[PATCHBOMB] xfsprogs: all my changes for 6.14

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHBOMB] xfsprogs: all my changes for 6.14
@ 2025-02-06 22:21 Darrick J. Wong
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:21 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, Christoph Hellwig

Hi everyone,

Now that I /think/ we're done with xfsprogs 6.13, here's a patchbomb
with all my code submissions for xfsprogs 6.14.  The first patchset
fixes some performance problems that Christoph observed with metadir
enabled and a very large number of rtgroups.

The second patchset is the usual libxfs resync; I'll only post it once
for archiving on the list.

The third and fourth patchsets implement all the userspace code needed
to handle reverse mapping and reflinks on the realtime volume.

The fifth patchset implements an "rdump" command for xfs_db.  This tool
has two intended users: support and recovery people who have a badly
damaged unmountable filesystem who want to try to extract as much of a
subdirectory as they can; and people who realize that they have an old
XFSv4 filesystem containing files they want, but kernel support for that
has been removed and finding/running an old kernel isn't possible or as
easy as running a program.

Nearly everything in here is unreviewed, but here's the list in case you
were wondering:

[PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration
  [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster
  [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to
  [PATCH 04/17] libfrog: wrap handle construction code
  [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes
  [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only
  [PATCH 07/17] xfs_scrub: remove flags argument from
  [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running
  [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records
  [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3
  [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount
  [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails
  [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if
  [PATCH 14/17] xfs_scrub: don't blow away new inodes in
  [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping
  [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping
  [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with
[PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14
[PATCHSET v6.2 3/5] xfsprogs: realtime reverse-mapping support
  [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during
  [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo
  [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap
  [PATCH 04/27] man: document userspace API changes due to rt rmap
  [PATCH 05/27] xfs_db: compute average btree height
  [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt
  [PATCH 07/27] xfs_db: display the realtime rmap btree contents
  [PATCH 08/27] xfs_db: support the realtime rmapbt
  [PATCH 09/27] xfs_db: copy the realtime rmap btree
  [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping
  [PATCH 11/27] xfs_db: add an rgresv command
  [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap
  [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys
  [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks
  [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block
  [PATCH 16/27] xfs_repair: create a new set of incore rmap information
  [PATCH 17/27] xfs_repair: refactor realtime inode check
  [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes
  [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries
  [PATCH 20/27] xfs_repair: always check realtime file mappings against
  [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree
  [PATCH 22/27] xfs_repair: check for global free space concerns with
  [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files
  [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt
  [PATCH 25/27] xfs_logprint: report realtime RUIs
  [PATCH 26/27] mkfs: add some rtgroup inode helpers
  [PATCH 27/27] mkfs: create the realtime rmap inode
[PATCHSET v6.2 4/5] xfsprogs: reflink on the realtime device
  [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during
  [PATCH 02/22] libxfs: add a realtime flag to the refcount update log
  [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW
  [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data
  [PATCH 05/22] man: document userspace API changes due to rt reflink
  [PATCH 06/22] xfs_db: display the realtime refcount btree contents
  [PATCH 07/22] xfs_db: support the realtime refcountbt
  [PATCH 08/22] xfs_db: copy the realtime refcount btree
  [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv
  [PATCH 10/22] xfs_spaceman: report health of the realtime refcount
  [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime
  [PATCH 12/22] xfs_repair: use realtime refcount btree data to check
  [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode
  [PATCH 14/22] xfs_repair: compute refcount data for the realtime
  [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries
  [PATCH 16/22] xfs_repair: reject unwritten shared extents
  [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree
  [PATCH 18/22] xfs_repair: allow realtime files to have the reflink
  [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit
  [PATCH 20/22] xfs_logprint: report realtime CUIs
  [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is
  [PATCH 22/22] mkfs: enable reflink on the realtime device
[PATCHSET 5/5] xfsprogs: dump fs directory trees
  [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them
  [PATCH 2/4] xfs_db: use an empty transaction to try to prevent
  [PATCH 3/4] xfs_db: make listdir more generally useful
  [PATCH 4/4] xfs_db: add command to copy directory trees out of

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration
  2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
@ 2025-02-06 22:29 ` Darrick J. Wong
  2025-02-06 22:30   ` [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster Darrick J. Wong
                     ` (16 more replies)
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 17 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:29 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

Hi all,

Christoph and I were investigating some performance problems in
xfs_scrub on filesystems that have a lot of rtgroups, and we noticed
several problems and inefficiencies in the existing inode iteration
code.

The first observation is that two of the three callers of
scrub_all_inodes (phases 5 and 6) just want to walk all the user files
in the filesystem.  They don't care about metadir directories, and they
don't care about matching inumbers data to bulkstat data for the purpose
of finding broken files.  The third caller (phase 3) does, so it makes
more sense to create a much simpler iterator for phase 5 and 6 that only
calls bulkstat.

But then I started noticing other problems in the phase 3 inode
iteration code -- if the per-inumbers bulkstat iterator races with other
threads that are creating or deleting files we can walk off the end of
the bulkstat array, we can miss newly allocated files, miss old
allocated inodes if there are newly allocated ones, pointlessly try to
scan deleted files, and redundantly scan files from another inobt
record.

These races rarely happen, but they all need fixing.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-inode-iteration-fixes
---
Commits in this patchset:
 * libxfs: unmap xmbuf pages to avoid disaster
 * libxfs: mark xmbuf_{un,}map_page static
 * man: document new XFS_BULK_IREQ_METADIR flag to bulkstat
 * libfrog: wrap handle construction code
 * xfs_scrub: don't report data loss in unlinked inodes twice
 * xfs_scrub: call bulkstat directly if we're only scanning user files
 * xfs_scrub: remove flags argument from scrub_scan_all_inodes
 * xfs_scrub: selectively re-run bulkstat after re-running inumbers
 * xfs_scrub: actually iterate all the bulkstat records
 * xfs_scrub: don't double-scan inodes during phase 3
 * xfs_scrub: don't (re)set the bulkstat request icount incorrectly
 * xfs_scrub: don't complain if bulkstat fails
 * xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data
 * xfs_scrub: don't blow away new inodes in bulkstat_single_step
 * xfs_scrub: hoist the phase3 bulkstat single stepping code
 * xfs_scrub: ignore freed inodes when single-stepping during phase 3
 * xfs_scrub: try harder to fill the bulkstat array with bulkstat()
---
 include/cache.h               |    6 
 io/parent.c                   |    9 -
 libfrog/Makefile              |    1 
 libfrog/bitmask.h             |    6 
 libfrog/handle_priv.h         |   55 ++++
 libxfs/buf_mem.c              |  159 +++++++++---
 libxfs/buf_mem.h              |    3 
 libxfs/cache.c                |   11 +
 man/man2/ioctl_xfs_bulkstat.2 |    8 +
 scrub/common.c                |    9 -
 scrub/inodes.c                |  552 ++++++++++++++++++++++++++++++++++++-----
 scrub/inodes.h                |   12 +
 scrub/phase3.c                |    7 -
 scrub/phase5.c                |   14 -
 scrub/phase6.c                |   14 +
 spaceman/health.c             |    9 -
 16 files changed, 726 insertions(+), 149 deletions(-)
 create mode 100644 libfrog/handle_priv.h

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14
  2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
@ 2025-02-06 22:29 ` Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 01/56] xfs: tidy up xfs_iroot_realloc Darrick J. Wong
                     ` (55 more replies)
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 56 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:29 UTC (permalink / raw)
  To: djwong, aalbersh
  Cc: david, alexjlzheng, chandanbabu, mtodorovac69, cem, dchinner,
	linux-kernel, cmaiolino, linux-xfs, hch, hch, linux-xfs

Hi all,

Port kernel libxfs code to userspace.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=libxfs-sync-6.14
---
Commits in this patchset:
 * xfs: tidy up xfs_iroot_realloc
 * xfs: refactor the inode fork memory allocation functions
 * xfs: make xfs_iroot_realloc take the new numrecs instead of deltas
 * xfs: make xfs_iroot_realloc a bmap btree function
 * xfs: tidy up xfs_bmap_broot_realloc a bit
 * xfs: hoist the node iroot update code out of xfs_btree_new_iroot
 * xfs: hoist the node iroot update code out of xfs_btree_kill_iroot
 * xfs: add some rtgroup inode helpers
 * xfs: prepare to reuse the dquot pointer space in struct xfs_inode
 * xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
 * xfs: support storing records in the inode core root
 * xfs: allow inode-based btrees to reserve space in the data device
 * xfs: introduce realtime rmap btree ondisk definitions
 * xfs: realtime rmap btree transaction reservations
 * xfs: add realtime rmap btree operations
 * xfs: prepare rmap functions to deal with rtrmapbt
 * xfs: add a realtime flag to the rmap update log redo items
 * xfs: pretty print metadata file types in error messages
 * xfs: support file data forks containing metadata btrees
 * xfs: add realtime reverse map inode to metadata directory
 * xfs: add metadata reservations for realtime rmap btrees
 * xfs: wire up a new metafile type for the realtime rmap
 * xfs: wire up rmap map and unmap to the realtime rmapbt
 * xfs: create routine to allocate and initialize a realtime rmap btree inode
 * xfs: report realtime rmap btree corruption errors to the health system
 * xfs: scrub the realtime rmapbt
 * xfs: scrub the metadir path of rt rmap btree files
 * xfs: online repair of realtime bitmaps for a realtime group
 * xfs: online repair of the realtime rmap btree
 * xfs: create a shadow rmap btree during realtime rmap repair
 * xfs: namespace the maximum length/refcount symbols
 * xfs: introduce realtime refcount btree ondisk definitions
 * xfs: realtime refcount btree transaction reservations
 * xfs: add realtime refcount btree operations
 * xfs: prepare refcount functions to deal with rtrefcountbt
 * xfs: add a realtime flag to the refcount update log redo items
 * xfs: add realtime refcount btree inode to metadata directory
 * xfs: add metadata reservations for realtime refcount btree
 * xfs: wire up a new metafile type for the realtime refcount
 * xfs: wire up realtime refcount btree cursors
 * xfs: create routine to allocate and initialize a realtime refcount btree inode
 * xfs: update rmap to allow cow staging extents in the rt rmap
 * xfs: compute rtrmap btree max levels when reflink enabled
 * xfs: allow inodes to have the realtime and reflink flags
 * xfs: recover CoW leftovers in the realtime volume
 * xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
 * xfs: apply rt extent alignment constraints to CoW extsize hint
 * xfs: enable extent size hints for CoW operations
 * xfs: report realtime refcount btree corruption errors to the health system
 * xfs: scrub the realtime refcount btree
 * xfs: scrub the metadir path of rt refcount btree files
 * xfs: fix the entry condition of exact EOF block allocation optimization
 * xfs: mark xfs_dir_isempty static
 * xfs: remove XFS_ILOG_NONCORE
 * xfs: constify feature checks
 * xfs/libxfs: replace kmalloc() and memcpy() with kmemdup()
---
 include/kmem.h                |    9 
 include/libxfs.h              |    2 
 include/xfs_inode.h           |    5 
 include/xfs_mount.h           |   27 +
 include/xfs_trace.h           |    7 
 io/inject.c                   |    1 
 libxfs/Makefile               |    4 
 libxfs/libxfs_priv.h          |   11 
 libxfs/xfs_ag_resv.c          |    3 
 libxfs/xfs_attr.c             |    4 
 libxfs/xfs_bmap.c             |   47 +-
 libxfs/xfs_bmap_btree.c       |  111 ++++
 libxfs/xfs_bmap_btree.h       |    3 
 libxfs/xfs_btree.c            |  410 +++++++++++++---
 libxfs/xfs_btree.h            |   28 +
 libxfs/xfs_btree_mem.c        |    1 
 libxfs/xfs_btree_staging.c    |   10 
 libxfs/xfs_defer.h            |    2 
 libxfs/xfs_dir2.c             |    9 
 libxfs/xfs_dir2.h             |    1 
 libxfs/xfs_errortag.h         |    4 
 libxfs/xfs_exchmaps.c         |    4 
 libxfs/xfs_format.h           |   51 ++
 libxfs/xfs_fs.h               |   10 
 libxfs/xfs_health.h           |    6 
 libxfs/xfs_inode_buf.c        |   65 ++-
 libxfs/xfs_inode_fork.c       |  201 +++-----
 libxfs/xfs_inode_fork.h       |    6 
 libxfs/xfs_log_format.h       |   16 -
 libxfs/xfs_metadir.c          |    3 
 libxfs/xfs_metafile.c         |  221 +++++++++
 libxfs/xfs_metafile.h         |   13 +
 libxfs/xfs_ondisk.h           |    4 
 libxfs/xfs_refcount.c         |  277 +++++++++--
 libxfs/xfs_refcount.h         |   23 +
 libxfs/xfs_rmap.c             |  178 ++++++-
 libxfs/xfs_rmap.h             |   12 
 libxfs/xfs_rtbitmap.c         |    2 
 libxfs/xfs_rtbitmap.h         |    9 
 libxfs/xfs_rtgroup.c          |   74 ++-
 libxfs/xfs_rtgroup.h          |   58 ++
 libxfs/xfs_rtrefcount_btree.c |  755 ++++++++++++++++++++++++++++++
 libxfs/xfs_rtrefcount_btree.h |  189 +++++++
 libxfs/xfs_rtrmap_btree.c     | 1034 +++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h     |  210 ++++++++
 libxfs/xfs_sb.c               |   14 +
 libxfs/xfs_shared.h           |   21 +
 libxfs/xfs_trans_resv.c       |   37 +
 libxfs/xfs_trans_space.h      |   13 +
 libxfs/xfs_types.h            |    7 
 repair/rmap.c                 |    2 
 repair/scan.c                 |    2 
 52 files changed, 3822 insertions(+), 394 deletions(-)
 create mode 100644 libxfs/xfs_rtrefcount_btree.c
 create mode 100644 libxfs/xfs_rtrefcount_btree.h
 create mode 100644 libxfs/xfs_rtrmap_btree.c
 create mode 100644 libxfs/xfs_rtrmap_btree.h


^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support
  2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
@ 2025-02-06 22:30 ` Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization Darrick J. Wong
                     ` (26 more replies)
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
  4 siblings, 27 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:30 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

This is the latest revision of a patchset that adds to XFS kernel
support for reverse mapping for the realtime device.  This time around
I've fixed some of the bitrot that I've noticed over the past few
months, and most notably have converted rtrmapbt to use the metadata
inode directory feature instead of burning more space in the superblock.

At the beginning of the set are patches to implement storing B+tree
leaves in an inode root, since the realtime rmapbt is rooted in an
inode, unlike the regular rmapbt which is rooted in an AG block.
Prior to this, the only btree that could be rooted in the inode fork
was the block mapping btree; if all the extent records fit in the
inode, format would be switched from 'btree' to 'extents'.

The next few patches enhance the reverse mapping routines to handle
the parts that are specific to rtgroups -- adding the new btree type,
adding a new log intent item type, and wiring up the metadata directory
tree entries.

Finally, implement GETFSMAP with the rtrmapbt and scrub functionality
for the rtrmapbt and rtbitmap and online fsck functionality.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-rmap

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-rmap

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-rmap

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-rmap
---
Commits in this patchset:
 * libxfs: compute the rt rmap btree maxlevels during initialization
 * libxfs: add a realtime flag to the rmap update log redo items
 * libfrog: enable scrubbing of the realtime rmap
 * man: document userspace API changes due to rt rmap
 * xfs_db: compute average btree height
 * xfs_db: don't abort when bmapping on a non-extents/bmbt fork
 * xfs_db: display the realtime rmap btree contents
 * xfs_db: support the realtime rmapbt
 * xfs_db: copy the realtime rmap btree
 * xfs_db: make fsmap query the realtime reverse mapping tree
 * xfs_db: add an rgresv command
 * xfs_spaceman: report health status of the realtime rmap btree
 * xfs_repair: tidy up rmap_diffkeys
 * xfs_repair: flag suspect long-format btree blocks
 * xfs_repair: use realtime rmap btree data to check block types
 * xfs_repair: create a new set of incore rmap information for rt groups
 * xfs_repair: refactor realtime inode check
 * xfs_repair: find and mark the rtrmapbt inodes
 * xfs_repair: check existing realtime rmapbt entries against observed rmaps
 * xfs_repair: always check realtime file mappings against incore info
 * xfs_repair: rebuild the realtime rmap btree
 * xfs_repair: check for global free space concerns with default btree slack levels
 * xfs_repair: rebuild the bmap btree for realtime files
 * xfs_repair: reserve per-AG space while rebuilding rt metadata
 * xfs_logprint: report realtime RUIs
 * mkfs: add some rtgroup inode helpers
 * mkfs: create the realtime rmap inode
---
 db/bmap.c                             |   17 +
 db/bmroot.c                           |  149 +++++++++++
 db/bmroot.h                           |    2 
 db/btblock.c                          |  103 ++++++++
 db/btblock.h                          |    5 
 db/btdump.c                           |   63 +++++
 db/btheight.c                         |   36 +++
 db/field.c                            |   11 +
 db/field.h                            |    5 
 db/fsmap.c                            |  149 +++++++++++
 db/info.c                             |  119 +++++++++
 db/inode.c                            |   24 ++
 db/metadump.c                         |  120 +++++++++
 db/type.c                             |    5 
 db/type.h                             |    1 
 include/libxfs.h                      |    1 
 libfrog/scrub.c                       |   10 +
 libxfs/defer_item.c                   |   35 ++-
 libxfs/init.c                         |   19 +
 libxfs/libxfs_api_defs.h              |   24 ++
 logprint/log_misc.c                   |    2 
 logprint/log_print_all.c              |    8 +
 logprint/log_redo.c                   |   24 +-
 man/man2/ioctl_xfs_rtgroup_geometry.2 |    3 
 man/man2/ioctl_xfs_scrub_metadata.2   |   12 +
 man/man8/xfs_db.8                     |   74 +++++-
 mkfs/proto.c                          |   31 ++
 mkfs/xfs_mkfs.c                       |   87 ++++++-
 repair/Makefile                       |    1 
 repair/agbtree.c                      |    5 
 repair/bmap_repair.c                  |  109 ++++++++
 repair/bulkload.c                     |   41 +++
 repair/bulkload.h                     |    2 
 repair/dino_chunks.c                  |   13 +
 repair/dinode.c                       |  441 ++++++++++++++++++++++++++++-----
 repair/dir2.c                         |    7 +
 repair/globals.c                      |    6 
 repair/globals.h                      |    2 
 repair/incore.h                       |    1 
 repair/phase4.c                       |   14 +
 repair/phase5.c                       |  114 ++++++++-
 repair/phase6.c                       |   72 +++++
 repair/rmap.c                         |  403 ++++++++++++++++++++++++------
 repair/rmap.h                         |   15 +
 repair/rt.h                           |    4 
 repair/rtrmap_repair.c                |  265 ++++++++++++++++++++
 repair/scan.c                         |  411 ++++++++++++++++++++++++++++++-
 repair/scan.h                         |   37 +++
 repair/xfs_repair.c                   |    8 -
 scrub/repair.c                        |    1 
 spaceman/health.c                     |   10 +
 51 files changed, 2905 insertions(+), 216 deletions(-)
 create mode 100644 repair/rtrmap_repair.c


^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device
  2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
@ 2025-02-06 22:30 ` Darrick J. Wong
  2025-02-06 22:57   ` [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization Darrick J. Wong
                     ` (21 more replies)
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
  4 siblings, 22 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:30 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

This patchset enables use of the file data block sharing feature (i.e.
reflink) on the realtime device.  It follows the same basic sequence as
the realtime rmap series -- first a few cleanups; then  introduction of
the new btree format and inode fork format.  Next comes enabling CoW and
remapping for the rt device; new scrub, repair, and health reporting
code; and at the end we implement some code to lengthen write requests
so that rt extents are always CoWed fully.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-reflink
---
Commits in this patchset:
 * libxfs: compute the rt refcount btree maxlevels during initialization
 * libxfs: add a realtime flag to the refcount update log redo items
 * libxfs: apply rt extent alignment constraints to CoW extsize hint
 * libfrog: enable scrubbing of the realtime refcount data
 * man: document userspace API changes due to rt reflink
 * xfs_db: display the realtime refcount btree contents
 * xfs_db: support the realtime refcountbt
 * xfs_db: copy the realtime refcount btree
 * xfs_db: add rtrefcount reservations to the rgresv command
 * xfs_spaceman: report health of the realtime refcount btree
 * xfs_repair: allow CoW staging extents in the realtime rmap records
 * xfs_repair: use realtime refcount btree data to check block types
 * xfs_repair: find and mark the rtrefcountbt inode
 * xfs_repair: compute refcount data for the realtime groups
 * xfs_repair: check existing realtime refcountbt entries against observed refcounts
 * xfs_repair: reject unwritten shared extents
 * xfs_repair: rebuild the realtime refcount btree
 * xfs_repair: allow realtime files to have the reflink flag set
 * xfs_repair: validate CoW extent size hint on rtinherit directories
 * xfs_logprint: report realtime CUIs
 * mkfs: validate CoW extent size hint when rtinherit is set
 * mkfs: enable reflink on the realtime device
---
 db/bmroot.c                         |  148 +++++++++++++++
 db/bmroot.h                         |    2 
 db/btblock.c                        |   53 ++++++
 db/btblock.h                        |    5 +
 db/btdump.c                         |    8 +
 db/btheight.c                       |    5 +
 db/field.c                          |   15 ++
 db/field.h                          |    6 +
 db/info.c                           |   13 +
 db/inode.c                          |   22 ++
 db/metadump.c                       |  114 ++++++++++++
 db/type.c                           |    5 +
 db/type.h                           |    1 
 libfrog/scrub.c                     |   10 +
 libxfs/defer_item.c                 |   34 +++-
 libxfs/init.c                       |    8 +
 libxfs/libxfs_api_defs.h            |   14 +
 libxfs/logitem.c                    |   14 +
 logprint/log_misc.c                 |    2 
 logprint/log_print_all.c            |    8 +
 logprint/log_redo.c                 |   24 ++
 man/man2/ioctl_xfs_scrub_metadata.2 |   10 +
 man/man8/xfs_db.8                   |   49 +++++
 mkfs/xfs_mkfs.c                     |   83 ++++++++-
 repair/Makefile                     |    1 
 repair/agbtree.c                    |    4 
 repair/dino_chunks.c                |   11 +
 repair/dinode.c                     |  287 +++++++++++++++++++++++++++---
 repair/dir2.c                       |    5 +
 repair/incore.h                     |    1 
 repair/phase4.c                     |   34 +++-
 repair/phase5.c                     |    6 -
 repair/phase6.c                     |   14 +
 repair/rmap.c                       |  241 ++++++++++++++++++++-----
 repair/rmap.h                       |   15 +-
 repair/rt.h                         |    4 
 repair/rtrefcount_repair.c          |  257 +++++++++++++++++++++++++++
 repair/scan.c                       |  337 ++++++++++++++++++++++++++++++++++-
 repair/scan.h                       |   33 +++
 scrub/repair.c                      |    1 
 spaceman/health.c                   |   10 +
 41 files changed, 1792 insertions(+), 122 deletions(-)
 create mode 100644 repair/rtrefcount_repair.c


^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCHSET 5/5] xfsprogs: dump fs directory trees
  2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
@ 2025-02-06 22:30 ` Darrick J. Wong
  2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
                     ` (3 more replies)
  4 siblings, 4 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:30 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

Before we disable XFS V4 support by default in the kernel ahead of
abandoning the old ondisk format in 2030, let's create a userspace
extraction tool so that in the future, people can extract their files
without needing to load an old kernel or figure out how to set up
lklfuse.  This also enables at least partial data recovery from
filesystems that cannot be mounted due to severe or unfixable
inconsistencies.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=rdump

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=rdump
---
Commits in this patchset:
 * xfs_db: pass const pointers when we're not modifying them
 * xfs_db: use an empty transaction to try to prevent livelocks in path_navigate
 * xfs_db: make listdir more generally useful
 * xfs_db: add command to copy directory trees out of filesystems
---
 db/Makefile              |    3 
 db/command.c             |    1 
 db/command.h             |    1 
 db/namei.c               |  115 ++++-
 db/namei.h               |   18 +
 db/rdump.c               |  999 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/libxfs_api_defs.h |    2 
 man/man8/xfs_db.8        |   26 +
 8 files changed, 1126 insertions(+), 39 deletions(-)
 create mode 100644 db/namei.h
 create mode 100644 db/rdump.c


^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
@ 2025-02-06 22:30   ` Darrick J. Wong
  2025-02-07  4:10     ` Christoph Hellwig
  2025-02-06 22:31   ` [PATCH 02/17] libxfs: mark xmbuf_{un,}map_page static Darrick J. Wong
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:30 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

It turns out that there's a maximum mappings count, so we need to be
smartish about not overflowing that with too many xmbuf buffers.  This
needs to be a global value because high-agcount filesystems will create
a large number of xmbuf caches but this is a process-global limit.

Cc: <linux-xfs@vger.kernel.org> # v6.9.0
Fixes: 124b388dac17f5 ("libxfs: support in-memory buffer cache targets")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 include/cache.h  |    6 +++
 libxfs/buf_mem.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 libxfs/cache.c   |   11 ++++++
 3 files changed, 115 insertions(+), 4 deletions(-)


diff --git a/include/cache.h b/include/cache.h
index 334ad26309e26d..279bf717ba335f 100644
--- a/include/cache.h
+++ b/include/cache.h
@@ -64,6 +64,8 @@ typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int,
 					  unsigned int);
 typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
 typedef unsigned int (*cache_bulk_relse_t)(struct cache *, struct list_head *);
+typedef int (*cache_node_get_t)(struct cache_node *);
+typedef void (*cache_node_put_t)(struct cache_node *);
 
 struct cache_operations {
 	cache_node_hash_t	hash;
@@ -72,6 +74,8 @@ struct cache_operations {
 	cache_node_relse_t	relse;
 	cache_node_compare_t	compare;
 	cache_bulk_relse_t	bulkrelse;	/* optional */
+	cache_node_get_t	get;		/* optional */
+	cache_node_put_t	put;		/* optional */
 };
 
 struct cache_hash {
@@ -107,6 +111,8 @@ struct cache {
 	cache_node_relse_t	relse;		/* memory free function */
 	cache_node_compare_t	compare;	/* comparison routine */
 	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
+	cache_node_get_t	get;		/* prepare cache node after get */
+	cache_node_put_t	put;		/* prepare to put cache node */
 	unsigned int		c_hashsize;	/* hash bucket count */
 	unsigned int		c_hashshift;	/* hash key shift */
 	struct cache_hash	*c_hash;	/* hash table buckets */
diff --git a/libxfs/buf_mem.c b/libxfs/buf_mem.c
index e5b91d3cfe0486..16cb038ba10e2a 100644
--- a/libxfs/buf_mem.c
+++ b/libxfs/buf_mem.c
@@ -34,6 +34,36 @@
 unsigned int	XMBUF_BLOCKSIZE;
 unsigned int	XMBUF_BLOCKSHIFT;
 
+long		xmbuf_max_mappings;
+static atomic_t	xmbuf_mappings;
+bool		xmbuf_unmap_early = false;
+
+static long
+get_max_mmap_count(void)
+{
+	char	buffer[64];
+	char	*p = NULL;
+	long	ret = -1;
+	FILE	*file;
+
+	file = fopen("/proc/sys/vm/max_map_count", "r");
+	if (!file)
+		return -1;
+
+	while (fgets(buffer, sizeof(buffer), file)) {
+		errno = 0;
+		ret = strtol(buffer, &p, 0);
+		if (errno || p == buffer)
+			continue;
+
+		/* only take half the maximum mmap count so others can use it */
+		ret /= 2;
+		break;
+	}
+	fclose(file);
+	return ret;
+}
+
 void
 xmbuf_libinit(void)
 {
@@ -45,6 +75,14 @@ xmbuf_libinit(void)
 
 	XMBUF_BLOCKSIZE = ret;
 	XMBUF_BLOCKSHIFT = libxfs_highbit32(XMBUF_BLOCKSIZE);
+
+	/*
+	 * Figure out how many mmaps we will use simultaneously.  Pick a low
+	 * default if we can't query procfs.
+	 */
+	xmbuf_max_mappings = get_max_mmap_count();
+	if (xmbuf_max_mappings < 0)
+		xmbuf_max_mappings = 1024;
 }
 
 /* Allocate a new cache node (aka a xfs_buf) */
@@ -105,7 +143,8 @@ xmbuf_cache_relse(
 	struct xfs_buf		*bp;
 
 	bp = container_of(node, struct xfs_buf, b_node);
-	xmbuf_unmap_page(bp);
+	if (bp->b_addr)
+		xmbuf_unmap_page(bp);
 	kmem_cache_free(xfs_buf_cache, bp);
 }
 
@@ -129,13 +168,50 @@ xmbuf_cache_bulkrelse(
 	return count;
 }
 
+static int
+xmbuf_cache_node_get(
+	struct cache_node	*node)
+{
+	struct xfs_buf		*bp =
+		container_of(node, struct xfs_buf, b_node);
+	int			error;
+
+	if (bp->b_addr != NULL)
+		return 0;
+
+	error = xmbuf_map_page(bp);
+	if (error) {
+		fprintf(stderr,
+ _("%s: %s can't mmap %u bytes at xfile offset %llu: %s\n"),
+				progname, __FUNCTION__, BBTOB(bp->b_length),
+				(unsigned long long)xfs_buf_daddr(bp),
+				strerror(error));
+		return error;
+	}
+
+	return 0;
+}
+
+static void
+xmbuf_cache_node_put(
+	struct cache_node	*node)
+{
+	struct xfs_buf		*bp =
+		container_of(node, struct xfs_buf, b_node);
+
+	if (xmbuf_unmap_early)
+		xmbuf_unmap_page(bp);
+}
+
 static struct cache_operations xmbuf_bcache_operations = {
 	.hash		= libxfs_bhash,
 	.alloc		= xmbuf_cache_alloc,
 	.flush		= xmbuf_cache_flush,
 	.relse		= xmbuf_cache_relse,
 	.compare	= libxfs_bcompare,
-	.bulkrelse	= xmbuf_cache_bulkrelse
+	.bulkrelse	= xmbuf_cache_bulkrelse,
+	.get		= xmbuf_cache_node_get,
+	.put		= xmbuf_cache_node_put,
 };
 
 /*
@@ -216,8 +292,24 @@ xmbuf_map_page(
 	pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp));
 	p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED,
 			xfile->fcb->fd, pos);
-	if (p == MAP_FAILED)
-		return -errno;
+	if (p == MAP_FAILED) {
+		if (errno == ENOMEM && !xmbuf_unmap_early) {
+#ifdef DEBUG
+			fprintf(stderr, "xmbuf could not make mappings!\n");
+#endif
+			xmbuf_unmap_early = true;
+		}
+		return errno;
+	}
+
+	if (!xmbuf_unmap_early &&
+	    atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) {
+#ifdef DEBUG
+		fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n",
+					xmbuf_max_mappings);
+#endif
+		xmbuf_unmap_early = true;
+	}
 
 	bp->b_addr = p;
 	bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED;
@@ -230,6 +322,8 @@ void
 xmbuf_unmap_page(
 	struct xfs_buf		*bp)
 {
+	if (!xmbuf_unmap_early)
+		atomic_dec(&xmbuf_mappings);
 	munmap(bp->b_addr, BBTOB(bp->b_length));
 	bp->b_addr = NULL;
 }
diff --git a/libxfs/cache.c b/libxfs/cache.c
index 139c7c1b9e715e..af20f3854df93e 100644
--- a/libxfs/cache.c
+++ b/libxfs/cache.c
@@ -61,6 +61,8 @@ cache_init(
 	cache->compare = cache_operations->compare;
 	cache->bulkrelse = cache_operations->bulkrelse ?
 		cache_operations->bulkrelse : cache_generic_bulkrelse;
+	cache->get = cache_operations->get;
+	cache->put = cache_operations->put;
 	pthread_mutex_init(&cache->c_mutex, NULL);
 
 	for (i = 0; i < hashsize; i++) {
@@ -415,6 +417,13 @@ cache_node_get(
 			 */
 			pthread_mutex_lock(&node->cn_mutex);
 
+			if (node->cn_count == 0 && cache->get) {
+				int err = cache->get(node);
+				if (err) {
+					pthread_mutex_unlock(&node->cn_mutex);
+					goto next_object;
+				}
+			}
 			if (node->cn_count == 0) {
 				ASSERT(node->cn_priority >= 0);
 				ASSERT(!list_empty(&node->cn_mru));
@@ -503,6 +512,8 @@ cache_node_put(
 #endif
 	node->cn_count--;
 
+	if (node->cn_count == 0 && cache->put)
+		cache->put(node);
 	if (node->cn_count == 0) {
 		/* add unreferenced node to appropriate MRU for shaker */
 		mru = &cache->c_mrus[node->cn_priority];


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 02/17] libxfs: mark xmbuf_{un,}map_page static
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
  2025-02-06 22:30   ` [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster Darrick J. Wong
@ 2025-02-06 22:31   ` Darrick J. Wong
  2025-02-06 22:31   ` [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat Darrick J. Wong
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:31 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Not used outside of buf_mem.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/buf_mem.c |   97 +++++++++++++++++++++++++++---------------------------
 libxfs/buf_mem.h |    3 --
 2 files changed, 49 insertions(+), 51 deletions(-)


diff --git a/libxfs/buf_mem.c b/libxfs/buf_mem.c
index 16cb038ba10e2a..77396fa95b4138 100644
--- a/libxfs/buf_mem.c
+++ b/libxfs/buf_mem.c
@@ -85,6 +85,55 @@ xmbuf_libinit(void)
 		xmbuf_max_mappings = 1024;
 }
 
+/* Directly map a memfd page into the buffer cache. */
+static int
+xmbuf_map_page(
+	struct xfs_buf		*bp)
+{
+	struct xfile		*xfile = bp->b_target->bt_xfile;
+	void			*p;
+	loff_t			pos;
+
+	pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp));
+	p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED,
+			xfile->fcb->fd, pos);
+	if (p == MAP_FAILED) {
+		if (errno == ENOMEM && !xmbuf_unmap_early) {
+#ifdef DEBUG
+			fprintf(stderr, "xmbuf could not make mappings!\n");
+#endif
+			xmbuf_unmap_early = true;
+		}
+		return errno;
+	}
+
+	if (!xmbuf_unmap_early &&
+	    atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) {
+#ifdef DEBUG
+		fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n",
+					xmbuf_max_mappings);
+#endif
+		xmbuf_unmap_early = true;
+	}
+
+	bp->b_addr = p;
+	bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED;
+	bp->b_error = 0;
+	return 0;
+}
+
+/* Unmap a memfd page that was mapped into the buffer cache. */
+static void
+xmbuf_unmap_page(
+	struct xfs_buf		*bp)
+{
+	if (!xmbuf_unmap_early)
+		atomic_dec(&xmbuf_mappings);
+	munmap(bp->b_addr, BBTOB(bp->b_length));
+	bp->b_addr = NULL;
+}
+
+
 /* Allocate a new cache node (aka a xfs_buf) */
 static struct cache_node *
 xmbuf_cache_alloc(
@@ -280,54 +329,6 @@ xmbuf_free(
 	kfree(btp);
 }
 
-/* Directly map a memfd page into the buffer cache. */
-int
-xmbuf_map_page(
-	struct xfs_buf		*bp)
-{
-	struct xfile		*xfile = bp->b_target->bt_xfile;
-	void			*p;
-	loff_t			pos;
-
-	pos = xfile->partition_pos + BBTOB(xfs_buf_daddr(bp));
-	p = mmap(NULL, BBTOB(bp->b_length), PROT_READ | PROT_WRITE, MAP_SHARED,
-			xfile->fcb->fd, pos);
-	if (p == MAP_FAILED) {
-		if (errno == ENOMEM && !xmbuf_unmap_early) {
-#ifdef DEBUG
-			fprintf(stderr, "xmbuf could not make mappings!\n");
-#endif
-			xmbuf_unmap_early = true;
-		}
-		return errno;
-	}
-
-	if (!xmbuf_unmap_early &&
-	    atomic_inc_return(&xmbuf_mappings) > xmbuf_max_mappings) {
-#ifdef DEBUG
-		fprintf(stderr, _("xmbuf hit too many mappings (%ld)!\n",
-					xmbuf_max_mappings);
-#endif
-		xmbuf_unmap_early = true;
-	}
-
-	bp->b_addr = p;
-	bp->b_flags |= LIBXFS_B_UPTODATE | LIBXFS_B_UNCHECKED;
-	bp->b_error = 0;
-	return 0;
-}
-
-/* Unmap a memfd page that was mapped into the buffer cache. */
-void
-xmbuf_unmap_page(
-	struct xfs_buf		*bp)
-{
-	if (!xmbuf_unmap_early)
-		atomic_dec(&xmbuf_mappings);
-	munmap(bp->b_addr, BBTOB(bp->b_length));
-	bp->b_addr = NULL;
-}
-
 /* Is this a valid daddr within the buftarg? */
 bool
 xmbuf_verify_daddr(
diff --git a/libxfs/buf_mem.h b/libxfs/buf_mem.h
index f19bc6fd700b9a..6e4b2d3503b853 100644
--- a/libxfs/buf_mem.h
+++ b/libxfs/buf_mem.h
@@ -20,9 +20,6 @@ int xmbuf_alloc(struct xfs_mount *mp, const char *descr,
 		unsigned long long maxpos, struct xfs_buftarg **btpp);
 void xmbuf_free(struct xfs_buftarg *btp);
 
-int xmbuf_map_page(struct xfs_buf *bp);
-void xmbuf_unmap_page(struct xfs_buf *bp);
-
 bool xmbuf_verify_daddr(struct xfs_buftarg *btp, xfs_daddr_t daddr);
 void xmbuf_trans_bdetach(struct xfs_trans *tp, struct xfs_buf *bp);
 int xmbuf_finalize(struct xfs_buf *bp);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
  2025-02-06 22:30   ` [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster Darrick J. Wong
  2025-02-06 22:31   ` [PATCH 02/17] libxfs: mark xmbuf_{un,}map_page static Darrick J. Wong
@ 2025-02-06 22:31   ` Darrick J. Wong
  2025-02-07  4:10     ` Christoph Hellwig
  2025-02-06 22:31   ` [PATCH 04/17] libfrog: wrap handle construction code Darrick J. Wong
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:31 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document this new flag.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 man/man2/ioctl_xfs_bulkstat.2 |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/man/man2/ioctl_xfs_bulkstat.2 b/man/man2/ioctl_xfs_bulkstat.2
index b6d51aa438111d..0dba16a6b0e2df 100644
--- a/man/man2/ioctl_xfs_bulkstat.2
+++ b/man/man2/ioctl_xfs_bulkstat.2
@@ -101,6 +101,14 @@ .SH DESCRIPTION
 via bs_extents field and bs_extents64 is assigned a value of 0. In the second
 case, bs_extents is set to (2^31 - 1) if data fork extent count is larger than
 2^31. This flag may be set independently of whether other flags have been set.
+.TP
+.B XFS_BULK_IREQ_METADIR
+Return metadata directory tree inodes in the stat output as well.
+The only fields that will be populated are
+.IR bs_ino ", " bs_gen ", " bs_mode ", " bs_sick ", and " bs_checked.
+All other fields, notably
+.IR bs_blksize ,
+will be zero.
 .RE
 .PP
 .I hdr.icount


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-02-06 22:31   ` [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat Darrick J. Wong
@ 2025-02-06 22:31   ` Darrick J. Wong
  2025-02-07  4:34     ` Christoph Hellwig
  2025-02-20 21:36     ` [PATCH v1.1] " Darrick J. Wong
  2025-02-06 22:31   ` [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice Darrick J. Wong
                     ` (12 subsequent siblings)
  16 siblings, 2 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:31 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Clean up all the open-coded logic to construct a file handle from a
fshandle and some bulkstat/parent pointer information.  The new
functions are stashed in a private header file to avoid leaking the
details of xfs_handle construction in the public libhandle headers.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 io/parent.c           |    9 +++-----
 libfrog/Makefile      |    1 +
 libfrog/handle_priv.h |   55 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/common.c        |    9 +++-----
 scrub/inodes.c        |   13 ++++--------
 scrub/phase5.c        |   12 ++++-------
 spaceman/health.c     |    9 +++-----
 7 files changed, 73 insertions(+), 35 deletions(-)
 create mode 100644 libfrog/handle_priv.h


diff --git a/io/parent.c b/io/parent.c
index 8db93d98755289..3ba3aef48cb9be 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -11,6 +11,7 @@
 #include "handle.h"
 #include "init.h"
 #include "io.h"
+#include "libfrog/handle_priv.h"
 
 static cmdinfo_t parent_cmd;
 static char *mntpt;
@@ -205,12 +206,8 @@ parent_f(
 			return 0;
 		}
 
-		memcpy(&handle, hanp, sizeof(handle));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = ino;
-		handle.ha_fid.fid_gen = gen;
+		handle_from_fshandle(&handle, hanp, hlen);
+		handle_from_inogen(&handle, ino, gen);
 	} else if (optind != argc) {
 		return command_usage(&parent_cmd);
 	}
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 4da427789411a6..fc7e506d96bbad 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -53,6 +53,7 @@ fsgeom.h \
 fsproperties.h \
 fsprops.h \
 getparents.h \
+handle_priv.h \
 histogram.h \
 logging.h \
 paths.h \
diff --git a/libfrog/handle_priv.h b/libfrog/handle_priv.h
new file mode 100644
index 00000000000000..8c3634c40de1c8
--- /dev/null
+++ b/libfrog/handle_priv.h
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __LIBFROG_HANDLE_PRIV_H__
+#define __LIBFROG_HANDLE_PRIV_H__
+
+/*
+ * Private helpers to construct an xfs_handle without publishing those details
+ * in the public libhandle header files.
+ */
+
+/*
+ * Fills out the fsid part of a handle.  This does not initialize the fid part
+ * of the handle; use either of the two functions below.
+ */
+static inline void
+handle_from_fshandle(
+	struct xfs_handle	*handle,
+	const void		*fshandle,
+	size_t			fshandle_len)
+{
+	ASSERT(fshandle_len == sizeof(xfs_fsid_t));
+
+	memcpy(&handle->ha_fsid, fshandle, sizeof(handle->ha_fsid));
+	handle->ha_fid.fid_len = sizeof(xfs_fid_t) -
+			sizeof(handle->ha_fid.fid_len);
+	handle->ha_fid.fid_pad = 0;
+	handle->ha_fid.fid_ino = 0;
+	handle->ha_fid.fid_gen = 0;
+}
+
+/* Fill out the fid part of a handle from raw components. */
+static inline void
+handle_from_inogen(
+	struct xfs_handle	*handle,
+	uint64_t		ino,
+	uint32_t		gen)
+{
+	handle->ha_fid.fid_ino = ino;
+	handle->ha_fid.fid_gen = gen;
+}
+
+/* Fill out the fid part of a handle. */
+static inline void
+handle_from_bulkstat(
+	struct xfs_handle		*handle,
+	const struct xfs_bulkstat	*bstat)
+{
+	handle->ha_fid.fid_ino = bstat->bs_ino;
+	handle->ha_fid.fid_gen = bstat->bs_gen;
+}
+
+#endif /* __LIBFROG_HANDLE_PRIV_H__ */
diff --git a/scrub/common.c b/scrub/common.c
index f86546556f46dd..6eb3c026dc5ac9 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -10,6 +10,7 @@
 #include "platform_defs.h"
 #include "libfrog/paths.h"
 #include "libfrog/getparents.h"
+#include "libfrog/handle_priv.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "progress.h"
@@ -414,12 +415,8 @@ scrub_render_ino_descr(
 	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) {
 		struct xfs_handle handle;
 
-		memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = ino;
-		handle.ha_fid.fid_gen = gen;
+		handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
+		handle_from_inogen(&handle, ino, gen);
 
 		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 4096,
 				buf, buflen);
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 3fe759e8f4867d..2b492a634ea3b2 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -19,6 +19,7 @@
 #include "descr.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
+#include "libfrog/handle_priv.h"
 
 /*
  * Iterate a range of inodes.
@@ -209,7 +210,7 @@ scan_ag_bulkstat(
 	xfs_agnumber_t		agno,
 	void			*arg)
 {
-	struct xfs_handle	handle = { };
+	struct xfs_handle	handle;
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	struct scan_ichunk	*ichunk = arg;
 	struct xfs_inumbers_req	*ireq = ichunk_to_inumbers(ichunk);
@@ -225,12 +226,7 @@ scan_ag_bulkstat(
 	DEFINE_DESCR(dsc_inumbers, ctx, render_inumbers_from_agno);
 
 	descr_set(&dsc_inumbers, &agno);
-
-	memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-			sizeof(handle.ha_fid.fid_len);
-	handle.ha_fid.fid_pad = 0;
-
+	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 retry:
 	bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq);
 
@@ -244,8 +240,7 @@ scan_ag_bulkstat(
 			continue;
 
 		descr_set(&dsc_bulkstat, bs);
-		handle.ha_fid.fid_ino = scan_ino;
-		handle.ha_fid.fid_gen = bs->bs_gen;
+		handle_from_bulkstat(&handle, bs);
 		error = si->fn(ctx, &handle, bs, si->arg);
 		switch (error) {
 		case 0:
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 22a22915dbc68d..6460d00f30f4bd 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -18,6 +18,7 @@
 #include "libfrog/bitmap.h"
 #include "libfrog/bulkstat.h"
 #include "libfrog/fakelibattr.h"
+#include "libfrog/handle_priv.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "inodes.h"
@@ -474,9 +475,7 @@ retry_deferred_inode(
 	if (error)
 		return error;
 
-	handle->ha_fid.fid_ino = bstat.bs_ino;
-	handle->ha_fid.fid_gen = bstat.bs_gen;
-
+	handle_from_bulkstat(handle, &bstat);
 	return check_inode_names(ncs->ctx, handle, &bstat, ncs);
 }
 
@@ -487,16 +486,13 @@ retry_deferred_inode_range(
 	uint64_t		len,
 	void			*arg)
 {
-	struct xfs_handle	handle = { };
+	struct xfs_handle	handle;
 	struct ncheck_state	*ncs = arg;
 	struct scrub_ctx	*ctx = ncs->ctx;
 	uint64_t		i;
 	int			error;
 
-	memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-			sizeof(handle.ha_fid.fid_len);
-	handle.ha_fid.fid_pad = 0;
+	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 
 	for (i = 0; i < len; i++) {
 		error = retry_deferred_inode(ncs, &handle, ino + i);
diff --git a/spaceman/health.c b/spaceman/health.c
index 4281589324cd44..0d2767df424f27 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -14,6 +14,7 @@
 #include "libfrog/bulkstat.h"
 #include "space.h"
 #include "libfrog/getparents.h"
+#include "libfrog/handle_priv.h"
 
 static cmdinfo_t health_cmd;
 static unsigned long long reported;
@@ -317,12 +318,8 @@ report_inode(
 	    (file->xfd.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT)) {
 		struct xfs_handle handle;
 
-		memcpy(&handle.ha_fsid, file->fshandle, sizeof(handle.ha_fsid));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = bs->bs_ino;
-		handle.ha_fid.fid_gen = bs->bs_gen;
+		handle_from_fshandle(&handle, file->fshandle, file->fshandle_len);
+		handle_from_bulkstat(&handle, bs);
 
 		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 0,
 				descr, sizeof(descr) - 1);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-02-06 22:31   ` [PATCH 04/17] libfrog: wrap handle construction code Darrick J. Wong
@ 2025-02-06 22:31   ` Darrick J. Wong
  2025-02-07  4:35     ` Christoph Hellwig
  2025-02-06 22:32   ` [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files Darrick J. Wong
                     ` (11 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:31 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If parent pointers are enabled, report_ioerr_fsmap will report lost file
data and xattrs for all files, having used the parent pointer ioctls to
generate the path of the lost file.  For unlinked files, the path lookup
will fail, but we'll report the inumber of the file that lost data.

Therefore, we don't need to do a separate scan of the unlinked inodes
in report_all_media_errors after doing the fsmap scan.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 9b5d1349ca5fb1 ("xfs_scrub: use parent pointers to report lost file data")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase6.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index fc63f5aad0bd7b..2695e645004bf1 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -569,12 +569,12 @@ report_all_media_errors(
 	 * Scan the directory tree to get file paths if we didn't already use
 	 * directory parent pointers to report the loss.
 	 */
-	if (!can_use_pptrs(ctx)) {
-		ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss,
-				vs);
-		if (ret)
-			return ret;
-	}
+	if (can_use_pptrs(ctx))
+		return 0;
+
+	ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss, vs);
+	if (ret)
+		return ret;
 
 	/* Scan for unlinked files. */
 	return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-02-06 22:31   ` [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice Darrick J. Wong
@ 2025-02-06 22:32   ` Darrick J. Wong
  2025-02-07  4:37     ` Christoph Hellwig
  2025-02-06 22:32   ` [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes Darrick J. Wong
                     ` (10 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:32 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Christoph observed xfs_scrub phase 5 consuming a lot of CPU time on a
filesystem with a very large number of rtgroups.  He traced this to
bulkstat_for_inumbers spending a lot of time trying to single-step
through inodes that were marked allocated in the inumbers record but
didn't show up in the bulkstat data.  These correspond to files in the
metadata directory tree that are not returned by the regular bulkstat.

This complex machinery isn't necessary for the inode walk that occur
during phase 5 because phase 5 wants to open user files and check the
dirent/xattr names associated with that file.  It's not needed for phase
6 because we're only using it to report data loss in unlinked files when
parent pointers aren't enabled.

Furthermore, we don't need to do this inumbers -> bulkstat dance because
phase 3 and 4 supposedly fixed any inode that was to corrupt to be
igettable and hence reported on by bulkstat.

Fix this by creating a simpler user file iterator that walks bulkstat
across the filesystem without using inumbers.  While we're at it, fix
the obviously incorrect comments in inodes.h.

Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 372d4ba99155b2 ("xfs_scrub: add inode iteration functions")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/inodes.h |    6 ++
 scrub/phase5.c |    2 -
 scrub/phase6.c |    2 -
 4 files changed, 158 insertions(+), 3 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 2b492a634ea3b2..58969131628f8f 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -445,6 +445,157 @@ scrub_scan_all_inodes(
 	return si.aborted ? -1 : 0;
 }
 
+struct user_bulkstat {
+	struct scan_inodes	*si;
+
+	/* vla, must be last */
+	struct xfs_bulkstat_req	breq;
+};
+
+/* Iterate all the user files returned by a bulkstat. */
+static void
+scan_user_files(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct xfs_handle	handle;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct user_bulkstat	*ureq = arg;
+	struct xfs_bulkstat	*bs = &ureq->breq.bulkstat[0];
+	struct scan_inodes	*si = ureq->si;
+	int			i;
+	int			error = 0;
+	DEFINE_DESCR(dsc_bulkstat, ctx, render_ino_from_bulkstat);
+
+	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
+
+	for (i = 0; !si->aborted && i < ureq->breq.hdr.ocount; i++, bs++) {
+		descr_set(&dsc_bulkstat, bs);
+		handle_from_bulkstat(&handle, bs);
+		error = si->fn(ctx, &handle, bs, si->arg);
+		switch (error) {
+		case 0:
+			break;
+		case ESTALE:
+		case ECANCELED:
+			error = 0;
+			fallthrough;
+		default:
+			goto err;
+		}
+		if (scrub_excessive_errors(ctx)) {
+			si->aborted = true;
+			goto out;
+		}
+	}
+
+err:
+	if (error) {
+		str_liberror(ctx, error, descr_render(&dsc_bulkstat));
+		si->aborted = true;
+	}
+out:
+	free(ureq);
+}
+
+/*
+ * Run one step of the user files bulkstat scan and schedule background
+ * processing of the stat data returned.  Returns 1 to keep going, or 0 to
+ * stop.
+ */
+static int
+scan_user_bulkstat(
+	struct scrub_ctx	*ctx,
+	struct scan_inodes	*si,
+	uint64_t		*cursor)
+{
+	struct user_bulkstat	*ureq;
+	const char		*what = NULL;
+	int			ret;
+
+	ureq = calloc(1, sizeof(struct user_bulkstat) +
+			 XFS_BULKSTAT_REQ_SIZE(LIBFROG_BULKSTAT_CHUNKSIZE));
+	if (!ureq) {
+		ret = ENOMEM;
+		what = _("creating bulkstat work item");
+		goto err;
+	}
+	ureq->si = si;
+	ureq->breq.hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE;
+	ureq->breq.hdr.ino = *cursor;
+
+	ret = -xfrog_bulkstat(&ctx->mnt, &ureq->breq);
+	if (ret) {
+		what = _("user files bulkstat");
+		goto err_ureq;
+	}
+	if (ureq->breq.hdr.ocount == 0) {
+		*cursor = NULLFSINO;
+		free(ureq);
+		return 0;
+	}
+
+	*cursor = ureq->breq.hdr.ino;
+
+	/* scan_user_files frees ureq; do not access it */
+	ret = -workqueue_add(&si->wq_bulkstat, scan_user_files, 0, ureq);
+	if (ret) {
+		what = _("queueing bulkstat work");
+		goto err_ureq;
+	}
+	ureq = NULL;
+
+	return 1;
+
+err_ureq:
+	free(ureq);
+err:
+	si->aborted = true;
+	str_liberror(ctx, ret, what);
+	return 0;
+}
+
+/*
+ * Scan all the user files in a filesystem in inumber order.  On error, this
+ * function will log an error message and return -1.
+ */
+int
+scrub_scan_user_files(
+	struct scrub_ctx	*ctx,
+	scrub_inode_iter_fn	fn,
+	void			*arg)
+{
+	struct scan_inodes	si = {
+		.fn		= fn,
+		.arg		= arg,
+		.nr_threads	= scrub_nproc_workqueue(ctx),
+	};
+	uint64_t		ino = 0;
+	int			ret;
+
+	/* Queue up to four bulkstat result sets per thread. */
+	ret = -workqueue_create_bound(&si.wq_bulkstat, (struct xfs_mount *)ctx,
+			si.nr_threads, si.nr_threads * 4);
+	if (ret) {
+		str_liberror(ctx, ret, _("creating bulkstat workqueue"));
+		return -1;
+	}
+
+	while ((ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) {
+		/* empty */
+	}
+
+	ret = -workqueue_terminate(&si.wq_bulkstat);
+	if (ret) {
+		si.aborted = true;
+		str_liberror(ctx, ret, _("finishing bulkstat work"));
+	}
+	workqueue_destroy(&si.wq_bulkstat);
+
+	return si.aborted ? -1 : 0;
+}
+
 /* Open a file by handle, returning either the fd or -1 on error. */
 int
 scrub_open_handle(
diff --git a/scrub/inodes.h b/scrub/inodes.h
index 7a0b275e575ead..99b78fa1f76515 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -7,7 +7,7 @@
 #define XFS_SCRUB_INODES_H_
 
 /*
- * Visit each space mapping of an inode fork.  Return 0 to continue iteration
+ * Callback for each inode in a filesystem.  Return 0 to continue iteration
  * or a positive error code to interrupt iteraton.  If ESTALE is returned,
  * iteration will be restarted from the beginning of the inode allocation
  * group.  Any other non zero value will stop iteration.  The special return
@@ -23,6 +23,10 @@ typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx,
 int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
 		unsigned int flags, void *arg);
 
+/* Scan all user-created files in the filesystem. */
+int scrub_scan_user_files(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
+		void *arg);
+
 int scrub_open_handle(struct xfs_handle *handle);
 
 #endif /* XFS_SCRUB_INODES_H_ */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 6460d00f30f4bd..577dda8064c3a8 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -882,7 +882,7 @@ _("Filesystem has errors, skipping connectivity checks."));
 
 	pthread_mutex_init(&ncs.lock, NULL);
 
-	ret = scrub_scan_all_inodes(ctx, check_inode_names, 0, &ncs);
+	ret = scrub_scan_user_files(ctx, check_inode_names, &ncs);
 	if (ret)
 		goto out_lock;
 	if (ncs.aborted) {
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 2695e645004bf1..9858b932f20de5 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -577,7 +577,7 @@ report_all_media_errors(
 		return ret;
 
 	/* Scan for unlinked files. */
-	return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs);
+	return scrub_scan_user_files(ctx, report_inode_loss, vs);
 }
 
 /* Schedule a read-verify of a (data block) extent. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-02-06 22:32   ` [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files Darrick J. Wong
@ 2025-02-06 22:32   ` Darrick J. Wong
  2025-02-07  4:38     ` Christoph Hellwig
  2025-02-06 22:32   ` [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers Darrick J. Wong
                     ` (9 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:32 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that there's only one caller of scrub_scan_all_inodes, remove the
single defined flag because it can set the METADIR bulkstat flag if
needed.  Clarify in the documentation that this is a special purpose
inode iterator that picks up things that don't normally happen.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   23 ++++++++++-------------
 scrub/inodes.h |    6 ++----
 scrub/phase3.c |    7 +------
 3 files changed, 13 insertions(+), 23 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 58969131628f8f..c32dfb624e3e95 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -57,7 +57,6 @@ bulkstat_for_inumbers(
 {
 	struct xfs_bulkstat	*bstat = breq->bulkstat;
 	struct xfs_bulkstat	*bs;
-	unsigned int		flags = 0;
 	int			i;
 	int			error;
 
@@ -72,9 +71,6 @@ bulkstat_for_inumbers(
 			 strerror_r(error, errbuf, DESCR_BUFSZ));
 	}
 
-	if (breq->hdr.flags & XFS_BULK_IREQ_METADIR)
-		flags |= XFS_BULK_IREQ_METADIR;
-
 	/*
 	 * Check each of the stats we got back to make sure we got the inodes
 	 * we asked for.
@@ -89,7 +85,7 @@ bulkstat_for_inumbers(
 
 		/* Load the one inode. */
 		error = -xfrog_bulkstat_single(&ctx->mnt,
-				inumbers->xi_startino + i, flags, bs);
+				inumbers->xi_startino + i, breq->hdr.flags, bs);
 		if (error || bs->bs_ino != inumbers->xi_startino + i) {
 			memset(bs, 0, sizeof(struct xfs_bulkstat));
 			bs->bs_ino = inumbers->xi_startino + i;
@@ -105,7 +101,6 @@ struct scan_inodes {
 	scrub_inode_iter_fn	fn;
 	void			*arg;
 	unsigned int		nr_threads;
-	unsigned int		flags;
 	bool			aborted;
 };
 
@@ -139,6 +134,7 @@ ichunk_to_bulkstat(
 
 static inline int
 alloc_ichunk(
+	struct scrub_ctx	*ctx,
 	struct scan_inodes	*si,
 	uint32_t		agno,
 	uint64_t		startino,
@@ -164,7 +160,9 @@ alloc_ichunk(
 
 	breq = ichunk_to_bulkstat(ichunk);
 	breq->hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE;
-	if (si->flags & SCRUB_SCAN_METADIR)
+
+	/* Scan the metadata directory tree too. */
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
 		breq->hdr.flags |= XFS_BULK_IREQ_METADIR;
 
 	*ichunkp = ichunk;
@@ -302,7 +300,7 @@ scan_ag_inumbers(
 
 	descr_set(&dsc, &agno);
 
-	error = alloc_ichunk(si, agno, 0, &ichunk);
+	error = alloc_ichunk(ctx, si, agno, 0, &ichunk);
 	if (error)
 		goto err;
 	ireq = ichunk_to_inumbers(ichunk);
@@ -355,7 +353,7 @@ scan_ag_inumbers(
 		}
 
 		if (!ichunk) {
-			error = alloc_ichunk(si, agno, nextino, &ichunk);
+			error = alloc_ichunk(ctx, si, agno, nextino, &ichunk);
 			if (error)
 				goto err;
 		}
@@ -375,19 +373,18 @@ scan_ag_inumbers(
 }
 
 /*
- * Scan all the inodes in a filesystem.  On error, this function will log
- * an error message and return -1.
+ * Scan all the inodes in a filesystem, including metadata directory files and
+ * broken files.  On error, this function will log an error message and return
+ * -1.
  */
 int
 scrub_scan_all_inodes(
 	struct scrub_ctx	*ctx,
 	scrub_inode_iter_fn	fn,
-	unsigned int		flags,
 	void			*arg)
 {
 	struct scan_inodes	si = {
 		.fn		= fn,
-		.flags		= flags,
 		.arg		= arg,
 		.nr_threads	= scrub_nproc_workqueue(ctx),
 	};
diff --git a/scrub/inodes.h b/scrub/inodes.h
index 99b78fa1f76515..d68e94eb216895 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -17,11 +17,9 @@
 typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx,
 		struct xfs_handle *handle, struct xfs_bulkstat *bs, void *arg);
 
-/* Return metadata directories too. */
-#define SCRUB_SCAN_METADIR	(1 << 0)
-
+/* Scan every file in the filesystem, including metadir and corrupt ones. */
 int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
-		unsigned int flags, void *arg);
+		void *arg);
 
 /* Scan all user-created files in the filesystem. */
 int scrub_scan_user_files(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
diff --git a/scrub/phase3.c b/scrub/phase3.c
index c90da78439425a..046a42c1da8beb 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -312,7 +312,6 @@ phase3_func(
 	struct scrub_inode_ctx	ictx = { .ctx = ctx };
 	uint64_t		val;
 	xfs_agnumber_t		agno;
-	unsigned int		scan_flags = 0;
 	int			err;
 
 	err = -ptvar_alloc(scrub_nproc(ctx), sizeof(struct action_list),
@@ -329,10 +328,6 @@ phase3_func(
 		goto out_ptvar;
 	}
 
-	/* Scan the metadata directory tree too. */
-	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
-		scan_flags |= SCRUB_SCAN_METADIR;
-
 	/*
 	 * If we already have ag/fs metadata to repair from previous phases,
 	 * we would rather not try to repair file metadata until we've tried
@@ -343,7 +338,7 @@ phase3_func(
 			ictx.always_defer_repairs = true;
 	}
 
-	err = scrub_scan_all_inodes(ctx, scrub_inode, scan_flags, &ictx);
+	err = scrub_scan_all_inodes(ctx, scrub_inode, &ictx);
 	if (!err && ictx.aborted)
 		err = ECANCELED;
 	if (err)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-02-06 22:32   ` [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes Darrick J. Wong
@ 2025-02-06 22:32   ` Darrick J. Wong
  2025-02-07  4:39     ` Christoph Hellwig
  2025-02-06 22:33   ` [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records Darrick J. Wong
                     ` (8 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:32 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In the phase 3 inode scan, don't bother retrying the inumbers ->
bulkstat conversion unless inumbers returns the same startino and there
are allocated inodes.  If inumbers returns data for a totally different
inobt record, that means the whole inode chunk was freed.

Cc: <linux-xfs@vger.kernel.org> # v5.18.0
Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index c32dfb624e3e95..8bdfa0b35d6172 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -60,6 +60,8 @@ bulkstat_for_inumbers(
 	int			i;
 	int			error;
 
+	assert(inumbers->xi_allocmask != 0);
+
 	/* First we try regular bulkstat, for speed. */
 	breq->hdr.ino = inumbers->xi_startino;
 	breq->hdr.icount = inumbers->xi_alloccount;
@@ -246,11 +248,24 @@ scan_ag_bulkstat(
 		case ESTALE: {
 			stale_count++;
 			if (stale_count < 30) {
-				ireq->hdr.ino = inumbers->xi_startino;
+				uint64_t	old_startino;
+
+				ireq->hdr.ino = old_startino =
+					inumbers->xi_startino;
 				error = -xfrog_inumbers(&ctx->mnt, ireq);
 				if (error)
 					goto err;
-				goto retry;
+				/*
+				 * Retry only if inumbers returns the same
+				 * inobt record as the previous record and
+				 * there are allocated inodes in it.
+				 */
+				if (!si->aborted &&
+				    ireq->hdr.ocount > 0 &&
+				    inumbers->xi_alloccount > 0 &&
+				    inumbers->xi_startino == old_startino)
+					goto retry;
+				goto out;
 			}
 			str_info(ctx, descr_render(&dsc_bulkstat),
 _("Changed too many times during scan; giving up."));


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-02-06 22:32   ` [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers Darrick J. Wong
@ 2025-02-06 22:33   ` Darrick J. Wong
  2025-02-07  4:40     ` Christoph Hellwig
  2025-02-06 22:33   ` [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3 Darrick J. Wong
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:33 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In scan_ag_bulkstat, we have a for loop that iterates all the
xfs_bulkstat records in breq->bulkstat.  The loop condition test should
test against the array length, not the number of bits set in an
unrelated data structure.  If ocount > xi_alloccount then we miss some
inodes; if ocount < xi_alloccount then we've walked off the end of the
array.

Cc: <linux-xfs@vger.kernel.org> # v5.18.0
Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 8bdfa0b35d6172..4d8b137a698004 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -216,7 +216,7 @@ scan_ag_bulkstat(
 	struct xfs_inumbers_req	*ireq = ichunk_to_inumbers(ichunk);
 	struct xfs_bulkstat_req	*breq = ichunk_to_bulkstat(ichunk);
 	struct scan_inodes	*si = ichunk->si;
-	struct xfs_bulkstat	*bs;
+	struct xfs_bulkstat	*bs = &breq->bulkstat[0];
 	struct xfs_inumbers	*inumbers = &ireq->inumbers[0];
 	uint64_t		last_ino = 0;
 	int			i;
@@ -231,8 +231,7 @@ scan_ag_bulkstat(
 	bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq);
 
 	/* Iterate all the inodes. */
-	bs = &breq->bulkstat[0];
-	for (i = 0; !si->aborted && i < inumbers->xi_alloccount; i++, bs++) {
+	for (i = 0; !si->aborted && i < breq->hdr.ocount; i++, bs++) {
 		uint64_t	scan_ino = bs->bs_ino;
 
 		/* ensure forward progress if we retried */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-02-06 22:33   ` [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records Darrick J. Wong
@ 2025-02-06 22:33   ` Darrick J. Wong
  2025-02-07  4:41     ` Christoph Hellwig
  2025-02-06 22:33   ` [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly Darrick J. Wong
                     ` (6 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:33 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The bulkstat ioctl only allows us to specify the starting inode number
and the length of the bulkstat array.  It is possible that a bulkstat
request for {startino = 30, icount = 10} will return stat data for inode
50.  For most bulkstat users this is ok because they're marching
linearly across all inodes in the filesystem.

Unfortunately for scrub phase 3 this is undesirable because we only want
the inodes that belong to a specific inobt record because we need to
know about inodes that are marked as allocated but are too corrupt to
appear in the bulkstat output.  Another worker will process the inobt
record(s) that corresponds to the extra inodes, which means we can
double-scan some inodes.

Therefore, bulkstat_for_inumbers should trim out inodes that don't
correspond to the inumbers record that it is given.

Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 4d8b137a698004..a7ea24615e9255 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -50,15 +50,17 @@
  */
 static void
 bulkstat_for_inumbers(
-	struct scrub_ctx	*ctx,
-	struct descr		*dsc,
-	const struct xfs_inumbers *inumbers,
-	struct xfs_bulkstat_req	*breq)
+	struct scrub_ctx		*ctx,
+	struct descr			*dsc,
+	const struct xfs_inumbers	*inumbers,
+	struct xfs_bulkstat_req		*breq)
 {
-	struct xfs_bulkstat	*bstat = breq->bulkstat;
-	struct xfs_bulkstat	*bs;
-	int			i;
-	int			error;
+	struct xfs_bulkstat		*bstat = breq->bulkstat;
+	struct xfs_bulkstat		*bs;
+	const uint64_t			limit_ino =
+		inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
+	int				i;
+	int				error;
 
 	assert(inumbers->xi_allocmask != 0);
 
@@ -73,6 +75,16 @@ bulkstat_for_inumbers(
 			 strerror_r(error, errbuf, DESCR_BUFSZ));
 	}
 
+	/*
+	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
+	 * ocount to ignore inodes not described by the inumbers record.
+	 */
+	for (i = breq->hdr.ocount - 1; i >= 0; i--) {
+		if (breq->bulkstat[i].bs_ino < limit_ino)
+			break;
+		breq->hdr.ocount--;
+	}
+
 	/*
 	 * Check each of the stats we got back to make sure we got the inodes
 	 * we asked for.


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-02-06 22:33   ` [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3 Darrick J. Wong
@ 2025-02-06 22:33   ` Darrick J. Wong
  2025-02-07  4:42     ` Christoph Hellwig
  2025-02-06 22:33   ` [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails Darrick J. Wong
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:33 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't change the bulkstat request icount in bulkstat_for_inumbers
because alloc_ichunk already set it to LIBFROG_BULKSTAT_CHUNKSIZE.
Lowering it to xi_alloccount here means that we can miss inodes at the
end of the inumbers chunk if any are allocated to the same inobt record
after the inumbers call but before the bulkstat call.

Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |    1 -
 1 file changed, 1 deletion(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index a7ea24615e9255..4e4408f9ff2256 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -66,7 +66,6 @@ bulkstat_for_inumbers(
 
 	/* First we try regular bulkstat, for speed. */
 	breq->hdr.ino = inumbers->xi_startino;
-	breq->hdr.icount = inumbers->xi_alloccount;
 	error = -xfrog_bulkstat(&ctx->mnt, breq);
 	if (error) {
 		char	errbuf[DESCR_BUFSZ];


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-02-06 22:33   ` [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly Darrick J. Wong
@ 2025-02-06 22:33   ` Darrick J. Wong
  2025-02-07  4:42     ` Christoph Hellwig
  2025-02-06 22:34   ` [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data Darrick J. Wong
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:33 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If bulkstat fails, we fall back to loading the bulkstat array one
element at a time.  There's no reason to log errors.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 4e4408f9ff2256..4d3ec07b2d9862 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -51,7 +51,6 @@
 static void
 bulkstat_for_inumbers(
 	struct scrub_ctx		*ctx,
-	struct descr			*dsc,
 	const struct xfs_inumbers	*inumbers,
 	struct xfs_bulkstat_req		*breq)
 {
@@ -66,13 +65,7 @@ bulkstat_for_inumbers(
 
 	/* First we try regular bulkstat, for speed. */
 	breq->hdr.ino = inumbers->xi_startino;
-	error = -xfrog_bulkstat(&ctx->mnt, breq);
-	if (error) {
-		char	errbuf[DESCR_BUFSZ];
-
-		str_info(ctx, descr_render(dsc), "%s",
-			 strerror_r(error, errbuf, DESCR_BUFSZ));
-	}
+	xfrog_bulkstat(&ctx->mnt, breq);
 
 	/*
 	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
@@ -239,7 +232,7 @@ scan_ag_bulkstat(
 	descr_set(&dsc_inumbers, &agno);
 	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 retry:
-	bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq);
+	bulkstat_for_inumbers(ctx, inumbers, breq);
 
 	/* Iterate all the inodes. */
 	for (i = 0; !si->aborted && i < breq->hdr.ocount; i++, bs++) {


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-02-06 22:33   ` [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails Darrick J. Wong
@ 2025-02-06 22:34   ` Darrick J. Wong
  2025-02-07  4:43     ` Christoph Hellwig
  2025-02-06 22:34   ` [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step Darrick J. Wong
                     ` (3 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If bulkstat doesn't return an error code or any bulkstat records, we've
hit the end of the filesystem, so return early.  This can happen if the
inumbers data came from the very last inobt record in the filesystem and
every inode in that inobt record is freed immediately after INUMBERS.
There's no bug here, it's just a minor optimization.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 4d3ec07b2d9862..3b9026ce8fa2f4 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -65,7 +65,9 @@ bulkstat_for_inumbers(
 
 	/* First we try regular bulkstat, for speed. */
 	breq->hdr.ino = inumbers->xi_startino;
-	xfrog_bulkstat(&ctx->mnt, breq);
+	error = -xfrog_bulkstat(&ctx->mnt, breq);
+	if (!error && !breq->hdr.ocount)
+		return;
 
 	/*
 	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-02-06 22:34   ` [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data Darrick J. Wong
@ 2025-02-06 22:34   ` Darrick J. Wong
  2025-02-07  4:46     ` Christoph Hellwig
  2025-02-06 22:34   ` [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code Darrick J. Wong
                     ` (2 subsequent siblings)
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

bulkstat_single_step has an ugly misfeature -- given the inumbers
record, it expects to find bulkstat data for those inodes, in the exact
order that they were specified in inumbers.  If a new inode is created
after inumbers but before bulkstat, bulkstat will return stat data for
that inode, only to have bulkstat_single_step obliterate it.  Then we
fail to scan that inode.

Instead, we should use the returned bulkstat array to compute a bitmask
of inodes that bulkstat had to have seen while it was walking the inobt.
An important detail is that any inode between the @ino parameter passed
to bulkstat and the last bulkstat record it returns was seen, even if no
bstat record was produced.

Any inode set in xi_allocmask but not set in the seen_mask is missing
and needs to be loaded.  Load bstat data for those inodes into the /end/
of the array so that we don't obliterate bstat data for a newly created
inode, then re-sort the array so we always scan in ascending inumber
order.

Cc: <linux-xfs@vger.kernel.org> # v5.18.0
Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |  144 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 123 insertions(+), 21 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 3b9026ce8fa2f4..ffdf0f2ae42c17 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -26,16 +26,41 @@
  *
  * This is a little more involved than repeatedly asking BULKSTAT for a
  * buffer's worth of stat data for some number of inodes.  We want to scan as
- * many of the inodes that the inobt thinks there are, including the ones that
- * are broken, but if we ask for n inodes starting at x, it'll skip the bad
- * ones and fill from beyond the range (x + n).
- *
- * Therefore, we ask INUMBERS to return one inobt chunk's worth of inode
- * bitmap information.  Then we try to BULKSTAT only the inodes that were
- * present in that chunk, and compare what we got against what INUMBERS said
- * was there.  If there's a mismatch, we know that we have an inode that fails
- * the verifiers but we can inject the bulkstat information to force the scrub
- * code to deal with the broken inodes.
+ * many of the inodes that the inobt thinks there are, so we use the INUMBERS
+ * ioctl to walk all the inobt records in the filesystem and spawn a worker to
+ * bulkstat and iterate.  The worker starts with an inumbers record that can
+ * look like this:
+ *
+ * {startino = S, allocmask = 0b11011}
+ *
+ * Given a starting inumber S and count C=64, bulkstat will return a sorted
+ * array of stat information.  The bs_ino of those array elements can look like
+ * any of the following:
+ *
+ * 0. [S, S+1, S+3, S+4]
+ * 1. [S+e, S+e+1, S+e+3, S+e+4, S+e+C+1...], where e >= 0
+ * 2. [S+e+n], where n >= 0
+ * 3. []
+ * 4. [], errno == EFSCORRUPTED
+ *
+ * We know that bulkstat scanned the entire inode range between S and bs_ino of
+ * the last array element, even though it only fills out an array element for
+ * allocated inodes.  Therefore, we can say in cases 0-2 that S was filled,
+ * even if there is no bstat[] record for S.  In turn, we can create a bitmask
+ * of inodes that we have seen, and set bits 0 through (bstat[-1].bs_ino - S),
+ * being careful not to set any bits past S+C.
+ *
+ * In case (0) we find that seen mask matches the inumber record
+ * exactly, so the caller can walk the stat records and move on.  In case (1)
+ * this is also true, but we must be careful to reduce the array length to
+ * avoid scanning inodes that are not in the inumber chunk.  In case (3) we
+ * conclude that there were no inodes left to scan and terminate.
+ *
+ * Inodes that are set in the allocmask but not set in the seen mask are the
+ * corrupt inodes.  For each of these cases, we try to populate the bulkstat
+ * array one inode at a time.  If the kernel returns a matching record we can
+ * use it; if instead we receive an error, we synthesize enough of a record
+ * to be able to run online scrub by handle.
  *
  * If the iteration function returns ESTALE, that means that the inode has
  * been deleted and possibly recreated since the BULKSTAT call.  We wil
@@ -43,6 +68,57 @@
  * the staleness as an error.
  */
 
+/*
+ * Return the inumber of the highest inode in the bulkstat data, assuming the
+ * records are sorted in inumber order.
+ */
+static inline uint64_t last_bstat_ino(const struct xfs_bulkstat_req *b)
+{
+	return b->hdr.ocount ? b->bulkstat[b->hdr.ocount - 1].bs_ino : 0;
+}
+
+/*
+ * Deduce the bitmask of the inodes in inums that were seen by bulkstat.  If
+ * the inode is present in the bstat array this is trivially true; or if it is
+ * not in the array but higher inumbers are present, then it was freed.
+ */
+static __u64
+seen_mask_from_bulkstat(
+	const struct xfs_inumbers	*inums,
+	__u64				breq_startino,
+	const struct xfs_bulkstat_req	*breq)
+{
+	const __u64			limit_ino =
+		inums->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
+	const __u64			last = last_bstat_ino(breq);
+	__u64				ret = 0;
+	int				i, maxi;
+
+	/* Ignore the bulkstat results if they don't cover inumbers */
+	if (breq_startino > limit_ino || last < inums->xi_startino)
+		return 0;
+
+	maxi = min(LIBFROG_BULKSTAT_CHUNKSIZE, last - inums->xi_startino + 1);
+	for (i = breq_startino - inums->xi_startino; i < maxi; i++)
+		ret |= 1ULL << i;
+
+	return ret;
+}
+
+#define cmp_int(l, r)		((l > r) - (l < r))
+
+/* Compare two bulkstat records by inumber. */
+static int
+compare_bstat(
+	const void		*a,
+	const void		*b)
+{
+	const struct xfs_bulkstat *ba = a;
+	const struct xfs_bulkstat *bb = b;
+
+	return cmp_int(ba->bs_ino, bb->bs_ino);
+}
+
 /*
  * Run bulkstat on an entire inode allocation group, then check that we got
  * exactly the inodes we expected.  If not, load them one at a time (or fake
@@ -54,10 +130,10 @@ bulkstat_for_inumbers(
 	const struct xfs_inumbers	*inumbers,
 	struct xfs_bulkstat_req		*breq)
 {
-	struct xfs_bulkstat		*bstat = breq->bulkstat;
-	struct xfs_bulkstat		*bs;
+	struct xfs_bulkstat		*bs = NULL;
 	const uint64_t			limit_ino =
 		inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
+	uint64_t			seen_mask = 0;
 	int				i;
 	int				error;
 
@@ -66,8 +142,12 @@ bulkstat_for_inumbers(
 	/* First we try regular bulkstat, for speed. */
 	breq->hdr.ino = inumbers->xi_startino;
 	error = -xfrog_bulkstat(&ctx->mnt, breq);
-	if (!error && !breq->hdr.ocount)
-		return;
+	if (!error) {
+		if (!breq->hdr.ocount)
+			return;
+		seen_mask |= seen_mask_from_bulkstat(inumbers,
+					inumbers->xi_startino, breq);
+	}
 
 	/*
 	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
@@ -80,18 +160,33 @@ bulkstat_for_inumbers(
 	}
 
 	/*
-	 * Check each of the stats we got back to make sure we got the inodes
-	 * we asked for.
+	 * Walk the xi_allocmask looking for set bits that aren't present in
+	 * the fill mask.  For each such inode, fill the entries at the end of
+	 * the array with stat information one at a time, synthesizing them if
+	 * necessary.  At this point, (xi_allocmask & ~seen_mask) should be the
+	 * corrupt inodes.
 	 */
-	for (i = 0, bs = bstat; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) {
+	for (i = 0; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) {
+		/*
+		 * Don't single-step if inumbers said it wasn't allocated or
+		 * bulkstat actually filled it.
+		 */
 		if (!(inumbers->xi_allocmask & (1ULL << i)))
 			continue;
-		if (bs->bs_ino == inumbers->xi_startino + i) {
-			bs++;
+		if (seen_mask & (1ULL << i))
 			continue;
-		}
 
-		/* Load the one inode. */
+		assert(breq->hdr.ocount < LIBFROG_BULKSTAT_CHUNKSIZE);
+
+		if (!bs)
+			bs = &breq->bulkstat[breq->hdr.ocount];
+
+		/*
+		 * Didn't get desired stat data and we've hit the end of the
+		 * returned data.  We can't distinguish between the inode being
+		 * freed vs. the inode being to corrupt to load, so try a
+		 * bulkstat single to see if we can load the inode.
+		 */
 		error = -xfrog_bulkstat_single(&ctx->mnt,
 				inumbers->xi_startino + i, breq->hdr.flags, bs);
 		if (error || bs->bs_ino != inumbers->xi_startino + i) {
@@ -99,8 +194,15 @@ bulkstat_for_inumbers(
 			bs->bs_ino = inumbers->xi_startino + i;
 			bs->bs_blksize = ctx->mnt_sv.f_frsize;
 		}
+
+		breq->hdr.ocount++;
 		bs++;
 	}
+
+	/* If we added any entries, re-sort the array. */
+	if (bs)
+		qsort(breq->bulkstat, breq->hdr.ocount,
+				sizeof(struct xfs_bulkstat), compare_bstat);
 }
 
 /* BULKSTAT wrapper routines. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-02-06 22:34   ` [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step Darrick J. Wong
@ 2025-02-06 22:34   ` Darrick J. Wong
  2025-02-07  4:47     ` Christoph Hellwig
  2025-02-06 22:34   ` [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3 Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat() Darrick J. Wong
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're about to make the bulkstat single step loading code more complex,
so hoist it into a separate function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   89 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 53 insertions(+), 36 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index ffdf0f2ae42c17..84696a5bcda7d1 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -120,52 +120,23 @@ compare_bstat(
 }
 
 /*
- * Run bulkstat on an entire inode allocation group, then check that we got
- * exactly the inodes we expected.  If not, load them one at a time (or fake
- * it) into the bulkstat data.
+ * Walk the xi_allocmask looking for set bits that aren't present in
+ * the fill mask.  For each such inode, fill the entries at the end of
+ * the array with stat information one at a time, synthesizing them if
+ * necessary.  At this point, (xi_allocmask & ~seen_mask) should be the
+ * corrupt inodes.
  */
 static void
-bulkstat_for_inumbers(
+bulkstat_single_step(
 	struct scrub_ctx		*ctx,
 	const struct xfs_inumbers	*inumbers,
+	uint64_t			seen_mask,
 	struct xfs_bulkstat_req		*breq)
 {
 	struct xfs_bulkstat		*bs = NULL;
-	const uint64_t			limit_ino =
-		inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
-	uint64_t			seen_mask = 0;
 	int				i;
 	int				error;
 
-	assert(inumbers->xi_allocmask != 0);
-
-	/* First we try regular bulkstat, for speed. */
-	breq->hdr.ino = inumbers->xi_startino;
-	error = -xfrog_bulkstat(&ctx->mnt, breq);
-	if (!error) {
-		if (!breq->hdr.ocount)
-			return;
-		seen_mask |= seen_mask_from_bulkstat(inumbers,
-					inumbers->xi_startino, breq);
-	}
-
-	/*
-	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
-	 * ocount to ignore inodes not described by the inumbers record.
-	 */
-	for (i = breq->hdr.ocount - 1; i >= 0; i--) {
-		if (breq->bulkstat[i].bs_ino < limit_ino)
-			break;
-		breq->hdr.ocount--;
-	}
-
-	/*
-	 * Walk the xi_allocmask looking for set bits that aren't present in
-	 * the fill mask.  For each such inode, fill the entries at the end of
-	 * the array with stat information one at a time, synthesizing them if
-	 * necessary.  At this point, (xi_allocmask & ~seen_mask) should be the
-	 * corrupt inodes.
-	 */
 	for (i = 0; i < LIBFROG_BULKSTAT_CHUNKSIZE; i++) {
 		/*
 		 * Don't single-step if inumbers said it wasn't allocated or
@@ -205,6 +176,52 @@ bulkstat_for_inumbers(
 				sizeof(struct xfs_bulkstat), compare_bstat);
 }
 
+/*
+ * Run bulkstat on an entire inode allocation group, then check that we got
+ * exactly the inodes we expected.  If not, load them one at a time (or fake
+ * it) into the bulkstat data.
+ */
+static void
+bulkstat_for_inumbers(
+	struct scrub_ctx		*ctx,
+	const struct xfs_inumbers	*inumbers,
+	struct xfs_bulkstat_req		*breq)
+{
+	const uint64_t			limit_ino =
+		inumbers->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
+	uint64_t			seen_mask = 0;
+	int				i;
+	int				error;
+
+	assert(inumbers->xi_allocmask != 0);
+
+	/* First we try regular bulkstat, for speed. */
+	breq->hdr.ino = inumbers->xi_startino;
+	error = -xfrog_bulkstat(&ctx->mnt, breq);
+	if (!error) {
+		if (!breq->hdr.ocount)
+			return;
+		seen_mask |= seen_mask_from_bulkstat(inumbers,
+					inumbers->xi_startino, breq);
+	}
+
+	/*
+	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
+	 * ocount to ignore inodes not described by the inumbers record.
+	 */
+	for (i = breq->hdr.ocount - 1; i >= 0; i--) {
+		if (breq->bulkstat[i].bs_ino < limit_ino)
+			break;
+		breq->hdr.ocount--;
+	}
+
+	/*
+	 * Fill in any missing inodes that are mentioned in the alloc mask but
+	 * weren't previously seen by bulkstat.
+	 */
+	bulkstat_single_step(ctx, inumbers, seen_mask, breq);
+}
+
 /* BULKSTAT wrapper routines. */
 struct scan_inodes {
 	struct workqueue	wq_bulkstat;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-02-06 22:34   ` [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code Darrick J. Wong
@ 2025-02-06 22:34   ` Darrick J. Wong
  2025-02-07  4:47     ` Christoph Hellwig
  2025-02-06 22:35   ` [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat() Darrick J. Wong
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For inodes that inumbers told us were allocated but weren't loaded by
the bulkstat call, we fall back to loading bulkstat data one inode at a
time to try to find the inodes that are too corrupt to load.

However, there are a couple of outcomes of the single bulkstat call that
clearly indicate that the inode is free, not corrupt.  In this case, the
phase 3 inode scan will try to scrub the inode, only to be told ENOENT
because it doesn't exist.

As an optimization here, don't increment ocount, just move on to the
next inode in the mask.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |   26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 84696a5bcda7d1..24a1dcab94c22d 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -160,10 +160,34 @@ bulkstat_single_step(
 		 */
 		error = -xfrog_bulkstat_single(&ctx->mnt,
 				inumbers->xi_startino + i, breq->hdr.flags, bs);
-		if (error || bs->bs_ino != inumbers->xi_startino + i) {
+		switch (error) {
+		case ENOENT:
+			/*
+			 * This inode wasn't found, and no results were
+			 * returned.  We've likely hit the end of the
+			 * filesystem, but we'll move on to the next inode in
+			 * the mask for the sake of caution.
+			 */
+			continue;
+		case 0:
+			/*
+			 * If a result was returned but it wasn't the inode
+			 * we were looking for, then the missing inode was
+			 * freed.  Move on to the next inode in the mask.
+			 */
+			if (bs->bs_ino != inumbers->xi_startino + i)
+				continue;
+			break;
+		default:
+			/*
+			 * Some error happened.  Synthesize a bulkstat record
+			 * so that phase3 can try to see if there's a corrupt
+			 * inode that needs repairing.
+			 */
 			memset(bs, 0, sizeof(struct xfs_bulkstat));
 			bs->bs_ino = inumbers->xi_startino + i;
 			bs->bs_blksize = ctx->mnt_sv.f_frsize;
+			break;
 		}
 
 		breq->hdr.ocount++;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat()
  2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-02-06 22:34   ` [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3 Darrick J. Wong
@ 2025-02-06 22:35   ` Darrick J. Wong
  2025-02-07  4:48     ` Christoph Hellwig
  16 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Sometimes, the last bulkstat record returned by the first xfrog_bulkstat
call in bulkstat_for_inumbers will contain an inumber less than the
highest allocated inode mentioned in the inumbers record.  This happens
either because the inodes have been freed, or because the the kernel
encountered a corrupt inode during bulkstat and stopped filling up the
array.

In both cases, we can call bulkstat again to try to fill up the rest of
the array.  If there are newly allocated inodes, they'll be returned; if
we've truly hit the end of the filesystem, the kernel will return zero
records; and if the first allocated inode is indeed corrupt, the kernel
will return EFSCORRUPTED.

As an optimization to avoid the single-step code, call bulkstat with an
increasing ino parameter until the bulkstat array is full or the kernel
tells us there are no bulkstat records to return.  This speeds things
up a bit in cases where the allocmask is all ones and only the second
inode is corrupt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libfrog/bitmask.h |    6 +++
 scrub/inodes.c    |  110 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+), 1 deletion(-)


diff --git a/libfrog/bitmask.h b/libfrog/bitmask.h
index 719a6bfd29db38..47e39a1e09d002 100644
--- a/libfrog/bitmask.h
+++ b/libfrog/bitmask.h
@@ -42,4 +42,10 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
 	return 0;
 }
 
+/* Get high bit set out of 64-bit argument, -1 if none set */
+static inline int xfrog_highbit64(uint64_t v)
+{
+	return fls64(v) - 1;
+}
+
 #endif /* __LIBFROG_BITMASK_H_ */
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 24a1dcab94c22d..2f3c87be79f783 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -20,6 +20,8 @@
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
 #include "libfrog/handle_priv.h"
+#include "bitops.h"
+#include "libfrog/bitmask.h"
 
 /*
  * Iterate a range of inodes.
@@ -56,6 +58,15 @@
  * avoid scanning inodes that are not in the inumber chunk.  In case (3) we
  * conclude that there were no inodes left to scan and terminate.
  *
+ * In case (2) and (4) we don't know why bulkstat returned fewer than C
+ * elements.  We might have found the end of the filesystem, or the kernel
+ * might have found a corrupt inode and stopped.  This we must investigate by
+ * trying to fill out the rest of the bstat array starting with the next
+ * inumber after the last bstat array element filled, and continuing until S'
+ * is beyond S0 + C, or the array is full.  Each time we succeed in loading
+ * new records, the kernel increases S' for us; if instead we encounter case
+ * (4), we can increment S' ourselves.
+ *
  * Inodes that are set in the allocmask but not set in the seen mask are the
  * corrupt inodes.  For each of these cases, we try to populate the bulkstat
  * array one inode at a time.  If the kernel returns a matching record we can
@@ -105,6 +116,87 @@ seen_mask_from_bulkstat(
 	return ret;
 }
 
+/*
+ * Try to fill the rest of orig_breq with bulkstat data by re-running bulkstat
+ * with increasing start_ino until we either hit the end of the inumbers info
+ * or fill up the bstat array with something.  Returns a bitmask of the inodes
+ * within inums that were filled by the bulkstat requests.
+ */
+static __u64
+bulkstat_the_rest(
+	struct scrub_ctx		*ctx,
+	const struct xfs_inumbers	*inums,
+	struct xfs_bulkstat_req		*orig_breq,
+	int				orig_error)
+{
+	struct xfs_bulkstat_req		*new_breq;
+	struct xfs_bulkstat		*old_bstat =
+		&orig_breq->bulkstat[orig_breq->hdr.ocount];
+	const __u64			limit_ino =
+		inums->xi_startino + LIBFROG_BULKSTAT_CHUNKSIZE;
+	__u64				start_ino = orig_breq->hdr.ino;
+	__u64				seen_mask = 0;
+	int				error;
+
+	assert(orig_breq->hdr.ocount < orig_breq->hdr.icount);
+
+	/*
+	 * If the first bulkstat returned a corruption error, that means
+	 * start_ino is corrupt.  Restart instead at the next inumber.
+	 */
+	if (orig_error == EFSCORRUPTED)
+		start_ino++;
+	if (start_ino >= limit_ino)
+		return 0;
+
+	error = -xfrog_bulkstat_alloc_req(
+			orig_breq->hdr.icount - orig_breq->hdr.ocount,
+			start_ino, &new_breq);
+	if (error)
+		return error;
+	new_breq->hdr.flags = orig_breq->hdr.flags;
+
+	do {
+		/*
+		 * Fill the new bulkstat request with stat data starting at
+		 * start_ino.
+		 */
+		error = -xfrog_bulkstat(&ctx->mnt, new_breq);
+		if (error == EFSCORRUPTED) {
+			/*
+			 * start_ino is corrupt, increment and try the next
+			 * inode.
+			 */
+			start_ino++;
+			new_breq->hdr.ino = start_ino;
+			continue;
+		}
+		if (error) {
+			/*
+			 * Any other error means the caller falls back to
+			 * single stepping.
+			 */
+			break;
+		}
+		if (new_breq->hdr.ocount == 0)
+			break;
+
+		/* Copy new results to the original bstat buffer */
+		memcpy(old_bstat, new_breq->bulkstat,
+		       new_breq->hdr.ocount * sizeof(struct xfs_bulkstat));
+		orig_breq->hdr.ocount += new_breq->hdr.ocount;
+		old_bstat += new_breq->hdr.ocount;
+		seen_mask |= seen_mask_from_bulkstat(inums, start_ino,
+					new_breq);
+
+		new_breq->hdr.icount -= new_breq->hdr.ocount;
+		start_ino = new_breq->hdr.ino;
+	} while (new_breq->hdr.icount > 0 && new_breq->hdr.ino < limit_ino);
+
+	free(new_breq);
+	return seen_mask;
+}
+
 #define cmp_int(l, r)		((l > r) - (l < r))
 
 /* Compare two bulkstat records by inumber. */
@@ -200,6 +292,12 @@ bulkstat_single_step(
 				sizeof(struct xfs_bulkstat), compare_bstat);
 }
 
+/* Return the inumber of the highest allocated inode in the inumbers data. */
+static inline uint64_t last_allocmask_ino(const struct xfs_inumbers *i)
+{
+	return i->xi_startino + xfrog_highbit64(i->xi_allocmask);
+}
+
 /*
  * Run bulkstat on an entire inode allocation group, then check that we got
  * exactly the inodes we expected.  If not, load them one at a time (or fake
@@ -229,6 +327,16 @@ bulkstat_for_inumbers(
 					inumbers->xi_startino, breq);
 	}
 
+	/*
+	 * If the last allocated inode as reported by inumbers is higher than
+	 * the last inode reported by bulkstat, two things could have happened.
+	 * Either all the inodes at the high end of the cluster were freed
+	 * since the inumbers call; or bulkstat encountered a corrupt inode and
+	 * returned early.  Try to bulkstat the rest of the array.
+	 */
+	if (last_allocmask_ino(inumbers) > last_bstat_ino(breq))
+		seen_mask |= bulkstat_the_rest(ctx, inumbers, breq, error);
+
 	/*
 	 * Bulkstat might return inodes beyond xi_startino + CHUNKSIZE.  Reduce
 	 * ocount to ignore inodes not described by the inumbers record.
@@ -241,7 +349,7 @@ bulkstat_for_inumbers(
 
 	/*
 	 * Fill in any missing inodes that are mentioned in the alloc mask but
-	 * weren't previously seen by bulkstat.
+	 * weren't previously seen by bulkstat.  These are the corrupt inodes.
 	 */
 	bulkstat_single_step(ctx, inumbers, seen_mask, breq);
 }


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 01/56] xfs: tidy up xfs_iroot_realloc
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
@ 2025-02-06 22:35   ` Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 02/56] xfs: refactor the inode fork memory allocation functions Darrick J. Wong
                     ` (54 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4f13f0a3fc6ad193e4d144a5e001b7b8f1fc4b7f

Tidy up this function a bit before we start refactoring the memory
handling and move the function to the bmbt code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_fork.c |   83 +++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 43 deletions(-)


diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index a71a5e98bf408b..8d7d943311e9a0 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -380,33 +380,32 @@ xfs_iformat_attr_fork(
  */
 void
 xfs_iroot_realloc(
-	xfs_inode_t		*ip,
+	struct xfs_inode	*ip,
 	int			rec_diff,
 	int			whichfork)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	int			cur_max;
-	struct xfs_ifork	*ifp;
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	struct xfs_btree_block	*new_broot;
-	int			new_max;
-	size_t			new_size;
 	char			*np;
 	char			*op;
+	size_t			new_size;
+	short			old_size = ifp->if_broot_bytes;
+	int			cur_max;
+	int			new_max;
 
 	/*
 	 * Handle the degenerate case quietly.
 	 */
-	if (rec_diff == 0) {
+	if (rec_diff == 0)
 		return;
-	}
 
-	ifp = xfs_ifork_ptr(ip, whichfork);
 	if (rec_diff > 0) {
 		/*
 		 * If there wasn't any memory allocated before, just
 		 * allocate it now and get out.
 		 */
-		if (ifp->if_broot_bytes == 0) {
+		if (old_size == 0) {
 			new_size = xfs_bmap_broot_space_calc(mp, rec_diff);
 			ifp->if_broot = kmalloc(new_size,
 						GFP_KERNEL | __GFP_NOFAIL);
@@ -420,13 +419,13 @@ xfs_iroot_realloc(
 		 * location.  The records don't change location because
 		 * they are kept butted up against the btree block header.
 		 */
-		cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false);
+		cur_max = xfs_bmbt_maxrecs(mp, old_size, false);
 		new_max = cur_max + rec_diff;
 		new_size = xfs_bmap_broot_space_calc(mp, new_max);
 		ifp->if_broot = krealloc(ifp->if_broot, new_size,
 					 GFP_KERNEL | __GFP_NOFAIL);
 		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     ifp->if_broot_bytes);
+						     old_size);
 		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     (int)new_size);
 		ifp->if_broot_bytes = (int)new_size;
@@ -441,52 +440,50 @@ xfs_iroot_realloc(
 	 * if_broot buffer.  It must already exist.  If we go to zero
 	 * records, just get rid of the root and clear the status bit.
 	 */
-	ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0));
-	cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false);
+	ASSERT(ifp->if_broot != NULL && old_size > 0);
+	cur_max = xfs_bmbt_maxrecs(mp, old_size, false);
 	new_max = cur_max + rec_diff;
 	ASSERT(new_max >= 0);
 	if (new_max > 0)
 		new_size = xfs_bmap_broot_space_calc(mp, new_max);
 	else
 		new_size = 0;
-	if (new_size > 0) {
-		new_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
-		/*
-		 * First copy over the btree block header.
-		 */
-		memcpy(new_broot, ifp->if_broot,
-			xfs_bmbt_block_len(ip->i_mount));
-	} else {
-		new_broot = NULL;
+	if (new_size == 0) {
+		ifp->if_broot = NULL;
+		ifp->if_broot_bytes = 0;
+		return;
 	}
 
 	/*
-	 * Only copy the keys and pointers if there are any.
+	 * Shrink the btree root by allocating a smaller object and copying the
+	 * fields from the old object to the new object.  krealloc does nothing
+	 * if we realloc downwards.
 	 */
-	if (new_max > 0) {
-		/*
-		 * First copy the keys.
-		 */
-		op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1);
-		np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1);
-		memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
+	new_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
+	/*
+	 * First copy over the btree block header.
+	 */
+	memcpy(new_broot, ifp->if_broot, xfs_bmbt_block_len(ip->i_mount));
+
+	/*
+	 * First copy the keys.
+	 */
+	op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1);
+	np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1);
+	memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
+
+	/*
+	 * Then copy the pointers.
+	 */
+	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
+	np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1, (int)new_size);
+	memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
 
-		/*
-		 * Then copy the pointers.
-		 */
-		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     ifp->if_broot_bytes);
-		np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1,
-						     (int)new_size);
-		memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
-	}
 	kfree(ifp->if_broot);
 	ifp->if_broot = new_broot;
 	ifp->if_broot_bytes = (int)new_size;
-	if (ifp->if_broot)
-		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
-			xfs_inode_fork_size(ip, whichfork));
-	return;
+	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
+	       xfs_inode_fork_size(ip, whichfork));
 }
 
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 02/56] xfs: refactor the inode fork memory allocation functions
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 01/56] xfs: tidy up xfs_iroot_realloc Darrick J. Wong
@ 2025-02-06 22:35   ` Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 03/56] xfs: make xfs_iroot_realloc take the new numrecs instead of deltas Darrick J. Wong
                     ` (53 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6c1c55ac3c0512262817a088e805d99aad4c0867

Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function.  Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_fork.c |  116 +++++++++++++++++++++++++++++++----------------
 libxfs/xfs_inode_fork.h |    5 ++
 2 files changed, 82 insertions(+), 39 deletions(-)


diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 8d7d943311e9a0..8c779c46d8217b 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -176,7 +176,7 @@ xfs_iformat_btree(
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_bmdr_block_t	*dfp;
 	struct xfs_ifork	*ifp;
-	/* REFERENCED */
+	struct xfs_btree_block	*broot;
 	int			nrecs;
 	int			size;
 	int			level;
@@ -209,16 +209,13 @@ xfs_iformat_btree(
 		return -EFSCORRUPTED;
 	}
 
-	ifp->if_broot_bytes = size;
-	ifp->if_broot = kmalloc(size,
-				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
-	ASSERT(ifp->if_broot != NULL);
+	broot = xfs_broot_alloc(ifp, size);
 	/*
 	 * Copy and convert from the on-disk structure
 	 * to the in-memory structure.
 	 */
 	xfs_bmdr_to_bmbt(ip, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, whichfork),
-			 ifp->if_broot, size);
+			 broot, size);
 
 	ifp->if_bytes = 0;
 	ifp->if_data = NULL;
@@ -360,6 +357,69 @@ xfs_iformat_attr_fork(
 	return error;
 }
 
+/*
+ * Allocate the if_broot component of an inode fork so that it is @new_size
+ * bytes in size, using __GFP_NOLOCKDEP like all the other code that
+ * initializes a broot during inode load.  Returns if_broot.
+ */
+struct xfs_btree_block *
+xfs_broot_alloc(
+	struct xfs_ifork	*ifp,
+	size_t			new_size)
+{
+	ASSERT(ifp->if_broot == NULL);
+
+	ifp->if_broot = kmalloc(new_size,
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
+	ifp->if_broot_bytes = new_size;
+	return ifp->if_broot;
+}
+
+/*
+ * Reallocate the if_broot component of an inode fork so that it is @new_size
+ * bytes in size.  Returns if_broot.
+ */
+struct xfs_btree_block *
+xfs_broot_realloc(
+	struct xfs_ifork	*ifp,
+	size_t			new_size)
+{
+	/* No size change?  No action needed. */
+	if (new_size == ifp->if_broot_bytes)
+		return ifp->if_broot;
+
+	/* New size is zero, free it. */
+	if (new_size == 0) {
+		ifp->if_broot_bytes = 0;
+		kfree(ifp->if_broot);
+		ifp->if_broot = NULL;
+		return NULL;
+	}
+
+	/*
+	 * Shrinking the iroot means we allocate a new smaller object and copy
+	 * it.  We don't trust krealloc not to nop on realloc-down.
+	 */
+	if (ifp->if_broot_bytes > 0 && ifp->if_broot_bytes > new_size) {
+		struct xfs_btree_block	*old_broot = ifp->if_broot;
+
+		ifp->if_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
+		ifp->if_broot_bytes = new_size;
+		memcpy(ifp->if_broot, old_broot, new_size);
+		kfree(old_broot);
+		return ifp->if_broot;
+	}
+
+	/*
+	 * Growing the iroot means we can krealloc.  This may get us the same
+	 * object.
+	 */
+	ifp->if_broot = krealloc(ifp->if_broot, new_size,
+			GFP_KERNEL | __GFP_NOFAIL);
+	ifp->if_broot_bytes = new_size;
+	return ifp->if_broot;
+}
+
 /*
  * Reallocate the space for if_broot based on the number of records
  * being added or deleted as indicated in rec_diff.  Move the records
@@ -386,7 +446,6 @@ xfs_iroot_realloc(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
-	struct xfs_btree_block	*new_broot;
 	char			*np;
 	char			*op;
 	size_t			new_size;
@@ -407,9 +466,7 @@ xfs_iroot_realloc(
 		 */
 		if (old_size == 0) {
 			new_size = xfs_bmap_broot_space_calc(mp, rec_diff);
-			ifp->if_broot = kmalloc(new_size,
-						GFP_KERNEL | __GFP_NOFAIL);
-			ifp->if_broot_bytes = (int)new_size;
+			xfs_broot_realloc(ifp, new_size);
 			return;
 		}
 
@@ -422,13 +479,12 @@ xfs_iroot_realloc(
 		cur_max = xfs_bmbt_maxrecs(mp, old_size, false);
 		new_max = cur_max + rec_diff;
 		new_size = xfs_bmap_broot_space_calc(mp, new_max);
-		ifp->if_broot = krealloc(ifp->if_broot, new_size,
-					 GFP_KERNEL | __GFP_NOFAIL);
+
+		xfs_broot_realloc(ifp, new_size);
 		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     old_size);
 		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     (int)new_size);
-		ifp->if_broot_bytes = (int)new_size;
 		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 			xfs_inode_fork_size(ip, whichfork));
 		memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t));
@@ -449,39 +505,21 @@ xfs_iroot_realloc(
 	else
 		new_size = 0;
 	if (new_size == 0) {
-		ifp->if_broot = NULL;
-		ifp->if_broot_bytes = 0;
+		xfs_broot_realloc(ifp, 0);
 		return;
 	}
 
 	/*
-	 * Shrink the btree root by allocating a smaller object and copying the
-	 * fields from the old object to the new object.  krealloc does nothing
-	 * if we realloc downwards.
-	 */
-	new_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
-	/*
-	 * First copy over the btree block header.
-	 */
-	memcpy(new_broot, ifp->if_broot, xfs_bmbt_block_len(ip->i_mount));
-
-	/*
-	 * First copy the keys.
-	 */
-	op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1);
-	np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1);
-	memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t));
-
-	/*
-	 * Then copy the pointers.
+	 * Shrink the btree root by moving the bmbt pointers, since they are
+	 * not butted up against the btree block header, then reallocating
+	 * broot.
 	 */
 	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
-	np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1, (int)new_size);
-	memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
+	np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
+					     (int)new_size);
+	memmove(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
 
-	kfree(ifp->if_broot);
-	ifp->if_broot = new_broot;
-	ifp->if_broot_bytes = (int)new_size;
+	xfs_broot_realloc(ifp, new_size);
 	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 	       xfs_inode_fork_size(ip, whichfork));
 }
diff --git a/libxfs/xfs_inode_fork.h b/libxfs/xfs_inode_fork.h
index 2373d12fd474f0..e3c5c9121044fd 100644
--- a/libxfs/xfs_inode_fork.h
+++ b/libxfs/xfs_inode_fork.h
@@ -170,6 +170,11 @@ void		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
 void		xfs_idestroy_fork(struct xfs_ifork *ifp);
 void *		xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff,
 				int whichfork);
+struct xfs_btree_block *xfs_broot_alloc(struct xfs_ifork *ifp,
+				size_t new_size);
+struct xfs_btree_block *xfs_broot_realloc(struct xfs_ifork *ifp,
+				size_t new_size);
+
 void		xfs_iroot_realloc(struct xfs_inode *, int, int);
 int		xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
 int		xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 03/56] xfs: make xfs_iroot_realloc take the new numrecs instead of deltas
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 01/56] xfs: tidy up xfs_iroot_realloc Darrick J. Wong
  2025-02-06 22:35   ` [PATCH 02/56] xfs: refactor the inode fork memory allocation functions Darrick J. Wong
@ 2025-02-06 22:35   ` Darrick J. Wong
  2025-02-06 22:36   ` [PATCH 04/56] xfs: make xfs_iroot_realloc a bmap btree function Darrick J. Wong
                     ` (52 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6a92924275ecdd768c8105f8975b971300c5ba7d

Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number.  This will make the callsites easier to understand.

Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported.  This will be addressed in
a subsequent patch.

Return the new btree root to reduce the amount of code clutter.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c       |    7 ++--
 libxfs/xfs_btree.c      |   25 +++++---------
 libxfs/xfs_inode_fork.c |   83 +++++++++++++++++++++--------------------------
 libxfs/xfs_inode_fork.h |    3 +-
 4 files changed, 51 insertions(+), 67 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 16ee7403a75f3b..2482f56dbb6ad6 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -609,7 +609,7 @@ xfs_bmap_btree_to_extents(
 	xfs_trans_binval(tp, cbp);
 	if (cur->bc_levels[0].bp == cbp)
 		cur->bc_levels[0].bp = NULL;
-	xfs_iroot_realloc(ip, -1, whichfork);
+	xfs_iroot_realloc(ip, whichfork, 0);
 	ASSERT(ifp->if_broot == NULL);
 	ifp->if_format = XFS_DINODE_FMT_EXTENTS;
 	*logflagsp |= XFS_ILOG_CORE | xfs_ilog_fext(whichfork);
@@ -653,12 +653,11 @@ xfs_bmap_extents_to_btree(
 	 * Make space in the inode incore. This needs to be undone if we fail
 	 * to expand the root.
 	 */
-	xfs_iroot_realloc(ip, 1, whichfork);
+	block = xfs_iroot_realloc(ip, whichfork, 1);
 
 	/*
 	 * Fill in the root.
 	 */
-	block = ifp->if_broot;
 	xfs_bmbt_init_block(ip, block, NULL, 1, 1);
 	/*
 	 * Need a cursor.  Can't allocate until bb_level is filled in.
@@ -740,7 +739,7 @@ xfs_bmap_extents_to_btree(
 out_unreserve_dquot:
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 out_root_realloc:
-	xfs_iroot_realloc(ip, -1, whichfork);
+	xfs_iroot_realloc(ip, whichfork, 0);
 	ifp->if_format = XFS_DINODE_FMT_EXTENTS;
 	ASSERT(ifp->if_broot == NULL);
 	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index f4c4db62e2069e..85f4f7b3e9813d 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -3159,9 +3159,7 @@ xfs_btree_new_iroot(
 
 	xfs_btree_copy_ptrs(cur, pp, &nptr, 1);
 
-	xfs_iroot_realloc(cur->bc_ino.ip,
-			  1 - xfs_btree_get_numrecs(cblock),
-			  cur->bc_ino.whichfork);
+	xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork, 1);
 
 	xfs_btree_setbuf(cur, level, cbp);
 
@@ -3345,7 +3343,8 @@ xfs_btree_make_block_unfull(
 
 		if (numrecs < cur->bc_ops->get_dmaxrecs(cur, level)) {
 			/* A root block that can be made bigger. */
-			xfs_iroot_realloc(ip, 1, cur->bc_ino.whichfork);
+			xfs_iroot_realloc(ip, cur->bc_ino.whichfork,
+					numrecs + 1);
 			*stat = 1;
 		} else {
 			/* A root block that needs replacing */
@@ -3703,9 +3702,7 @@ STATIC int
 xfs_btree_kill_iroot(
 	struct xfs_btree_cur	*cur)
 {
-	int			whichfork = cur->bc_ino.whichfork;
 	struct xfs_inode	*ip = cur->bc_ino.ip;
-	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	struct xfs_btree_block	*block;
 	struct xfs_btree_block	*cblock;
 	union xfs_btree_key	*kp;
@@ -3714,7 +3711,6 @@ xfs_btree_kill_iroot(
 	union xfs_btree_ptr	*cpp;
 	struct xfs_buf		*cbp;
 	int			level;
-	int			index;
 	int			numrecs;
 	int			error;
 #ifdef DEBUG
@@ -3760,14 +3756,10 @@ xfs_btree_kill_iroot(
 	ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
 #endif
 
-	index = numrecs - cur->bc_ops->get_maxrecs(cur, level);
-	if (index) {
-		xfs_iroot_realloc(cur->bc_ino.ip, index,
-				  cur->bc_ino.whichfork);
-		block = ifp->if_broot;
-	}
+	block = xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork,
+			numrecs);
 
-	be16_add_cpu(&block->bb_numrecs, index);
+	block->bb_numrecs = be16_to_cpu(numrecs);
 	ASSERT(block->bb_numrecs == cblock->bb_numrecs);
 
 	kp = xfs_btree_key_addr(cur, 1, block);
@@ -3947,10 +3939,11 @@ xfs_btree_delrec(
 	/*
 	 * We're at the root level.  First, shrink the root block in-memory.
 	 * Try to get rid of the next level down.  If we can't then there's
-	 * nothing left to do.
+	 * nothing left to do.  numrecs was decremented above.
 	 */
 	if (xfs_btree_at_iroot(cur, level)) {
-		xfs_iroot_realloc(cur->bc_ino.ip, -1, cur->bc_ino.whichfork);
+		xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork,
+				numrecs);
 
 		error = xfs_btree_kill_iroot(cur);
 		if (error)
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 8c779c46d8217b..baebcbc26eb29b 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -421,12 +421,10 @@ xfs_broot_realloc(
 }
 
 /*
- * Reallocate the space for if_broot based on the number of records
- * being added or deleted as indicated in rec_diff.  Move the records
- * and pointers in if_broot to fit the new size.  When shrinking this
- * will eliminate holes between the records and pointers created by
- * the caller.  When growing this will create holes to be filled in
- * by the caller.
+ * Reallocate the space for if_broot based on the number of records.  Move the
+ * records and pointers in if_broot to fit the new size.  When shrinking this
+ * will eliminate holes between the records and pointers created by the caller.
+ * When growing this will create holes to be filled in by the caller.
  *
  * The caller must not request to add more records than would fit in
  * the on-disk inode root.  If the if_broot is currently NULL, then
@@ -435,40 +433,47 @@ xfs_broot_realloc(
  * it can go to zero.
  *
  * ip -- the inode whose if_broot area is changing
- * ext_diff -- the change in the number of records, positive or negative,
- *	 requested for the if_broot array.
+ * whichfork -- which inode fork to change
+ * new_numrecs -- the new number of records requested for the if_broot array
+ *
+ * Returns the incore btree root block.
  */
-void
+struct xfs_btree_block *
 xfs_iroot_realloc(
 	struct xfs_inode	*ip,
-	int			rec_diff,
-	int			whichfork)
+	int			whichfork,
+	unsigned int		new_numrecs)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	char			*np;
 	char			*op;
-	size_t			new_size;
-	short			old_size = ifp->if_broot_bytes;
-	int			cur_max;
-	int			new_max;
+	unsigned int		new_size;
+	unsigned int		old_size = ifp->if_broot_bytes;
 
 	/*
-	 * Handle the degenerate case quietly.
+	 * Block mapping btrees do not support storing zero records; if this
+	 * happens, the fork is being changed to FMT_EXTENTS.  Free the broot
+	 * and get out.
 	 */
-	if (rec_diff == 0)
-		return;
+	if (new_numrecs == 0)
+		return xfs_broot_realloc(ifp, 0);
+
+	new_size = xfs_bmap_broot_space_calc(mp, new_numrecs);
+
+	/* Handle the nop case quietly. */
+	if (new_size == old_size)
+		return ifp->if_broot;
+
+	if (new_size > old_size) {
+		unsigned int	old_numrecs;
 
-	if (rec_diff > 0) {
 		/*
 		 * If there wasn't any memory allocated before, just
 		 * allocate it now and get out.
 		 */
-		if (old_size == 0) {
-			new_size = xfs_bmap_broot_space_calc(mp, rec_diff);
-			xfs_broot_realloc(ifp, new_size);
-			return;
-		}
+		if (old_size == 0)
+			return xfs_broot_realloc(ifp, new_size);
 
 		/*
 		 * If there is already an existing if_broot, then we need
@@ -476,10 +481,7 @@ xfs_iroot_realloc(
 		 * location.  The records don't change location because
 		 * they are kept butted up against the btree block header.
 		 */
-		cur_max = xfs_bmbt_maxrecs(mp, old_size, false);
-		new_max = cur_max + rec_diff;
-		new_size = xfs_bmap_broot_space_calc(mp, new_max);
-
+		old_numrecs = xfs_bmbt_maxrecs(mp, old_size, false);
 		xfs_broot_realloc(ifp, new_size);
 		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 						     old_size);
@@ -487,27 +489,15 @@ xfs_iroot_realloc(
 						     (int)new_size);
 		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 			xfs_inode_fork_size(ip, whichfork));
-		memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t));
-		return;
+		memmove(np, op, old_numrecs * (uint)sizeof(xfs_fsblock_t));
+		return ifp->if_broot;
 	}
 
 	/*
-	 * rec_diff is less than 0.  In this case, we are shrinking the
-	 * if_broot buffer.  It must already exist.  If we go to zero
-	 * records, just get rid of the root and clear the status bit.
+	 * We're reducing, but not totally eliminating, numrecs.  In this case,
+	 * we are shrinking the if_broot buffer, so it must already exist.
 	 */
-	ASSERT(ifp->if_broot != NULL && old_size > 0);
-	cur_max = xfs_bmbt_maxrecs(mp, old_size, false);
-	new_max = cur_max + rec_diff;
-	ASSERT(new_max >= 0);
-	if (new_max > 0)
-		new_size = xfs_bmap_broot_space_calc(mp, new_max);
-	else
-		new_size = 0;
-	if (new_size == 0) {
-		xfs_broot_realloc(ifp, 0);
-		return;
-	}
+	ASSERT(ifp->if_broot != NULL && old_size > 0 && new_size > 0);
 
 	/*
 	 * Shrink the btree root by moving the bmbt pointers, since they are
@@ -517,11 +507,12 @@ xfs_iroot_realloc(
 	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
 	np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
 					     (int)new_size);
-	memmove(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
+	memmove(np, op, new_numrecs * (uint)sizeof(xfs_fsblock_t));
 
 	xfs_broot_realloc(ifp, new_size);
 	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
 	       xfs_inode_fork_size(ip, whichfork));
+	return ifp->if_broot;
 }
 
 
diff --git a/libxfs/xfs_inode_fork.h b/libxfs/xfs_inode_fork.h
index e3c5c9121044fd..d05eb0bad864e1 100644
--- a/libxfs/xfs_inode_fork.h
+++ b/libxfs/xfs_inode_fork.h
@@ -175,7 +175,8 @@ struct xfs_btree_block *xfs_broot_alloc(struct xfs_ifork *ifp,
 struct xfs_btree_block *xfs_broot_realloc(struct xfs_ifork *ifp,
 				size_t new_size);
 
-void		xfs_iroot_realloc(struct xfs_inode *, int, int);
+struct xfs_btree_block *xfs_iroot_realloc(struct xfs_inode *ip, int whichfork,
+				unsigned int new_numrecs);
 int		xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
 int		xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *,
 				  int);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 04/56] xfs: make xfs_iroot_realloc a bmap btree function
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-02-06 22:35   ` [PATCH 03/56] xfs: make xfs_iroot_realloc take the new numrecs instead of deltas Darrick J. Wong
@ 2025-02-06 22:36   ` Darrick J. Wong
  2025-02-06 22:36   ` [PATCH 05/56] xfs: tidy up xfs_bmap_broot_realloc a bit Darrick J. Wong
                     ` (51 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: eb9bff22311ca47ef4848bbdcf24dae06ae3f243

Move the inode fork btree root reallocation function part of the btree
ops because it's now mostly bmbt-specific code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c       |    6 +--
 libxfs/xfs_bmap_btree.c |  104 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_bmap_btree.h |    3 +
 libxfs/xfs_btree.c      |   11 ++---
 libxfs/xfs_btree.h      |   16 +++++++
 libxfs/xfs_inode_fork.c |   96 -------------------------------------------
 libxfs/xfs_inode_fork.h |    2 -
 7 files changed, 130 insertions(+), 108 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 2482f56dbb6ad6..fcb400bc768c2f 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -609,7 +609,7 @@ xfs_bmap_btree_to_extents(
 	xfs_trans_binval(tp, cbp);
 	if (cur->bc_levels[0].bp == cbp)
 		cur->bc_levels[0].bp = NULL;
-	xfs_iroot_realloc(ip, whichfork, 0);
+	xfs_bmap_broot_realloc(ip, whichfork, 0);
 	ASSERT(ifp->if_broot == NULL);
 	ifp->if_format = XFS_DINODE_FMT_EXTENTS;
 	*logflagsp |= XFS_ILOG_CORE | xfs_ilog_fext(whichfork);
@@ -653,7 +653,7 @@ xfs_bmap_extents_to_btree(
 	 * Make space in the inode incore. This needs to be undone if we fail
 	 * to expand the root.
 	 */
-	block = xfs_iroot_realloc(ip, whichfork, 1);
+	block = xfs_bmap_broot_realloc(ip, whichfork, 1);
 
 	/*
 	 * Fill in the root.
@@ -739,7 +739,7 @@ xfs_bmap_extents_to_btree(
 out_unreserve_dquot:
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 out_root_realloc:
-	xfs_iroot_realloc(ip, whichfork, 0);
+	xfs_bmap_broot_realloc(ip, whichfork, 0);
 	ifp->if_format = XFS_DINODE_FMT_EXTENTS;
 	ASSERT(ifp->if_broot == NULL);
 	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index 62e79d8fc49784..baae9ab91fa908 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -515,6 +515,109 @@ xfs_bmbt_keys_contiguous(
 				 be64_to_cpu(key2->bmbt.br_startoff));
 }
 
+/*
+ * Reallocate the space for if_broot based on the number of records.  Move the
+ * records and pointers in if_broot to fit the new size.  When shrinking this
+ * will eliminate holes between the records and pointers created by the caller.
+ * When growing this will create holes to be filled in by the caller.
+ *
+ * The caller must not request to add more records than would fit in the
+ * on-disk inode root.  If the if_broot is currently NULL, then if we are
+ * adding records, one will be allocated.  The caller must also not request
+ * that the number of records go below zero, although it can go to zero.
+ *
+ * ip -- the inode whose if_broot area is changing
+ * whichfork -- which inode fork to change
+ * new_numrecs -- the new number of records requested for the if_broot array
+ *
+ * Returns the incore btree root block.
+ */
+struct xfs_btree_block *
+xfs_bmap_broot_realloc(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	unsigned int		new_numrecs)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
+	char			*np;
+	char			*op;
+	unsigned int		new_size;
+	unsigned int		old_size = ifp->if_broot_bytes;
+
+	/*
+	 * Block mapping btrees do not support storing zero records; if this
+	 * happens, the fork is being changed to FMT_EXTENTS.  Free the broot
+	 * and get out.
+	 */
+	if (new_numrecs == 0)
+		return xfs_broot_realloc(ifp, 0);
+
+	new_size = xfs_bmap_broot_space_calc(mp, new_numrecs);
+
+	/* Handle the nop case quietly. */
+	if (new_size == old_size)
+		return ifp->if_broot;
+
+	if (new_size > old_size) {
+		unsigned int	old_numrecs;
+
+		/*
+		 * If there wasn't any memory allocated before, just
+		 * allocate it now and get out.
+		 */
+		if (old_size == 0)
+			return xfs_broot_realloc(ifp, new_size);
+
+		/*
+		 * If there is already an existing if_broot, then we need
+		 * to realloc() it and shift the pointers to their new
+		 * location.  The records don't change location because
+		 * they are kept butted up against the btree block header.
+		 */
+		old_numrecs = xfs_bmbt_maxrecs(mp, old_size, false);
+		xfs_broot_realloc(ifp, new_size);
+		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
+						     old_size);
+		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
+						     (int)new_size);
+		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
+			xfs_inode_fork_size(ip, whichfork));
+		memmove(np, op, old_numrecs * (uint)sizeof(xfs_fsblock_t));
+		return ifp->if_broot;
+	}
+
+	/*
+	 * We're reducing, but not totally eliminating, numrecs.  In this case,
+	 * we are shrinking the if_broot buffer, so it must already exist.
+	 */
+	ASSERT(ifp->if_broot != NULL && old_size > 0 && new_size > 0);
+
+	/*
+	 * Shrink the btree root by moving the bmbt pointers, since they are
+	 * not butted up against the btree block header, then reallocating
+	 * broot.
+	 */
+	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
+	np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
+					     (int)new_size);
+	memmove(np, op, new_numrecs * (uint)sizeof(xfs_fsblock_t));
+
+	xfs_broot_realloc(ifp, new_size);
+	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
+	       xfs_inode_fork_size(ip, whichfork));
+	return ifp->if_broot;
+}
+
+static struct xfs_btree_block *
+xfs_bmbt_broot_realloc(
+	struct xfs_btree_cur	*cur,
+	unsigned int		new_numrecs)
+{
+	return xfs_bmap_broot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork,
+			new_numrecs);
+}
+
 const struct xfs_btree_ops xfs_bmbt_ops = {
 	.name			= "bmap",
 	.type			= XFS_BTREE_TYPE_INODE,
@@ -542,6 +645,7 @@ const struct xfs_btree_ops xfs_bmbt_ops = {
 	.keys_inorder		= xfs_bmbt_keys_inorder,
 	.recs_inorder		= xfs_bmbt_recs_inorder,
 	.keys_contiguous	= xfs_bmbt_keys_contiguous,
+	.broot_realloc		= xfs_bmbt_broot_realloc,
 };
 
 /*
diff --git a/libxfs/xfs_bmap_btree.h b/libxfs/xfs_bmap_btree.h
index 49a3bae3f6ecec..b238d559ab0369 100644
--- a/libxfs/xfs_bmap_btree.h
+++ b/libxfs/xfs_bmap_btree.h
@@ -198,4 +198,7 @@ xfs_bmap_bmdr_space(struct xfs_btree_block *bb)
 	return xfs_bmdr_space_calc(be16_to_cpu(bb->bb_numrecs));
 }
 
+struct xfs_btree_block *xfs_bmap_broot_realloc(struct xfs_inode *ip,
+		int whichfork, unsigned int new_numrecs);
+
 #endif	/* __XFS_BMAP_BTREE_H__ */
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 85f4f7b3e9813d..3d2bedc79fc270 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -3159,7 +3159,7 @@ xfs_btree_new_iroot(
 
 	xfs_btree_copy_ptrs(cur, pp, &nptr, 1);
 
-	xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork, 1);
+	cur->bc_ops->broot_realloc(cur, 1);
 
 	xfs_btree_setbuf(cur, level, cbp);
 
@@ -3343,8 +3343,7 @@ xfs_btree_make_block_unfull(
 
 		if (numrecs < cur->bc_ops->get_dmaxrecs(cur, level)) {
 			/* A root block that can be made bigger. */
-			xfs_iroot_realloc(ip, cur->bc_ino.whichfork,
-					numrecs + 1);
+			cur->bc_ops->broot_realloc(cur, numrecs + 1);
 			*stat = 1;
 		} else {
 			/* A root block that needs replacing */
@@ -3756,8 +3755,7 @@ xfs_btree_kill_iroot(
 	ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
 #endif
 
-	block = xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork,
-			numrecs);
+	block = cur->bc_ops->broot_realloc(cur, numrecs);
 
 	block->bb_numrecs = be16_to_cpu(numrecs);
 	ASSERT(block->bb_numrecs == cblock->bb_numrecs);
@@ -3942,8 +3940,7 @@ xfs_btree_delrec(
 	 * nothing left to do.  numrecs was decremented above.
 	 */
 	if (xfs_btree_at_iroot(cur, level)) {
-		xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork,
-				numrecs);
+		cur->bc_ops->broot_realloc(cur, numrecs);
 
 		error = xfs_btree_kill_iroot(cur);
 		if (error)
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index c5bff273cae255..8380ae0a64dd5e 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -213,6 +213,22 @@ struct xfs_btree_ops {
 			       const union xfs_btree_key *key1,
 			       const union xfs_btree_key *key2,
 			       const union xfs_btree_key *mask);
+
+	/*
+	 * Reallocate the space for if_broot to fit the number of records.
+	 * Move the records and pointers in if_broot to fit the new size.  When
+	 * shrinking this will eliminate holes between the records and pointers
+	 * created by the caller.  When growing this will create holes to be
+	 * filled in by the caller.
+	 *
+	 * The caller must not request to add more records than would fit in
+	 * the on-disk inode root.  If the if_broot is currently NULL, then if
+	 * we are adding records, one will be allocated.  The caller must also
+	 * not request that the number of records go below zero, although it
+	 * can go to zero.
+	 */
+	struct xfs_btree_block *(*broot_realloc)(struct xfs_btree_cur *cur,
+				unsigned int new_numrecs);
 };
 
 /* btree geometry flags */
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index baebcbc26eb29b..d6bbff85ffba8e 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -420,102 +420,6 @@ xfs_broot_realloc(
 	return ifp->if_broot;
 }
 
-/*
- * Reallocate the space for if_broot based on the number of records.  Move the
- * records and pointers in if_broot to fit the new size.  When shrinking this
- * will eliminate holes between the records and pointers created by the caller.
- * When growing this will create holes to be filled in by the caller.
- *
- * The caller must not request to add more records than would fit in
- * the on-disk inode root.  If the if_broot is currently NULL, then
- * if we are adding records, one will be allocated.  The caller must also
- * not request that the number of records go below zero, although
- * it can go to zero.
- *
- * ip -- the inode whose if_broot area is changing
- * whichfork -- which inode fork to change
- * new_numrecs -- the new number of records requested for the if_broot array
- *
- * Returns the incore btree root block.
- */
-struct xfs_btree_block *
-xfs_iroot_realloc(
-	struct xfs_inode	*ip,
-	int			whichfork,
-	unsigned int		new_numrecs)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
-	char			*np;
-	char			*op;
-	unsigned int		new_size;
-	unsigned int		old_size = ifp->if_broot_bytes;
-
-	/*
-	 * Block mapping btrees do not support storing zero records; if this
-	 * happens, the fork is being changed to FMT_EXTENTS.  Free the broot
-	 * and get out.
-	 */
-	if (new_numrecs == 0)
-		return xfs_broot_realloc(ifp, 0);
-
-	new_size = xfs_bmap_broot_space_calc(mp, new_numrecs);
-
-	/* Handle the nop case quietly. */
-	if (new_size == old_size)
-		return ifp->if_broot;
-
-	if (new_size > old_size) {
-		unsigned int	old_numrecs;
-
-		/*
-		 * If there wasn't any memory allocated before, just
-		 * allocate it now and get out.
-		 */
-		if (old_size == 0)
-			return xfs_broot_realloc(ifp, new_size);
-
-		/*
-		 * If there is already an existing if_broot, then we need
-		 * to realloc() it and shift the pointers to their new
-		 * location.  The records don't change location because
-		 * they are kept butted up against the btree block header.
-		 */
-		old_numrecs = xfs_bmbt_maxrecs(mp, old_size, false);
-		xfs_broot_realloc(ifp, new_size);
-		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     old_size);
-		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     (int)new_size);
-		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
-			xfs_inode_fork_size(ip, whichfork));
-		memmove(np, op, old_numrecs * (uint)sizeof(xfs_fsblock_t));
-		return ifp->if_broot;
-	}
-
-	/*
-	 * We're reducing, but not totally eliminating, numrecs.  In this case,
-	 * we are shrinking the if_broot buffer, so it must already exist.
-	 */
-	ASSERT(ifp->if_broot != NULL && old_size > 0 && new_size > 0);
-
-	/*
-	 * Shrink the btree root by moving the bmbt pointers, since they are
-	 * not butted up against the btree block header, then reallocating
-	 * broot.
-	 */
-	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
-	np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-					     (int)new_size);
-	memmove(np, op, new_numrecs * (uint)sizeof(xfs_fsblock_t));
-
-	xfs_broot_realloc(ifp, new_size);
-	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
-	       xfs_inode_fork_size(ip, whichfork));
-	return ifp->if_broot;
-}
-
-
 /*
  * This is called when the amount of space needed for if_data
  * is increased or decreased.  The change in size is indicated by
diff --git a/libxfs/xfs_inode_fork.h b/libxfs/xfs_inode_fork.h
index d05eb0bad864e1..69ed0919d60b12 100644
--- a/libxfs/xfs_inode_fork.h
+++ b/libxfs/xfs_inode_fork.h
@@ -175,8 +175,6 @@ struct xfs_btree_block *xfs_broot_alloc(struct xfs_ifork *ifp,
 struct xfs_btree_block *xfs_broot_realloc(struct xfs_ifork *ifp,
 				size_t new_size);
 
-struct xfs_btree_block *xfs_iroot_realloc(struct xfs_inode *ip, int whichfork,
-				unsigned int new_numrecs);
 int		xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
 int		xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *,
 				  int);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 05/56] xfs: tidy up xfs_bmap_broot_realloc a bit
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-02-06 22:36   ` [PATCH 04/56] xfs: make xfs_iroot_realloc a bmap btree function Darrick J. Wong
@ 2025-02-06 22:36   ` Darrick J. Wong
  2025-02-06 22:36   ` [PATCH 06/56] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong
                     ` (50 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: c914081775e2e39e4afa9b4bb9e5c98202110f51

Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller.  Also
use the correct bmbt pointer type for the sizeof operation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap_btree.c |   43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)


diff --git a/libxfs/xfs_bmap_btree.c b/libxfs/xfs_bmap_btree.c
index baae9ab91fa908..40e0cb3016fc5e 100644
--- a/libxfs/xfs_bmap_btree.c
+++ b/libxfs/xfs_bmap_btree.c
@@ -515,6 +515,22 @@ xfs_bmbt_keys_contiguous(
 				 be64_to_cpu(key2->bmbt.br_startoff));
 }
 
+static inline void
+xfs_bmbt_move_ptrs(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*broot,
+	short			old_size,
+	size_t			new_size,
+	unsigned int		numrecs)
+{
+	void			*dptr;
+	void			*sptr;
+
+	sptr = xfs_bmap_broot_ptr_addr(mp, broot, 1, old_size);
+	dptr = xfs_bmap_broot_ptr_addr(mp, broot, 1, new_size);
+	memmove(dptr, sptr, numrecs * sizeof(xfs_bmbt_ptr_t));
+}
+
 /*
  * Reallocate the space for if_broot based on the number of records.  Move the
  * records and pointers in if_broot to fit the new size.  When shrinking this
@@ -540,8 +556,7 @@ xfs_bmap_broot_realloc(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
-	char			*np;
-	char			*op;
+	struct xfs_btree_block	*broot;
 	unsigned int		new_size;
 	unsigned int		old_size = ifp->if_broot_bytes;
 
@@ -576,15 +591,11 @@ xfs_bmap_broot_realloc(
 		 * they are kept butted up against the btree block header.
 		 */
 		old_numrecs = xfs_bmbt_maxrecs(mp, old_size, false);
-		xfs_broot_realloc(ifp, new_size);
-		op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     old_size);
-		np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-						     (int)new_size);
-		ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
+		broot = xfs_broot_realloc(ifp, new_size);
+		ASSERT(xfs_bmap_bmdr_space(broot) <=
 			xfs_inode_fork_size(ip, whichfork));
-		memmove(np, op, old_numrecs * (uint)sizeof(xfs_fsblock_t));
-		return ifp->if_broot;
+		xfs_bmbt_move_ptrs(mp, broot, old_size, new_size, old_numrecs);
+		return broot;
 	}
 
 	/*
@@ -598,15 +609,11 @@ xfs_bmap_broot_realloc(
 	 * not butted up against the btree block header, then reallocating
 	 * broot.
 	 */
-	op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, old_size);
-	np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1,
-					     (int)new_size);
-	memmove(np, op, new_numrecs * (uint)sizeof(xfs_fsblock_t));
-
-	xfs_broot_realloc(ifp, new_size);
-	ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <=
+	xfs_bmbt_move_ptrs(mp, ifp->if_broot, old_size, new_size, new_numrecs);
+	broot = xfs_broot_realloc(ifp, new_size);
+	ASSERT(xfs_bmap_bmdr_space(broot) <=
 	       xfs_inode_fork_size(ip, whichfork));
-	return ifp->if_broot;
+	return broot;
 }
 
 static struct xfs_btree_block *


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 06/56] xfs: hoist the node iroot update code out of xfs_btree_new_iroot
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-02-06 22:36   ` [PATCH 05/56] xfs: tidy up xfs_bmap_broot_realloc a bit Darrick J. Wong
@ 2025-02-06 22:36   ` Darrick J. Wong
  2025-02-06 22:36   ` [PATCH 07/56] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong
                     ` (49 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7708951ae52132d3c4e05aee2e57d35f0d89bd49

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c |  117 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 76 insertions(+), 41 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 3d2bedc79fc270..87a3ed5fc7517c 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -3076,6 +3076,78 @@ xfs_btree_split(
 #define xfs_btree_split	__xfs_btree_split
 #endif /* __KERNEL__ */
 
+/*
+ * Move the keys and pointers from a root block to a separate block.
+ *
+ * Since the keyptr size does not change, all we have to do is increase the
+ * tree height, copy the keyptrs to the new internal node (cblock), shrink
+ * the root, and copy the pointers there.
+ */
+STATIC int
+xfs_btree_promote_node_iroot(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	int			level,
+	struct xfs_buf		*cbp,
+	union xfs_btree_ptr	*cptr,
+	struct xfs_btree_block	*cblock)
+{
+	union xfs_btree_key	*ckp;
+	union xfs_btree_key	*kp;
+	union xfs_btree_ptr	*cpp;
+	union xfs_btree_ptr	*pp;
+	int			i;
+	int			error;
+	int			numrecs = xfs_btree_get_numrecs(block);
+
+	/*
+	 * Increase tree height, adjusting the root block level to match.
+	 * We cannot change the root btree node size until we've copied the
+	 * block contents to the new child block.
+	 */
+	be16_add_cpu(&block->bb_level, 1);
+	cur->bc_nlevels++;
+	cur->bc_levels[level + 1].ptr = 1;
+
+	/*
+	 * Adjust the root btree record count, then copy the keys from the old
+	 * root to the new child block.
+	 */
+	xfs_btree_set_numrecs(block, 1);
+	kp = xfs_btree_key_addr(cur, 1, block);
+	ckp = xfs_btree_key_addr(cur, 1, cblock);
+	xfs_btree_copy_keys(cur, ckp, kp, numrecs);
+
+	/* Check the pointers and copy them to the new child block. */
+	pp = xfs_btree_ptr_addr(cur, 1, block);
+	cpp = xfs_btree_ptr_addr(cur, 1, cblock);
+	for (i = 0; i < numrecs; i++) {
+		error = xfs_btree_debug_check_ptr(cur, pp, i, level);
+		if (error)
+			return error;
+	}
+	xfs_btree_copy_ptrs(cur, cpp, pp, numrecs);
+
+	/*
+	 * Set the first keyptr to point to the new child block, then shrink
+	 * the memory buffer for the root block.
+	 */
+	error = xfs_btree_debug_check_ptr(cur, cptr, 0, level);
+	if (error)
+		return error;
+	xfs_btree_copy_ptrs(cur, pp, cptr, 1);
+	xfs_btree_get_keys(cur, cblock, kp);
+
+	cur->bc_ops->broot_realloc(cur, 1);
+
+	/* Attach the new block to the cursor and log it. */
+	xfs_btree_setbuf(cur, level, cbp);
+	xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS);
+	xfs_btree_log_keys(cur, cbp, 1, numrecs);
+	xfs_btree_log_ptrs(cur, cbp, 1, numrecs);
+	return 0;
+}
+
 /*
  * Copy the old inode root contents into a real block and make the
  * broot point to it.
@@ -3089,14 +3161,10 @@ xfs_btree_new_iroot(
 	struct xfs_buf		*cbp;		/* buffer for cblock */
 	struct xfs_btree_block	*block;		/* btree block */
 	struct xfs_btree_block	*cblock;	/* child btree block */
-	union xfs_btree_key	*ckp;		/* child key pointer */
-	union xfs_btree_ptr	*cpp;		/* child ptr pointer */
-	union xfs_btree_key	*kp;		/* pointer to btree key */
-	union xfs_btree_ptr	*pp;		/* pointer to block addr */
+	union xfs_btree_ptr	*pp;
 	union xfs_btree_ptr	nptr;		/* new block addr */
 	int			level;		/* btree level */
 	int			error;		/* error return code */
-	int			i;		/* loop counter */
 
 	XFS_BTREE_STATS_INC(cur, newroot);
 
@@ -3134,45 +3202,12 @@ xfs_btree_new_iroot(
 			cblock->bb_u.s.bb_blkno = bno;
 	}
 
-	be16_add_cpu(&block->bb_level, 1);
-	xfs_btree_set_numrecs(block, 1);
-	cur->bc_nlevels++;
-	ASSERT(cur->bc_nlevels <= cur->bc_maxlevels);
-	cur->bc_levels[level + 1].ptr = 1;
-
-	kp = xfs_btree_key_addr(cur, 1, block);
-	ckp = xfs_btree_key_addr(cur, 1, cblock);
-	xfs_btree_copy_keys(cur, ckp, kp, xfs_btree_get_numrecs(cblock));
-
-	cpp = xfs_btree_ptr_addr(cur, 1, cblock);
-	for (i = 0; i < be16_to_cpu(cblock->bb_numrecs); i++) {
-		error = xfs_btree_debug_check_ptr(cur, pp, i, level);
-		if (error)
-			goto error0;
-	}
-
-	xfs_btree_copy_ptrs(cur, cpp, pp, xfs_btree_get_numrecs(cblock));
-
-	error = xfs_btree_debug_check_ptr(cur, &nptr, 0, level);
+	error = xfs_btree_promote_node_iroot(cur, block, level, cbp, &nptr,
+			cblock);
 	if (error)
 		goto error0;
 
-	xfs_btree_copy_ptrs(cur, pp, &nptr, 1);
-
-	cur->bc_ops->broot_realloc(cur, 1);
-
-	xfs_btree_setbuf(cur, level, cbp);
-
-	/*
-	 * Do all this logging at the end so that
-	 * the root is at the right level.
-	 */
-	xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS);
-	xfs_btree_log_keys(cur, cbp, 1, be16_to_cpu(cblock->bb_numrecs));
-	xfs_btree_log_ptrs(cur, cbp, 1, be16_to_cpu(cblock->bb_numrecs));
-
-	*logflags |=
-		XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork);
+	*logflags |= XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork);
 	*stat = 1;
 	return 0;
 error0:


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 07/56] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-02-06 22:36   ` [PATCH 06/56] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong
@ 2025-02-06 22:36   ` Darrick J. Wong
  2025-02-06 22:37   ` [PATCH 08/56] xfs: add some rtgroup inode helpers Darrick J. Wong
                     ` (48 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 505248719fcbf2c76594fe2ef293680d97fe426c

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c |   84 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 27 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 87a3ed5fc7517c..2d1b42d5270db8 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -3724,6 +3724,60 @@ xfs_btree_insert(
 	return error;
 }
 
+/*
+ * Move the keyptrs from a child node block to the root block.
+ *
+ * Since the keyptr size does not change, all we have to do is increase the
+ * tree height, copy the keyptrs to the new internal node (cblock), shrink
+ * the root, and copy the pointers there.
+ */
+STATIC int
+xfs_btree_demote_node_child(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*cblock,
+	int			level,
+	int			numrecs)
+{
+	struct xfs_btree_block	*block;
+	union xfs_btree_key	*ckp;
+	union xfs_btree_key	*kp;
+	union xfs_btree_ptr	*cpp;
+	union xfs_btree_ptr	*pp;
+	int			i;
+	int			error;
+
+	/*
+	 * Adjust the root btree node size and the record count to match the
+	 * doomed child so that we can copy the keyptrs ahead of changing the
+	 * tree shape.
+	 */
+	block = cur->bc_ops->broot_realloc(cur, numrecs);
+
+	xfs_btree_set_numrecs(block, numrecs);
+	ASSERT(block->bb_numrecs == cblock->bb_numrecs);
+
+	/* Copy keys from the doomed block. */
+	kp = xfs_btree_key_addr(cur, 1, block);
+	ckp = xfs_btree_key_addr(cur, 1, cblock);
+	xfs_btree_copy_keys(cur, kp, ckp, numrecs);
+
+	/* Copy pointers from the doomed block. */
+	pp = xfs_btree_ptr_addr(cur, 1, block);
+	cpp = xfs_btree_ptr_addr(cur, 1, cblock);
+	for (i = 0; i < numrecs; i++) {
+		error = xfs_btree_debug_check_ptr(cur, cpp, i, level - 1);
+		if (error)
+			return error;
+	}
+	xfs_btree_copy_ptrs(cur, pp, cpp, numrecs);
+
+	/* Decrease tree height, adjusting the root block level to match. */
+	cur->bc_levels[level - 1].bp = NULL;
+	be16_add_cpu(&block->bb_level, -1);
+	cur->bc_nlevels--;
+	return 0;
+}
+
 /*
  * Try to merge a non-leaf block back into the inode root.
  *
@@ -3739,10 +3793,6 @@ xfs_btree_kill_iroot(
 	struct xfs_inode	*ip = cur->bc_ino.ip;
 	struct xfs_btree_block	*block;
 	struct xfs_btree_block	*cblock;
-	union xfs_btree_key	*kp;
-	union xfs_btree_key	*ckp;
-	union xfs_btree_ptr	*pp;
-	union xfs_btree_ptr	*cpp;
 	struct xfs_buf		*cbp;
 	int			level;
 	int			numrecs;
@@ -3750,7 +3800,6 @@ xfs_btree_kill_iroot(
 #ifdef DEBUG
 	union xfs_btree_ptr	ptr;
 #endif
-	int			i;
 
 	ASSERT(cur->bc_ops->type == XFS_BTREE_TYPE_INODE);
 	ASSERT(cur->bc_nlevels > 1);
@@ -3790,35 +3839,16 @@ xfs_btree_kill_iroot(
 	ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
 #endif
 
-	block = cur->bc_ops->broot_realloc(cur, numrecs);
-
-	block->bb_numrecs = be16_to_cpu(numrecs);
-	ASSERT(block->bb_numrecs == cblock->bb_numrecs);
-
-	kp = xfs_btree_key_addr(cur, 1, block);
-	ckp = xfs_btree_key_addr(cur, 1, cblock);
-	xfs_btree_copy_keys(cur, kp, ckp, numrecs);
-
-	pp = xfs_btree_ptr_addr(cur, 1, block);
-	cpp = xfs_btree_ptr_addr(cur, 1, cblock);
-
-	for (i = 0; i < numrecs; i++) {
-		error = xfs_btree_debug_check_ptr(cur, cpp, i, level - 1);
-		if (error)
-			return error;
-	}
-
-	xfs_btree_copy_ptrs(cur, pp, cpp, numrecs);
+	error = xfs_btree_demote_node_child(cur, cblock, level, numrecs);
+	if (error)
+		return error;
 
 	error = xfs_btree_free_block(cur, cbp);
 	if (error)
 		return error;
 
-	cur->bc_levels[level - 1].bp = NULL;
-	be16_add_cpu(&block->bb_level, -1);
 	xfs_trans_log_inode(cur->bc_tp, ip,
 		XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork));
-	cur->bc_nlevels--;
 out0:
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 08/56] xfs: add some rtgroup inode helpers
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-02-06 22:36   ` [PATCH 07/56] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong
@ 2025-02-06 22:37   ` Darrick J. Wong
  2025-02-06 22:37   ` [PATCH 09/56] xfs: prepare to reuse the dquot pointer space in struct xfs_inode Darrick J. Wong
                     ` (47 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: af32541081ed6b6ad49b1ea38b5128cb319841b0

Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes.  Conversion was done with this spatch and some
minor reformatting:

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_BITMAP]
+ rtg_bitmap(rtg)

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_SUMMARY]
+ rtg_summary(rtg)

and the CLI command:

$ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c |    2 +-
 libxfs/xfs_rtgroup.c  |   18 ++++++++----------
 libxfs/xfs_rtgroup.h  |   10 ++++++++++
 3 files changed, 19 insertions(+), 11 deletions(-)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index b439fb3c20709f..689d5844b8bd09 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1050,7 +1050,7 @@ xfs_rtfree_extent(
 	xfs_rtxlen_t		len)	/* length of extent freed */
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	struct xfs_inode	*rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP];
+	struct xfs_inode	*rbmip = rtg_bitmap(rtg);
 	struct xfs_rtalloc_args	args = {
 		.mp		= mp,
 		.tp		= tp,
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index aaaec2a1cef9e5..e422a7bc41a55e 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -194,10 +194,10 @@ xfs_rtgroup_lock(
 		 * Lock both realtime free space metadata inodes for a freespace
 		 * update.
 		 */
-		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL);
-		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL);
+		xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+		xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL);
 	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
-		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED);
+		xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
 	}
 }
 
@@ -212,10 +212,10 @@ xfs_rtgroup_unlock(
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
 	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
-		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL);
-		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL);
+		xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
+		xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
 	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
-		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED);
+		xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
 	}
 }
 
@@ -233,10 +233,8 @@ xfs_rtgroup_trans_join(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED));
 
 	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
-		xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP],
-				XFS_ILOCK_EXCL);
-		xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_SUMMARY],
-				XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL);
 	}
 }
 
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 2d7822644efff0..2e145ea2de8007 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -64,6 +64,16 @@ static inline xfs_rgnumber_t rtg_rgno(const struct xfs_rtgroup *rtg)
 	return rtg->rtg_group.xg_gno;
 }
 
+static inline struct xfs_inode *rtg_bitmap(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_inodes[XFS_RTGI_BITMAP];
+}
+
+static inline struct xfs_inode *rtg_summary(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_inodes[XFS_RTGI_SUMMARY];
+}
+
 /* Passive rtgroup references */
 static inline struct xfs_rtgroup *
 xfs_rtgroup_get(


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 09/56] xfs: prepare to reuse the dquot pointer space in struct xfs_inode
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-02-06 22:37   ` [PATCH 08/56] xfs: add some rtgroup inode helpers Darrick J. Wong
@ 2025-02-06 22:37   ` Darrick J. Wong
  2025-02-06 22:37   ` [PATCH 10/56] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong
                     ` (46 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 84140a96cf7a5b5b48b862a79c8322aa220ce591

Files participating in the metadata directory tree are not accounted to
the quota subsystem.  Therefore, the i_[ugp]dquot pointers in struct
xfs_inode are never used and should always be NULL.

In the next patch we want to add a u64 count of fs blocks reserved for
metadata btree expansion, but we don't want every inode in the fs to pay
the memory price for this feature.  The intent is to union those three
pointers with the u64 counter, but for that to work we must guard
against all access to the dquot pointers for metadata files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_attr.c |    4 +---
 libxfs/xfs_bmap.c |    4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 7567014abbe7f0..4b985e054ff84c 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -1003,9 +1003,7 @@ xfs_attr_add_fork(
 	unsigned int		blks;		/* space reservation */
 	int			error;		/* error return value */
 
-	if (xfs_is_metadir_inode(ip))
-		ASSERT(XFS_IS_DQDETACHED(ip));
-	else
+	if (!xfs_is_metadir_inode(ip))
 		ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
 	blks = XFS_ADDAFORK_SPACE_RES(mp);
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index fcb400bc768c2f..41cfadb51f4937 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1036,9 +1036,7 @@ xfs_bmap_add_attrfork(
 	int			error;		/* error return value */
 
 	xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
-	if (xfs_is_metadir_inode(ip))
-		ASSERT(XFS_IS_DQDETACHED(ip));
-	else
+	if (!xfs_is_metadir_inode(ip))
 		ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 	ASSERT(!xfs_inode_has_attr_fork(ip));
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 10/56] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-02-06 22:37   ` [PATCH 09/56] xfs: prepare to reuse the dquot pointer space in struct xfs_inode Darrick J. Wong
@ 2025-02-06 22:37   ` Darrick J. Wong
  2025-02-06 22:37   ` [PATCH 11/56] xfs: support storing records in the inode core root Darrick J. Wong
                     ` (45 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 953f76bf7a3622351f335c77c56ed7efb793e3e7

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_refcount.c |    6 ++----
 libxfs/xfs_rmap.c     |   12 +++++-------
 libxfs/xfs_rmap.h     |    8 ++++----
 3 files changed, 11 insertions(+), 15 deletions(-)


diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 709e2a94176da5..11c0ca5ecdb3e4 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1830,8 +1830,7 @@ xfs_refcount_alloc_cow_extent(
 	__xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len);
 
 	/* Add rmap entry */
-	xfs_rmap_alloc_extent(tp, XFS_FSB_TO_AGNO(mp, fsb),
-			XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
+	xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW);
 }
 
 /* Forget a CoW staging event in the refcount btree. */
@@ -1847,8 +1846,7 @@ xfs_refcount_free_cow_extent(
 		return;
 
 	/* Remove rmap entry */
-	xfs_rmap_free_extent(tp, XFS_FSB_TO_AGNO(mp, fsb),
-			XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
+	xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW);
 	__xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len);
 }
 
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index dabc7003586ce4..e23972f934f9ef 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -524,7 +524,7 @@ xfs_rmap_free_check_owner(
 	struct xfs_btree_cur	*cur,
 	uint64_t		ltoff,
 	struct xfs_rmap_irec	*rec,
-	xfs_filblks_t		len,
+	xfs_extlen_t		len,
 	uint64_t		owner,
 	uint64_t		offset,
 	unsigned int		flags)
@@ -2728,8 +2728,7 @@ xfs_rmap_convert_extent(
 void
 xfs_rmap_alloc_extent(
 	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
+	xfs_fsblock_t		fsbno,
 	xfs_extlen_t		len,
 	uint64_t		owner)
 {
@@ -2738,7 +2737,7 @@ xfs_rmap_alloc_extent(
 	if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK))
 		return;
 
-	bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno);
+	bmap.br_startblock = fsbno;
 	bmap.br_blockcount = len;
 	bmap.br_startoff = 0;
 	bmap.br_state = XFS_EXT_NORM;
@@ -2750,8 +2749,7 @@ xfs_rmap_alloc_extent(
 void
 xfs_rmap_free_extent(
 	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
+	xfs_fsblock_t		fsbno,
 	xfs_extlen_t		len,
 	uint64_t		owner)
 {
@@ -2760,7 +2758,7 @@ xfs_rmap_free_extent(
 	if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK))
 		return;
 
-	bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno);
+	bmap.br_startblock = fsbno;
 	bmap.br_blockcount = len;
 	bmap.br_startoff = 0;
 	bmap.br_state = XFS_EXT_NORM;
diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h
index 96b4321d831007..8e2657af038e9e 100644
--- a/libxfs/xfs_rmap.h
+++ b/libxfs/xfs_rmap.h
@@ -184,10 +184,10 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_bmbt_irec *imap);
-void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_agnumber_t agno,
-		xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner);
-void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_agnumber_t agno,
-		xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner);
+void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno,
+		xfs_extlen_t len, uint64_t owner);
+void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno,
+		xfs_extlen_t len, uint64_t owner);
 
 int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri,
 		struct xfs_btree_cur **pcur);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 11/56] xfs: support storing records in the inode core root
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-02-06 22:37   ` [PATCH 10/56] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong
@ 2025-02-06 22:37   ` Darrick J. Wong
  2025-02-06 22:38   ` [PATCH 12/56] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong
                     ` (44 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 2f63b20b7a26c9a7c76ea5a6565ca38cd9e31282

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c         |  138 ++++++++++++++++++++++++++++++++++++++++----
 libxfs/xfs_btree.h         |    2 -
 libxfs/xfs_btree_staging.c |    9 ++-
 3 files changed, 132 insertions(+), 17 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 2d1b42d5270db8..c5a0280a55f3b5 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -1535,12 +1535,16 @@ xfs_btree_log_recs(
 	int			first,
 	int			last)
 {
+	if (!bp) {
+		xfs_trans_log_inode(cur->bc_tp, cur->bc_ino.ip,
+				xfs_ilog_fbroot(cur->bc_ino.whichfork));
+		return;
+	}
 
 	xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLFT_BTREE_BUF);
 	xfs_trans_log_buf(cur->bc_tp, bp,
 			  xfs_btree_rec_offset(cur, first),
 			  xfs_btree_rec_offset(cur, last + 1) - 1);
-
 }
 
 /*
@@ -3076,6 +3080,59 @@ xfs_btree_split(
 #define xfs_btree_split	__xfs_btree_split
 #endif /* __KERNEL__ */
 
+/* Move the records from a root leaf block to a separate block. */
+STATIC void
+xfs_btree_promote_leaf_iroot(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*block,
+	struct xfs_buf		*cbp,
+	union xfs_btree_ptr	*cptr,
+	struct xfs_btree_block	*cblock)
+{
+	union xfs_btree_rec	*rp;
+	union xfs_btree_rec	*crp;
+	union xfs_btree_key	*kp;
+	union xfs_btree_ptr	*pp;
+	struct xfs_btree_block	*broot;
+	int			numrecs = xfs_btree_get_numrecs(block);
+
+	/* Copy the records from the leaf broot into the new child block. */
+	rp = xfs_btree_rec_addr(cur, 1, block);
+	crp = xfs_btree_rec_addr(cur, 1, cblock);
+	xfs_btree_copy_recs(cur, crp, rp, numrecs);
+
+	/*
+	 * Increment the tree height.
+	 *
+	 * Trickery here: The amount of memory that we need per record for the
+	 * ifork's btree root block may change when we convert the broot from a
+	 * leaf to a node block.  Free the existing leaf broot so that nobody
+	 * thinks we need to migrate node pointers when we realloc the broot
+	 * buffer after bumping nlevels.
+	 */
+	cur->bc_ops->broot_realloc(cur, 0);
+	cur->bc_nlevels++;
+	cur->bc_levels[1].ptr = 1;
+
+	/*
+	 * Allocate a new node broot and initialize it to point to the new
+	 * child block.
+	 */
+	broot = cur->bc_ops->broot_realloc(cur, 1);
+	xfs_btree_init_block(cur->bc_mp, broot, cur->bc_ops,
+			cur->bc_nlevels - 1, 1, cur->bc_ino.ip->i_ino);
+
+	pp = xfs_btree_ptr_addr(cur, 1, broot);
+	kp = xfs_btree_key_addr(cur, 1, broot);
+	xfs_btree_copy_ptrs(cur, pp, cptr, 1);
+	xfs_btree_get_keys(cur, cblock, kp);
+
+	/* Attach the new block to the cursor and log it. */
+	xfs_btree_setbuf(cur, 0, cbp);
+	xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS);
+	xfs_btree_log_recs(cur, cbp, 1, numrecs);
+}
+
 /*
  * Move the keys and pointers from a root block to a separate block.
  *
@@ -3161,7 +3218,7 @@ xfs_btree_new_iroot(
 	struct xfs_buf		*cbp;		/* buffer for cblock */
 	struct xfs_btree_block	*block;		/* btree block */
 	struct xfs_btree_block	*cblock;	/* child btree block */
-	union xfs_btree_ptr	*pp;
+	union xfs_btree_ptr	aptr;
 	union xfs_btree_ptr	nptr;		/* new block addr */
 	int			level;		/* btree level */
 	int			error;		/* error return code */
@@ -3173,10 +3230,15 @@ xfs_btree_new_iroot(
 	level = cur->bc_nlevels - 1;
 
 	block = xfs_btree_get_iroot(cur);
-	pp = xfs_btree_ptr_addr(cur, 1, block);
+	ASSERT(level > 0 || (cur->bc_ops->geom_flags & XFS_BTGEO_IROOT_RECORDS));
+	if (level > 0)
+		aptr = *xfs_btree_ptr_addr(cur, 1, block);
+	else
+		aptr.l = cpu_to_be64(XFS_INO_TO_FSB(cur->bc_mp,
+				cur->bc_ino.ip->i_ino));
 
 	/* Allocate the new block. If we can't do it, we're toast. Give up. */
-	error = xfs_btree_alloc_block(cur, pp, &nptr, stat);
+	error = xfs_btree_alloc_block(cur, &aptr, &nptr, stat);
 	if (error)
 		goto error0;
 	if (*stat == 0)
@@ -3202,10 +3264,14 @@ xfs_btree_new_iroot(
 			cblock->bb_u.s.bb_blkno = bno;
 	}
 
-	error = xfs_btree_promote_node_iroot(cur, block, level, cbp, &nptr,
-			cblock);
-	if (error)
-		goto error0;
+	if (level > 0) {
+		error = xfs_btree_promote_node_iroot(cur, block, level, cbp,
+				&nptr, cblock);
+		if (error)
+			goto error0;
+	} else {
+		xfs_btree_promote_leaf_iroot(cur, block, cbp, &nptr, cblock);
+	}
 
 	*logflags |= XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork);
 	*stat = 1;
@@ -3724,6 +3790,43 @@ xfs_btree_insert(
 	return error;
 }
 
+/* Move the records from a child leaf block to the root block. */
+STATIC void
+xfs_btree_demote_leaf_child(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_block	*cblock,
+	int			numrecs)
+{
+	union xfs_btree_rec	*rp;
+	union xfs_btree_rec	*crp;
+	struct xfs_btree_block	*broot;
+
+	/*
+	 * Decrease the tree height.
+	 *
+	 * Trickery here: The amount of memory that we need per record for the
+	 * ifork's btree root block may change when we convert the broot from a
+	 * node to a leaf.  Free the old node broot so that we can get a fresh
+	 * leaf broot.
+	 */
+	cur->bc_ops->broot_realloc(cur, 0);
+	cur->bc_nlevels--;
+
+	/*
+	 * Allocate a new leaf broot and copy the records from the old child.
+	 * Detach the old child from the cursor.
+	 */
+	broot = cur->bc_ops->broot_realloc(cur, numrecs);
+	xfs_btree_init_block(cur->bc_mp, broot, cur->bc_ops, 0, numrecs,
+			cur->bc_ino.ip->i_ino);
+
+	rp = xfs_btree_rec_addr(cur, 1, broot);
+	crp = xfs_btree_rec_addr(cur, 1, cblock);
+	xfs_btree_copy_recs(cur, rp, crp, numrecs);
+
+	cur->bc_levels[0].bp = NULL;
+}
+
 /*
  * Move the keyptrs from a child node block to the root block.
  *
@@ -3802,14 +3905,19 @@ xfs_btree_kill_iroot(
 #endif
 
 	ASSERT(cur->bc_ops->type == XFS_BTREE_TYPE_INODE);
-	ASSERT(cur->bc_nlevels > 1);
+	ASSERT((cur->bc_ops->geom_flags & XFS_BTGEO_IROOT_RECORDS) ||
+	       cur->bc_nlevels > 1);
 
 	/*
 	 * Don't deal with the root block needs to be a leaf case.
 	 * We're just going to turn the thing back into extents anyway.
 	 */
 	level = cur->bc_nlevels - 1;
-	if (level == 1)
+	if (level == 1 && !(cur->bc_ops->geom_flags & XFS_BTGEO_IROOT_RECORDS))
+		goto out0;
+
+	/* If we're already a leaf, jump out. */
+	if (level == 0)
 		goto out0;
 
 	/*
@@ -3839,9 +3947,13 @@ xfs_btree_kill_iroot(
 	ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
 #endif
 
-	error = xfs_btree_demote_node_child(cur, cblock, level, numrecs);
-	if (error)
-		return error;
+	if (level > 1) {
+		error = xfs_btree_demote_node_child(cur, cblock, level,
+				numrecs);
+		if (error)
+			return error;
+	} else
+		xfs_btree_demote_leaf_child(cur, cblock, numrecs);
 
 	error = xfs_btree_free_block(cur, cbp);
 	if (error)
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 8380ae0a64dd5e..3b8c2ccad90847 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -233,7 +233,7 @@ struct xfs_btree_ops {
 
 /* btree geometry flags */
 #define XFS_BTGEO_OVERLAPPING		(1U << 0) /* overlapping intervals */
-
+#define XFS_BTGEO_IROOT_RECORDS		(1U << 1) /* iroot can store records */
 
 union xfs_btree_irec {
 	struct xfs_alloc_rec_incore	a;
diff --git a/libxfs/xfs_btree_staging.c b/libxfs/xfs_btree_staging.c
index 2f5b1d0b685d2a..b3afb4a142a5e0 100644
--- a/libxfs/xfs_btree_staging.c
+++ b/libxfs/xfs_btree_staging.c
@@ -573,6 +573,7 @@ xfs_btree_bload_compute_geometry(
 	struct xfs_btree_bload	*bbl,
 	uint64_t		nr_records)
 {
+	const struct xfs_btree_ops *ops = cur->bc_ops;
 	uint64_t		nr_blocks = 0;
 	uint64_t		nr_this_level;
 
@@ -599,7 +600,7 @@ xfs_btree_bload_compute_geometry(
 		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
 				&avg_per_block, &level_blocks, &dontcare64);
 
-		if (cur->bc_ops->type == XFS_BTREE_TYPE_INODE) {
+		if (ops->type == XFS_BTREE_TYPE_INODE) {
 			/*
 			 * If all the items we want to store at this level
 			 * would fit in the inode root block, then we have our
@@ -607,7 +608,9 @@ xfs_btree_bload_compute_geometry(
 			 *
 			 * Note that bmap btrees forbid records in the root.
 			 */
-			if (level != 0 && nr_this_level <= avg_per_block) {
+			if ((level != 0 ||
+			     (ops->geom_flags & XFS_BTGEO_IROOT_RECORDS)) &&
+			    nr_this_level <= avg_per_block) {
 				nr_blocks++;
 				break;
 			}
@@ -658,7 +661,7 @@ xfs_btree_bload_compute_geometry(
 		return -EOVERFLOW;
 
 	bbl->btree_height = cur->bc_nlevels;
-	if (cur->bc_ops->type == XFS_BTREE_TYPE_INODE)
+	if (ops->type == XFS_BTREE_TYPE_INODE)
 		bbl->nr_blocks = nr_blocks - 1;
 	else
 		bbl->nr_blocks = nr_blocks;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 12/56] xfs: allow inode-based btrees to reserve space in the data device
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-02-06 22:37   ` [PATCH 11/56] xfs: support storing records in the inode core root Darrick J. Wong
@ 2025-02-06 22:38   ` Darrick J. Wong
  2025-02-06 22:38   ` [PATCH 13/56] xfs: introduce realtime rmap btree ondisk definitions Darrick J. Wong
                     ` (43 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 05290bd5c6236b8ad659157edb36bd2d38f46d3e

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Back when we were testing the rmap and refcount btrees for the data
device, people observed occasional shutdowns when xfs_btree_split was
called for either of those two btrees.  This happened when certain
operations (mostly writeback ioends) created new rmap or refcount
records, which would expand the size of the btree.  If there were no
free blocks available the allocation would fail and the split would shut
down the filesystem.

I considered pre-reserving blocks for btree expansion at the time of a
write() call, but there wasn't any good way to attach the reservations
to an inode and keep them there all the way to ioend processing.  Unlike
delalloc reservations which have that indlen mechanism, there's no way
to do that for mapped extents; and indlen blocks are given back during
the delalloc -> unwritten transition.

The solution was to reserve sufficient blocks for rmap/refcount btree
expansion at mount time.  This is what the XFS_AG_RESV_* flags provide;
any expansion of those two btrees can come from the pre-reserved space.

This patch brings that pre-reservation ability to inode-rooted btrees so
that the rt rmap and refcount btrees can also save room for future
expansion.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_inode.h   |    5 +
 include/xfs_mount.h   |    1 
 include/xfs_trace.h   |    7 ++
 io/inject.c           |    1 
 libxfs/libxfs_priv.h  |   11 +++
 libxfs/xfs_ag_resv.c  |    3 +
 libxfs/xfs_errortag.h |    4 +
 libxfs/xfs_metadir.c  |    3 +
 libxfs/xfs_metafile.c |  203 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_metafile.h |   11 +++
 libxfs/xfs_types.h    |    7 ++
 11 files changed, 254 insertions(+), 2 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 30e171696c80e2..5bb31eb4aa5305 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -224,7 +224,10 @@ typedef struct xfs_inode {
 	struct xfs_ifork	i_df;		/* data fork */
 	struct xfs_ifork	i_af;		/* attribute fork */
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
-	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
+	uint64_t		i_delayed_blks;	/* count of delay alloc blks */
+	/* Space that has been set aside to root a btree in this file. */
+	uint64_t		i_meta_resv_asked;
+
 	xfs_fsize_t		i_disk_size;	/* number of bytes in file */
 	xfs_rfsblock_t		i_nblocks;	/* # of direct & btree blocks */
 	prid_t			i_projid;	/* owner's project id */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 19d08cf047f202..532bff8513bf53 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -115,6 +115,7 @@ typedef struct xfs_mount {
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
 	uint			m_refc_maxlevels; /* max refc btree levels */
 	unsigned int		m_agbtree_maxlevels; /* max level of all AG btrees */
+	unsigned int		m_rtbtree_maxlevels; /* max level of all rt btrees */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	uint			m_alloc_set_aside; /* space we can't use */
 	uint			m_ag_max_usable; /* max space per AG */
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index a53ce092c8ea3b..30166c11dd597b 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -390,4 +390,11 @@
 #define trace_xfs_group_put(...)		((void) 0)
 #define trace_xfs_group_rele(...)		((void) 0)
 
+#define trace_xfs_metafile_resv_alloc_space(...)	((void) 0)
+#define trace_xfs_metafile_resv_critical(...)	((void) 0)
+#define trace_xfs_metafile_resv_free(...)		((void) 0)
+#define trace_xfs_metafile_resv_free_space(...)	((void) 0)
+#define trace_xfs_metafile_resv_init(...)		((void) 0)
+#define trace_xfs_metafile_resv_init_error(...)	((void) 0)
+
 #endif /* __TRACE_H__ */
diff --git a/io/inject.c b/io/inject.c
index 4aeb6da326b4fd..7b9a76406cc54d 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -64,6 +64,7 @@ error_tag(char *name)
 		{ XFS_ERRTAG_WB_DELAY_MS,		"wb_delay_ms" },
 		{ XFS_ERRTAG_WRITE_DELAY_MS,		"write_delay_ms" },
 		{ XFS_ERRTAG_EXCHMAPS_FINISH_ONE,	"exchmaps_finish_one" },
+		{ XFS_ERRTAG_METAFILE_RESV_CRITICAL,	"metafile_resv_crit" },
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index a1401b2c1e409b..7e5c125b581a2f 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -219,6 +219,17 @@ uint32_t get_random_u32(void);
 #define get_random_u32()	(0)
 #endif
 
+static inline int
+__percpu_counter_compare(uint64_t *count, int64_t rhs, int32_t batch)
+{
+	if (*count > rhs)
+		return 1;
+	else if (*count < rhs)
+		return -1;
+	return 0;
+}
+
+
 #define PAGE_SIZE		getpagesize()
 extern unsigned int PAGE_SHIFT;
 
diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c
index f5cbaa94664f22..83cac20331fd34 100644
--- a/libxfs/xfs_ag_resv.c
+++ b/libxfs/xfs_ag_resv.c
@@ -113,6 +113,7 @@ xfs_ag_resv_needed(
 	case XFS_AG_RESV_RMAPBT:
 		len -= xfs_perag_resv(pag, type)->ar_reserved;
 		break;
+	case XFS_AG_RESV_METAFILE:
 	case XFS_AG_RESV_NONE:
 		/* empty */
 		break;
@@ -346,6 +347,7 @@ xfs_ag_resv_alloc_extent(
 
 	switch (type) {
 	case XFS_AG_RESV_AGFL:
+	case XFS_AG_RESV_METAFILE:
 		return;
 	case XFS_AG_RESV_METADATA:
 	case XFS_AG_RESV_RMAPBT:
@@ -388,6 +390,7 @@ xfs_ag_resv_free_extent(
 
 	switch (type) {
 	case XFS_AG_RESV_AGFL:
+	case XFS_AG_RESV_METAFILE:
 		return;
 	case XFS_AG_RESV_METADATA:
 	case XFS_AG_RESV_RMAPBT:
diff --git a/libxfs/xfs_errortag.h b/libxfs/xfs_errortag.h
index 7002d7676a7884..a53c5d40e084dc 100644
--- a/libxfs/xfs_errortag.h
+++ b/libxfs/xfs_errortag.h
@@ -64,7 +64,8 @@
 #define XFS_ERRTAG_WB_DELAY_MS				42
 #define XFS_ERRTAG_WRITE_DELAY_MS			43
 #define XFS_ERRTAG_EXCHMAPS_FINISH_ONE			44
-#define XFS_ERRTAG_MAX					45
+#define XFS_ERRTAG_METAFILE_RESV_CRITICAL		45
+#define XFS_ERRTAG_MAX					46
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -113,5 +114,6 @@
 #define XFS_RANDOM_WB_DELAY_MS				3000
 #define XFS_RANDOM_WRITE_DELAY_MS			3000
 #define XFS_RANDOM_EXCHMAPS_FINISH_ONE			1
+#define XFS_RANDOM_METAFILE_RESV_CRITICAL		4
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/libxfs/xfs_metadir.c b/libxfs/xfs_metadir.c
index b5f05925e73a4e..253fbf48e170e0 100644
--- a/libxfs/xfs_metadir.c
+++ b/libxfs/xfs_metadir.c
@@ -28,6 +28,9 @@
 #include "xfs_dir2_priv.h"
 #include "xfs_parent.h"
 #include "xfs_health.h"
+#include "xfs_errortag.h"
+#include "xfs_btree.h"
+#include "xfs_alloc.h"
 
 /*
  * Metadata Directory Tree
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 3bd9493373115a..7f673d706aada8 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -17,6 +17,8 @@
 #include "xfs_metafile.h"
 #include "xfs_trace.h"
 #include "xfs_inode.h"
+#include "xfs_errortag.h"
+#include "xfs_alloc.h"
 
 /* Set up an inode to be recognized as a metadata directory inode. */
 void
@@ -50,3 +52,204 @@ xfs_metafile_clear_iflag(
 	ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 }
+
+/*
+ * Is the amount of space that could be allocated towards a given metadata
+ * file at or beneath a certain threshold?
+ */
+static inline bool
+xfs_metafile_resv_can_cover(
+	struct xfs_inode	*ip,
+	int64_t			rhs)
+{
+	/*
+	 * The amount of space that can be allocated to this metadata file is
+	 * the remaining reservation for the particular metadata file + the
+	 * global free block count.  Take care of the first case to avoid
+	 * touching the per-cpu counter.
+	 */
+	if (ip->i_delayed_blks >= rhs)
+		return true;
+
+	/*
+	 * There aren't enough blocks left in the inode's reservation, but it
+	 * isn't critical unless there also isn't enough free space.
+	 */
+	return __percpu_counter_compare(&ip->i_mount->m_fdblocks,
+			rhs - ip->i_delayed_blks, 2048) >= 0;
+}
+
+/*
+ * Is this metadata file critically low on blocks?  For now we'll define that
+ * as the number of blocks we can get our hands on being less than 10% of what
+ * we reserved or less than some arbitrary number (maximum btree height).
+ */
+bool
+xfs_metafile_resv_critical(
+	struct xfs_inode	*ip)
+{
+	uint64_t		asked_low_water;
+
+	if (!ip)
+		return false;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+	trace_xfs_metafile_resv_critical(ip, 0);
+
+	if (!xfs_metafile_resv_can_cover(ip, ip->i_mount->m_rtbtree_maxlevels))
+		return true;
+
+	asked_low_water = div_u64(ip->i_meta_resv_asked, 10);
+	if (!xfs_metafile_resv_can_cover(ip, asked_low_water))
+		return true;
+
+	return XFS_TEST_ERROR(false, ip->i_mount,
+			XFS_ERRTAG_METAFILE_RESV_CRITICAL);
+}
+
+/* Allocate a block from the metadata file's reservation. */
+void
+xfs_metafile_resv_alloc_space(
+	struct xfs_inode	*ip,
+	struct xfs_alloc_arg	*args)
+{
+	int64_t			len = args->len;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+	ASSERT(args->resv == XFS_AG_RESV_METAFILE);
+
+	trace_xfs_metafile_resv_alloc_space(ip, args->len);
+
+	/*
+	 * Allocate the blocks from the metadata inode's block reservation
+	 * and update the ondisk sb counter.
+	 */
+	if (ip->i_delayed_blks > 0) {
+		int64_t		from_resv;
+
+		from_resv = min_t(int64_t, len, ip->i_delayed_blks);
+		ip->i_delayed_blks -= from_resv;
+		xfs_mod_delalloc(ip, 0, -from_resv);
+		xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS,
+				-from_resv);
+		len -= from_resv;
+	}
+
+	/*
+	 * Any allocation in excess of the reservation requires in-core and
+	 * on-disk fdblocks updates.  If we can grab @len blocks from the
+	 * in-core fdblocks then all we need to do is update the on-disk
+	 * superblock; if not, then try to steal some from the transaction's
+	 * block reservation.  Overruns are only expected for rmap btrees.
+	 */
+	if (len) {
+		unsigned int	field;
+		int		error;
+
+		error = xfs_dec_fdblocks(ip->i_mount, len, true);
+		if (error)
+			field = XFS_TRANS_SB_FDBLOCKS;
+		else
+			field = XFS_TRANS_SB_RES_FDBLOCKS;
+
+		xfs_trans_mod_sb(args->tp, field, -len);
+	}
+
+	ip->i_nblocks += args->len;
+	xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE);
+}
+
+/* Free a block to the metadata file's reservation. */
+void
+xfs_metafile_resv_free_space(
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	xfs_filblks_t		len)
+{
+	int64_t			to_resv;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+	trace_xfs_metafile_resv_free_space(ip, len);
+
+	ip->i_nblocks -= len;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	/*
+	 * Add the freed blocks back into the inode's delalloc reservation
+	 * until it reaches the maximum size.  Update the ondisk fdblocks only.
+	 */
+	to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks);
+	if (to_resv > 0) {
+		to_resv = min_t(int64_t, to_resv, len);
+		ip->i_delayed_blks += to_resv;
+		xfs_mod_delalloc(ip, 0, to_resv);
+		xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, to_resv);
+		len -= to_resv;
+	}
+
+	/*
+	 * Everything else goes back to the filesystem, so update the in-core
+	 * and on-disk counters.
+	 */
+	if (len)
+		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len);
+}
+
+/* Release a metadata file's space reservation. */
+void
+xfs_metafile_resv_free(
+	struct xfs_inode	*ip)
+{
+	/* Non-btree metadata inodes don't need space reservations. */
+	if (!ip || !ip->i_meta_resv_asked)
+		return;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+	trace_xfs_metafile_resv_free(ip, 0);
+
+	if (ip->i_delayed_blks) {
+		xfs_mod_delalloc(ip, 0, -ip->i_delayed_blks);
+		xfs_add_fdblocks(ip->i_mount, ip->i_delayed_blks);
+		ip->i_delayed_blks = 0;
+	}
+	ip->i_meta_resv_asked = 0;
+}
+
+/* Set up a metadata file's space reservation. */
+int
+xfs_metafile_resv_init(
+	struct xfs_inode	*ip,
+	xfs_filblks_t		ask)
+{
+	xfs_filblks_t		hidden_space;
+	xfs_filblks_t		used;
+	int			error;
+
+	if (!ip || ip->i_meta_resv_asked > 0)
+		return 0;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+
+	/*
+	 * Space taken by all other metadata btrees are accounted on-disk as
+	 * used space.  We therefore only hide the space that is reserved but
+	 * not used by the trees.
+	 */
+	used = ip->i_nblocks;
+	if (used > ask)
+		ask = used;
+	hidden_space = ask - used;
+
+	error = xfs_dec_fdblocks(ip->i_mount, hidden_space, true);
+	if (error) {
+		trace_xfs_metafile_resv_init_error(ip, error, _RET_IP_);
+		return error;
+	}
+
+	xfs_mod_delalloc(ip, 0, hidden_space);
+	ip->i_delayed_blks = hidden_space;
+	ip->i_meta_resv_asked = ask;
+
+	trace_xfs_metafile_resv_init(ip, ask);
+	return 0;
+}
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index acec400123db05..8d8f08a6071c23 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -21,6 +21,17 @@ void xfs_metafile_set_iflag(struct xfs_trans *tp, struct xfs_inode *ip,
 		enum xfs_metafile_type metafile_type);
 void xfs_metafile_clear_iflag(struct xfs_trans *tp, struct xfs_inode *ip);
 
+/* Space reservations for metadata inodes. */
+struct xfs_alloc_arg;
+
+bool xfs_metafile_resv_critical(struct xfs_inode *ip);
+void xfs_metafile_resv_alloc_space(struct xfs_inode *ip,
+		struct xfs_alloc_arg *args);
+void xfs_metafile_resv_free_space(struct xfs_inode *ip, struct xfs_trans *tp,
+		xfs_filblks_t len);
+void xfs_metafile_resv_free(struct xfs_inode *ip);
+int xfs_metafile_resv_init(struct xfs_inode *ip, xfs_filblks_t ask);
+
 /* Code specific to kernel/userspace; must be provided externally. */
 
 int xfs_trans_metafile_iget(struct xfs_trans *tp, xfs_ino_t ino,
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index bf33c2b1e43e5f..ca2401c1facda7 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -202,6 +202,13 @@ enum xfs_ag_resv_type {
 	 * altering fdblocks.  If you think you need this you're wrong.
 	 */
 	XFS_AG_RESV_IGNORE,
+
+	/*
+	 * This allocation activity is being done on behalf of a metadata file.
+	 * These files maintain their own permanent space reservations and are
+	 * required to adjust fdblocks using the xfs_metafile_resv_* helpers.
+	 */
+	XFS_AG_RESV_METAFILE,
 };
 
 /* Results of scanning a btree keyspace to check occupancy. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 13/56] xfs: introduce realtime rmap btree ondisk definitions
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-02-06 22:38   ` [PATCH 12/56] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong
@ 2025-02-06 22:38   ` Darrick J. Wong
  2025-02-06 22:38   ` [PATCH 14/56] xfs: realtime rmap btree transaction reservations Darrick J. Wong
                     ` (42 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: fc6856c6ff08642e3e8437f0416d70a5e1807010

Add the ondisk structure definitions for realtime rmap btrees. The
realtime rmap btree will be rooted from a hidden inode so it needs to
have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate rmap btree
blocks. This prepares the way for connecting the btree operations
implementation, though embedding the rtrmap btree root in the inode
comes later in the series.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h          |    1 
 include/xfs_mount.h       |    9 ++
 libxfs/Makefile           |    2 
 libxfs/xfs_btree.c        |    5 +
 libxfs/xfs_format.h       |   10 ++
 libxfs/xfs_ondisk.h       |    1 
 libxfs/xfs_rtrmap_btree.c |  269 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |   82 ++++++++++++++
 libxfs/xfs_sb.c           |    6 +
 libxfs/xfs_shared.h       |    7 +
 10 files changed, 392 insertions(+)
 create mode 100644 libxfs/xfs_rtrmap_btree.c
 create mode 100644 libxfs/xfs_rtrmap_btree.h


diff --git a/include/libxfs.h b/include/libxfs.h
index 5689f58c640168..af08f176d79a9e 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -99,6 +99,7 @@ struct iomap;
 #include "xfs_metadir.h"
 #include "xfs_rtgroup.h"
 #include "xfs_rtbitmap.h"
+#include "xfs_rtrmap_btree.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 532bff8513bf53..0d9b99ef43b4c1 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -108,11 +108,14 @@ typedef struct xfs_mount {
 	uint			m_bmap_dmnr[2];	/* XFS_BMAP_BLOCK_DMINRECS */
 	uint			m_rmap_mxr[2];	/* max rmap btree records */
 	uint			m_rmap_mnr[2];	/* min rmap btree records */
+	uint			m_rtrmap_mxr[2]; /* max rtrmap btree records */
+	uint			m_rtrmap_mnr[2]; /* min rtrmap btree records */
 	uint			m_refc_mxr[2];	/* max refc btree records */
 	uint			m_refc_mnr[2];	/* min refc btree records */
 	uint			m_alloc_maxlevels; /* max alloc btree levels */
 	uint			m_bm_maxlevels[2];  /* max bmap btree levels */
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
+	uint			m_rtrmap_maxlevels; /* max rtrmap btree level */
 	uint			m_refc_maxlevels; /* max refc btree levels */
 	unsigned int		m_agbtree_maxlevels; /* max level of all AG btrees */
 	unsigned int		m_rtbtree_maxlevels; /* max level of all rt btrees */
@@ -259,6 +262,12 @@ static inline bool xfs_has_rtsb(struct xfs_mount *mp)
 	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
 }
 
+static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp)
+{
+	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) &&
+	       xfs_has_rmapbt(mp);
+}
+
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
 static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 42c9f3cc439dc5..d152667eeec275 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -65,6 +65,7 @@ HFILES = \
 	xfs_rmap_btree.h \
 	xfs_rtbitmap.h \
 	xfs_rtgroup.h \
+	xfs_rtrmap_btree.h \
 	xfs_sb.h \
 	xfs_shared.h \
 	xfs_trans_resv.h \
@@ -126,6 +127,7 @@ CFILES = buf_mem.c \
 	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
 	xfs_rtgroup.c \
+	xfs_rtrmap_btree.c \
 	xfs_sb.c \
 	xfs_symlink_remote.c \
 	xfs_trans_inode.c \
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index c5a0280a55f3b5..8b20458a6cb4c4 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -28,6 +28,7 @@
 #include "xfile.h"
 #include "buf_mem.h"
 #include "xfs_btree_mem.h"
+#include "xfs_rtrmap_btree.h"
 
 /*
  * Btree magic numbers.
@@ -5523,6 +5524,9 @@ xfs_btree_init_cur_caches(void)
 	if (error)
 		goto err;
 	error = xfs_refcountbt_init_cur_cache();
+	if (error)
+		goto err;
+	error = xfs_rtrmapbt_init_cur_cache();
 	if (error)
 		goto err;
 
@@ -5541,6 +5545,7 @@ xfs_btree_destroy_cur_caches(void)
 	xfs_bmbt_destroy_cur_cache();
 	xfs_rmapbt_destroy_cur_cache();
 	xfs_refcountbt_destroy_cur_cache();
+	xfs_rtrmapbt_destroy_cur_cache();
 }
 
 /* Move the btree cursor before the first record. */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 4d47a3e723aa13..469fc7afa591b4 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1725,6 +1725,16 @@ typedef __be32 xfs_rmap_ptr_t;
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
+/*
+ * Realtime Reverse mapping btree format definitions
+ *
+ * This is a btree for reverse mapping records for realtime volumes
+ */
+#define	XFS_RTRMAP_CRC_MAGIC	0x4d415052	/* 'MAPR' */
+
+/* inode-based btree pointer type */
+typedef __be64 xfs_rtrmap_ptr_t;
+
 /*
  * Reference Count Btree format definitions
  *
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index ad0dedf00f1898..2c50877a1a2f0b 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -83,6 +83,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);
+	XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t,			8);
 
 	/*
 	 * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
new file mode 100644
index 00000000000000..a06cb09133db87
--- /dev/null
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_btree_staging.h"
+#include "xfs_rtrmap_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_rtgroup.h"
+
+static struct kmem_cache	*xfs_rtrmapbt_cur_cache;
+
+/*
+ * Realtime Reverse Map btree.
+ *
+ * This is a btree used to track the owner(s) of a given extent in the realtime
+ * device.  See the comments in xfs_rmap_btree.c for more information.
+ *
+ * This tree is basically the same as the regular rmap btree except that it
+ * is rooted in an inode and does not live in free space.
+ */
+
+static struct xfs_btree_cur *
+xfs_rtrmapbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_rtrmapbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group));
+}
+
+static xfs_failaddr_t
+xfs_rtrmapbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	xfs_failaddr_t		fa;
+	int			level;
+
+	if (!xfs_verify_magic(bp, block->bb_magic))
+		return __this_address;
+
+	if (!xfs_has_rmapbt(mp))
+		return __this_address;
+	fa = xfs_btree_fsblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN);
+	if (fa)
+		return fa;
+	level = be16_to_cpu(block->bb_level);
+	if (level > mp->m_rtrmap_maxlevels)
+		return __this_address;
+
+	return xfs_btree_fsblock_verify(bp, mp->m_rtrmap_mxr[level != 0]);
+}
+
+static void
+xfs_rtrmapbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_failaddr_t	fa;
+
+	if (!xfs_btree_fsblock_verify_crc(bp))
+		xfs_verifier_error(bp, -EFSBADCRC, __this_address);
+	else {
+		fa = xfs_rtrmapbt_verify(bp);
+		if (fa)
+			xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+	}
+
+	if (bp->b_error)
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+}
+
+static void
+xfs_rtrmapbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_failaddr_t	fa;
+
+	fa = xfs_rtrmapbt_verify(bp);
+	if (fa) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+		return;
+	}
+	xfs_btree_fsblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = {
+	.name			= "xfs_rtrmapbt",
+	.magic			= { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) },
+	.verify_read		= xfs_rtrmapbt_read_verify,
+	.verify_write		= xfs_rtrmapbt_write_verify,
+	.verify_struct		= xfs_rtrmapbt_verify,
+};
+
+const struct xfs_btree_ops xfs_rtrmapbt_ops = {
+	.name			= "rtrmap",
+	.type			= XFS_BTREE_TYPE_INODE,
+	.geom_flags		= XFS_BTGEO_OVERLAPPING |
+				  XFS_BTGEO_IROOT_RECORDS,
+
+	.rec_len		= sizeof(struct xfs_rmap_rec),
+	/* Overlapping btree; 2 keys per pointer. */
+	.key_len		= 2 * sizeof(struct xfs_rmap_key),
+	.ptr_len		= XFS_BTREE_LONG_PTR_LEN,
+
+	.lru_refs		= XFS_RMAP_BTREE_REF,
+	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrmap_2),
+
+	.dup_cursor		= xfs_rtrmapbt_dup_cursor,
+	.buf_ops		= &xfs_rtrmapbt_buf_ops,
+};
+
+/* Allocate a new rt rmap btree cursor. */
+struct xfs_btree_cur *
+xfs_rtrmapbt_init_cursor(
+	struct xfs_trans	*tp,
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_inode	*ip = NULL;
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_btree_cur	*cur;
+
+	return NULL; /* XXX */
+
+	xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
+
+	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_ops,
+			mp->m_rtrmap_maxlevels, xfs_rtrmapbt_cur_cache);
+
+	cur->bc_ino.ip = ip;
+	cur->bc_group = xfs_group_hold(rtg_group(rtg));
+	cur->bc_ino.whichfork = XFS_DATA_FORK;
+	cur->bc_nlevels = be16_to_cpu(ip->i_df.if_broot->bb_level) + 1;
+	cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK);
+
+	return cur;
+}
+
+/*
+ * Install a new rt reverse mapping btree root.  Caller is responsible for
+ * invalidating and freeing the old btree blocks.
+ */
+void
+xfs_rtrmapbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp)
+{
+	struct xbtree_ifakeroot	*ifake = cur->bc_ino.ifake;
+	struct xfs_ifork	*ifp;
+	int			flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	/*
+	 * Free any resources hanging off the real fork, then shallow-copy the
+	 * staging fork's contents into the real fork to transfer everything
+	 * we just built.
+	 */
+	ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK);
+	xfs_idestroy_fork(ifp);
+	memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork));
+
+	cur->bc_ino.ip->i_projid = cur->bc_group->xg_gno;
+	xfs_trans_log_inode(tp, cur->bc_ino.ip, flags);
+	xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK);
+}
+
+/* Calculate number of records in a rt reverse mapping btree block. */
+static inline unsigned int
+xfs_rtrmapbt_block_maxrecs(
+	unsigned int		blocklen,
+	bool			leaf)
+{
+	if (leaf)
+		return blocklen / sizeof(struct xfs_rmap_rec);
+	return blocklen /
+		(2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rtrmap_ptr_t));
+}
+
+/*
+ * Calculate number of records in an rt reverse mapping btree block.
+ */
+unsigned int
+xfs_rtrmapbt_maxrecs(
+	struct xfs_mount	*mp,
+	unsigned int		blocklen,
+	bool			leaf)
+{
+	blocklen -= XFS_RTRMAP_BLOCK_LEN;
+	return xfs_rtrmapbt_block_maxrecs(blocklen, leaf);
+}
+
+/* Compute the max possible height for realtime reverse mapping btrees. */
+unsigned int
+xfs_rtrmapbt_maxlevels_ondisk(void)
+{
+	unsigned int		minrecs[2];
+	unsigned int		blocklen;
+
+	blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN;
+
+	minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2;
+	minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2;
+
+	/* We need at most one record for every block in an rt group. */
+	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS);
+}
+
+int __init
+xfs_rtrmapbt_init_cur_cache(void)
+{
+	xfs_rtrmapbt_cur_cache = kmem_cache_create("xfs_rtrmapbt_cur",
+			xfs_btree_cur_sizeof(xfs_rtrmapbt_maxlevels_ondisk()),
+			0, 0, NULL);
+
+	if (!xfs_rtrmapbt_cur_cache)
+		return -ENOMEM;
+	return 0;
+}
+
+void
+xfs_rtrmapbt_destroy_cur_cache(void)
+{
+	kmem_cache_destroy(xfs_rtrmapbt_cur_cache);
+	xfs_rtrmapbt_cur_cache = NULL;
+}
+
+/* Compute the maximum height of an rt reverse mapping btree. */
+void
+xfs_rtrmapbt_compute_maxlevels(
+	struct xfs_mount	*mp)
+{
+	unsigned int		d_maxlevels, r_maxlevels;
+
+	if (!xfs_has_rtrmapbt(mp)) {
+		mp->m_rtrmap_maxlevels = 0;
+		return;
+	}
+
+	/*
+	 * The realtime rmapbt lives on the data device, which means that its
+	 * maximum height is constrained by the size of the data device and
+	 * the height required to store one rmap record for each block in an
+	 * rt group.
+	 */
+	d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr,
+				mp->m_sb.sb_dblocks);
+	r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr,
+				mp->m_groups[XG_TYPE_RTG].blocks);
+
+	/* Add one level to handle the inode root level. */
+	mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1;
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
new file mode 100644
index 00000000000000..7d1a3a49a2d69b
--- /dev/null
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_RTRMAP_BTREE_H__
+#define __XFS_RTRMAP_BTREE_H__
+
+struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+struct xbtree_ifakeroot;
+struct xfs_rtgroup;
+
+/* rmaps only exist on crc enabled filesystems */
+#define XFS_RTRMAP_BLOCK_LEN	XFS_BTREE_LBLOCK_CRC_LEN
+
+struct xfs_btree_cur *xfs_rtrmapbt_init_cursor(struct xfs_trans *tp,
+		struct xfs_rtgroup *rtg);
+struct xfs_btree_cur *xfs_rtrmapbt_stage_cursor(struct xfs_mount *mp,
+		struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xbtree_ifakeroot *ifake);
+void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp);
+unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
+		bool leaf);
+void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp);
+
+/*
+ * Addresses of records, keys, and pointers within an incore rtrmapbt block.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+static inline struct xfs_rmap_rec *
+xfs_rtrmap_rec_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_rmap_rec *)
+		((char *)block + XFS_RTRMAP_BLOCK_LEN +
+		 (index - 1) * sizeof(struct xfs_rmap_rec));
+}
+
+static inline struct xfs_rmap_key *
+xfs_rtrmap_key_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_rmap_key *)
+		((char *)block + XFS_RTRMAP_BLOCK_LEN +
+		 (index - 1) * 2 * sizeof(struct xfs_rmap_key));
+}
+
+static inline struct xfs_rmap_key *
+xfs_rtrmap_high_key_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_rmap_key *)
+		((char *)block + XFS_RTRMAP_BLOCK_LEN +
+		 sizeof(struct xfs_rmap_key) +
+		 (index - 1) * 2 * sizeof(struct xfs_rmap_key));
+}
+
+static inline xfs_rtrmap_ptr_t *
+xfs_rtrmap_ptr_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index,
+	unsigned int		maxrecs)
+{
+	return (xfs_rtrmap_ptr_t *)
+		((char *)block + XFS_RTRMAP_BLOCK_LEN +
+		 maxrecs * 2 * sizeof(struct xfs_rmap_key) +
+		 (index - 1) * sizeof(xfs_rtrmap_ptr_t));
+}
+
+unsigned int xfs_rtrmapbt_maxlevels_ondisk(void);
+
+int __init xfs_rtrmapbt_init_cur_cache(void);
+void xfs_rtrmapbt_destroy_cur_cache(void);
+
+#endif /* __XFS_RTRMAP_BTREE_H__ */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index ff23803a8065bf..467d64e199d22e 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -25,6 +25,7 @@
 #include "xfs_ag.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_rtgroup.h"
+#include "xfs_rtrmap_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -1212,6 +1213,11 @@ xfs_sb_mount_common(
 	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
 	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
 
+	mp->m_rtrmap_mxr[0] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, true);
+	mp->m_rtrmap_mxr[1] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, false);
+	mp->m_rtrmap_mnr[0] = mp->m_rtrmap_mxr[0] / 2;
+	mp->m_rtrmap_mnr[1] = mp->m_rtrmap_mxr[1] / 2;
+
 	mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true);
 	mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false);
 	mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index e7efdb9ceaf382..da23dac22c3f08 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops;
 extern const struct xfs_buf_ops xfs_rtsummary_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
 extern const struct xfs_buf_ops xfs_rtsb_buf_ops;
+extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
 extern const struct xfs_buf_ops xfs_symlink_buf_ops;
@@ -55,6 +56,7 @@ extern const struct xfs_btree_ops xfs_bmbt_ops;
 extern const struct xfs_btree_ops xfs_refcountbt_ops;
 extern const struct xfs_btree_ops xfs_rmapbt_ops;
 extern const struct xfs_btree_ops xfs_rmapbt_mem_ops;
+extern const struct xfs_btree_ops xfs_rtrmapbt_ops;
 
 static inline bool xfs_btree_is_bno(const struct xfs_btree_ops *ops)
 {
@@ -100,6 +102,11 @@ static inline bool xfs_btree_is_mem_rmap(const struct xfs_btree_ops *ops)
 # define xfs_btree_is_mem_rmap(...)	(false)
 #endif
 
+static inline bool xfs_btree_is_rtrmap(const struct xfs_btree_ops *ops)
+{
+	return ops == &xfs_rtrmapbt_ops;
+}
+
 /* log size calculation functions */
 int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
 int	xfs_log_calc_minimum_size(struct xfs_mount *);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 14/56] xfs: realtime rmap btree transaction reservations
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-02-06 22:38   ` [PATCH 13/56] xfs: introduce realtime rmap btree ondisk definitions Darrick J. Wong
@ 2025-02-06 22:38   ` Darrick J. Wong
  2025-02-06 22:39   ` [PATCH 15/56] xfs: add realtime rmap btree operations Darrick J. Wong
                     ` (41 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: e1c76fce50bb750dff236aa51a3b698de4f7132c

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_exchmaps.c    |    4 +++-
 libxfs/xfs_trans_resv.c  |   12 ++++++++++--
 libxfs/xfs_trans_space.h |   13 +++++++++++++
 3 files changed, 26 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_exchmaps.c b/libxfs/xfs_exchmaps.c
index 08bfe6f1b3422d..2d7513549429b4 100644
--- a/libxfs/xfs_exchmaps.c
+++ b/libxfs/xfs_exchmaps.c
@@ -659,7 +659,9 @@ xfs_exchmaps_rmapbt_blocks(
 	if (!xfs_has_rmapbt(mp))
 		return 0;
 	if (XFS_IS_REALTIME_INODE(req->ip1))
-		return 0;
+		return howmany_64(req->nr_exchanges,
+					XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) *
+			XFS_RTRMAPADD_SPACE_RES(mp);
 
 	return howmany_64(req->nr_exchanges,
 					XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) *
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 93047e149693d6..cdfac7a12906c2 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -210,7 +210,9 @@ xfs_calc_inode_chunk_res(
  * Per-extent log reservation for the btree changes involved in freeing or
  * allocating a realtime extent.  We have to be able to log as many rtbitmap
  * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime
- * extents, as well as the realtime summary block.
+ * extents, as well as the realtime summary block (t1).  Realtime rmap btree
+ * operations happen in a second transaction, so factor in a couple of rtrmapbt
+ * splits (t2).
  */
 static unsigned int
 xfs_rtalloc_block_count(
@@ -219,10 +221,16 @@ xfs_rtalloc_block_count(
 {
 	unsigned int		rtbmp_blocks;
 	xfs_rtxlen_t		rtxlen;
+	unsigned int		t1, t2 = 0;
 
 	rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN);
 	rtbmp_blocks = xfs_rtbitmap_blockcount_len(mp, rtxlen);
-	return (rtbmp_blocks + 1) * num_ops;
+	t1 = (rtbmp_blocks + 1) * num_ops;
+
+	if (xfs_has_rmapbt(mp))
+		t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1);
+
+	return max(t1, t2);
 }
 
 /*
diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h
index 1155ff2d37e29f..d89b570aafcc64 100644
--- a/libxfs/xfs_trans_space.h
+++ b/libxfs/xfs_trans_space.h
@@ -14,6 +14,19 @@
 #define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp)    \
 		(((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0]))
 
+/* Worst case number of realtime rmaps that can be held in a block. */
+#define XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)    \
+		(((mp)->m_rtrmap_mxr[0]) - ((mp)->m_rtrmap_mnr[0]))
+
+/* Adding one realtime rmap could split every level to the top of the tree. */
+#define XFS_RTRMAPADD_SPACE_RES(mp) ((mp)->m_rtrmap_maxlevels)
+
+/* Blocks we might need to add "b" realtime rmaps to a tree. */
+#define XFS_NRTRMAPADD_SPACE_RES(mp, b) \
+	((((b) + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) - 1) / \
+	  XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * \
+	  XFS_RTRMAPADD_SPACE_RES(mp))
+
 /* Worst case number of rmaps that can be held in a block. */
 #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)    \
 		(((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0]))


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 15/56] xfs: add realtime rmap btree operations
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-02-06 22:38   ` [PATCH 14/56] xfs: realtime rmap btree transaction reservations Darrick J. Wong
@ 2025-02-06 22:39   ` Darrick J. Wong
  2025-02-06 22:39   ` [PATCH 16/56] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong
                     ` (40 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: d386b4024372ea2f06aaa0f2c6c380b45ba0536e

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c        |   67 +++++++++++
 libxfs/xfs_btree.h        |    6 +
 libxfs/xfs_rtrmap_btree.c |  271 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 344 insertions(+)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 8b20458a6cb4c4..7cbd8645b9a816 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -29,6 +29,9 @@
 #include "buf_mem.h"
 #include "xfs_btree_mem.h"
 #include "xfs_rtrmap_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_rmap.h"
+#include "xfs_metafile.h"
 
 /*
  * Btree magic numbers.
@@ -5574,3 +5577,67 @@ xfs_btree_goto_left_edge(
 
 	return 0;
 }
+
+/* Allocate a block for an inode-rooted metadata btree. */
+int
+xfs_btree_alloc_metafile_block(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_ptr	*start,
+	union xfs_btree_ptr		*new,
+	int				*stat)
+{
+	struct xfs_alloc_arg		args = {
+		.mp			= cur->bc_mp,
+		.tp			= cur->bc_tp,
+		.resv			= XFS_AG_RESV_METAFILE,
+		.minlen			= 1,
+		.maxlen			= 1,
+		.prod			= 1,
+	};
+	struct xfs_inode		*ip = cur->bc_ino.ip;
+	int				error;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+
+	xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, cur->bc_ino.whichfork);
+	error = xfs_alloc_vextent_start_ag(&args,
+			XFS_INO_TO_FSB(cur->bc_mp, ip->i_ino));
+	if (error)
+		return error;
+	if (args.fsbno == NULLFSBLOCK) {
+		*stat = 0;
+		return 0;
+	}
+	ASSERT(args.len == 1);
+
+	xfs_metafile_resv_alloc_space(ip, &args);
+
+	new->l = cpu_to_be64(args.fsbno);
+	*stat = 1;
+	return 0;
+}
+
+/* Free a block from an inode-rooted metadata btree. */
+int
+xfs_btree_free_metafile_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_owner_info	oinfo;
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_inode	*ip = cur->bc_ino.ip;
+	struct xfs_trans	*tp = cur->bc_tp;
+	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
+	int			error;
+
+	ASSERT(xfs_is_metadir_inode(ip));
+
+	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
+	error = xfs_free_extent_later(tp, fsbno, 1, &oinfo, XFS_AG_RESV_METAFILE,
+			0);
+	if (error)
+		return error;
+
+	xfs_metafile_resv_free_space(ip, tp, 1);
+	return 0;
+}
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 3b8c2ccad90847..ee82dc777d6d5b 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -703,4 +703,10 @@ xfs_btree_at_iroot(
 	       level == cur->bc_nlevels - 1;
 }
 
+int xfs_btree_alloc_metafile_block(struct xfs_btree_cur *cur,
+		const union xfs_btree_ptr *start, union xfs_btree_ptr *newp,
+		int *stat);
+int xfs_btree_free_metafile_block(struct xfs_btree_cur *cur,
+		struct xfs_buf *bp);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index a06cb09133db87..3c135648dfce4e 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -18,10 +18,12 @@
 #include "xfs_alloc.h"
 #include "xfs_btree.h"
 #include "xfs_btree_staging.h"
+#include "xfs_rmap.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_trace.h"
 #include "xfs_cksum.h"
 #include "xfs_rtgroup.h"
+#include "xfs_bmap.h"
 
 static struct kmem_cache	*xfs_rtrmapbt_cur_cache;
 
@@ -42,6 +44,182 @@ xfs_rtrmapbt_dup_cursor(
 	return xfs_rtrmapbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group));
 }
 
+STATIC int
+xfs_rtrmapbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level == cur->bc_nlevels - 1) {
+		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+
+		return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes,
+				level == 0) / 2;
+	}
+
+	return cur->bc_mp->m_rtrmap_mnr[level != 0];
+}
+
+STATIC int
+xfs_rtrmapbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level == cur->bc_nlevels - 1) {
+		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+
+		return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes,
+				level == 0);
+	}
+
+	return cur->bc_mp->m_rtrmap_mxr[level != 0];
+}
+
+/*
+ * Convert the ondisk record's offset field into the ondisk key's offset field.
+ * Fork and bmbt are significant parts of the rmap record key, but written
+ * status is merely a record attribute.
+ */
+static inline __be64 ondisk_rec_offset_to_key(const union xfs_btree_rec *rec)
+{
+	return rec->rmap.rm_offset & ~cpu_to_be64(XFS_RMAP_OFF_UNWRITTEN);
+}
+
+STATIC void
+xfs_rtrmapbt_init_key_from_rec(
+	union xfs_btree_key		*key,
+	const union xfs_btree_rec	*rec)
+{
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+	key->rmap.rm_owner = rec->rmap.rm_owner;
+	key->rmap.rm_offset = ondisk_rec_offset_to_key(rec);
+}
+
+STATIC void
+xfs_rtrmapbt_init_high_key_from_rec(
+	union xfs_btree_key		*key,
+	const union xfs_btree_rec	*rec)
+{
+	uint64_t			off;
+	int				adj;
+
+	adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1;
+
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+	be32_add_cpu(&key->rmap.rm_startblock, adj);
+	key->rmap.rm_owner = rec->rmap.rm_owner;
+	key->rmap.rm_offset = ondisk_rec_offset_to_key(rec);
+	if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) ||
+	    XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset)))
+		return;
+	off = be64_to_cpu(key->rmap.rm_offset);
+	off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK);
+	key->rmap.rm_offset = cpu_to_be64(off);
+}
+
+STATIC void
+xfs_rtrmapbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
+	rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
+	rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+	rec->rmap.rm_offset = cpu_to_be64(
+			xfs_rmap_irec_offset_pack(&cur->bc_rec.r));
+}
+
+STATIC void
+xfs_rtrmapbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	ptr->l = 0;
+}
+
+/*
+ * Mask the appropriate parts of the ondisk key field for a key comparison.
+ * Fork and bmbt are significant parts of the rmap record key, but written
+ * status is merely a record attribute.
+ */
+static inline uint64_t offset_keymask(uint64_t offset)
+{
+	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
+}
+
+STATIC int64_t
+xfs_rtrmapbt_key_diff(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key)
+{
+	struct xfs_rmap_irec		*rec = &cur->bc_rec.r;
+	const struct xfs_rmap_key	*kp = &key->rmap;
+	__u64				x, y;
+	int64_t				d;
+
+	d = (int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+	if (d)
+		return d;
+
+	x = be64_to_cpu(kp->rm_owner);
+	y = rec->rm_owner;
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+
+	x = offset_keymask(be64_to_cpu(kp->rm_offset));
+	y = offset_keymask(xfs_rmap_irec_offset_pack(rec));
+	if (x > y)
+		return 1;
+	else if (y > x)
+		return -1;
+	return 0;
+}
+
+STATIC int64_t
+xfs_rtrmapbt_diff_two_keys(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*k1,
+	const union xfs_btree_key	*k2,
+	const union xfs_btree_key	*mask)
+{
+	const struct xfs_rmap_key	*kp1 = &k1->rmap;
+	const struct xfs_rmap_key	*kp2 = &k2->rmap;
+	int64_t				d;
+	__u64				x, y;
+
+	/* Doesn't make sense to mask off the physical space part */
+	ASSERT(!mask || mask->rmap.rm_startblock);
+
+	d = (int64_t)be32_to_cpu(kp1->rm_startblock) -
+		     be32_to_cpu(kp2->rm_startblock);
+	if (d)
+		return d;
+
+	if (!mask || mask->rmap.rm_owner) {
+		x = be64_to_cpu(kp1->rm_owner);
+		y = be64_to_cpu(kp2->rm_owner);
+		if (x > y)
+			return 1;
+		else if (y > x)
+			return -1;
+	}
+
+	if (!mask || mask->rmap.rm_offset) {
+		/* Doesn't make sense to allow offset but not owner */
+		ASSERT(!mask || mask->rmap.rm_owner);
+
+		x = offset_keymask(be64_to_cpu(kp1->rm_offset));
+		y = offset_keymask(be64_to_cpu(kp2->rm_offset));
+		if (x > y)
+			return 1;
+		else if (y > x)
+			return -1;
+	}
+
+	return 0;
+}
+
 static xfs_failaddr_t
 xfs_rtrmapbt_verify(
 	struct xfs_buf		*bp)
@@ -108,6 +286,86 @@ const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = {
 	.verify_struct		= xfs_rtrmapbt_verify,
 };
 
+STATIC int
+xfs_rtrmapbt_keys_inorder(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*k1,
+	const union xfs_btree_key	*k2)
+{
+	uint32_t			x;
+	uint32_t			y;
+	uint64_t			a;
+	uint64_t			b;
+
+	x = be32_to_cpu(k1->rmap.rm_startblock);
+	y = be32_to_cpu(k2->rmap.rm_startblock);
+	if (x < y)
+		return 1;
+	else if (x > y)
+		return 0;
+	a = be64_to_cpu(k1->rmap.rm_owner);
+	b = be64_to_cpu(k2->rmap.rm_owner);
+	if (a < b)
+		return 1;
+	else if (a > b)
+		return 0;
+	a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset));
+	if (a <= b)
+		return 1;
+	return 0;
+}
+
+STATIC int
+xfs_rtrmapbt_recs_inorder(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_rec	*r1,
+	const union xfs_btree_rec	*r2)
+{
+	uint32_t			x;
+	uint32_t			y;
+	uint64_t			a;
+	uint64_t			b;
+
+	x = be32_to_cpu(r1->rmap.rm_startblock);
+	y = be32_to_cpu(r2->rmap.rm_startblock);
+	if (x < y)
+		return 1;
+	else if (x > y)
+		return 0;
+	a = be64_to_cpu(r1->rmap.rm_owner);
+	b = be64_to_cpu(r2->rmap.rm_owner);
+	if (a < b)
+		return 1;
+	else if (a > b)
+		return 0;
+	a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset));
+	if (a <= b)
+		return 1;
+	return 0;
+}
+
+STATIC enum xbtree_key_contig
+xfs_rtrmapbt_keys_contiguous(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
+{
+	ASSERT(!mask || mask->rmap.rm_startblock);
+
+	/*
+	 * We only support checking contiguity of the physical space component.
+	 * If any callers ever need more specificity than that, they'll have to
+	 * implement it here.
+	 */
+	ASSERT(!mask || (!mask->rmap.rm_owner && !mask->rmap.rm_offset));
+
+	return xbtree_key_contig(be32_to_cpu(key1->rmap.rm_startblock),
+				 be32_to_cpu(key2->rmap.rm_startblock));
+}
+
 const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 	.name			= "rtrmap",
 	.type			= XFS_BTREE_TYPE_INODE,
@@ -123,7 +381,20 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrmap_2),
 
 	.dup_cursor		= xfs_rtrmapbt_dup_cursor,
+	.alloc_block		= xfs_btree_alloc_metafile_block,
+	.free_block		= xfs_btree_free_metafile_block,
+	.get_minrecs		= xfs_rtrmapbt_get_minrecs,
+	.get_maxrecs		= xfs_rtrmapbt_get_maxrecs,
+	.init_key_from_rec	= xfs_rtrmapbt_init_key_from_rec,
+	.init_high_key_from_rec	= xfs_rtrmapbt_init_high_key_from_rec,
+	.init_rec_from_cur	= xfs_rtrmapbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_rtrmapbt_init_ptr_from_cur,
+	.key_diff		= xfs_rtrmapbt_key_diff,
 	.buf_ops		= &xfs_rtrmapbt_buf_ops,
+	.diff_two_keys		= xfs_rtrmapbt_diff_two_keys,
+	.keys_inorder		= xfs_rtrmapbt_keys_inorder,
+	.recs_inorder		= xfs_rtrmapbt_recs_inorder,
+	.keys_contiguous	= xfs_rtrmapbt_keys_contiguous,
 };
 
 /* Allocate a new rt rmap btree cursor. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 16/56] xfs: prepare rmap functions to deal with rtrmapbt
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-02-06 22:39   ` [PATCH 15/56] xfs: add realtime rmap btree operations Darrick J. Wong
@ 2025-02-06 22:39   ` Darrick J. Wong
  2025-02-06 22:39   ` [PATCH 17/56] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
                     ` (39 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: adafb31c80e608e63adcf8cae5675db00c734149

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rmap.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rmap.h    |    3 ++
 libxfs/xfs_rtgroup.h |   26 +++++++++++++++++++++
 3 files changed, 92 insertions(+)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index e23972f934f9ef..f1bc677c56e8d9 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -24,6 +24,7 @@
 #include "xfs_ag.h"
 #include "xfs_health.h"
 #include "defer_item.h"
+#include "xfs_rtgroup.h"
 
 struct kmem_cache	*xfs_rmap_intent_cache;
 
@@ -263,11 +264,69 @@ xfs_rmap_check_irec(
 	return NULL;
 }
 
+static xfs_failaddr_t
+xfs_rtrmap_check_meta_irec(
+	struct xfs_rtgroup		*rtg,
+	const struct xfs_rmap_irec	*irec)
+{
+	struct xfs_mount		*mp = rtg_mount(rtg);
+
+	if (irec->rm_offset != 0)
+		return __this_address;
+	if (irec->rm_flags & XFS_RMAP_UNWRITTEN)
+		return __this_address;
+
+	switch (irec->rm_owner) {
+	case XFS_RMAP_OWN_FS:
+		if (irec->rm_startblock != 0)
+			return __this_address;
+		if (irec->rm_blockcount != mp->m_sb.sb_rextsize)
+			return __this_address;
+		return NULL;
+	default:
+		return __this_address;
+	}
+
+	return NULL;
+}
+
+static xfs_failaddr_t
+xfs_rtrmap_check_inode_irec(
+	struct xfs_rtgroup		*rtg,
+	const struct xfs_rmap_irec	*irec)
+{
+	struct xfs_mount		*mp = rtg_mount(rtg);
+
+	if (!xfs_verify_ino(mp, irec->rm_owner))
+		return __this_address;
+	if (!xfs_verify_rgbext(rtg, irec->rm_startblock, irec->rm_blockcount))
+		return __this_address;
+	if (!xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount))
+		return __this_address;
+	return NULL;
+}
+
+xfs_failaddr_t
+xfs_rtrmap_check_irec(
+	struct xfs_rtgroup		*rtg,
+	const struct xfs_rmap_irec	*irec)
+{
+	if (irec->rm_blockcount == 0)
+		return __this_address;
+	if (irec->rm_flags & (XFS_RMAP_BMBT_BLOCK | XFS_RMAP_ATTR_FORK))
+		return __this_address;
+	if (XFS_RMAP_NON_INODE_OWNER(irec->rm_owner))
+		return xfs_rtrmap_check_meta_irec(rtg, irec);
+	return xfs_rtrmap_check_inode_irec(rtg, irec);
+}
+
 static inline xfs_failaddr_t
 xfs_rmap_check_btrec(
 	struct xfs_btree_cur		*cur,
 	const struct xfs_rmap_irec	*irec)
 {
+	if (xfs_btree_is_rtrmap(cur->bc_ops))
+		return xfs_rtrmap_check_irec(to_rtg(cur->bc_group), irec);
 	return xfs_rmap_check_irec(to_perag(cur->bc_group), irec);
 }
 
@@ -282,6 +341,10 @@ xfs_rmap_complain_bad_rec(
 	if (xfs_btree_is_mem_rmap(cur->bc_ops))
 		xfs_warn(mp,
  "In-Memory Reverse Mapping BTree record corruption detected at %pS!", fa);
+	else if (xfs_btree_is_rtrmap(cur->bc_ops))
+		xfs_warn(mp,
+ "RT Reverse Mapping BTree record corruption in rtgroup %u detected at %pS!",
+				cur->bc_group->xg_gno, fa);
 	else
 		xfs_warn(mp,
  "Reverse Mapping BTree record corruption in AG %d detected at %pS!",
diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h
index 8e2657af038e9e..1b19f54b65047f 100644
--- a/libxfs/xfs_rmap.h
+++ b/libxfs/xfs_rmap.h
@@ -7,6 +7,7 @@
 #define __XFS_RMAP_H__
 
 struct xfs_perag;
+struct xfs_rtgroup;
 
 static inline void
 xfs_rmap_ino_bmbt_owner(
@@ -206,6 +207,8 @@ xfs_failaddr_t xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec,
 		struct xfs_rmap_irec *irec);
 xfs_failaddr_t xfs_rmap_check_irec(struct xfs_perag *pag,
 		const struct xfs_rmap_irec *irec);
+xfs_failaddr_t xfs_rtrmap_check_irec(struct xfs_rtgroup *rtg,
+		const struct xfs_rmap_irec *irec);
 
 int xfs_rmap_has_records(struct xfs_btree_cur *cur, xfs_agblock_t bno,
 		xfs_extlen_t len, enum xbtree_recpacking *outcome);
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 2e145ea2de8007..bff319fa5a7173 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -132,6 +132,32 @@ xfs_rtgroup_next(
 	return xfs_rtgroup_next_range(mp, rtg, 0, mp->m_sb.sb_rgcount - 1);
 }
 
+static inline bool
+xfs_verify_rgbno(
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		rgbno)
+{
+	ASSERT(xfs_has_rtgroups(rtg_mount(rtg)));
+
+	return xfs_verify_gbno(rtg_group(rtg), rgbno);
+}
+
+/*
+ * Check that [@rgbno,@len] is a valid extent range in @rtg.
+ *
+ * Must only be used for RTG-enabled file systems.
+ */
+static inline bool
+xfs_verify_rgbext(
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		rgbno,
+	xfs_extlen_t		len)
+{
+	ASSERT(xfs_has_rtgroups(rtg_mount(rtg)));
+
+	return xfs_verify_gbext(rtg_group(rtg), rgbno, len);
+}
+
 static inline xfs_rtblock_t
 xfs_rgbno_to_rtb(
 	struct xfs_rtgroup	*rtg,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 17/56] xfs: add a realtime flag to the rmap update log redo items
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-02-06 22:39   ` [PATCH 16/56] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong
@ 2025-02-06 22:39   ` Darrick J. Wong
  2025-02-06 22:39   ` [PATCH 18/56] xfs: pretty print metadata file types in error messages Darrick J. Wong
                     ` (38 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 9e823fc27419b09718fff74ae2297b25ae6fb317

Extend the rmap update (RUI) log items to handle realtime volumes by
adding a new log intent item type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_defer.h      |    1 +
 libxfs/xfs_log_format.h |    6 +++++-
 libxfs/xfs_refcount.c   |    4 ++--
 libxfs/xfs_rmap.c       |   17 ++++++++++++-----
 libxfs/xfs_rmap.h       |    5 +++--
 5 files changed, 23 insertions(+), 10 deletions(-)


diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index ec51b8465e61cb..1e2477eaa5a844 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -69,6 +69,7 @@ struct xfs_defer_op_type {
 extern const struct xfs_defer_op_type xfs_bmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
+extern const struct xfs_defer_op_type xfs_rtrmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
 extern const struct xfs_defer_op_type xfs_rtextent_free_defer_type;
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 15dec19b6c32ad..a7e0e479454d3d 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -250,6 +250,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_XMD		0x1249  /* mapping exchange done */
 #define	XFS_LI_EFI_RT		0x124a	/* realtime extent free intent */
 #define	XFS_LI_EFD_RT		0x124b	/* realtime extent free done */
+#define	XFS_LI_RUI_RT		0x124c	/* realtime rmap update intent */
+#define	XFS_LI_RUD_RT		0x124d	/* realtime rmap update done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -271,7 +273,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_XMI,		"XFS_LI_XMI" }, \
 	{ XFS_LI_XMD,		"XFS_LI_XMD" }, \
 	{ XFS_LI_EFI_RT,	"XFS_LI_EFI_RT" }, \
-	{ XFS_LI_EFD_RT,	"XFS_LI_EFD_RT" }
+	{ XFS_LI_EFD_RT,	"XFS_LI_EFD_RT" }, \
+	{ XFS_LI_RUI_RT,	"XFS_LI_RUI_RT" }, \
+	{ XFS_LI_RUD_RT,	"XFS_LI_RUD_RT" }
 
 /*
  * Inode Log Item Format definitions.
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 11c0ca5ecdb3e4..fb7c56a5a32921 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1830,7 +1830,7 @@ xfs_refcount_alloc_cow_extent(
 	__xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len);
 
 	/* Add rmap entry */
-	xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW);
+	xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW);
 }
 
 /* Forget a CoW staging event in the refcount btree. */
@@ -1846,7 +1846,7 @@ xfs_refcount_free_cow_extent(
 		return;
 
 	/* Remove rmap entry */
-	xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW);
+	xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW);
 	__xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len);
 }
 
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index f1bc677c56e8d9..a1a57cd0c62c10 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -2709,6 +2709,7 @@ __xfs_rmap_add(
 	struct xfs_trans		*tp,
 	enum xfs_rmap_intent_type	type,
 	uint64_t			owner,
+	bool				isrt,
 	int				whichfork,
 	struct xfs_bmbt_irec		*bmap)
 {
@@ -2720,6 +2721,7 @@ __xfs_rmap_add(
 	ri->ri_owner = owner;
 	ri->ri_whichfork = whichfork;
 	ri->ri_bmap = *bmap;
+	ri->ri_realtime = isrt;
 
 	xfs_rmap_defer_add(tp, ri);
 }
@@ -2733,6 +2735,7 @@ xfs_rmap_map_extent(
 	struct xfs_bmbt_irec	*PREV)
 {
 	enum xfs_rmap_intent_type type = XFS_RMAP_MAP;
+	bool			isrt = xfs_ifork_is_realtime(ip, whichfork);
 
 	if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork))
 		return;
@@ -2740,7 +2743,7 @@ xfs_rmap_map_extent(
 	if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip))
 		type = XFS_RMAP_MAP_SHARED;
 
-	__xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV);
+	__xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV);
 }
 
 /* Unmap an extent out of a file. */
@@ -2752,6 +2755,7 @@ xfs_rmap_unmap_extent(
 	struct xfs_bmbt_irec	*PREV)
 {
 	enum xfs_rmap_intent_type type = XFS_RMAP_UNMAP;
+	bool			isrt = xfs_ifork_is_realtime(ip, whichfork);
 
 	if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork))
 		return;
@@ -2759,7 +2763,7 @@ xfs_rmap_unmap_extent(
 	if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip))
 		type = XFS_RMAP_UNMAP_SHARED;
 
-	__xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV);
+	__xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV);
 }
 
 /*
@@ -2777,6 +2781,7 @@ xfs_rmap_convert_extent(
 	struct xfs_bmbt_irec	*PREV)
 {
 	enum xfs_rmap_intent_type type = XFS_RMAP_CONVERT;
+	bool			isrt = xfs_ifork_is_realtime(ip, whichfork);
 
 	if (!xfs_rmap_update_is_needed(mp, whichfork))
 		return;
@@ -2784,13 +2789,14 @@ xfs_rmap_convert_extent(
 	if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip))
 		type = XFS_RMAP_CONVERT_SHARED;
 
-	__xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV);
+	__xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV);
 }
 
 /* Schedule the creation of an rmap for non-file data. */
 void
 xfs_rmap_alloc_extent(
 	struct xfs_trans	*tp,
+	bool			isrt,
 	xfs_fsblock_t		fsbno,
 	xfs_extlen_t		len,
 	uint64_t		owner)
@@ -2805,13 +2811,14 @@ xfs_rmap_alloc_extent(
 	bmap.br_startoff = 0;
 	bmap.br_state = XFS_EXT_NORM;
 
-	__xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, XFS_DATA_FORK, &bmap);
+	__xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, isrt, XFS_DATA_FORK, &bmap);
 }
 
 /* Schedule the deletion of an rmap for non-file data. */
 void
 xfs_rmap_free_extent(
 	struct xfs_trans	*tp,
+	bool			isrt,
 	xfs_fsblock_t		fsbno,
 	xfs_extlen_t		len,
 	uint64_t		owner)
@@ -2826,7 +2833,7 @@ xfs_rmap_free_extent(
 	bmap.br_startoff = 0;
 	bmap.br_state = XFS_EXT_NORM;
 
-	__xfs_rmap_add(tp, XFS_RMAP_FREE, owner, XFS_DATA_FORK, &bmap);
+	__xfs_rmap_add(tp, XFS_RMAP_FREE, owner, isrt, XFS_DATA_FORK, &bmap);
 }
 
 /* Compare rmap records.  Returns -1 if a < b, 1 if a > b, and 0 if equal. */
diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h
index 1b19f54b65047f..5f39f6e53cd19a 100644
--- a/libxfs/xfs_rmap.h
+++ b/libxfs/xfs_rmap.h
@@ -175,6 +175,7 @@ struct xfs_rmap_intent {
 	uint64_t				ri_owner;
 	struct xfs_bmbt_irec			ri_bmap;
 	struct xfs_group			*ri_group;
+	bool					ri_realtime;
 };
 
 /* functions for updating the rmapbt based on bmbt map/unmap operations */
@@ -185,9 +186,9 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_bmbt_irec *imap);
-void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno,
+void xfs_rmap_alloc_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno,
 		xfs_extlen_t len, uint64_t owner);
-void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno,
+void xfs_rmap_free_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno,
 		xfs_extlen_t len, uint64_t owner);
 
 int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 18/56] xfs: pretty print metadata file types in error messages
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (16 preceding siblings ...)
  2025-02-06 22:39   ` [PATCH 17/56] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
@ 2025-02-06 22:39   ` Darrick J. Wong
  2025-02-06 22:40   ` [PATCH 19/56] xfs: support file data forks containing metadata btrees Darrick J. Wong
                     ` (37 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 219ee99d3673ded7abbc13ddd4d7847e92661e2c

Create a helper function to turn a metadata file type code into a
printable string, and use this to complain about lockdep problems with
rtgroup inodes.  We'll use this more in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_metafile.c |   18 ++++++++++++++++++
 libxfs/xfs_metafile.h |    2 ++
 libxfs/xfs_rtgroup.c  |    3 ++-
 3 files changed, 22 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 7f673d706aada8..4488e38a87345a 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -20,6 +20,24 @@
 #include "xfs_errortag.h"
 #include "xfs_alloc.h"
 
+static const struct {
+	enum xfs_metafile_type	mtype;
+	const char		*name;
+} xfs_metafile_type_strs[] = { XFS_METAFILE_TYPE_STR };
+
+const char *
+xfs_metafile_type_str(enum xfs_metafile_type metatype)
+{
+	unsigned int	i;
+
+	for (i = 0; i < ARRAY_SIZE(xfs_metafile_type_strs); i++) {
+		if (xfs_metafile_type_strs[i].mtype == metatype)
+			return xfs_metafile_type_strs[i].name;
+	}
+
+	return NULL;
+}
+
 /* Set up an inode to be recognized as a metadata directory inode. */
 void
 xfs_metafile_set_iflag(
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index 8d8f08a6071c23..95af4b52e5a75f 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -6,6 +6,8 @@
 #ifndef __XFS_METAFILE_H__
 #define __XFS_METAFILE_H__
 
+const char *xfs_metafile_type_str(enum xfs_metafile_type metatype);
+
 /* All metadata files must have these flags set. */
 #define XFS_METAFILE_DIFLAGS	(XFS_DIFLAG_IMMUTABLE | \
 				 XFS_DIFLAG_SYNC | \
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index e422a7bc41a55e..b9da90c5062dc1 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -279,7 +279,8 @@ xfs_rtginode_ilock_print_fn(
 	const struct xfs_inode *ip =
 		container_of(m, struct xfs_inode, i_lock.dep_map);
 
-	printk(KERN_CONT " rgno=%u", ip->i_projid);
+	printk(KERN_CONT " rgno=%u metatype=%s", ip->i_projid,
+			xfs_metafile_type_str(ip->i_metatype));
 }
 
 /*


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 19/56] xfs: support file data forks containing metadata btrees
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (17 preceding siblings ...)
  2025-02-06 22:39   ` [PATCH 18/56] xfs: pretty print metadata file types in error messages Darrick J. Wong
@ 2025-02-06 22:40   ` Darrick J. Wong
  2025-02-06 22:40   ` [PATCH 20/56] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong
                     ` (36 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 702c90f451622384d6c65897b619f647704b06a9

Create a new fork format type for metadata btrees.  This fork type
requires that the inode is in the metadata directory tree, and only
applies to the data fork.  The actual type of the metadata btree itself
is determined by the di_metatype field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h     |    6 ++++--
 libxfs/xfs_inode_buf.c  |   23 ++++++++++++++++++++---
 libxfs/xfs_inode_fork.c |   19 +++++++++++++++++++
 3 files changed, 43 insertions(+), 5 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 469fc7afa591b4..41ea4283c43cb4 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -997,7 +997,8 @@ enum xfs_dinode_fmt {
 	XFS_DINODE_FMT_LOCAL,		/* bulk data */
 	XFS_DINODE_FMT_EXTENTS,		/* struct xfs_bmbt_rec */
 	XFS_DINODE_FMT_BTREE,		/* struct xfs_bmdr_block */
-	XFS_DINODE_FMT_UUID		/* added long ago, but never used */
+	XFS_DINODE_FMT_UUID,		/* added long ago, but never used */
+	XFS_DINODE_FMT_META_BTREE,	/* metadata btree */
 };
 
 #define XFS_INODE_FORMAT_STR \
@@ -1005,7 +1006,8 @@ enum xfs_dinode_fmt {
 	{ XFS_DINODE_FMT_LOCAL,		"local" }, \
 	{ XFS_DINODE_FMT_EXTENTS,	"extent" }, \
 	{ XFS_DINODE_FMT_BTREE,		"btree" }, \
-	{ XFS_DINODE_FMT_UUID,		"uuid" }
+	{ XFS_DINODE_FMT_UUID,		"uuid" }, \
+	{ XFS_DINODE_FMT_META_BTREE,	"meta_btree" }
 
 /*
  * Max values for extnum and aextnum.
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 98482cb4948284..10ce2e34969d99 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -438,6 +438,16 @@ xfs_dinode_verify_fork(
 		if (di_nextents > max_extents)
 			return __this_address;
 		break;
+	case XFS_DINODE_FMT_META_BTREE:
+		if (!xfs_has_metadir(mp))
+			return __this_address;
+		if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA)))
+			return __this_address;
+		switch (be16_to_cpu(dip->di_metatype)) {
+		default:
+			return __this_address;
+		}
+		break;
 	default:
 		return __this_address;
 	}
@@ -457,6 +467,10 @@ xfs_dinode_verify_forkoff(
 		if (dip->di_forkoff != (roundup(sizeof(xfs_dev_t), 8) >> 3))
 			return __this_address;
 		break;
+	case XFS_DINODE_FMT_META_BTREE:
+		if (!xfs_has_metadir(mp) || !xfs_has_parent(mp))
+			return __this_address;
+		fallthrough;
 	case XFS_DINODE_FMT_LOCAL:	/* fall through ... */
 	case XFS_DINODE_FMT_EXTENTS:    /* fall through ... */
 	case XFS_DINODE_FMT_BTREE:
@@ -634,9 +648,6 @@ xfs_dinode_verify(
 	if (mode && nextents + naextents > nblocks)
 		return __this_address;
 
-	if (nextents + naextents == 0 && nblocks != 0)
-		return __this_address;
-
 	if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
 		return __this_address;
 
@@ -740,6 +751,12 @@ xfs_dinode_verify(
 			return fa;
 	}
 
+	/* metadata inodes containing btrees always have zero extent count */
+	if (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK) != XFS_DINODE_FMT_META_BTREE) {
+		if (nextents + naextents == 0 && nblocks != 0)
+			return __this_address;
+	}
+
 	return NULL;
 }
 
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index d6bbff85ffba8e..b66dc4ad0f52ef 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -265,6 +265,12 @@ xfs_iformat_data_fork(
 			return xfs_iformat_extents(ip, dip, XFS_DATA_FORK);
 		case XFS_DINODE_FMT_BTREE:
 			return xfs_iformat_btree(ip, dip, XFS_DATA_FORK);
+		case XFS_DINODE_FMT_META_BTREE:
+			switch (ip->i_metatype) {
+			default:
+				break;
+			}
+			fallthrough;
 		default:
 			xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__,
 					dip, sizeof(*dip), __this_address);
@@ -599,6 +605,19 @@ xfs_iflush_fork(
 		}
 		break;
 
+	case XFS_DINODE_FMT_META_BTREE:
+		ASSERT(whichfork == XFS_DATA_FORK);
+
+		if (!(iip->ili_fields & brootflag[whichfork]))
+			break;
+
+		switch (ip->i_metatype) {
+		default:
+			ASSERT(0);
+			break;
+		}
+		break;
+
 	default:
 		ASSERT(0);
 		break;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 20/56] xfs: add realtime reverse map inode to metadata directory
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (18 preceding siblings ...)
  2025-02-06 22:40   ` [PATCH 19/56] xfs: support file data forks containing metadata btrees Darrick J. Wong
@ 2025-02-06 22:40   ` Darrick J. Wong
  2025-02-06 22:40   ` [PATCH 21/56] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong
                     ` (35 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6b08901a6e8fcda555f3ad39abd73bb0dd37f231

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h       |    4 +++-
 libxfs/xfs_inode_buf.c    |    9 +++++++++
 libxfs/xfs_inode_fork.c   |    6 ++++++
 libxfs/xfs_rtgroup.c      |   20 ++++++++++++++++++--
 libxfs/xfs_rtgroup.h      |    8 ++++++++
 libxfs/xfs_rtrmap_btree.c |    6 +++---
 6 files changed, 47 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 41ea4283c43cb4..f32c9fda5a195f 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -857,6 +857,7 @@ enum xfs_metafile_type {
 	XFS_METAFILE_PRJQUOTA,		/* project quota */
 	XFS_METAFILE_RTBITMAP,		/* rt bitmap */
 	XFS_METAFILE_RTSUMMARY,		/* rt summary */
+	XFS_METAFILE_RTRMAP,		/* rt rmap */
 
 	XFS_METAFILE_MAX
 } __packed;
@@ -868,7 +869,8 @@ enum xfs_metafile_type {
 	{ XFS_METAFILE_GRPQUOTA,	"grpquota" }, \
 	{ XFS_METAFILE_PRJQUOTA,	"prjquota" }, \
 	{ XFS_METAFILE_RTBITMAP,	"rtbitmap" }, \
-	{ XFS_METAFILE_RTSUMMARY,	"rtsummary" }
+	{ XFS_METAFILE_RTSUMMARY,	"rtsummary" }, \
+	{ XFS_METAFILE_RTRMAP,		"rtrmap" }
 
 /*
  * On-disk inode structure.
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 10ce2e34969d99..0278fbb6379701 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -444,6 +444,15 @@ xfs_dinode_verify_fork(
 		if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA)))
 			return __this_address;
 		switch (be16_to_cpu(dip->di_metatype)) {
+		case XFS_METAFILE_RTRMAP:
+			/*
+			 * growfs must create the rtrmap inodes before adding a
+			 * realtime volume to the filesystem, so we cannot use
+			 * the rtrmapbt predicate here.
+			 */
+			if (!xfs_has_rmapbt(mp))
+				return __this_address;
+			break;
 		default:
 			return __this_address;
 		}
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index b66dc4ad0f52ef..7c481d159cce15 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -267,6 +267,9 @@ xfs_iformat_data_fork(
 			return xfs_iformat_btree(ip, dip, XFS_DATA_FORK);
 		case XFS_DINODE_FMT_META_BTREE:
 			switch (ip->i_metatype) {
+			case XFS_METAFILE_RTRMAP:
+				ASSERT(0); /* to be implemented later */
+				return -EFSCORRUPTED;
 			default:
 				break;
 			}
@@ -612,6 +615,9 @@ xfs_iflush_fork(
 			break;
 
 		switch (ip->i_metatype) {
+		case XFS_METAFILE_RTRMAP:
+			ASSERT(0); /* to be implemented later */
+			break;
 		default:
 			ASSERT(0);
 			break;
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index b9da90c5062dc1..d46ce8e7fa6e85 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -312,6 +312,8 @@ struct xfs_rtginode_ops {
 
 	unsigned int		sick;	/* rtgroup sickness flag */
 
+	unsigned int		fmt_mask; /* all valid data fork formats */
+
 	/* Does the fs have this feature? */
 	bool			(*enabled)(struct xfs_mount *mp);
 
@@ -327,14 +329,29 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		.name		= "bitmap",
 		.metafile_type	= XFS_METAFILE_RTBITMAP,
 		.sick		= XFS_SICK_RG_BITMAP,
+		.fmt_mask	= (1U << XFS_DINODE_FMT_EXTENTS) |
+				  (1U << XFS_DINODE_FMT_BTREE),
 		.create		= xfs_rtbitmap_create,
 	},
 	[XFS_RTGI_SUMMARY] = {
 		.name		= "summary",
 		.metafile_type	= XFS_METAFILE_RTSUMMARY,
 		.sick		= XFS_SICK_RG_SUMMARY,
+		.fmt_mask	= (1U << XFS_DINODE_FMT_EXTENTS) |
+				  (1U << XFS_DINODE_FMT_BTREE),
 		.create		= xfs_rtsummary_create,
 	},
+	[XFS_RTGI_RMAP] = {
+		.name		= "rmap",
+		.metafile_type	= XFS_METAFILE_RTRMAP,
+		.fmt_mask	= 1U << XFS_DINODE_FMT_META_BTREE,
+		/*
+		 * growfs must create the rtrmap inodes before adding a
+		 * realtime volume to the filesystem, so we cannot use the
+		 * rtrmapbt predicate here.
+		 */
+		.enabled	= xfs_has_rmapbt,
+	},
 };
 
 /* Return the shortname of this rtgroup inode. */
@@ -431,8 +448,7 @@ xfs_rtginode_load(
 		return error;
 	}
 
-	if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS &&
-			       ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) {
+	if (XFS_IS_CORRUPT(mp, !((1U << ip->i_df.if_format) & ops->fmt_mask))) {
 		xfs_irele(ip);
 		xfs_rtginode_mark_sick(rtg, type);
 		return -EFSCORRUPTED;
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index bff319fa5a7173..09ec9f0e660160 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -14,6 +14,7 @@ struct xfs_trans;
 enum xfs_rtg_inodes {
 	XFS_RTGI_BITMAP,	/* allocation bitmap */
 	XFS_RTGI_SUMMARY,	/* allocation summary */
+	XFS_RTGI_RMAP,		/* rmap btree inode */
 
 	XFS_RTGI_MAX,
 };
@@ -74,6 +75,11 @@ static inline struct xfs_inode *rtg_summary(const struct xfs_rtgroup *rtg)
 	return rtg->rtg_inodes[XFS_RTGI_SUMMARY];
 }
 
+static inline struct xfs_inode *rtg_rmap(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_inodes[XFS_RTGI_RMAP];
+}
+
 /* Passive rtgroup references */
 static inline struct xfs_rtgroup *
 xfs_rtgroup_get(
@@ -284,6 +290,8 @@ int xfs_rtginode_create(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,
 		bool init);
 void xfs_rtginode_irele(struct xfs_inode **ipp);
 
+void xfs_rtginode_irele(struct xfs_inode **ipp);
+
 static inline const char *xfs_rtginode_path(xfs_rgnumber_t rgno,
 		enum xfs_rtg_inodes type)
 {
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 3c135648dfce4e..4c3b4a302bd778 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -18,6 +18,7 @@
 #include "xfs_alloc.h"
 #include "xfs_btree.h"
 #include "xfs_btree_staging.h"
+#include "xfs_metafile.h"
 #include "xfs_rmap.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_trace.h"
@@ -403,12 +404,10 @@ xfs_rtrmapbt_init_cursor(
 	struct xfs_trans	*tp,
 	struct xfs_rtgroup	*rtg)
 {
-	struct xfs_inode	*ip = NULL;
+	struct xfs_inode	*ip = rtg_rmap(rtg);
 	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_btree_cur	*cur;
 
-	return NULL; /* XXX */
-
 	xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_ops,
@@ -437,6 +436,7 @@ xfs_rtrmapbt_commit_staged_btree(
 	int			flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT;
 
 	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+	ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_META_BTREE);
 
 	/*
 	 * Free any resources hanging off the real fork, then shallow-copy the


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 21/56] xfs: add metadata reservations for realtime rmap btrees
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (19 preceding siblings ...)
  2025-02-06 22:40   ` [PATCH 20/56] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong
@ 2025-02-06 22:40   ` Darrick J. Wong
  2025-02-06 22:40   ` [PATCH 22/56] xfs: wire up a new metafile type for the realtime rmap Darrick J. Wong
                     ` (34 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8491a55cfc73ff5c2c637a70ade51d4d08abb90a

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtrmap_btree.c |   41 +++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |    2 ++
 2 files changed, 43 insertions(+)


diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 4c3b4a302bd778..08d1e75f190854 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -538,3 +538,44 @@ xfs_rtrmapbt_compute_maxlevels(
 	/* Add one level to handle the inode root level. */
 	mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1;
 }
+
+/* Calculate the rtrmap btree size for some records. */
+static unsigned long long
+xfs_rtrmapbt_calc_size(
+	struct xfs_mount	*mp,
+	unsigned long long	len)
+{
+	return xfs_btree_calc_size(mp->m_rtrmap_mnr, len);
+}
+
+/*
+ * Calculate the maximum rmap btree size.
+ */
+static unsigned long long
+xfs_rtrmapbt_max_size(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtblocks)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_rtrmap_mxr[0] == 0)
+		return 0;
+
+	return xfs_rtrmapbt_calc_size(mp, rtblocks);
+}
+
+/*
+ * Figure out how many blocks to reserve and how many are used by this btree.
+ */
+xfs_filblks_t
+xfs_rtrmapbt_calc_reserves(
+	struct xfs_mount	*mp)
+{
+	uint32_t		blocks = mp->m_groups[XG_TYPE_RTG].blocks;
+
+	if (!xfs_has_rtrmapbt(mp))
+		return 0;
+
+	/* Reserve 1% of the rtgroup or enough for 1 block per record. */
+	return max_t(xfs_filblks_t, blocks / 100,
+			xfs_rtrmapbt_max_size(mp, blocks));
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index 7d1a3a49a2d69b..eaa2942297e20c 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -79,4 +79,6 @@ unsigned int xfs_rtrmapbt_maxlevels_ondisk(void);
 int __init xfs_rtrmapbt_init_cur_cache(void);
 void xfs_rtrmapbt_destroy_cur_cache(void);
 
+xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 22/56] xfs: wire up a new metafile type for the realtime rmap
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (20 preceding siblings ...)
  2025-02-06 22:40   ` [PATCH 21/56] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong
@ 2025-02-06 22:40   ` Darrick J. Wong
  2025-02-06 22:41   ` [PATCH 23/56] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong
                     ` (33 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: f33659e8a114e2c17108227d30a2bdf398e39bdb

Plumb in the pieces we need to embed the root of the realtime rmap btree
in an inode's data fork, complete with new metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h       |    8 +
 libxfs/xfs_inode_fork.c   |    6 +
 libxfs/xfs_ondisk.h       |    1 
 libxfs/xfs_rtrmap_btree.c |  251 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |  112 ++++++++++++++++++++
 5 files changed, 375 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index f32c9fda5a195f..fba4e59aded4a0 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1736,6 +1736,14 @@ typedef __be32 xfs_rmap_ptr_t;
  */
 #define	XFS_RTRMAP_CRC_MAGIC	0x4d415052	/* 'MAPR' */
 
+/*
+ * rtrmap root header, on-disk form only.
+ */
+struct xfs_rtrmap_root {
+	__be16		bb_level;	/* 0 is a leaf */
+	__be16		bb_numrecs;	/* current # of data records */
+};
+
 /* inode-based btree pointer type */
 typedef __be64 xfs_rtrmap_ptr_t;
 
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index 7c481d159cce15..e4f866969bf1b6 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -25,6 +25,7 @@
 #include "xfs_errortag.h"
 #include "xfs_health.h"
 #include "xfs_symlink_remote.h"
+#include "xfs_rtrmap_btree.h"
 
 struct kmem_cache *xfs_ifork_cache;
 
@@ -268,8 +269,7 @@ xfs_iformat_data_fork(
 		case XFS_DINODE_FMT_META_BTREE:
 			switch (ip->i_metatype) {
 			case XFS_METAFILE_RTRMAP:
-				ASSERT(0); /* to be implemented later */
-				return -EFSCORRUPTED;
+				return xfs_iformat_rtrmap(ip, dip);
 			default:
 				break;
 			}
@@ -616,7 +616,7 @@ xfs_iflush_fork(
 
 		switch (ip->i_metatype) {
 		case XFS_METAFILE_RTRMAP:
-			ASSERT(0); /* to be implemented later */
+			xfs_iflush_rtrmap(ip, dip);
 			break;
 		default:
 			ASSERT(0);
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 2c50877a1a2f0b..07e2f5fb3a94ae 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -84,6 +84,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);
 	XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t,			8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root,		4);
 
 	/*
 	 * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 08d1e75f190854..39fd9a0ab986e6 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -75,6 +75,39 @@ xfs_rtrmapbt_get_maxrecs(
 	return cur->bc_mp->m_rtrmap_mxr[level != 0];
 }
 
+/* Calculate number of records in the ondisk realtime rmap btree inode root. */
+unsigned int
+xfs_rtrmapbt_droot_maxrecs(
+	unsigned int		blocklen,
+	bool			leaf)
+{
+	blocklen -= sizeof(struct xfs_rtrmap_root);
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_rmap_rec);
+	return blocklen / (2 * sizeof(struct xfs_rmap_key) +
+			sizeof(xfs_rtrmap_ptr_t));
+}
+
+/*
+ * Get the maximum records we could store in the on-disk format.
+ *
+ * For non-root nodes this is equivalent to xfs_rtrmapbt_get_maxrecs, but
+ * for the root node this checks the available space in the dinode fork
+ * so that we can resize the in-memory buffer to match it.  After a
+ * resize to the maximum size this function returns the same value
+ * as xfs_rtrmapbt_get_maxrecs for the root node, too.
+ */
+STATIC int
+xfs_rtrmapbt_get_dmaxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level != cur->bc_nlevels - 1)
+		return cur->bc_mp->m_rtrmap_mxr[level != 0];
+	return xfs_rtrmapbt_droot_maxrecs(cur->bc_ino.forksize, level == 0);
+}
+
 /*
  * Convert the ondisk record's offset field into the ondisk key's offset field.
  * Fork and bmbt are significant parts of the rmap record key, but written
@@ -367,6 +400,87 @@ xfs_rtrmapbt_keys_contiguous(
 				 be32_to_cpu(key2->rmap.rm_startblock));
 }
 
+static inline void
+xfs_rtrmapbt_move_ptrs(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*broot,
+	short			old_size,
+	size_t			new_size,
+	unsigned int		numrecs)
+{
+	void			*dptr;
+	void			*sptr;
+
+	sptr = xfs_rtrmap_broot_ptr_addr(mp, broot, 1, old_size);
+	dptr = xfs_rtrmap_broot_ptr_addr(mp, broot, 1, new_size);
+	memmove(dptr, sptr, numrecs * sizeof(xfs_rtrmap_ptr_t));
+}
+
+static struct xfs_btree_block *
+xfs_rtrmapbt_broot_realloc(
+	struct xfs_btree_cur	*cur,
+	unsigned int		new_numrecs)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+	struct xfs_btree_block	*broot;
+	unsigned int		new_size;
+	unsigned int		old_size = ifp->if_broot_bytes;
+	const unsigned int	level = cur->bc_nlevels - 1;
+
+	new_size = xfs_rtrmap_broot_space_calc(mp, level, new_numrecs);
+
+	/* Handle the nop case quietly. */
+	if (new_size == old_size)
+		return ifp->if_broot;
+
+	if (new_size > old_size) {
+		unsigned int	old_numrecs;
+
+		/*
+		 * If there wasn't any memory allocated before, just allocate
+		 * it now and get out.
+		 */
+		if (old_size == 0)
+			return xfs_broot_realloc(ifp, new_size);
+
+		/*
+		 * If there is already an existing if_broot, then we need to
+		 * realloc it and possibly move the node block pointers because
+		 * those are not butted up against the btree block header.
+		 */
+		old_numrecs = xfs_rtrmapbt_maxrecs(mp, old_size, level == 0);
+		broot = xfs_broot_realloc(ifp, new_size);
+		if (level > 0)
+			xfs_rtrmapbt_move_ptrs(mp, broot, old_size, new_size,
+					old_numrecs);
+		goto out_broot;
+	}
+
+	/*
+	 * We're reducing numrecs.  If we're going all the way to zero, just
+	 * free the block.
+	 */
+	ASSERT(ifp->if_broot != NULL && old_size > 0);
+	if (new_size == 0)
+		return xfs_broot_realloc(ifp, 0);
+
+	/*
+	 * Shrink the btree root by possibly moving the rtrmapbt pointers,
+	 * since they are not butted up against the btree block header.  Then
+	 * reallocate broot.
+	 */
+	if (level > 0)
+		xfs_rtrmapbt_move_ptrs(mp, ifp->if_broot, old_size, new_size,
+				new_numrecs);
+	broot = xfs_broot_realloc(ifp, new_size);
+
+out_broot:
+	ASSERT(xfs_rtrmap_droot_space(broot) <=
+	       xfs_inode_fork_size(cur->bc_ino.ip, cur->bc_ino.whichfork));
+	return broot;
+}
+
 const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 	.name			= "rtrmap",
 	.type			= XFS_BTREE_TYPE_INODE,
@@ -386,6 +500,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 	.free_block		= xfs_btree_free_metafile_block,
 	.get_minrecs		= xfs_rtrmapbt_get_minrecs,
 	.get_maxrecs		= xfs_rtrmapbt_get_maxrecs,
+	.get_dmaxrecs		= xfs_rtrmapbt_get_dmaxrecs,
 	.init_key_from_rec	= xfs_rtrmapbt_init_key_from_rec,
 	.init_high_key_from_rec	= xfs_rtrmapbt_init_high_key_from_rec,
 	.init_rec_from_cur	= xfs_rtrmapbt_init_rec_from_cur,
@@ -396,6 +511,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 	.keys_inorder		= xfs_rtrmapbt_keys_inorder,
 	.recs_inorder		= xfs_rtrmapbt_recs_inorder,
 	.keys_contiguous	= xfs_rtrmapbt_keys_contiguous,
+	.broot_realloc		= xfs_rtrmapbt_broot_realloc,
 };
 
 /* Allocate a new rt rmap btree cursor. */
@@ -579,3 +695,138 @@ xfs_rtrmapbt_calc_reserves(
 	return max_t(xfs_filblks_t, blocks / 100,
 			xfs_rtrmapbt_max_size(mp, blocks));
 }
+
+/* Convert on-disk form of btree root to in-memory form. */
+STATIC void
+xfs_rtrmapbt_from_disk(
+	struct xfs_inode	*ip,
+	struct xfs_rtrmap_root	*dblock,
+	unsigned int		dblocklen,
+	struct xfs_btree_block	*rblock)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_rmap_key	*fkp;
+	__be64			*fpp;
+	struct xfs_rmap_key	*tkp;
+	__be64			*tpp;
+	struct xfs_rmap_rec	*frp;
+	struct xfs_rmap_rec	*trp;
+	unsigned int		rblocklen = xfs_rtrmap_broot_space(mp, dblock);
+	unsigned int		numrecs;
+	unsigned int		maxrecs;
+
+	xfs_btree_init_block(mp, rblock, &xfs_rtrmapbt_ops, 0, 0, ip->i_ino);
+
+	rblock->bb_level = dblock->bb_level;
+	rblock->bb_numrecs = dblock->bb_numrecs;
+	numrecs = be16_to_cpu(dblock->bb_numrecs);
+
+	if (be16_to_cpu(rblock->bb_level) > 0) {
+		maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false);
+		fkp = xfs_rtrmap_droot_key_addr(dblock, 1);
+		tkp = xfs_rtrmap_key_addr(rblock, 1);
+		fpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs);
+		tpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen);
+		memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs);
+		memcpy(tpp, fpp, sizeof(*fpp) * numrecs);
+	} else {
+		frp = xfs_rtrmap_droot_rec_addr(dblock, 1);
+		trp = xfs_rtrmap_rec_addr(rblock, 1);
+		memcpy(trp, frp, sizeof(*frp) * numrecs);
+	}
+}
+
+/* Load a realtime reverse mapping btree root in from disk. */
+int
+xfs_iformat_rtrmap(
+	struct xfs_inode	*ip,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_rtrmap_root	*dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	struct xfs_btree_block	*broot;
+	unsigned int		numrecs;
+	unsigned int		level;
+	int			dsize;
+
+	/*
+	 * growfs must create the rtrmap inodes before adding a realtime volume
+	 * to the filesystem, so we cannot use the rtrmapbt predicate here.
+	 */
+	if (!xfs_has_rmapbt(ip->i_mount))
+		return -EFSCORRUPTED;
+
+	dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK);
+	numrecs = be16_to_cpu(dfp->bb_numrecs);
+	level = be16_to_cpu(dfp->bb_level);
+
+	if (level > mp->m_rtrmap_maxlevels ||
+	    xfs_rtrmap_droot_space_calc(level, numrecs) > dsize)
+		return -EFSCORRUPTED;
+
+	broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK),
+			xfs_rtrmap_broot_space_calc(mp, level, numrecs));
+	if (broot)
+		xfs_rtrmapbt_from_disk(ip, dfp, dsize, broot);
+	return 0;
+}
+
+/* Convert in-memory form of btree root to on-disk form. */
+void
+xfs_rtrmapbt_to_disk(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*rblock,
+	unsigned int		rblocklen,
+	struct xfs_rtrmap_root	*dblock,
+	unsigned int		dblocklen)
+{
+	struct xfs_rmap_key	*fkp;
+	__be64			*fpp;
+	struct xfs_rmap_key	*tkp;
+	__be64			*tpp;
+	struct xfs_rmap_rec	*frp;
+	struct xfs_rmap_rec	*trp;
+	unsigned int		numrecs;
+	unsigned int		maxrecs;
+
+	ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTRMAP_CRC_MAGIC));
+	ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid));
+	ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL));
+	ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK));
+	ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK));
+
+	dblock->bb_level = rblock->bb_level;
+	dblock->bb_numrecs = rblock->bb_numrecs;
+	numrecs = be16_to_cpu(rblock->bb_numrecs);
+
+	if (be16_to_cpu(rblock->bb_level) > 0) {
+		maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false);
+		fkp = xfs_rtrmap_key_addr(rblock, 1);
+		tkp = xfs_rtrmap_droot_key_addr(dblock, 1);
+		fpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen);
+		tpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs);
+		memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs);
+		memcpy(tpp, fpp, sizeof(*fpp) * numrecs);
+	} else {
+		frp = xfs_rtrmap_rec_addr(rblock, 1);
+		trp = xfs_rtrmap_droot_rec_addr(dblock, 1);
+		memcpy(trp, frp, sizeof(*frp) * numrecs);
+	}
+}
+
+/* Flush a realtime reverse mapping btree root out to disk. */
+void
+xfs_iflush_rtrmap(
+	struct xfs_inode	*ip,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	struct xfs_rtrmap_root	*dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+
+	ASSERT(ifp->if_broot != NULL);
+	ASSERT(ifp->if_broot_bytes > 0);
+	ASSERT(xfs_rtrmap_droot_space(ifp->if_broot) <=
+			xfs_inode_fork_size(ip, XFS_DATA_FORK));
+	xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes,
+			dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK));
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index eaa2942297e20c..e97695066920ee 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -25,6 +25,7 @@ void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur,
 unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen,
 		bool leaf);
 void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp);
+unsigned int xfs_rtrmapbt_droot_maxrecs(unsigned int blocklen, bool leaf);
 
 /*
  * Addresses of records, keys, and pointers within an incore rtrmapbt block.
@@ -81,4 +82,115 @@ void xfs_rtrmapbt_destroy_cur_cache(void);
 
 xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp);
 
+/* Addresses of key, pointers, and records within an ondisk rtrmapbt block. */
+
+static inline struct xfs_rmap_rec *
+xfs_rtrmap_droot_rec_addr(
+	struct xfs_rtrmap_root	*block,
+	unsigned int		index)
+{
+	return (struct xfs_rmap_rec *)
+		((char *)(block + 1) +
+		 (index - 1) * sizeof(struct xfs_rmap_rec));
+}
+
+static inline struct xfs_rmap_key *
+xfs_rtrmap_droot_key_addr(
+	struct xfs_rtrmap_root	*block,
+	unsigned int		index)
+{
+	return (struct xfs_rmap_key *)
+		((char *)(block + 1) +
+		 (index - 1) * 2 * sizeof(struct xfs_rmap_key));
+}
+
+static inline xfs_rtrmap_ptr_t *
+xfs_rtrmap_droot_ptr_addr(
+	struct xfs_rtrmap_root	*block,
+	unsigned int		index,
+	unsigned int		maxrecs)
+{
+	return (xfs_rtrmap_ptr_t *)
+		((char *)(block + 1) +
+		 maxrecs * 2 * sizeof(struct xfs_rmap_key) +
+		 (index - 1) * sizeof(xfs_rtrmap_ptr_t));
+}
+
+/*
+ * Address of pointers within the incore btree root.
+ *
+ * These are to be used when we know the size of the block and
+ * we don't have a cursor.
+ */
+static inline xfs_rtrmap_ptr_t *
+xfs_rtrmap_broot_ptr_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*bb,
+	unsigned int		index,
+	unsigned int		block_size)
+{
+	return xfs_rtrmap_ptr_addr(bb, index,
+			xfs_rtrmapbt_maxrecs(mp, block_size, false));
+}
+
+/*
+ * Compute the space required for the incore btree root containing the given
+ * number of records.
+ */
+static inline size_t
+xfs_rtrmap_broot_space_calc(
+	struct xfs_mount	*mp,
+	unsigned int		level,
+	unsigned int		nrecs)
+{
+	size_t			sz = XFS_RTRMAP_BLOCK_LEN;
+
+	if (level > 0)
+		return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) +
+					 sizeof(xfs_rtrmap_ptr_t));
+	return sz + nrecs * sizeof(struct xfs_rmap_rec);
+}
+
+/*
+ * Compute the space required for the incore btree root given the ondisk
+ * btree root block.
+ */
+static inline size_t
+xfs_rtrmap_broot_space(struct xfs_mount *mp, struct xfs_rtrmap_root *bb)
+{
+	return xfs_rtrmap_broot_space_calc(mp, be16_to_cpu(bb->bb_level),
+			be16_to_cpu(bb->bb_numrecs));
+}
+
+/* Compute the space required for the ondisk root block. */
+static inline size_t
+xfs_rtrmap_droot_space_calc(
+	unsigned int		level,
+	unsigned int		nrecs)
+{
+	size_t			sz = sizeof(struct xfs_rtrmap_root);
+
+	if (level > 0)
+		return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) +
+					 sizeof(xfs_rtrmap_ptr_t));
+	return sz + nrecs * sizeof(struct xfs_rmap_rec);
+}
+
+/*
+ * Compute the space required for the ondisk root block given an incore root
+ * block.
+ */
+static inline size_t
+xfs_rtrmap_droot_space(struct xfs_btree_block *bb)
+{
+	return xfs_rtrmap_droot_space_calc(be16_to_cpu(bb->bb_level),
+			be16_to_cpu(bb->bb_numrecs));
+}
+
+int xfs_iformat_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip);
+void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock,
+		unsigned int rblocklen, struct xfs_rtrmap_root *dblock,
+		unsigned int dblocklen);
+void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 23/56] xfs: wire up rmap map and unmap to the realtime rmapbt
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (21 preceding siblings ...)
  2025-02-06 22:40   ` [PATCH 22/56] xfs: wire up a new metafile type for the realtime rmap Darrick J. Wong
@ 2025-02-06 22:41   ` Darrick J. Wong
  2025-02-06 22:41   ` [PATCH 24/56] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong
                     ` (32 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 609a592865c9e66a1c00eb7b8ee7436eea3c39a3

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rmap.c    |   78 ++++++++++++++++++++++++++++++++++----------------
 libxfs/xfs_rtgroup.c |    9 ++++++
 libxfs/xfs_rtgroup.h |    5 +++
 3 files changed, 66 insertions(+), 26 deletions(-)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index a1a57cd0c62c10..551f158e5424f3 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -25,6 +25,7 @@
 #include "xfs_health.h"
 #include "defer_item.h"
 #include "xfs_rtgroup.h"
+#include "xfs_rtrmap_btree.h"
 
 struct kmem_cache	*xfs_rmap_intent_cache;
 
@@ -2618,6 +2619,47 @@ __xfs_rmap_finish_intent(
 	}
 }
 
+static int
+xfs_rmap_finish_init_cursor(
+	struct xfs_trans		*tp,
+	struct xfs_rmap_intent		*ri,
+	struct xfs_btree_cur		**pcur)
+{
+	struct xfs_perag		*pag = to_perag(ri->ri_group);
+	struct xfs_buf			*agbp = NULL;
+	int				error;
+
+	/*
+	 * Refresh the freelist before we start changing the rmapbt, because a
+	 * shape change could cause us to allocate blocks.
+	 */
+	error = xfs_free_extent_fix_freelist(tp, pag, &agbp);
+	if (error) {
+		xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
+		return error;
+	}
+	if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) {
+		xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
+		return -EFSCORRUPTED;
+	}
+	*pcur = xfs_rmapbt_init_cursor(tp->t_mountp, tp, agbp, pag);
+	return 0;
+}
+
+static int
+xfs_rtrmap_finish_init_cursor(
+	struct xfs_trans		*tp,
+	struct xfs_rmap_intent		*ri,
+	struct xfs_btree_cur		**pcur)
+{
+	struct xfs_rtgroup		*rtg = to_rtg(ri->ri_group);
+
+	xfs_rtgroup_lock(rtg, XFS_RTGLOCK_RMAP);
+	xfs_rtgroup_trans_join(tp, rtg, XFS_RTGLOCK_RMAP);
+	*pcur = xfs_rtrmapbt_init_cursor(tp, rtg);
+	return 0;
+}
+
 /*
  * Process one of the deferred rmap operations.  We pass back the
  * btree cursor to maintain our lock on the rmapbt between calls.
@@ -2633,8 +2675,6 @@ xfs_rmap_finish_one(
 {
 	struct xfs_owner_info		oinfo;
 	struct xfs_mount		*mp = tp->t_mountp;
-	struct xfs_btree_cur		*rcur = *pcur;
-	struct xfs_buf			*agbp = NULL;
 	xfs_agblock_t			bno;
 	bool				unwritten;
 	int				error = 0;
@@ -2648,38 +2688,26 @@ xfs_rmap_finish_one(
 	 * If we haven't gotten a cursor or the cursor AG doesn't match
 	 * the startblock, get one now.
 	 */
-	if (rcur != NULL && rcur->bc_group != ri->ri_group) {
-		xfs_btree_del_cursor(rcur, 0);
-		rcur = NULL;
+	if (*pcur != NULL && (*pcur)->bc_group != ri->ri_group) {
+		xfs_btree_del_cursor(*pcur, 0);
 		*pcur = NULL;
 	}
-	if (rcur == NULL) {
-		struct xfs_perag	*pag = to_perag(ri->ri_group);
-
-		/*
-		 * Refresh the freelist before we start changing the
-		 * rmapbt, because a shape change could cause us to
-		 * allocate blocks.
-		 */
-		error = xfs_free_extent_fix_freelist(tp, pag, &agbp);
-		if (error) {
-			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
+	if (*pcur == NULL) {
+		if (ri->ri_group->xg_type == XG_TYPE_RTG)
+			error = xfs_rtrmap_finish_init_cursor(tp, ri, pcur);
+		else
+			error = xfs_rmap_finish_init_cursor(tp, ri, pcur);
+		if (error)
 			return error;
-		}
-		if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) {
-			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
-			return -EFSCORRUPTED;
-		}
-
-		*pcur = rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag);
 	}
 
 	xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork,
 			ri->ri_bmap.br_startoff);
 	unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN;
-	bno = XFS_FSB_TO_AGBNO(rcur->bc_mp, ri->ri_bmap.br_startblock);
 
-	error = __xfs_rmap_finish_intent(rcur, ri->ri_type, bno,
+	bno = xfs_fsb_to_gbno(mp, ri->ri_bmap.br_startblock,
+			ri->ri_group->xg_type);
+	error = __xfs_rmap_finish_intent(*pcur, ri->ri_type, bno,
 			ri->ri_bmap.br_blockcount, &oinfo, unwritten);
 	if (error)
 		return error;
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index d46ce8e7fa6e85..f0c45e75e52c3e 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -199,6 +199,9 @@ xfs_rtgroup_lock(
 	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
 		xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
 	}
+
+	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
+		xfs_ilock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
 }
 
 /* Unlock metadata inodes associated with this rt group. */
@@ -211,6 +214,9 @@ xfs_rtgroup_unlock(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
+	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
+		xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
+
 	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
 		xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
 		xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
@@ -236,6 +242,9 @@ xfs_rtgroup_trans_join(
 		xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL);
 	}
+
+	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
+		xfs_trans_ijoin(tp, rtg_rmap(rtg), XFS_ILOCK_EXCL);
 }
 
 /* Retrieve rt group geometry. */
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 09ec9f0e660160..6ff222a053674d 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -265,9 +265,12 @@ int xfs_update_last_rtgroup_size(struct xfs_mount *mp,
 #define XFS_RTGLOCK_BITMAP		(1U << 0)
 /* Lock the rt bitmap inode in shared mode */
 #define XFS_RTGLOCK_BITMAP_SHARED	(1U << 1)
+/* Lock the rt rmap inode in exclusive mode */
+#define XFS_RTGLOCK_RMAP		(1U << 2)
 
 #define XFS_RTGLOCK_ALL_FLAGS	(XFS_RTGLOCK_BITMAP | \
-				 XFS_RTGLOCK_BITMAP_SHARED)
+				 XFS_RTGLOCK_BITMAP_SHARED | \
+				 XFS_RTGLOCK_RMAP)
 
 void xfs_rtgroup_lock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
 void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 24/56] xfs: create routine to allocate and initialize a realtime rmap btree inode
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (22 preceding siblings ...)
  2025-02-06 22:41   ` [PATCH 23/56] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong
@ 2025-02-06 22:41   ` Darrick J. Wong
  2025-02-06 22:41   ` [PATCH 25/56] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong
                     ` (31 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 71b8acb42be60e11810eb43a6f470589fcf7b7dd

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c      |    2 ++
 libxfs/xfs_rtrmap_btree.c |   54 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |    5 ++++
 3 files changed, 61 insertions(+)


diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index f0c45e75e52c3e..c40b02be0f41b9 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -30,6 +30,7 @@
 #include "xfs_rtbitmap.h"
 #include "xfs_metafile.h"
 #include "xfs_metadir.h"
+#include "xfs_rtrmap_btree.h"
 
 /* Find the first usable fsblock in this rtgroup. */
 static inline uint32_t
@@ -360,6 +361,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		 * rtrmapbt predicate here.
 		 */
 		.enabled	= xfs_has_rmapbt,
+		.create		= xfs_rtrmapbt_create,
 	},
 };
 
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 39fd9a0ab986e6..a14f2bec8aaa52 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -830,3 +830,57 @@ xfs_iflush_rtrmap(
 	xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes,
 			dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK));
 }
+
+/*
+ * Create a realtime rmap btree inode.
+ */
+int
+xfs_rtrmapbt_create(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	bool			init)
+{
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_btree_block	*broot;
+
+	ifp->if_format = XFS_DINODE_FMT_META_BTREE;
+	ASSERT(ifp->if_broot_bytes == 0);
+	ASSERT(ifp->if_bytes == 0);
+
+	/* Initialize the empty incore btree root. */
+	broot = xfs_broot_realloc(ifp, xfs_rtrmap_broot_space_calc(mp, 0, 0));
+	if (broot)
+		xfs_btree_init_block(mp, broot, &xfs_rtrmapbt_ops, 0, 0,
+				ip->i_ino);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT);
+
+	return 0;
+}
+
+/*
+ * Initialize an rmap for a realtime superblock using the potentially updated
+ * rt geometry in the provided @mp.
+ */
+int
+xfs_rtrmapbt_init_rtsb(
+	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
+	struct xfs_trans	*tp)
+{
+	struct xfs_rmap_irec	rmap = {
+		.rm_blockcount	= mp->m_sb.sb_rextsize,
+		.rm_owner	= XFS_RMAP_OWN_FS,
+	};
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	ASSERT(xfs_has_rtsb(mp));
+	ASSERT(rtg_rgno(rtg) == 0);
+
+	cur = xfs_rtrmapbt_init_cursor(tp, rtg);
+	error = xfs_rmap_map_raw(cur, &rmap);
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index e97695066920ee..bf73460be274d1 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -193,4 +193,9 @@ void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock,
 		unsigned int dblocklen);
 void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip);
 
+int xfs_rtrmapbt_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xfs_trans *tp, bool init);
+int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg,
+		struct xfs_trans *tp);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 25/56] xfs: report realtime rmap btree corruption errors to the health system
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (23 preceding siblings ...)
  2025-02-06 22:41   ` [PATCH 24/56] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong
@ 2025-02-06 22:41   ` Darrick J. Wong
  2025-02-06 22:41   ` [PATCH 26/56] xfs: scrub the realtime rmapbt Darrick J. Wong
                     ` (30 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6d4933c221958d1e1848d5092a3e3d1c6e4a6f92

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.h        |    2 +-
 libxfs/xfs_fs.h           |    1 +
 libxfs/xfs_health.h       |    4 +++-
 libxfs/xfs_rtgroup.c      |    1 +
 libxfs/xfs_rtrmap_btree.c |   10 ++++++++--
 5 files changed, 14 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index ee82dc777d6d5b..dbc047b2fb2cf5 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -135,7 +135,7 @@ struct xfs_btree_ops {
 	/* offset of btree stats array */
 	unsigned int		statoff;
 
-	/* sick mask for health reporting (only for XFS_BTREE_TYPE_AG) */
+	/* sick mask for health reporting (not for bmap btrees) */
 	unsigned int		sick_mask;
 
 	/* cursor operations */
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 41ce4d3d650ec7..7cca458ff81245 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -993,6 +993,7 @@ struct xfs_rtgroup_geometry {
 #define XFS_RTGROUP_GEOM_SICK_SUPER	(1U << 0)  /* superblock */
 #define XFS_RTGROUP_GEOM_SICK_BITMAP	(1U << 1)  /* rtbitmap */
 #define XFS_RTGROUP_GEOM_SICK_SUMMARY	(1U << 2)  /* rtsummary */
+#define XFS_RTGROUP_GEOM_SICK_RMAPBT	(1U << 3)  /* reverse mappings */
 
 /*
  * ioctl commands that are used by Linux filesystems
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index d34986ac18c3fa..5c8a0aff6ba6e9 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -70,6 +70,7 @@ struct xfs_rtgroup;
 #define XFS_SICK_RG_SUPER	(1 << 0)  /* rt group superblock */
 #define XFS_SICK_RG_BITMAP	(1 << 1)  /* rt group bitmap */
 #define XFS_SICK_RG_SUMMARY	(1 << 2)  /* rt groups summary */
+#define XFS_SICK_RG_RMAPBT	(1 << 3)  /* reverse mappings */
 
 /* Observable health issues for AG metadata. */
 #define XFS_SICK_AG_SB		(1 << 0)  /* superblock */
@@ -115,7 +116,8 @@ struct xfs_rtgroup;
 
 #define XFS_SICK_RG_PRIMARY	(XFS_SICK_RG_SUPER | \
 				 XFS_SICK_RG_BITMAP | \
-				 XFS_SICK_RG_SUMMARY)
+				 XFS_SICK_RG_SUMMARY | \
+				 XFS_SICK_RG_RMAPBT)
 
 #define XFS_SICK_AG_PRIMARY	(XFS_SICK_AG_SB | \
 				 XFS_SICK_AG_AGF | \
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index c40b02be0f41b9..7e69b6bf96f1ac 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -354,6 +354,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 	[XFS_RTGI_RMAP] = {
 		.name		= "rmap",
 		.metafile_type	= XFS_METAFILE_RTRMAP,
+		.sick		= XFS_SICK_RG_RMAPBT,
 		.fmt_mask	= 1U << XFS_DINODE_FMT_META_BTREE,
 		/*
 		 * growfs must create the rtrmap inodes before adding a
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index a14f2bec8aaa52..387c9f17118d52 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -25,6 +25,7 @@
 #include "xfs_cksum.h"
 #include "xfs_rtgroup.h"
 #include "xfs_bmap.h"
+#include "xfs_health.h"
 
 static struct kmem_cache	*xfs_rtrmapbt_cur_cache;
 
@@ -494,6 +495,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = {
 
 	.lru_refs		= XFS_RMAP_BTREE_REF,
 	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrmap_2),
+	.sick_mask		= XFS_SICK_RG_RMAPBT,
 
 	.dup_cursor		= xfs_rtrmapbt_dup_cursor,
 	.alloc_block		= xfs_btree_alloc_metafile_block,
@@ -753,16 +755,20 @@ xfs_iformat_rtrmap(
 	 * growfs must create the rtrmap inodes before adding a realtime volume
 	 * to the filesystem, so we cannot use the rtrmapbt predicate here.
 	 */
-	if (!xfs_has_rmapbt(ip->i_mount))
+	if (!xfs_has_rmapbt(ip->i_mount)) {
+		xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE);
 		return -EFSCORRUPTED;
+	}
 
 	dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK);
 	numrecs = be16_to_cpu(dfp->bb_numrecs);
 	level = be16_to_cpu(dfp->bb_level);
 
 	if (level > mp->m_rtrmap_maxlevels ||
-	    xfs_rtrmap_droot_space_calc(level, numrecs) > dsize)
+	    xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) {
+		xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE);
 		return -EFSCORRUPTED;
+	}
 
 	broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK),
 			xfs_rtrmap_broot_space_calc(mp, level, numrecs));


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 26/56] xfs: scrub the realtime rmapbt
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (24 preceding siblings ...)
  2025-02-06 22:41   ` [PATCH 25/56] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong
@ 2025-02-06 22:41   ` Darrick J. Wong
  2025-02-06 22:42   ` [PATCH 27/56] xfs: scrub the metadir path of rt rmap btree files Darrick J. Wong
                     ` (29 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 9a6cc4f6d081fddc0d5ff96744a2507d3559f949

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 7cca458ff81245..34fcbcd0bcd5e3 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -737,9 +737,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_DIRTREE	28	/* directory tree structure */
 #define XFS_SCRUB_TYPE_METAPATH	29	/* metadata directory tree paths */
 #define XFS_SCRUB_TYPE_RGSUPER	30	/* realtime superblock */
+#define XFS_SCRUB_TYPE_RTRMAPBT	31	/* rtgroup reverse mapping btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	31
+#define XFS_SCRUB_TYPE_NR	32
 
 /*
  * This special type code only applies to the vectored scrub implementation.


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 27/56] xfs: scrub the metadir path of rt rmap btree files
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (25 preceding siblings ...)
  2025-02-06 22:41   ` [PATCH 26/56] xfs: scrub the realtime rmapbt Darrick J. Wong
@ 2025-02-06 22:42   ` Darrick J. Wong
  2025-02-06 22:42   ` [PATCH 28/56] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong
                     ` (28 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 366243cc99b7e80236a19d7391b68d0f47677f4f

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 34fcbcd0bcd5e3..d42d3a5617e314 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -830,9 +830,10 @@ struct xfs_scrub_vec_head {
 #define XFS_SCRUB_METAPATH_USRQUOTA	(5)  /* user quota */
 #define XFS_SCRUB_METAPATH_GRPQUOTA	(6)  /* group quota */
 #define XFS_SCRUB_METAPATH_PRJQUOTA	(7)  /* project quota */
+#define XFS_SCRUB_METAPATH_RTRMAPBT	(8)  /* realtime reverse mapping */
 
 /* Number of metapath sm_ino values */
-#define XFS_SCRUB_METAPATH_NR		(8)
+#define XFS_SCRUB_METAPATH_NR		(9)
 
 /*
  * ioctl limits


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 28/56] xfs: online repair of realtime bitmaps for a realtime group
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (26 preceding siblings ...)
  2025-02-06 22:42   ` [PATCH 27/56] xfs: scrub the metadir path of rt rmap btree files Darrick J. Wong
@ 2025-02-06 22:42   ` Darrick J. Wong
  2025-02-06 22:42   ` [PATCH 29/56] xfs: online repair of the realtime rmap btree Darrick J. Wong
                     ` (27 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8defee8dff2b202702cdf33f6d8577adf9ad3e82

For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.h |    9 +++++++++
 1 file changed, 9 insertions(+)


diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 16563a44bd138a..22e5d9cd95f47c 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -135,6 +135,15 @@ xfs_rtb_to_rtx(
 	return div_u64(rtbno, mp->m_sb.sb_rextsize);
 }
 
+/* Return the offset of a rtgroup block number within an rt extent. */
+static inline xfs_extlen_t
+xfs_rgbno_to_rtxoff(
+	struct xfs_mount	*mp,
+	xfs_rgblock_t		rgbno)
+{
+	return rgbno % mp->m_sb.sb_rextsize;
+}
+
 /* Return the offset of an rt block number within an rt extent. */
 static inline xfs_extlen_t
 xfs_rtb_to_rtxoff(


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 29/56] xfs: online repair of the realtime rmap btree
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (27 preceding siblings ...)
  2025-02-06 22:42   ` [PATCH 28/56] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong
@ 2025-02-06 22:42   ` Darrick J. Wong
  2025-02-06 22:42   ` [PATCH 30/56] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong
                     ` (26 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6a849bd81b69ccbda5b766cc700f0be86194e4d1

Repair the realtime rmap btree while mounted.  Similar to the regular
rmap btree repair code, we walk the data fork mappings of every realtime
file in the filesystem to collect reverse-mapping records in an xfarray.
Then we sort the xfarray, and use the btree bulk loader to create a new
rtrmap btree ondisk.  Finally, we swap the btree roots, and reap the old
blocks in the usual way.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree_staging.c |    1 +
 libxfs/xfs_rtrmap_btree.c  |    2 +-
 libxfs/xfs_rtrmap_btree.h  |    3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_btree_staging.c b/libxfs/xfs_btree_staging.c
index b3afb4a142a5e0..d82665ef78398e 100644
--- a/libxfs/xfs_btree_staging.c
+++ b/libxfs/xfs_btree_staging.c
@@ -134,6 +134,7 @@ xfs_btree_stage_ifakeroot(
 	cur->bc_ino.ifake = ifake;
 	cur->bc_nlevels = ifake->if_levels;
 	cur->bc_ino.forksize = ifake->if_fork_size;
+	cur->bc_ino.whichfork = XFS_STAGING_FORK;
 	cur->bc_flags |= XFS_BTREE_STAGING;
 }
 
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 387c9f17118d52..ac51e736e7e489 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -658,7 +658,7 @@ xfs_rtrmapbt_compute_maxlevels(
 }
 
 /* Calculate the rtrmap btree size for some records. */
-static unsigned long long
+unsigned long long
 xfs_rtrmapbt_calc_size(
 	struct xfs_mount	*mp,
 	unsigned long long	len)
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index bf73460be274d1..ad76ac7938b602 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -198,4 +198,7 @@ int xfs_rtrmapbt_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip,
 int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg,
 		struct xfs_trans *tp);
 
+unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp,
+		unsigned long long len);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 30/56] xfs: create a shadow rmap btree during realtime rmap repair
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (28 preceding siblings ...)
  2025-02-06 22:42   ` [PATCH 29/56] xfs: online repair of the realtime rmap btree Darrick J. Wong
@ 2025-02-06 22:42   ` Darrick J. Wong
  2025-02-06 22:43   ` [PATCH 31/56] xfs: namespace the maximum length/refcount symbols Darrick J. Wong
                     ` (25 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4a61f12eb11958f157e054d386466627445644cd

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree_mem.c    |    1 
 libxfs/xfs_rmap.c         |    3 +
 libxfs/xfs_rtrmap_btree.c |  118 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |    6 ++
 libxfs/xfs_shared.h       |    7 +++
 5 files changed, 134 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_btree_mem.c b/libxfs/xfs_btree_mem.c
index 8e3efdbccc156a..2b98b8d01dce0d 100644
--- a/libxfs/xfs_btree_mem.c
+++ b/libxfs/xfs_btree_mem.c
@@ -17,6 +17,7 @@
 #include "xfs_btree_mem.h"
 #include "xfs_ag.h"
 #include "xfs_trace.h"
+#include "xfs_rtgroup.h"
 
 /* Set the root of an in-memory btree. */
 void
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 551f158e5424f3..3748bfc7a9dfc1 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -326,7 +326,8 @@ xfs_rmap_check_btrec(
 	struct xfs_btree_cur		*cur,
 	const struct xfs_rmap_irec	*irec)
 {
-	if (xfs_btree_is_rtrmap(cur->bc_ops))
+	if (xfs_btree_is_rtrmap(cur->bc_ops) ||
+	    xfs_btree_is_mem_rtrmap(cur->bc_ops))
 		return xfs_rtrmap_check_irec(to_rtg(cur->bc_group), irec);
 	return xfs_rmap_check_irec(to_perag(cur->bc_group), irec);
 }
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index ac51e736e7e489..10055110b8cf42 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -26,6 +26,9 @@
 #include "xfs_rtgroup.h"
 #include "xfs_bmap.h"
 #include "xfs_health.h"
+#include "xfile.h"
+#include "buf_mem.h"
+#include "xfs_btree_mem.h"
 
 static struct kmem_cache	*xfs_rtrmapbt_cur_cache;
 
@@ -540,6 +543,121 @@ xfs_rtrmapbt_init_cursor(
 	return cur;
 }
 
+#ifdef CONFIG_XFS_BTREE_IN_MEM
+/*
+ * Validate an in-memory realtime rmap btree block.  Callers are allowed to
+ * generate an in-memory btree even if the ondisk feature is not enabled.
+ */
+static xfs_failaddr_t
+xfs_rtrmapbt_mem_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	xfs_failaddr_t		fa;
+	unsigned int		level;
+	unsigned int		maxrecs;
+
+	if (!xfs_verify_magic(bp, block->bb_magic))
+		return __this_address;
+
+	fa = xfs_btree_fsblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN);
+	if (fa)
+		return fa;
+
+	level = be16_to_cpu(block->bb_level);
+	if (xfs_has_rmapbt(mp)) {
+		if (level >= mp->m_rtrmap_maxlevels)
+			return __this_address;
+	} else {
+		if (level >= xfs_rtrmapbt_maxlevels_ondisk())
+			return __this_address;
+	}
+
+	maxrecs = xfs_rtrmapbt_maxrecs(mp, XFBNO_BLOCKSIZE, level == 0);
+	return xfs_btree_memblock_verify(bp, maxrecs);
+}
+
+static void
+xfs_rtrmapbt_mem_rw_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_failaddr_t	fa = xfs_rtrmapbt_mem_verify(bp);
+
+	if (fa)
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+}
+
+/* skip crc checks on in-memory btrees to save time */
+static const struct xfs_buf_ops xfs_rtrmapbt_mem_buf_ops = {
+	.name			= "xfs_rtrmapbt_mem",
+	.magic			= { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) },
+	.verify_read		= xfs_rtrmapbt_mem_rw_verify,
+	.verify_write		= xfs_rtrmapbt_mem_rw_verify,
+	.verify_struct		= xfs_rtrmapbt_mem_verify,
+};
+
+const struct xfs_btree_ops xfs_rtrmapbt_mem_ops = {
+	.type			= XFS_BTREE_TYPE_MEM,
+	.geom_flags		= XFS_BTGEO_OVERLAPPING,
+
+	.rec_len		= sizeof(struct xfs_rmap_rec),
+	/* Overlapping btree; 2 keys per pointer. */
+	.key_len		= 2 * sizeof(struct xfs_rmap_key),
+	.ptr_len		= XFS_BTREE_LONG_PTR_LEN,
+
+	.lru_refs		= XFS_RMAP_BTREE_REF,
+	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrmap_mem_2),
+
+	.dup_cursor		= xfbtree_dup_cursor,
+	.set_root		= xfbtree_set_root,
+	.alloc_block		= xfbtree_alloc_block,
+	.free_block		= xfbtree_free_block,
+	.get_minrecs		= xfbtree_get_minrecs,
+	.get_maxrecs		= xfbtree_get_maxrecs,
+	.init_key_from_rec	= xfs_rtrmapbt_init_key_from_rec,
+	.init_high_key_from_rec	= xfs_rtrmapbt_init_high_key_from_rec,
+	.init_rec_from_cur	= xfs_rtrmapbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfbtree_init_ptr_from_cur,
+	.key_diff		= xfs_rtrmapbt_key_diff,
+	.buf_ops		= &xfs_rtrmapbt_mem_buf_ops,
+	.diff_two_keys		= xfs_rtrmapbt_diff_two_keys,
+	.keys_inorder		= xfs_rtrmapbt_keys_inorder,
+	.recs_inorder		= xfs_rtrmapbt_recs_inorder,
+	.keys_contiguous	= xfs_rtrmapbt_keys_contiguous,
+};
+
+/* Create a cursor for an in-memory btree. */
+struct xfs_btree_cur *
+xfs_rtrmapbt_mem_cursor(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_trans	*tp,
+	struct xfbtree		*xfbt)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_mem_ops,
+			mp->m_rtrmap_maxlevels, xfs_rtrmapbt_cur_cache);
+	cur->bc_mem.xfbtree = xfbt;
+	cur->bc_nlevels = xfbt->nlevels;
+	cur->bc_group = xfs_group_hold(rtg_group(rtg));
+	return cur;
+}
+
+/* Create an in-memory realtime rmap btree. */
+int
+xfs_rtrmapbt_mem_init(
+	struct xfs_mount	*mp,
+	struct xfbtree		*xfbt,
+	struct xfs_buftarg	*btp,
+	xfs_rgnumber_t		rgno)
+{
+	xfbt->owner = rgno;
+	return xfbtree_init(mp, xfbt, btp, &xfs_rtrmapbt_mem_ops);
+}
+#endif /* CONFIG_XFS_BTREE_IN_MEM */
+
 /*
  * Install a new rt reverse mapping btree root.  Caller is responsible for
  * invalidating and freeing the old btree blocks.
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index ad76ac7938b602..9d0915089891a5 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -11,6 +11,7 @@ struct xfs_btree_cur;
 struct xfs_mount;
 struct xbtree_ifakeroot;
 struct xfs_rtgroup;
+struct xfbtree;
 
 /* rmaps only exist on crc enabled filesystems */
 #define XFS_RTRMAP_BLOCK_LEN	XFS_BTREE_LBLOCK_CRC_LEN
@@ -201,4 +202,9 @@ int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg,
 unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
+struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg,
+		struct xfs_trans *tp, struct xfbtree *xfbtree);
+int xfs_rtrmapbt_mem_init(struct xfs_mount *mp, struct xfbtree *xfbtree,
+		struct xfs_buftarg *btp, xfs_rgnumber_t rgno);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index da23dac22c3f08..960716c387cc2b 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -57,6 +57,7 @@ extern const struct xfs_btree_ops xfs_refcountbt_ops;
 extern const struct xfs_btree_ops xfs_rmapbt_ops;
 extern const struct xfs_btree_ops xfs_rmapbt_mem_ops;
 extern const struct xfs_btree_ops xfs_rtrmapbt_ops;
+extern const struct xfs_btree_ops xfs_rtrmapbt_mem_ops;
 
 static inline bool xfs_btree_is_bno(const struct xfs_btree_ops *ops)
 {
@@ -98,8 +99,14 @@ static inline bool xfs_btree_is_mem_rmap(const struct xfs_btree_ops *ops)
 {
 	return ops == &xfs_rmapbt_mem_ops;
 }
+
+static inline bool xfs_btree_is_mem_rtrmap(const struct xfs_btree_ops *ops)
+{
+	return ops == &xfs_rtrmapbt_mem_ops;
+}
 #else
 # define xfs_btree_is_mem_rmap(...)	(false)
+# define xfs_btree_is_mem_rtrmap(...)	(false)
 #endif
 
 static inline bool xfs_btree_is_rtrmap(const struct xfs_btree_ops *ops)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 31/56] xfs: namespace the maximum length/refcount symbols
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (29 preceding siblings ...)
  2025-02-06 22:42   ` [PATCH 30/56] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong
@ 2025-02-06 22:43   ` Darrick J. Wong
  2025-02-06 22:43   ` [PATCH 32/56] xfs: introduce realtime refcount btree ondisk definitions Darrick J. Wong
                     ` (24 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 70fcf6866578e69635399e806273376f5e0b8e2b

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h   |    4 ++--
 libxfs/xfs_refcount.c |   18 +++++++++---------
 repair/rmap.c         |    2 +-
 repair/scan.c         |    2 +-
 4 files changed, 13 insertions(+), 13 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index fba4e59aded4a0..16696bc3ff9445 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1790,8 +1790,8 @@ struct xfs_refcount_key {
 	__be32		rc_startblock;	/* starting block number */
 };
 
-#define MAXREFCOUNT	((xfs_nlink_t)~0U)
-#define MAXREFCEXTLEN	((xfs_extlen_t)~0U)
+#define XFS_REFC_REFCOUNT_MAX	((xfs_nlink_t)~0U)
+#define XFS_REFC_LEN_MAX	((xfs_extlen_t)~0U)
 
 /* btree pointer type */
 typedef __be32 xfs_refcount_ptr_t;
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index fb7c56a5a32921..1aeea0b161849d 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -127,7 +127,7 @@ xfs_refcount_check_irec(
 	struct xfs_perag		*pag,
 	const struct xfs_refcount_irec	*irec)
 {
-	if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX)
 		return __this_address;
 
 	if (!xfs_refcount_check_domain(irec))
@@ -137,7 +137,7 @@ xfs_refcount_check_irec(
 	if (!xfs_verify_agbext(pag, irec->rc_startblock, irec->rc_blockcount))
 		return __this_address;
 
-	if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT)
+	if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX)
 		return __this_address;
 
 	return NULL;
@@ -852,9 +852,9 @@ xfs_refc_merge_refcount(
 	const struct xfs_refcount_irec	*irec,
 	enum xfs_refc_adjust_op		adjust)
 {
-	/* Once a record hits MAXREFCOUNT, it is pinned there forever */
-	if (irec->rc_refcount == MAXREFCOUNT)
-		return MAXREFCOUNT;
+	/* Once a record hits XFS_REFC_REFCOUNT_MAX, it is pinned forever */
+	if (irec->rc_refcount == XFS_REFC_REFCOUNT_MAX)
+		return XFS_REFC_REFCOUNT_MAX;
 	return irec->rc_refcount + adjust;
 }
 
@@ -897,7 +897,7 @@ xfs_refc_want_merge_center(
 	 * hence we need to catch u32 addition overflows here.
 	 */
 	ulen += cleft->rc_blockcount + right->rc_blockcount;
-	if (ulen >= MAXREFCEXTLEN)
+	if (ulen >= XFS_REFC_LEN_MAX)
 		return false;
 
 	*ulenp = ulen;
@@ -932,7 +932,7 @@ xfs_refc_want_merge_left(
 	 * hence we need to catch u32 addition overflows here.
 	 */
 	ulen += cleft->rc_blockcount;
-	if (ulen >= MAXREFCEXTLEN)
+	if (ulen >= XFS_REFC_LEN_MAX)
 		return false;
 
 	return true;
@@ -966,7 +966,7 @@ xfs_refc_want_merge_right(
 	 * hence we need to catch u32 addition overflows here.
 	 */
 	ulen += cright->rc_blockcount;
-	if (ulen >= MAXREFCEXTLEN)
+	if (ulen >= XFS_REFC_LEN_MAX)
 		return false;
 
 	return true;
@@ -1195,7 +1195,7 @@ xfs_refcount_adjust_extents(
 		 * Adjust the reference count and either update the tree
 		 * (incr) or free the blocks (decr).
 		 */
-		if (ext.rc_refcount == MAXREFCOUNT)
+		if (ext.rc_refcount == XFS_REFC_REFCOUNT_MAX)
 			goto skip;
 		ext.rc_refcount += adj;
 		trace_xfs_refcount_modify_extent(cur, &ext);
diff --git a/repair/rmap.c b/repair/rmap.c
index 1c6a8691b8cb2c..bd91c721e20e4e 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -793,7 +793,7 @@ refcount_emit(
 		agno, agbno, len, nr_rmaps);
 	rlrec.rc_startblock = agbno;
 	rlrec.rc_blockcount = len;
-	nr_rmaps = min(nr_rmaps, MAXREFCOUNT);
+	nr_rmaps = min(nr_rmaps, XFS_REFC_REFCOUNT_MAX);
 	rlrec.rc_refcount = nr_rmaps;
 	rlrec.rc_domain = XFS_REFC_DOMAIN_SHARED;
 
diff --git a/repair/scan.c b/repair/scan.c
index 221d660e81fdb4..88fbda6b83f61a 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1503,7 +1503,7 @@ _("extent (%u/%u) len %u claimed, state is %d\n"),
 						break;
 					}
 				}
-			} else if (nr < 2 || nr > MAXREFCOUNT) {
+			} else if (nr < 2 || nr > XFS_REFC_REFCOUNT_MAX) {
 				do_warn(
 	_("invalid reference count %u in record %u of %s btree block %u/%u\n"),
 					nr, i, name, agno, bno);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 32/56] xfs: introduce realtime refcount btree ondisk definitions
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (30 preceding siblings ...)
  2025-02-06 22:43   ` [PATCH 31/56] xfs: namespace the maximum length/refcount symbols Darrick J. Wong
@ 2025-02-06 22:43   ` Darrick J. Wong
  2025-02-06 22:43   ` [PATCH 33/56] xfs: realtime refcount btree transaction reservations Darrick J. Wong
                     ` (23 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 9abe03a0e4f978615a2b1b484b8d09ca84c16ea0

Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation, though the changes to actually root the rtrefcount btree
in an inode come later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h              |    1 
 include/xfs_mount.h           |    9 +
 libxfs/Makefile               |    2 
 libxfs/xfs_btree.c            |    5 +
 libxfs/xfs_format.h           |    9 +
 libxfs/xfs_ondisk.h           |    1 
 libxfs/xfs_rtrefcount_btree.c |  271 +++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrefcount_btree.h |   70 +++++++++++
 libxfs/xfs_sb.c               |    8 +
 libxfs/xfs_shared.h           |    7 +
 10 files changed, 383 insertions(+)
 create mode 100644 libxfs/xfs_rtrefcount_btree.c
 create mode 100644 libxfs/xfs_rtrefcount_btree.h


diff --git a/include/libxfs.h b/include/libxfs.h
index af08f176d79a9e..79f8e1ff03d3f5 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -86,6 +86,7 @@ struct iomap;
 #include "xfs_rmap_btree.h"
 #include "xfs_rmap.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_rtrefcount_btree.h"
 #include "xfs_refcount.h"
 #include "xfs_btree_staging.h"
 #include "xfs_rtbitmap.h"
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 0d9b99ef43b4c1..efe7738bcb416a 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -112,11 +112,14 @@ typedef struct xfs_mount {
 	uint			m_rtrmap_mnr[2]; /* min rtrmap btree records */
 	uint			m_refc_mxr[2];	/* max refc btree records */
 	uint			m_refc_mnr[2];	/* min refc btree records */
+	unsigned int		m_rtrefc_mxr[2]; /* max rtrefc btree records */
+	unsigned int		m_rtrefc_mnr[2]; /* min rtrefc btree records */
 	uint			m_alloc_maxlevels; /* max alloc btree levels */
 	uint			m_bm_maxlevels[2];  /* max bmap btree levels */
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
 	uint			m_rtrmap_maxlevels; /* max rtrmap btree level */
 	uint			m_refc_maxlevels; /* max refc btree levels */
+	unsigned int		m_rtrefc_maxlevels; /* max rtrefc btree level */
 	unsigned int		m_agbtree_maxlevels; /* max level of all AG btrees */
 	unsigned int		m_rtbtree_maxlevels; /* max level of all rt btrees */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
@@ -268,6 +271,12 @@ static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp)
 	       xfs_has_rmapbt(mp);
 }
 
+static inline bool xfs_has_rtreflink(struct xfs_mount *mp)
+{
+	return xfs_has_metadir(mp) && xfs_has_realtime(mp) &&
+	       xfs_has_reflink(mp);
+}
+
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
 static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
diff --git a/libxfs/Makefile b/libxfs/Makefile
index d152667eeec275..f3daa598ca9721 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -61,6 +61,7 @@ HFILES = \
 	xfs_quota_defs.h \
 	xfs_refcount.h \
 	xfs_refcount_btree.h \
+	xfs_rtrefcount_btree.h \
 	xfs_rmap.h \
 	xfs_rmap_btree.h \
 	xfs_rtbitmap.h \
@@ -123,6 +124,7 @@ CFILES = buf_mem.c \
 	xfs_parent.c \
 	xfs_refcount.c \
 	xfs_refcount_btree.c \
+	xfs_rtrefcount_btree.c \
 	xfs_rmap.c \
 	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 7cbd8645b9a816..fce215f924898d 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -32,6 +32,7 @@
 #include "xfs_bmap.h"
 #include "xfs_rmap.h"
 #include "xfs_metafile.h"
+#include "xfs_rtrefcount_btree.h"
 
 /*
  * Btree magic numbers.
@@ -5530,6 +5531,9 @@ xfs_btree_init_cur_caches(void)
 	if (error)
 		goto err;
 	error = xfs_rtrmapbt_init_cur_cache();
+	if (error)
+		goto err;
+	error = xfs_rtrefcountbt_init_cur_cache();
 	if (error)
 		goto err;
 
@@ -5549,6 +5553,7 @@ xfs_btree_destroy_cur_caches(void)
 	xfs_rmapbt_destroy_cur_cache();
 	xfs_refcountbt_destroy_cur_cache();
 	xfs_rtrmapbt_destroy_cur_cache();
+	xfs_rtrefcountbt_destroy_cur_cache();
 }
 
 /* Move the btree cursor before the first record. */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 16696bc3ff9445..17f7c0d1aaa452 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1796,6 +1796,15 @@ struct xfs_refcount_key {
 /* btree pointer type */
 typedef __be32 xfs_refcount_ptr_t;
 
+/*
+ * Realtime Reference Count btree format definitions
+ *
+ * This is a btree for reference count records for realtime volumes
+ */
+#define	XFS_RTREFC_CRC_MAGIC	0x52434e54	/* 'RCNT' */
+
+/* inode-rooted btree pointer type */
+typedef __be64 xfs_rtrefcount_ptr_t;
 
 /*
  * BMAP Btree format definitions
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 07e2f5fb3a94ae..efb035050c009c 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -85,6 +85,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);
 	XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t,			8);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root,		4);
+	XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t,		8);
 
 	/*
 	 * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad
diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
new file mode 100644
index 00000000000000..d92ef06b04a73e
--- /dev/null
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2021-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_btree_staging.h"
+#include "xfs_rtrefcount_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_rtgroup.h"
+#include "xfs_rtbitmap.h"
+
+static struct kmem_cache	*xfs_rtrefcountbt_cur_cache;
+
+/*
+ * Realtime Reference Count btree.
+ *
+ * This is a btree used to track the owner(s) of a given extent in the realtime
+ * device.  See the comments in xfs_refcount_btree.c for more information.
+ *
+ * This tree is basically the same as the regular refcount btree except that
+ * it's rooted in an inode.
+ */
+
+static struct xfs_btree_cur *
+xfs_rtrefcountbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_rtrefcountbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group));
+}
+
+static xfs_failaddr_t
+xfs_rtrefcountbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	xfs_failaddr_t		fa;
+	int			level;
+
+	if (!xfs_verify_magic(bp, block->bb_magic))
+		return __this_address;
+
+	if (!xfs_has_reflink(mp))
+		return __this_address;
+	fa = xfs_btree_fsblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN);
+	if (fa)
+		return fa;
+	level = be16_to_cpu(block->bb_level);
+	if (level > mp->m_rtrefc_maxlevels)
+		return __this_address;
+
+	return xfs_btree_fsblock_verify(bp, mp->m_rtrefc_mxr[level != 0]);
+}
+
+static void
+xfs_rtrefcountbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_failaddr_t	fa;
+
+	if (!xfs_btree_fsblock_verify_crc(bp))
+		xfs_verifier_error(bp, -EFSBADCRC, __this_address);
+	else {
+		fa = xfs_rtrefcountbt_verify(bp);
+		if (fa)
+			xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+	}
+
+	if (bp->b_error)
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+}
+
+static void
+xfs_rtrefcountbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	xfs_failaddr_t	fa;
+
+	fa = xfs_rtrefcountbt_verify(bp);
+	if (fa) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+		return;
+	}
+	xfs_btree_fsblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = {
+	.name			= "xfs_rtrefcountbt",
+	.magic			= { 0, cpu_to_be32(XFS_RTREFC_CRC_MAGIC) },
+	.verify_read		= xfs_rtrefcountbt_read_verify,
+	.verify_write		= xfs_rtrefcountbt_write_verify,
+	.verify_struct		= xfs_rtrefcountbt_verify,
+};
+
+const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
+	.name			= "rtrefcount",
+	.type			= XFS_BTREE_TYPE_INODE,
+	.geom_flags		= XFS_BTGEO_IROOT_RECORDS,
+
+	.rec_len		= sizeof(struct xfs_refcount_rec),
+	.key_len		= sizeof(struct xfs_refcount_key),
+	.ptr_len		= XFS_BTREE_LONG_PTR_LEN,
+
+	.lru_refs		= XFS_REFC_BTREE_REF,
+	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrefcbt_2),
+
+	.dup_cursor		= xfs_rtrefcountbt_dup_cursor,
+	.buf_ops		= &xfs_rtrefcountbt_buf_ops,
+};
+
+/* Allocate a new rt refcount btree cursor. */
+struct xfs_btree_cur *
+xfs_rtrefcountbt_init_cursor(
+	struct xfs_trans	*tp,
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_inode	*ip = NULL;
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_btree_cur	*cur;
+
+	return NULL; /* XXX */
+
+	xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
+
+	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrefcountbt_ops,
+			mp->m_rtrefc_maxlevels, xfs_rtrefcountbt_cur_cache);
+
+	cur->bc_ino.ip = ip;
+	cur->bc_refc.nr_ops = 0;
+	cur->bc_refc.shape_changes = 0;
+	cur->bc_group = xfs_group_hold(rtg_group(rtg));
+	cur->bc_nlevels = be16_to_cpu(ip->i_df.if_broot->bb_level) + 1;
+	cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK);
+	cur->bc_ino.whichfork = XFS_DATA_FORK;
+	return cur;
+}
+
+/*
+ * Install a new rt reverse mapping btree root.  Caller is responsible for
+ * invalidating and freeing the old btree blocks.
+ */
+void
+xfs_rtrefcountbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp)
+{
+	struct xbtree_ifakeroot	*ifake = cur->bc_ino.ifake;
+	struct xfs_ifork	*ifp;
+	int			flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	/*
+	 * Free any resources hanging off the real fork, then shallow-copy the
+	 * staging fork's contents into the real fork to transfer everything
+	 * we just built.
+	 */
+	ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK);
+	xfs_idestroy_fork(ifp);
+	memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork));
+
+	cur->bc_ino.ip->i_projid = cur->bc_group->xg_gno;
+	xfs_trans_log_inode(tp, cur->bc_ino.ip, flags);
+	xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK);
+}
+
+/* Calculate number of records in a realtime refcount btree block. */
+static inline unsigned int
+xfs_rtrefcountbt_block_maxrecs(
+	unsigned int		blocklen,
+	bool			leaf)
+{
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_refcount_rec);
+	return blocklen / (sizeof(struct xfs_refcount_key) +
+			   sizeof(xfs_rtrefcount_ptr_t));
+}
+
+/*
+ * Calculate number of records in an refcount btree block.
+ */
+unsigned int
+xfs_rtrefcountbt_maxrecs(
+	struct xfs_mount	*mp,
+	unsigned int		blocklen,
+	bool			leaf)
+{
+	blocklen -= XFS_RTREFCOUNT_BLOCK_LEN;
+	return xfs_rtrefcountbt_block_maxrecs(blocklen, leaf);
+}
+
+/* Compute the max possible height for realtime refcount btrees. */
+unsigned int
+xfs_rtrefcountbt_maxlevels_ondisk(void)
+{
+	unsigned int		minrecs[2];
+	unsigned int		blocklen;
+
+	blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN;
+
+	minrecs[0] = xfs_rtrefcountbt_block_maxrecs(blocklen, true) / 2;
+	minrecs[1] = xfs_rtrefcountbt_block_maxrecs(blocklen, false) / 2;
+
+	/* We need at most one record for every block in an rt group. */
+	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS);
+}
+
+int __init
+xfs_rtrefcountbt_init_cur_cache(void)
+{
+	xfs_rtrefcountbt_cur_cache = kmem_cache_create("xfs_rtrefcountbt_cur",
+			xfs_btree_cur_sizeof(
+					xfs_rtrefcountbt_maxlevels_ondisk()),
+			0, 0, NULL);
+
+	if (!xfs_rtrefcountbt_cur_cache)
+		return -ENOMEM;
+	return 0;
+}
+
+void
+xfs_rtrefcountbt_destroy_cur_cache(void)
+{
+	kmem_cache_destroy(xfs_rtrefcountbt_cur_cache);
+	xfs_rtrefcountbt_cur_cache = NULL;
+}
+
+/* Compute the maximum height of a realtime refcount btree. */
+void
+xfs_rtrefcountbt_compute_maxlevels(
+	struct xfs_mount	*mp)
+{
+	unsigned int		d_maxlevels, r_maxlevels;
+
+	if (!xfs_has_rtreflink(mp)) {
+		mp->m_rtrefc_maxlevels = 0;
+		return;
+	}
+
+	/*
+	 * The realtime refcountbt lives on the data device, which means that
+	 * its maximum height is constrained by the size of the data device and
+	 * the height required to store one refcount record for each rtextent
+	 * in an rt group.
+	 */
+	d_maxlevels = xfs_btree_space_to_height(mp->m_rtrefc_mnr,
+				mp->m_sb.sb_dblocks);
+	r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrefc_mnr,
+				mp->m_sb.sb_rgextents);
+
+	/* Add one level to handle the inode root level. */
+	mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1;
+}
diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h
new file mode 100644
index 00000000000000..b713b33818800c
--- /dev/null
+++ b/libxfs/xfs_rtrefcount_btree.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2021-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_RTREFCOUNT_BTREE_H__
+#define __XFS_RTREFCOUNT_BTREE_H__
+
+struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+struct xbtree_ifakeroot;
+struct xfs_rtgroup;
+
+/* refcounts only exist on crc enabled filesystems */
+#define XFS_RTREFCOUNT_BLOCK_LEN	XFS_BTREE_LBLOCK_CRC_LEN
+
+struct xfs_btree_cur *xfs_rtrefcountbt_init_cursor(struct xfs_trans *tp,
+		struct xfs_rtgroup *rtg);
+struct xfs_btree_cur *xfs_rtrefcountbt_stage_cursor(struct xfs_mount *mp,
+		struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xbtree_ifakeroot *ifake);
+void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp);
+unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp,
+		unsigned int blocklen, bool leaf);
+void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp);
+
+/*
+ * Addresses of records, keys, and pointers within an incore rtrefcountbt block.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+static inline struct xfs_refcount_rec *
+xfs_rtrefcount_rec_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_refcount_rec *)
+		((char *)block + XFS_RTREFCOUNT_BLOCK_LEN +
+		 (index - 1) * sizeof(struct xfs_refcount_rec));
+}
+
+static inline struct xfs_refcount_key *
+xfs_rtrefcount_key_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index)
+{
+	return (struct xfs_refcount_key *)
+		((char *)block + XFS_RTREFCOUNT_BLOCK_LEN +
+		 (index - 1) * sizeof(struct xfs_refcount_key));
+}
+
+static inline xfs_rtrefcount_ptr_t *
+xfs_rtrefcount_ptr_addr(
+	struct xfs_btree_block	*block,
+	unsigned int		index,
+	unsigned int		maxrecs)
+{
+	return (xfs_rtrefcount_ptr_t *)
+		((char *)block + XFS_RTREFCOUNT_BLOCK_LEN +
+		 maxrecs * sizeof(struct xfs_refcount_key) +
+		 (index - 1) * sizeof(xfs_rtrefcount_ptr_t));
+}
+
+unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void);
+int __init xfs_rtrefcountbt_init_cur_cache(void);
+void xfs_rtrefcountbt_destroy_cur_cache(void);
+
+#endif	/* __XFS_RTREFCOUNT_BTREE_H__ */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 467d64e199d22e..50a43c0328cb93 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -26,6 +26,7 @@
 #include "xfs_rtbitmap.h"
 #include "xfs_rtgroup.h"
 #include "xfs_rtrmap_btree.h"
+#include "xfs_rtrefcount_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -1223,6 +1224,13 @@ xfs_sb_mount_common(
 	mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
 	mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
 
+	mp->m_rtrefc_mxr[0] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize,
+			true);
+	mp->m_rtrefc_mxr[1] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize,
+			false);
+	mp->m_rtrefc_mnr[0] = mp->m_rtrefc_mxr[0] / 2;
+	mp->m_rtrefc_mnr[1] = mp->m_rtrefc_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 960716c387cc2b..b1e0d9bc1f7d70 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops;
 extern const struct xfs_buf_ops xfs_rtsummary_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
 extern const struct xfs_buf_ops xfs_rtsb_buf_ops;
+extern const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
@@ -58,6 +59,7 @@ extern const struct xfs_btree_ops xfs_rmapbt_ops;
 extern const struct xfs_btree_ops xfs_rmapbt_mem_ops;
 extern const struct xfs_btree_ops xfs_rtrmapbt_ops;
 extern const struct xfs_btree_ops xfs_rtrmapbt_mem_ops;
+extern const struct xfs_btree_ops xfs_rtrefcountbt_ops;
 
 static inline bool xfs_btree_is_bno(const struct xfs_btree_ops *ops)
 {
@@ -114,6 +116,11 @@ static inline bool xfs_btree_is_rtrmap(const struct xfs_btree_ops *ops)
 	return ops == &xfs_rtrmapbt_ops;
 }
 
+static inline bool xfs_btree_is_rtrefcount(const struct xfs_btree_ops *ops)
+{
+	return ops == &xfs_rtrefcountbt_ops;
+}
+
 /* log size calculation functions */
 int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
 int	xfs_log_calc_minimum_size(struct xfs_mount *);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 33/56] xfs: realtime refcount btree transaction reservations
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (31 preceding siblings ...)
  2025-02-06 22:43   ` [PATCH 32/56] xfs: introduce realtime refcount btree ondisk definitions Darrick J. Wong
@ 2025-02-06 22:43   ` Darrick J. Wong
  2025-02-06 22:43   ` [PATCH 34/56] xfs: add realtime refcount btree operations Darrick J. Wong
                     ` (22 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 2003c6a8754e307970c101a20baf8fb67d0588f2

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_trans_resv.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index cdfac7a12906c2..4034c52790de8b 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -89,6 +89,14 @@ xfs_refcountbt_block_count(
 	return num_ops * (2 * mp->m_refc_maxlevels - 1);
 }
 
+static unsigned int
+xfs_rtrefcountbt_block_count(
+	struct xfs_mount	*mp,
+	unsigned int		num_ops)
+{
+	return num_ops * (2 * mp->m_rtrefc_maxlevels - 1);
+}
+
 /*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into
@@ -256,10 +264,13 @@ xfs_rtalloc_block_count(
  * Compute the log reservation required to handle the refcount update
  * transaction.  Refcount updates are always done via deferred log items.
  *
- * This is calculated as:
+ * This is calculated as the max of:
  * Data device refcount updates (t1):
  *    the agfs of the ags containing the blocks: nr_ops * sector size
  *    the refcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size
+ * Realtime refcount updates (t2);
+ *    the rt refcount inode
+ *    the rtrefcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size
  */
 static unsigned int
 xfs_calc_refcountbt_reservation(
@@ -267,12 +278,20 @@ xfs_calc_refcountbt_reservation(
 	unsigned int		nr_ops)
 {
 	unsigned int		blksz = XFS_FSB_TO_B(mp, 1);
+	unsigned int		t1, t2 = 0;
 
 	if (!xfs_has_reflink(mp))
 		return 0;
 
-	return xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) +
-	       xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz);
+	t1 = xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz);
+
+	if (xfs_has_realtime(mp))
+		t2 = xfs_calc_inode_res(mp, 1) +
+		     xfs_calc_buf_res(xfs_rtrefcountbt_block_count(mp, nr_ops),
+				     blksz);
+
+	return max(t1, t2);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 34/56] xfs: add realtime refcount btree operations
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (32 preceding siblings ...)
  2025-02-06 22:43   ` [PATCH 33/56] xfs: realtime refcount btree transaction reservations Darrick J. Wong
@ 2025-02-06 22:43   ` Darrick J. Wong
  2025-02-06 22:44   ` [PATCH 35/56] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong
                     ` (21 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 1a6f88ea538db9b3d8aef86112894e7e6d098287

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtrefcount_btree.c |  148 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)


diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index d92ef06b04a73e..ea05b1a48eab62 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -19,6 +19,7 @@
 #include "xfs_btree.h"
 #include "xfs_btree_staging.h"
 #include "xfs_rtrefcount_btree.h"
+#include "xfs_refcount.h"
 #include "xfs_trace.h"
 #include "xfs_cksum.h"
 #include "xfs_rtgroup.h"
@@ -43,6 +44,106 @@ xfs_rtrefcountbt_dup_cursor(
 	return xfs_rtrefcountbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group));
 }
 
+STATIC int
+xfs_rtrefcountbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level == cur->bc_nlevels - 1) {
+		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+
+		return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes,
+				level == 0) / 2;
+	}
+
+	return cur->bc_mp->m_rtrefc_mnr[level != 0];
+}
+
+STATIC int
+xfs_rtrefcountbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level == cur->bc_nlevels - 1) {
+		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+
+		return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes,
+				level == 0);
+	}
+
+	return cur->bc_mp->m_rtrefc_mxr[level != 0];
+}
+
+STATIC void
+xfs_rtrefcountbt_init_key_from_rec(
+	union xfs_btree_key		*key,
+	const union xfs_btree_rec	*rec)
+{
+	key->refc.rc_startblock = rec->refc.rc_startblock;
+}
+
+STATIC void
+xfs_rtrefcountbt_init_high_key_from_rec(
+	union xfs_btree_key		*key,
+	const union xfs_btree_rec	*rec)
+{
+	__u32				x;
+
+	x = be32_to_cpu(rec->refc.rc_startblock);
+	x += be32_to_cpu(rec->refc.rc_blockcount) - 1;
+	key->refc.rc_startblock = cpu_to_be32(x);
+}
+
+STATIC void
+xfs_rtrefcountbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	const struct xfs_refcount_irec *irec = &cur->bc_rec.rc;
+	uint32_t		start;
+
+	start = xfs_refcount_encode_startblock(irec->rc_startblock,
+			irec->rc_domain);
+	rec->refc.rc_startblock = cpu_to_be32(start);
+	rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount);
+	rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount);
+}
+
+STATIC void
+xfs_rtrefcountbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	ptr->l = 0;
+}
+
+STATIC int64_t
+xfs_rtrefcountbt_key_diff(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key)
+{
+	const struct xfs_refcount_key	*kp = &key->refc;
+	const struct xfs_refcount_irec	*irec = &cur->bc_rec.rc;
+	uint32_t			start;
+
+	start = xfs_refcount_encode_startblock(irec->rc_startblock,
+			irec->rc_domain);
+	return (int64_t)be32_to_cpu(kp->rc_startblock) - start;
+}
+
+STATIC int64_t
+xfs_rtrefcountbt_diff_two_keys(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*k1,
+	const union xfs_btree_key	*k2,
+	const union xfs_btree_key	*mask)
+{
+	ASSERT(!mask || mask->refc.rc_startblock);
+
+	return (int64_t)be32_to_cpu(k1->refc.rc_startblock) -
+			be32_to_cpu(k2->refc.rc_startblock);
+}
+
 static xfs_failaddr_t
 xfs_rtrefcountbt_verify(
 	struct xfs_buf		*bp)
@@ -109,6 +210,40 @@ const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = {
 	.verify_struct		= xfs_rtrefcountbt_verify,
 };
 
+STATIC int
+xfs_rtrefcountbt_keys_inorder(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*k1,
+	const union xfs_btree_key	*k2)
+{
+	return be32_to_cpu(k1->refc.rc_startblock) <
+	       be32_to_cpu(k2->refc.rc_startblock);
+}
+
+STATIC int
+xfs_rtrefcountbt_recs_inorder(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_rec	*r1,
+	const union xfs_btree_rec	*r2)
+{
+	return  be32_to_cpu(r1->refc.rc_startblock) +
+		be32_to_cpu(r1->refc.rc_blockcount) <=
+		be32_to_cpu(r2->refc.rc_startblock);
+}
+
+STATIC enum xbtree_key_contig
+xfs_rtrefcountbt_keys_contiguous(
+	struct xfs_btree_cur		*cur,
+	const union xfs_btree_key	*key1,
+	const union xfs_btree_key	*key2,
+	const union xfs_btree_key	*mask)
+{
+	ASSERT(!mask || mask->refc.rc_startblock);
+
+	return xbtree_key_contig(be32_to_cpu(key1->refc.rc_startblock),
+				 be32_to_cpu(key2->refc.rc_startblock));
+}
+
 const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 	.name			= "rtrefcount",
 	.type			= XFS_BTREE_TYPE_INODE,
@@ -122,7 +257,20 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrefcbt_2),
 
 	.dup_cursor		= xfs_rtrefcountbt_dup_cursor,
+	.alloc_block		= xfs_btree_alloc_metafile_block,
+	.free_block		= xfs_btree_free_metafile_block,
+	.get_minrecs		= xfs_rtrefcountbt_get_minrecs,
+	.get_maxrecs		= xfs_rtrefcountbt_get_maxrecs,
+	.init_key_from_rec	= xfs_rtrefcountbt_init_key_from_rec,
+	.init_high_key_from_rec	= xfs_rtrefcountbt_init_high_key_from_rec,
+	.init_rec_from_cur	= xfs_rtrefcountbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_rtrefcountbt_init_ptr_from_cur,
+	.key_diff		= xfs_rtrefcountbt_key_diff,
 	.buf_ops		= &xfs_rtrefcountbt_buf_ops,
+	.diff_two_keys		= xfs_rtrefcountbt_diff_two_keys,
+	.keys_inorder		= xfs_rtrefcountbt_keys_inorder,
+	.recs_inorder		= xfs_rtrefcountbt_recs_inorder,
+	.keys_contiguous	= xfs_rtrefcountbt_keys_contiguous,
 };
 
 /* Allocate a new rt refcount btree cursor. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 35/56] xfs: prepare refcount functions to deal with rtrefcountbt
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (33 preceding siblings ...)
  2025-02-06 22:43   ` [PATCH 34/56] xfs: add realtime refcount btree operations Darrick J. Wong
@ 2025-02-06 22:44   ` Darrick J. Wong
  2025-02-06 22:44   ` [PATCH 36/56] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
                     ` (20 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 01cef1db246ee8b094fca6df23ea6d4335748181

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_refcount.c |   53 ++++++++++++++++++++++++++++++++++++++++++-------
 libxfs/xfs_refcount.h |    3 +++
 2 files changed, 48 insertions(+), 8 deletions(-)


diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 1aeea0b161849d..83e039c3c3afa8 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -24,6 +24,7 @@
 #include "xfs_ag.h"
 #include "xfs_health.h"
 #include "defer_item.h"
+#include "xfs_rtgroup.h"
 
 struct kmem_cache	*xfs_refcount_intent_cache;
 
@@ -143,6 +144,37 @@ xfs_refcount_check_irec(
 	return NULL;
 }
 
+xfs_failaddr_t
+xfs_rtrefcount_check_irec(
+	struct xfs_rtgroup		*rtg,
+	const struct xfs_refcount_irec	*irec)
+{
+	if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX)
+		return __this_address;
+
+	if (!xfs_refcount_check_domain(irec))
+		return __this_address;
+
+	/* check for valid extent range, including overflow */
+	if (!xfs_verify_rgbext(rtg, irec->rc_startblock, irec->rc_blockcount))
+		return __this_address;
+
+	if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX)
+		return __this_address;
+
+	return NULL;
+}
+
+static inline xfs_failaddr_t
+xfs_refcount_check_btrec(
+	struct xfs_btree_cur		*cur,
+	const struct xfs_refcount_irec	*irec)
+{
+	if (xfs_btree_is_rtrefcount(cur->bc_ops))
+		return xfs_rtrefcount_check_irec(to_rtg(cur->bc_group), irec);
+	return xfs_refcount_check_irec(to_perag(cur->bc_group), irec);
+}
+
 static inline int
 xfs_refcount_complain_bad_rec(
 	struct xfs_btree_cur		*cur,
@@ -151,9 +183,15 @@ xfs_refcount_complain_bad_rec(
 {
 	struct xfs_mount		*mp = cur->bc_mp;
 
-	xfs_warn(mp,
+	if (xfs_btree_is_rtrefcount(cur->bc_ops)) {
+		xfs_warn(mp,
+ "RT Refcount BTree record corruption in rtgroup %u detected at %pS!",
+				cur->bc_group->xg_gno, fa);
+	} else {
+		xfs_warn(mp,
  "Refcount BTree record corruption in AG %d detected at %pS!",
 				cur->bc_group->xg_gno, fa);
+	}
 	xfs_warn(mp,
 		"Start block 0x%x, block count 0x%x, references 0x%x",
 		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
@@ -179,7 +217,7 @@ xfs_refcount_get_rec(
 		return error;
 
 	xfs_refcount_btrec_to_irec(rec, irec);
-	fa = xfs_refcount_check_irec(to_perag(cur->bc_group), irec);
+	fa = xfs_refcount_check_btrec(cur, irec);
 	if (fa)
 		return xfs_refcount_complain_bad_rec(cur, fa, irec);
 
@@ -1064,7 +1102,7 @@ xfs_refcount_still_have_space(
 	 */
 	overhead = xfs_allocfree_block_count(cur->bc_mp,
 				cur->bc_refc.shape_changes);
-	overhead += cur->bc_mp->m_refc_maxlevels;
+	overhead += cur->bc_maxlevels;
 	overhead *= cur->bc_mp->m_sb.sb_blocksize;
 
 	/*
@@ -1116,7 +1154,7 @@ xfs_refcount_adjust_extents(
 		if (error)
 			goto out_error;
 		if (!found_rec || ext.rc_domain != XFS_REFC_DOMAIN_SHARED) {
-			ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+			ext.rc_startblock = xfs_group_max_blocks(cur->bc_group);
 			ext.rc_blockcount = 0;
 			ext.rc_refcount = 0;
 			ext.rc_domain = XFS_REFC_DOMAIN_SHARED;
@@ -1665,7 +1703,7 @@ xfs_refcount_adjust_cow_extents(
 		goto out_error;
 	}
 	if (!found_rec) {
-		ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+		ext.rc_startblock = xfs_group_max_blocks(cur->bc_group);
 		ext.rc_blockcount = 0;
 		ext.rc_refcount = 0;
 		ext.rc_domain = XFS_REFC_DOMAIN_COW;
@@ -1876,8 +1914,7 @@ xfs_refcount_recover_extent(
 	INIT_LIST_HEAD(&rr->rr_list);
 	xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec);
 
-	if (xfs_refcount_check_irec(to_perag(cur->bc_group), &rr->rr_rrec) !=
-			NULL ||
+	if (xfs_refcount_check_btrec(cur, &rr->rr_rrec) != NULL ||
 	    XFS_IS_CORRUPT(cur->bc_mp,
 			   rr->rr_rrec.rc_domain != XFS_REFC_DOMAIN_COW)) {
 		xfs_btree_mark_sick(cur);
@@ -2025,7 +2062,7 @@ xfs_refcount_query_range_helper(
 	xfs_failaddr_t			fa;
 
 	xfs_refcount_btrec_to_irec(rec, &irec);
-	fa = xfs_refcount_check_irec(to_perag(cur->bc_group), &irec);
+	fa = xfs_refcount_check_btrec(cur, &irec);
 	if (fa)
 		return xfs_refcount_complain_bad_rec(cur, fa, &irec);
 
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 62d78afcf1f3ff..9cd58d48716f83 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -12,6 +12,7 @@ struct xfs_perag;
 struct xfs_btree_cur;
 struct xfs_bmbt_irec;
 struct xfs_refcount_irec;
+struct xfs_rtgroup;
 
 extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur,
 		enum xfs_refc_domain domain, xfs_agblock_t bno, int *stat);
@@ -120,6 +121,8 @@ extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec,
 		struct xfs_refcount_irec *irec);
 xfs_failaddr_t xfs_refcount_check_irec(struct xfs_perag *pag,
 		const struct xfs_refcount_irec *irec);
+xfs_failaddr_t xfs_rtrefcount_check_irec(struct xfs_rtgroup *rtg,
+		const struct xfs_refcount_irec *irec);
 extern int xfs_refcount_insert(struct xfs_btree_cur *cur,
 		struct xfs_refcount_irec *irec, int *stat);
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 36/56] xfs: add a realtime flag to the refcount update log redo items
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (34 preceding siblings ...)
  2025-02-06 22:44   ` [PATCH 35/56] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong
@ 2025-02-06 22:44   ` Darrick J. Wong
  2025-02-06 22:44   ` [PATCH 37/56] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong
                     ` (19 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: fd9300679ccec20c6ee1b95458ab0bcf0db628d5

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c       |   10 ++++---
 libxfs/xfs_defer.h      |    1 +
 libxfs/xfs_log_format.h |    6 ++++
 libxfs/xfs_refcount.c   |   63 ++++++++++++++++++++++++++++++++++-------------
 libxfs/xfs_refcount.h   |   17 +++++++------
 5 files changed, 67 insertions(+), 30 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 41cfadb51f4937..9cfbdb85c975a5 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4558,8 +4558,9 @@ xfs_bmapi_write(
 			 * the refcount btree for orphan recovery.
 			 */
 			if (whichfork == XFS_COW_FORK)
-				xfs_refcount_alloc_cow_extent(tp, bma.blkno,
-						bma.length);
+				xfs_refcount_alloc_cow_extent(tp,
+						XFS_IS_REALTIME_INODE(ip),
+						bma.blkno, bma.length);
 		}
 
 		/* Deal with the allocated space we found.  */
@@ -4734,7 +4735,8 @@ xfs_bmapi_convert_one_delalloc(
 		*seq = READ_ONCE(ifp->if_seq);
 
 	if (whichfork == XFS_COW_FORK)
-		xfs_refcount_alloc_cow_extent(tp, bma.blkno, bma.length);
+		xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip),
+				bma.blkno, bma.length);
 
 	error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags,
 			whichfork);
@@ -5382,7 +5384,7 @@ xfs_bmap_del_extent_real(
 		bool	isrt = xfs_ifork_is_realtime(ip, whichfork);
 
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
-			xfs_refcount_decrease_extent(tp, del);
+			xfs_refcount_decrease_extent(tp, isrt, del);
 		} else if (isrt && !xfs_has_rtgroups(mp)) {
 			error = xfs_bmap_free_rtblocks(tp, del);
 		} else {
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 1e2477eaa5a844..9effd95ddcd4ee 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -68,6 +68,7 @@ struct xfs_defer_op_type {
 
 extern const struct xfs_defer_op_type xfs_bmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
+extern const struct xfs_defer_op_type xfs_rtrefcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rtrmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index a7e0e479454d3d..ec7157eaba5f68 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -252,6 +252,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_EFD_RT		0x124b	/* realtime extent free done */
 #define	XFS_LI_RUI_RT		0x124c	/* realtime rmap update intent */
 #define	XFS_LI_RUD_RT		0x124d	/* realtime rmap update done */
+#define	XFS_LI_CUI_RT		0x124e	/* realtime refcount update intent */
+#define	XFS_LI_CUD_RT		0x124f	/* realtime refcount update done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -275,7 +277,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_EFI_RT,	"XFS_LI_EFI_RT" }, \
 	{ XFS_LI_EFD_RT,	"XFS_LI_EFD_RT" }, \
 	{ XFS_LI_RUI_RT,	"XFS_LI_RUI_RT" }, \
-	{ XFS_LI_RUD_RT,	"XFS_LI_RUD_RT" }
+	{ XFS_LI_RUD_RT,	"XFS_LI_RUD_RT" }, \
+	{ XFS_LI_CUI_RT,	"XFS_LI_CUI_RT" }, \
+	{ XFS_LI_CUD_RT,	"XFS_LI_CUD_RT" }
 
 /*
  * Inode Log Item Format definitions.
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 83e039c3c3afa8..ef08c26c75b32c 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1122,6 +1122,22 @@ xfs_refcount_still_have_space(
 		cur->bc_refc.nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD;
 }
 
+/* Schedule an extent free. */
+static int
+xrefc_free_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*rec)
+{
+	unsigned int			flags = 0;
+
+	if (xfs_btree_is_rtrefcount(cur->bc_ops))
+		flags |= XFS_FREE_EXTENT_REALTIME;
+
+	return xfs_free_extent_later(cur->bc_tp,
+			xfs_gbno_to_fsb(cur->bc_group, rec->rc_startblock),
+			rec->rc_blockcount, NULL, XFS_AG_RESV_NONE, flags);
+}
+
 /*
  * Adjust the refcounts of middle extents.  At this point we should have
  * split extents that crossed the adjustment range; merged with adjacent
@@ -1138,7 +1154,6 @@ xfs_refcount_adjust_extents(
 	struct xfs_refcount_irec	ext, tmp;
 	int				error;
 	int				found_rec, found_tmp;
-	xfs_fsblock_t			fsbno;
 
 	/* Merging did all the work already. */
 	if (*aglen == 0)
@@ -1191,11 +1206,7 @@ xfs_refcount_adjust_extents(
 					goto out_error;
 				}
 			} else {
-				fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group),
-						tmp.rc_startblock);
-				error = xfs_free_extent_later(cur->bc_tp, fsbno,
-						  tmp.rc_blockcount, NULL,
-						  XFS_AG_RESV_NONE, 0);
+				error = xrefc_free_extent(cur, &tmp);
 				if (error)
 					goto out_error;
 			}
@@ -1253,11 +1264,7 @@ xfs_refcount_adjust_extents(
 			}
 			goto advloop;
 		} else {
-			fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group),
-					ext.rc_startblock);
-			error = xfs_free_extent_later(cur->bc_tp, fsbno,
-					ext.rc_blockcount, NULL,
-					XFS_AG_RESV_NONE, 0);
+			error = xrefc_free_extent(cur, &ext);
 			if (error)
 				goto out_error;
 		}
@@ -1453,6 +1460,20 @@ xfs_refcount_finish_one(
 	return error;
 }
 
+/*
+ * Process one of the deferred realtime refcount operations.  We pass back the
+ * btree cursor to maintain our lock on the btree between calls.
+ */
+int
+xfs_rtrefcount_finish_one(
+	struct xfs_trans		*tp,
+	struct xfs_refcount_intent	*ri,
+	struct xfs_btree_cur		**pcur)
+{
+	ASSERT(0);
+	return -EFSCORRUPTED;
+}
+
 /*
  * Record a refcount intent for later processing.
  */
@@ -1460,6 +1481,7 @@ static void
 __xfs_refcount_add(
 	struct xfs_trans		*tp,
 	enum xfs_refcount_intent_type	type,
+	bool				isrt,
 	xfs_fsblock_t			startblock,
 	xfs_extlen_t			blockcount)
 {
@@ -1471,6 +1493,7 @@ __xfs_refcount_add(
 	ri->ri_type = type;
 	ri->ri_startblock = startblock;
 	ri->ri_blockcount = blockcount;
+	ri->ri_realtime = isrt;
 
 	xfs_refcount_defer_add(tp, ri);
 }
@@ -1481,12 +1504,13 @@ __xfs_refcount_add(
 void
 xfs_refcount_increase_extent(
 	struct xfs_trans		*tp,
+	bool				isrt,
 	struct xfs_bmbt_irec		*PREV)
 {
 	if (!xfs_has_reflink(tp->t_mountp))
 		return;
 
-	__xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, PREV->br_startblock,
+	__xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, isrt, PREV->br_startblock,
 			PREV->br_blockcount);
 }
 
@@ -1496,12 +1520,13 @@ xfs_refcount_increase_extent(
 void
 xfs_refcount_decrease_extent(
 	struct xfs_trans		*tp,
+	bool				isrt,
 	struct xfs_bmbt_irec		*PREV)
 {
 	if (!xfs_has_reflink(tp->t_mountp))
 		return;
 
-	__xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, PREV->br_startblock,
+	__xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, isrt, PREV->br_startblock,
 			PREV->br_blockcount);
 }
 
@@ -1857,6 +1882,7 @@ __xfs_refcount_cow_free(
 void
 xfs_refcount_alloc_cow_extent(
 	struct xfs_trans		*tp,
+	bool				isrt,
 	xfs_fsblock_t			fsb,
 	xfs_extlen_t			len)
 {
@@ -1865,16 +1891,17 @@ xfs_refcount_alloc_cow_extent(
 	if (!xfs_has_reflink(mp))
 		return;
 
-	__xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len);
+	__xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, isrt, fsb, len);
 
 	/* Add rmap entry */
-	xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW);
+	xfs_rmap_alloc_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW);
 }
 
 /* Forget a CoW staging event in the refcount btree. */
 void
 xfs_refcount_free_cow_extent(
 	struct xfs_trans		*tp,
+	bool				isrt,
 	xfs_fsblock_t			fsb,
 	xfs_extlen_t			len)
 {
@@ -1884,8 +1911,8 @@ xfs_refcount_free_cow_extent(
 		return;
 
 	/* Remove rmap entry */
-	xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW);
-	__xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len);
+	xfs_rmap_free_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW);
+	__xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, isrt, fsb, len);
 }
 
 struct xfs_refcount_recovery {
@@ -1991,7 +2018,7 @@ xfs_refcount_recover_cow_leftovers(
 
 		/* Free the orphan record */
 		fsb = xfs_agbno_to_fsb(pag, rr->rr_rrec.rc_startblock);
-		xfs_refcount_free_cow_extent(tp, fsb,
+		xfs_refcount_free_cow_extent(tp, false, fsb,
 				rr->rr_rrec.rc_blockcount);
 
 		/* Free the block. */
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 9cd58d48716f83..be11df25abcc5b 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -61,6 +61,7 @@ struct xfs_refcount_intent {
 	enum xfs_refcount_intent_type		ri_type;
 	xfs_extlen_t				ri_blockcount;
 	xfs_fsblock_t				ri_startblock;
+	bool					ri_realtime;
 };
 
 /* Check that the refcount is appropriate for the record domain. */
@@ -75,22 +76,24 @@ xfs_refcount_check_domain(
 	return true;
 }
 
-void xfs_refcount_increase_extent(struct xfs_trans *tp,
+void xfs_refcount_increase_extent(struct xfs_trans *tp, bool isrt,
 		struct xfs_bmbt_irec *irec);
-void xfs_refcount_decrease_extent(struct xfs_trans *tp,
+void xfs_refcount_decrease_extent(struct xfs_trans *tp, bool isrt,
 		struct xfs_bmbt_irec *irec);
 
-extern int xfs_refcount_finish_one(struct xfs_trans *tp,
+int xfs_refcount_finish_one(struct xfs_trans *tp,
+		struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur);
+int xfs_rtrefcount_finish_one(struct xfs_trans *tp,
 		struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur);
 
 extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur,
 		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
 		xfs_extlen_t *flen, bool find_end_of_shared);
 
-void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb,
-		xfs_extlen_t len);
-void xfs_refcount_free_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb,
-		xfs_extlen_t len);
+void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt,
+		xfs_fsblock_t fsb, xfs_extlen_t len);
+void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt,
+		xfs_fsblock_t fsb, xfs_extlen_t len);
 extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp,
 		struct xfs_perag *pag);
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 37/56] xfs: add realtime refcount btree inode to metadata directory
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (35 preceding siblings ...)
  2025-02-06 22:44   ` [PATCH 36/56] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
@ 2025-02-06 22:44   ` Darrick J. Wong
  2025-02-06 22:45   ` [PATCH 38/56] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong
                     ` (18 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: eaed472c40527e526217aff3737816b44b08b363

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h           |    4 +++-
 libxfs/xfs_inode_buf.c        |    5 +++++
 libxfs/xfs_inode_fork.c       |    6 ++++++
 libxfs/xfs_rtgroup.c          |    7 +++++++
 libxfs/xfs_rtgroup.h          |    6 ++++++
 libxfs/xfs_rtrefcount_btree.c |    6 +++---
 6 files changed, 30 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 17f7c0d1aaa452..b6828f92c131fb 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -858,6 +858,7 @@ enum xfs_metafile_type {
 	XFS_METAFILE_RTBITMAP,		/* rt bitmap */
 	XFS_METAFILE_RTSUMMARY,		/* rt summary */
 	XFS_METAFILE_RTRMAP,		/* rt rmap */
+	XFS_METAFILE_RTREFCOUNT,	/* rt refcount */
 
 	XFS_METAFILE_MAX
 } __packed;
@@ -870,7 +871,8 @@ enum xfs_metafile_type {
 	{ XFS_METAFILE_PRJQUOTA,	"prjquota" }, \
 	{ XFS_METAFILE_RTBITMAP,	"rtbitmap" }, \
 	{ XFS_METAFILE_RTSUMMARY,	"rtsummary" }, \
-	{ XFS_METAFILE_RTRMAP,		"rtrmap" }
+	{ XFS_METAFILE_RTRMAP,		"rtrmap" }, \
+	{ XFS_METAFILE_RTREFCOUNT,	"rtrefcount" }
 
 /*
  * On-disk inode structure.
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 0278fbb6379701..0be0625857002c 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -453,6 +453,11 @@ xfs_dinode_verify_fork(
 			if (!xfs_has_rmapbt(mp))
 				return __this_address;
 			break;
+		case XFS_METAFILE_RTREFCOUNT:
+			/* same comment about growfs and rmap inodes applies */
+			if (!xfs_has_reflink(mp))
+				return __this_address;
+			break;
 		default:
 			return __this_address;
 		}
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index e4f866969bf1b6..f690c746805fad 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -270,6 +270,9 @@ xfs_iformat_data_fork(
 			switch (ip->i_metatype) {
 			case XFS_METAFILE_RTRMAP:
 				return xfs_iformat_rtrmap(ip, dip);
+			case XFS_METAFILE_RTREFCOUNT:
+				ASSERT(0); /* to be implemented later */
+				return -EFSCORRUPTED;
 			default:
 				break;
 			}
@@ -618,6 +621,9 @@ xfs_iflush_fork(
 		case XFS_METAFILE_RTRMAP:
 			xfs_iflush_rtrmap(ip, dip);
 			break;
+		case XFS_METAFILE_RTREFCOUNT:
+			ASSERT(0); /* to be implemented later */
+			break;
 		default:
 			ASSERT(0);
 			break;
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 7e69b6bf96f1ac..f7bacce8500c0c 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -364,6 +364,13 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		.enabled	= xfs_has_rmapbt,
 		.create		= xfs_rtrmapbt_create,
 	},
+	[XFS_RTGI_REFCOUNT] = {
+		.name		= "refcount",
+		.metafile_type	= XFS_METAFILE_RTREFCOUNT,
+		.fmt_mask	= 1U << XFS_DINODE_FMT_META_BTREE,
+		/* same comment about growfs and rmap inodes applies here */
+		.enabled	= xfs_has_reflink,
+	},
 };
 
 /* Return the shortname of this rtgroup inode. */
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 6ff222a053674d..2663f2d849e295 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -15,6 +15,7 @@ enum xfs_rtg_inodes {
 	XFS_RTGI_BITMAP,	/* allocation bitmap */
 	XFS_RTGI_SUMMARY,	/* allocation summary */
 	XFS_RTGI_RMAP,		/* rmap btree inode */
+	XFS_RTGI_REFCOUNT,	/* refcount btree inode */
 
 	XFS_RTGI_MAX,
 };
@@ -80,6 +81,11 @@ static inline struct xfs_inode *rtg_rmap(const struct xfs_rtgroup *rtg)
 	return rtg->rtg_inodes[XFS_RTGI_RMAP];
 }
 
+static inline struct xfs_inode *rtg_refcount(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_inodes[XFS_RTGI_REFCOUNT];
+}
+
 /* Passive rtgroup references */
 static inline struct xfs_rtgroup *
 xfs_rtgroup_get(
diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index ea05b1a48eab62..46c62bd63458f4 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -24,6 +24,7 @@
 #include "xfs_cksum.h"
 #include "xfs_rtgroup.h"
 #include "xfs_rtbitmap.h"
+#include "xfs_metafile.h"
 
 static struct kmem_cache	*xfs_rtrefcountbt_cur_cache;
 
@@ -279,12 +280,10 @@ xfs_rtrefcountbt_init_cursor(
 	struct xfs_trans	*tp,
 	struct xfs_rtgroup	*rtg)
 {
-	struct xfs_inode	*ip = NULL;
+	struct xfs_inode	*ip = rtg_refcount(rtg);
 	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_btree_cur	*cur;
 
-	return NULL; /* XXX */
-
 	xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrefcountbt_ops,
@@ -314,6 +313,7 @@ xfs_rtrefcountbt_commit_staged_btree(
 	int			flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT;
 
 	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+	ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_META_BTREE);
 
 	/*
 	 * Free any resources hanging off the real fork, then shallow-copy the


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 38/56] xfs: add metadata reservations for realtime refcount btree
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (36 preceding siblings ...)
  2025-02-06 22:44   ` [PATCH 37/56] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong
@ 2025-02-06 22:45   ` Darrick J. Wong
  2025-02-06 22:45   ` [PATCH 39/56] xfs: wire up a new metafile type for the realtime refcount Darrick J. Wong
                     ` (17 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: bf0b99411335db18a9ed4fcef278ce9e313f6076

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtrefcount_btree.c |   38 ++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrefcount_btree.h |    4 ++++
 2 files changed, 42 insertions(+)


diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index 46c62bd63458f4..9fa8f1d6eb6749 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -417,3 +417,41 @@ xfs_rtrefcountbt_compute_maxlevels(
 	/* Add one level to handle the inode root level. */
 	mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1;
 }
+
+/* Calculate the rtrefcount btree size for some records. */
+unsigned long long
+xfs_rtrefcountbt_calc_size(
+	struct xfs_mount	*mp,
+	unsigned long long	len)
+{
+	return xfs_btree_calc_size(mp->m_rtrefc_mnr, len);
+}
+
+/*
+ * Calculate the maximum refcount btree size.
+ */
+static unsigned long long
+xfs_rtrefcountbt_max_size(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtblocks)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_rtrefc_mxr[0] == 0)
+		return 0;
+
+	return xfs_rtrefcountbt_calc_size(mp, rtblocks);
+}
+
+/*
+ * Figure out how many blocks to reserve and how many are used by this btree.
+ * We need enough space to hold one record for every rt extent in the rtgroup.
+ */
+xfs_filblks_t
+xfs_rtrefcountbt_calc_reserves(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_has_rtreflink(mp))
+		return 0;
+
+	return xfs_rtrefcountbt_max_size(mp, mp->m_sb.sb_rgextents);
+}
diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h
index b713b33818800c..3cd44590c9304c 100644
--- a/libxfs/xfs_rtrefcount_btree.h
+++ b/libxfs/xfs_rtrefcount_btree.h
@@ -67,4 +67,8 @@ unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void);
 int __init xfs_rtrefcountbt_init_cur_cache(void);
 void xfs_rtrefcountbt_destroy_cur_cache(void);
 
+xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp);
+unsigned long long xfs_rtrefcountbt_calc_size(struct xfs_mount *mp,
+		unsigned long long len);
+
 #endif	/* __XFS_RTREFCOUNT_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 39/56] xfs: wire up a new metafile type for the realtime refcount
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (37 preceding siblings ...)
  2025-02-06 22:45   ` [PATCH 38/56] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong
@ 2025-02-06 22:45   ` Darrick J. Wong
  2025-02-06 22:45   ` [PATCH 40/56] xfs: wire up realtime refcount btree cursors Darrick J. Wong
                     ` (16 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: f0415af60f482a2192065be8b334b409495ca8a3

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h           |    8 +
 libxfs/xfs_inode_fork.c       |    6 -
 libxfs/xfs_ondisk.h           |    1 
 libxfs/xfs_rtrefcount_btree.c |  264 +++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtrefcount_btree.h |  112 +++++++++++++++++
 5 files changed, 388 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index b6828f92c131fb..b1007fb661ba73 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1805,6 +1805,14 @@ typedef __be32 xfs_refcount_ptr_t;
  */
 #define	XFS_RTREFC_CRC_MAGIC	0x52434e54	/* 'RCNT' */
 
+/*
+ * rt refcount root header, on-disk form only.
+ */
+struct xfs_rtrefcount_root {
+	__be16		bb_level;	/* 0 is a leaf */
+	__be16		bb_numrecs;	/* current # of data records */
+};
+
 /* inode-rooted btree pointer type */
 typedef __be64 xfs_rtrefcount_ptr_t;
 
diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c
index f690c746805fad..52740e8e883424 100644
--- a/libxfs/xfs_inode_fork.c
+++ b/libxfs/xfs_inode_fork.c
@@ -26,6 +26,7 @@
 #include "xfs_health.h"
 #include "xfs_symlink_remote.h"
 #include "xfs_rtrmap_btree.h"
+#include "xfs_rtrefcount_btree.h"
 
 struct kmem_cache *xfs_ifork_cache;
 
@@ -271,8 +272,7 @@ xfs_iformat_data_fork(
 			case XFS_METAFILE_RTRMAP:
 				return xfs_iformat_rtrmap(ip, dip);
 			case XFS_METAFILE_RTREFCOUNT:
-				ASSERT(0); /* to be implemented later */
-				return -EFSCORRUPTED;
+				return xfs_iformat_rtrefcount(ip, dip);
 			default:
 				break;
 			}
@@ -622,7 +622,7 @@ xfs_iflush_fork(
 			xfs_iflush_rtrmap(ip, dip);
 			break;
 		case XFS_METAFILE_RTREFCOUNT:
-			ASSERT(0); /* to be implemented later */
+			xfs_iflush_rtrefcount(ip, dip);
 			break;
 		default:
 			ASSERT(0);
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index efb035050c009c..a85ecddaa48eed 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -86,6 +86,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t,			8);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root,		4);
 	XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t,		8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_rtrefcount_root,	4);
 
 	/*
 	 * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad
diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index 9fa8f1d6eb6749..cf9f2ee5314dda 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -75,6 +75,41 @@ xfs_rtrefcountbt_get_maxrecs(
 	return cur->bc_mp->m_rtrefc_mxr[level != 0];
 }
 
+/*
+ * Calculate number of records in a realtime refcount btree inode root.
+ */
+unsigned int
+xfs_rtrefcountbt_droot_maxrecs(
+	unsigned int		blocklen,
+	bool			leaf)
+{
+	blocklen -= sizeof(struct xfs_rtrefcount_root);
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_refcount_rec);
+	return blocklen / (2 * sizeof(struct xfs_refcount_key) +
+			sizeof(xfs_rtrefcount_ptr_t));
+}
+
+/*
+ * Get the maximum records we could store in the on-disk format.
+ *
+ * For non-root nodes this is equivalent to xfs_rtrefcountbt_get_maxrecs, but
+ * for the root node this checks the available space in the dinode fork so that
+ * we can resize the in-memory buffer to match it.  After a resize to the
+ * maximum size this function returns the same value as
+ * xfs_rtrefcountbt_get_maxrecs for the root node, too.
+ */
+STATIC int
+xfs_rtrefcountbt_get_dmaxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	if (level != cur->bc_nlevels - 1)
+		return cur->bc_mp->m_rtrefc_mxr[level != 0];
+	return xfs_rtrefcountbt_droot_maxrecs(cur->bc_ino.forksize, level == 0);
+}
+
 STATIC void
 xfs_rtrefcountbt_init_key_from_rec(
 	union xfs_btree_key		*key,
@@ -245,6 +280,87 @@ xfs_rtrefcountbt_keys_contiguous(
 				 be32_to_cpu(key2->refc.rc_startblock));
 }
 
+static inline void
+xfs_rtrefcountbt_move_ptrs(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*broot,
+	short			old_size,
+	size_t			new_size,
+	unsigned int		numrecs)
+{
+	void			*dptr;
+	void			*sptr;
+
+	sptr = xfs_rtrefcount_broot_ptr_addr(mp, broot, 1, old_size);
+	dptr = xfs_rtrefcount_broot_ptr_addr(mp, broot, 1, new_size);
+	memmove(dptr, sptr, numrecs * sizeof(xfs_rtrefcount_ptr_t));
+}
+
+static struct xfs_btree_block *
+xfs_rtrefcountbt_broot_realloc(
+	struct xfs_btree_cur	*cur,
+	unsigned int		new_numrecs)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+	struct xfs_btree_block	*broot;
+	unsigned int		new_size;
+	unsigned int		old_size = ifp->if_broot_bytes;
+	const unsigned int	level = cur->bc_nlevels - 1;
+
+	new_size = xfs_rtrefcount_broot_space_calc(mp, level, new_numrecs);
+
+	/* Handle the nop case quietly. */
+	if (new_size == old_size)
+		return ifp->if_broot;
+
+	if (new_size > old_size) {
+		unsigned int	old_numrecs;
+
+		/*
+		 * If there wasn't any memory allocated before, just allocate
+		 * it now and get out.
+		 */
+		if (old_size == 0)
+			return xfs_broot_realloc(ifp, new_size);
+
+		/*
+		 * If there is already an existing if_broot, then we need to
+		 * realloc it and possibly move the node block pointers because
+		 * those are not butted up against the btree block header.
+		 */
+		old_numrecs = xfs_rtrefcountbt_maxrecs(mp, old_size, level);
+		broot = xfs_broot_realloc(ifp, new_size);
+		if (level > 0)
+			xfs_rtrefcountbt_move_ptrs(mp, broot, old_size,
+					new_size, old_numrecs);
+		goto out_broot;
+	}
+
+	/*
+	 * We're reducing numrecs.  If we're going all the way to zero, just
+	 * free the block.
+	 */
+	ASSERT(ifp->if_broot != NULL && old_size > 0);
+	if (new_size == 0)
+		return xfs_broot_realloc(ifp, 0);
+
+	/*
+	 * Shrink the btree root by possibly moving the rtrmapbt pointers,
+	 * since they are not butted up against the btree block header.  Then
+	 * reallocate broot.
+	 */
+	if (level > 0)
+		xfs_rtrefcountbt_move_ptrs(mp, ifp->if_broot, old_size,
+				new_size, new_numrecs);
+	broot = xfs_broot_realloc(ifp, new_size);
+
+out_broot:
+	ASSERT(xfs_rtrefcount_droot_space(broot) <=
+	       xfs_inode_fork_size(cur->bc_ino.ip, cur->bc_ino.whichfork));
+	return broot;
+}
+
 const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 	.name			= "rtrefcount",
 	.type			= XFS_BTREE_TYPE_INODE,
@@ -262,6 +378,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 	.free_block		= xfs_btree_free_metafile_block,
 	.get_minrecs		= xfs_rtrefcountbt_get_minrecs,
 	.get_maxrecs		= xfs_rtrefcountbt_get_maxrecs,
+	.get_dmaxrecs		= xfs_rtrefcountbt_get_dmaxrecs,
 	.init_key_from_rec	= xfs_rtrefcountbt_init_key_from_rec,
 	.init_high_key_from_rec	= xfs_rtrefcountbt_init_high_key_from_rec,
 	.init_rec_from_cur	= xfs_rtrefcountbt_init_rec_from_cur,
@@ -272,6 +389,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 	.keys_inorder		= xfs_rtrefcountbt_keys_inorder,
 	.recs_inorder		= xfs_rtrefcountbt_recs_inorder,
 	.keys_contiguous	= xfs_rtrefcountbt_keys_contiguous,
+	.broot_realloc		= xfs_rtrefcountbt_broot_realloc,
 };
 
 /* Allocate a new rt refcount btree cursor. */
@@ -455,3 +573,149 @@ xfs_rtrefcountbt_calc_reserves(
 
 	return xfs_rtrefcountbt_max_size(mp, mp->m_sb.sb_rgextents);
 }
+
+/*
+ * Convert on-disk form of btree root to in-memory form.
+ */
+STATIC void
+xfs_rtrefcountbt_from_disk(
+	struct xfs_inode		*ip,
+	struct xfs_rtrefcount_root	*dblock,
+	int				dblocklen,
+	struct xfs_btree_block		*rblock)
+{
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_refcount_key	*fkp;
+	__be64				*fpp;
+	struct xfs_refcount_key	*tkp;
+	__be64				*tpp;
+	struct xfs_refcount_rec	*frp;
+	struct xfs_refcount_rec	*trp;
+	unsigned int			numrecs;
+	unsigned int			maxrecs;
+	unsigned int			rblocklen;
+
+	rblocklen = xfs_rtrefcount_broot_space(mp, dblock);
+
+	xfs_btree_init_block(mp, rblock, &xfs_rtrefcountbt_ops, 0, 0,
+			ip->i_ino);
+
+	rblock->bb_level = dblock->bb_level;
+	rblock->bb_numrecs = dblock->bb_numrecs;
+
+	if (be16_to_cpu(rblock->bb_level) > 0) {
+		maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false);
+		fkp = xfs_rtrefcount_droot_key_addr(dblock, 1);
+		tkp = xfs_rtrefcount_key_addr(rblock, 1);
+		fpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs);
+		tpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen);
+		numrecs = be16_to_cpu(dblock->bb_numrecs);
+		memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs);
+		memcpy(tpp, fpp, sizeof(*fpp) * numrecs);
+	} else {
+		frp = xfs_rtrefcount_droot_rec_addr(dblock, 1);
+		trp = xfs_rtrefcount_rec_addr(rblock, 1);
+		numrecs = be16_to_cpu(dblock->bb_numrecs);
+		memcpy(trp, frp, sizeof(*frp) * numrecs);
+	}
+}
+
+/* Load a realtime reference count btree root in from disk. */
+int
+xfs_iformat_rtrefcount(
+	struct xfs_inode	*ip,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	struct xfs_btree_block	*broot;
+	unsigned int		numrecs;
+	unsigned int		level;
+	int			dsize;
+
+	/*
+	 * growfs must create the rtrefcount inodes before adding a realtime
+	 * volume to the filesystem, so we cannot use the rtrefcount predicate
+	 * here.
+	 */
+	if (!xfs_has_reflink(ip->i_mount))
+		return -EFSCORRUPTED;
+
+	dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK);
+	numrecs = be16_to_cpu(dfp->bb_numrecs);
+	level = be16_to_cpu(dfp->bb_level);
+
+	if (level > mp->m_rtrefc_maxlevels ||
+	    xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize)
+		return -EFSCORRUPTED;
+
+	broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK),
+			xfs_rtrefcount_broot_space_calc(mp, level, numrecs));
+	if (broot)
+		xfs_rtrefcountbt_from_disk(ip, dfp, dsize, broot);
+	return 0;
+}
+
+/*
+ * Convert in-memory form of btree root to on-disk form.
+ */
+void
+xfs_rtrefcountbt_to_disk(
+	struct xfs_mount		*mp,
+	struct xfs_btree_block		*rblock,
+	int				rblocklen,
+	struct xfs_rtrefcount_root	*dblock,
+	int				dblocklen)
+{
+	struct xfs_refcount_key	*fkp;
+	__be64				*fpp;
+	struct xfs_refcount_key	*tkp;
+	__be64				*tpp;
+	struct xfs_refcount_rec	*frp;
+	struct xfs_refcount_rec	*trp;
+	unsigned int			maxrecs;
+	unsigned int			numrecs;
+
+	ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTREFC_CRC_MAGIC));
+	ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid));
+	ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL));
+	ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK));
+	ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK));
+
+	dblock->bb_level = rblock->bb_level;
+	dblock->bb_numrecs = rblock->bb_numrecs;
+
+	if (be16_to_cpu(rblock->bb_level) > 0) {
+		maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false);
+		fkp = xfs_rtrefcount_key_addr(rblock, 1);
+		tkp = xfs_rtrefcount_droot_key_addr(dblock, 1);
+		fpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen);
+		tpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs);
+		numrecs = be16_to_cpu(rblock->bb_numrecs);
+		memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs);
+		memcpy(tpp, fpp, sizeof(*fpp) * numrecs);
+	} else {
+		frp = xfs_rtrefcount_rec_addr(rblock, 1);
+		trp = xfs_rtrefcount_droot_rec_addr(dblock, 1);
+		numrecs = be16_to_cpu(rblock->bb_numrecs);
+		memcpy(trp, frp, sizeof(*frp) * numrecs);
+	}
+}
+
+/* Flush a realtime reference count btree root out to disk. */
+void
+xfs_iflush_rtrefcount(
+	struct xfs_inode	*ip,
+	struct xfs_dinode	*dip)
+{
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+
+	ASSERT(ifp->if_broot != NULL);
+	ASSERT(ifp->if_broot_bytes > 0);
+	ASSERT(xfs_rtrefcount_droot_space(ifp->if_broot) <=
+			xfs_inode_fork_size(ip, XFS_DATA_FORK));
+	xfs_rtrefcountbt_to_disk(ip->i_mount, ifp->if_broot,
+			ifp->if_broot_bytes, dfp,
+			XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK));
+}
diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h
index 3cd44590c9304c..e558a10c4744ad 100644
--- a/libxfs/xfs_rtrefcount_btree.h
+++ b/libxfs/xfs_rtrefcount_btree.h
@@ -25,6 +25,7 @@ void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur,
 unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp,
 		unsigned int blocklen, bool leaf);
 void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp);
+unsigned int xfs_rtrefcountbt_droot_maxrecs(unsigned int blocklen, bool leaf);
 
 /*
  * Addresses of records, keys, and pointers within an incore rtrefcountbt block.
@@ -71,4 +72,115 @@ xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp);
 unsigned long long xfs_rtrefcountbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
+/* Addresses of key, pointers, and records within an ondisk rtrefcount block. */
+
+static inline struct xfs_refcount_rec *
+xfs_rtrefcount_droot_rec_addr(
+	struct xfs_rtrefcount_root	*block,
+	unsigned int			index)
+{
+	return (struct xfs_refcount_rec *)
+		((char *)(block + 1) +
+		 (index - 1) * sizeof(struct xfs_refcount_rec));
+}
+
+static inline struct xfs_refcount_key *
+xfs_rtrefcount_droot_key_addr(
+	struct xfs_rtrefcount_root	*block,
+	unsigned int			index)
+{
+	return (struct xfs_refcount_key *)
+		((char *)(block + 1) +
+		 (index - 1) * sizeof(struct xfs_refcount_key));
+}
+
+static inline xfs_rtrefcount_ptr_t *
+xfs_rtrefcount_droot_ptr_addr(
+	struct xfs_rtrefcount_root	*block,
+	unsigned int			index,
+	unsigned int			maxrecs)
+{
+	return (xfs_rtrefcount_ptr_t *)
+		((char *)(block + 1) +
+		 maxrecs * sizeof(struct xfs_refcount_key) +
+		 (index - 1) * sizeof(xfs_rtrefcount_ptr_t));
+}
+
+/*
+ * Address of pointers within the incore btree root.
+ *
+ * These are to be used when we know the size of the block and
+ * we don't have a cursor.
+ */
+static inline xfs_rtrefcount_ptr_t *
+xfs_rtrefcount_broot_ptr_addr(
+	struct xfs_mount	*mp,
+	struct xfs_btree_block	*bb,
+	unsigned int		index,
+	unsigned int		block_size)
+{
+	return xfs_rtrefcount_ptr_addr(bb, index,
+			xfs_rtrefcountbt_maxrecs(mp, block_size, false));
+}
+
+/*
+ * Compute the space required for the incore btree root containing the given
+ * number of records.
+ */
+static inline size_t
+xfs_rtrefcount_broot_space_calc(
+	struct xfs_mount	*mp,
+	unsigned int		level,
+	unsigned int		nrecs)
+{
+	size_t			sz = XFS_RTREFCOUNT_BLOCK_LEN;
+
+	if (level > 0)
+		return sz + nrecs * (sizeof(struct xfs_refcount_key) +
+				     sizeof(xfs_rtrefcount_ptr_t));
+	return sz + nrecs * sizeof(struct xfs_refcount_rec);
+}
+
+/*
+ * Compute the space required for the incore btree root given the ondisk
+ * btree root block.
+ */
+static inline size_t
+xfs_rtrefcount_broot_space(struct xfs_mount *mp, struct xfs_rtrefcount_root *bb)
+{
+	return xfs_rtrefcount_broot_space_calc(mp, be16_to_cpu(bb->bb_level),
+			be16_to_cpu(bb->bb_numrecs));
+}
+
+/* Compute the space required for the ondisk root block. */
+static inline size_t
+xfs_rtrefcount_droot_space_calc(
+	unsigned int		level,
+	unsigned int		nrecs)
+{
+	size_t			sz = sizeof(struct xfs_rtrefcount_root);
+
+	if (level > 0)
+		return sz + nrecs * (sizeof(struct xfs_refcount_key) +
+				     sizeof(xfs_rtrefcount_ptr_t));
+	return sz + nrecs * sizeof(struct xfs_refcount_rec);
+}
+
+/*
+ * Compute the space required for the ondisk root block given an incore root
+ * block.
+ */
+static inline size_t
+xfs_rtrefcount_droot_space(struct xfs_btree_block *bb)
+{
+	return xfs_rtrefcount_droot_space_calc(be16_to_cpu(bb->bb_level),
+			be16_to_cpu(bb->bb_numrecs));
+}
+
+int xfs_iformat_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip);
+void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp,
+		struct xfs_btree_block *rblock, int rblocklen,
+		struct xfs_rtrefcount_root *dblock, int dblocklen);
+void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip);
+
 #endif	/* __XFS_RTREFCOUNT_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 40/56] xfs: wire up realtime refcount btree cursors
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (38 preceding siblings ...)
  2025-02-06 22:45   ` [PATCH 39/56] xfs: wire up a new metafile type for the realtime refcount Darrick J. Wong
@ 2025-02-06 22:45   ` Darrick J. Wong
  2025-02-06 22:45   ` [PATCH 41/56] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong
                     ` (15 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: e5a171729baf61b703069b11fa0d2955890e9b6b

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.h    |    2 -
 libxfs/xfs_refcount.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/xfs_rtgroup.c  |    9 ++++
 libxfs/xfs_rtgroup.h  |    5 ++
 4 files changed, 112 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index dbc047b2fb2cf5..355b304696e6c3 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -297,7 +297,7 @@ struct xfs_btree_cur
 		struct {
 			unsigned int	nr_ops;		/* # record updates */
 			unsigned int	shape_changes;	/* # of extent splits */
-		} bc_refc;	/* refcountbt */
+		} bc_refc;	/* refcountbt/rtrefcountbt */
 	};
 
 	/* Must be at the end of the struct! */
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index ef08c26c75b32c..738c3cd4ea5131 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -25,6 +25,7 @@
 #include "xfs_health.h"
 #include "defer_item.h"
 #include "xfs_rtgroup.h"
+#include "xfs_rtrefcount_btree.h"
 
 struct kmem_cache	*xfs_refcount_intent_cache;
 
@@ -1460,6 +1461,32 @@ xfs_refcount_finish_one(
 	return error;
 }
 
+/*
+ * Set up a continuation a deferred rtrefcount operation by updating the
+ * intent.  Checks to make sure we're not going to run off the end of the
+ * rtgroup.
+ */
+static inline int
+xfs_rtrefcount_continue_op(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_intent	*ri,
+	xfs_agblock_t			new_agbno)
+{
+	struct xfs_mount		*mp = cur->bc_mp;
+	struct xfs_rtgroup		*rtg = to_rtg(ri->ri_group);
+
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_rgbext(rtg, new_agbno,
+					ri->ri_blockcount))) {
+		xfs_btree_mark_sick(cur);
+		return -EFSCORRUPTED;
+	}
+
+	ri->ri_startblock = xfs_rgbno_to_rtb(rtg, new_agbno);
+
+	ASSERT(xfs_verify_rtbext(mp, ri->ri_startblock, ri->ri_blockcount));
+	return 0;
+}
+
 /*
  * Process one of the deferred realtime refcount operations.  We pass back the
  * btree cursor to maintain our lock on the btree between calls.
@@ -1470,8 +1497,77 @@ xfs_rtrefcount_finish_one(
 	struct xfs_refcount_intent	*ri,
 	struct xfs_btree_cur		**pcur)
 {
-	ASSERT(0);
-	return -EFSCORRUPTED;
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_rtgroup		*rtg = to_rtg(ri->ri_group);
+	struct xfs_btree_cur		*rcur = *pcur;
+	int				error = 0;
+	xfs_rgblock_t			bno;
+	unsigned long			nr_ops = 0;
+	int				shape_changes = 0;
+
+	bno = xfs_rtb_to_rgbno(mp, ri->ri_startblock);
+
+	trace_xfs_refcount_deferred(mp, ri);
+
+	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE))
+		return -EIO;
+
+	/*
+	 * If we haven't gotten a cursor or the cursor AG doesn't match
+	 * the startblock, get one now.
+	 */
+	if (rcur != NULL && rcur->bc_group != ri->ri_group) {
+		nr_ops = rcur->bc_refc.nr_ops;
+		shape_changes = rcur->bc_refc.shape_changes;
+		xfs_btree_del_cursor(rcur, 0);
+		rcur = NULL;
+		*pcur = NULL;
+	}
+	if (rcur == NULL) {
+		xfs_rtgroup_lock(rtg, XFS_RTGLOCK_REFCOUNT);
+		xfs_rtgroup_trans_join(tp, rtg, XFS_RTGLOCK_REFCOUNT);
+		*pcur = rcur = xfs_rtrefcountbt_init_cursor(tp, rtg);
+
+		rcur->bc_refc.nr_ops = nr_ops;
+		rcur->bc_refc.shape_changes = shape_changes;
+	}
+
+	switch (ri->ri_type) {
+	case XFS_REFCOUNT_INCREASE:
+		error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount,
+				XFS_REFCOUNT_ADJUST_INCREASE);
+		if (error)
+			return error;
+		if (ri->ri_blockcount > 0)
+			error = xfs_rtrefcount_continue_op(rcur, ri, bno);
+		break;
+	case XFS_REFCOUNT_DECREASE:
+		error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount,
+				XFS_REFCOUNT_ADJUST_DECREASE);
+		if (error)
+			return error;
+		if (ri->ri_blockcount > 0)
+			error = xfs_rtrefcount_continue_op(rcur, ri, bno);
+		break;
+	case XFS_REFCOUNT_ALLOC_COW:
+		error = __xfs_refcount_cow_alloc(rcur, bno, ri->ri_blockcount);
+		if (error)
+			return error;
+		ri->ri_blockcount = 0;
+		break;
+	case XFS_REFCOUNT_FREE_COW:
+		error = __xfs_refcount_cow_free(rcur, bno, ri->ri_blockcount);
+		if (error)
+			return error;
+		ri->ri_blockcount = 0;
+		break;
+	default:
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+	if (!error && ri->ri_blockcount > 0)
+		trace_xfs_refcount_finish_one_leftover(mp, ri);
+	return error;
 }
 
 /*
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index f7bacce8500c0c..e1f853dd2c5b3e 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -203,6 +203,9 @@ xfs_rtgroup_lock(
 
 	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
 		xfs_ilock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
+
+	if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg_refcount(rtg))
+		xfs_ilock(rtg_refcount(rtg), XFS_ILOCK_EXCL);
 }
 
 /* Unlock metadata inodes associated with this rt group. */
@@ -215,6 +218,9 @@ xfs_rtgroup_unlock(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
+	if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg_refcount(rtg))
+		xfs_iunlock(rtg_refcount(rtg), XFS_ILOCK_EXCL);
+
 	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
 		xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
 
@@ -246,6 +252,9 @@ xfs_rtgroup_trans_join(
 
 	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
 		xfs_trans_ijoin(tp, rtg_rmap(rtg), XFS_ILOCK_EXCL);
+
+	if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg_refcount(rtg))
+		xfs_trans_ijoin(tp, rtg_refcount(rtg), XFS_ILOCK_EXCL);
 }
 
 /* Retrieve rt group geometry. */
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 2663f2d849e295..03f39d4e43fc7f 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -273,10 +273,13 @@ int xfs_update_last_rtgroup_size(struct xfs_mount *mp,
 #define XFS_RTGLOCK_BITMAP_SHARED	(1U << 1)
 /* Lock the rt rmap inode in exclusive mode */
 #define XFS_RTGLOCK_RMAP		(1U << 2)
+/* Lock the rt refcount inode in exclusive mode */
+#define XFS_RTGLOCK_REFCOUNT		(1U << 3)
 
 #define XFS_RTGLOCK_ALL_FLAGS	(XFS_RTGLOCK_BITMAP | \
 				 XFS_RTGLOCK_BITMAP_SHARED | \
-				 XFS_RTGLOCK_RMAP)
+				 XFS_RTGLOCK_RMAP | \
+				 XFS_RTGLOCK_REFCOUNT)
 
 void xfs_rtgroup_lock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
 void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 41/56] xfs: create routine to allocate and initialize a realtime refcount btree inode
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (39 preceding siblings ...)
  2025-02-06 22:45   ` [PATCH 40/56] xfs: wire up realtime refcount btree cursors Darrick J. Wong
@ 2025-02-06 22:45   ` Darrick J. Wong
  2025-02-06 22:46   ` [PATCH 42/56] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong
                     ` (14 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4ee3113aaf3f6a3c24fcf952d8489363f56ab375

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c          |    2 ++
 libxfs/xfs_rtrefcount_btree.c |   28 ++++++++++++++++++++++++++++
 libxfs/xfs_rtrefcount_btree.h |    3 +++
 3 files changed, 33 insertions(+)


diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index e1f853dd2c5b3e..74701264265ce2 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -31,6 +31,7 @@
 #include "xfs_metafile.h"
 #include "xfs_metadir.h"
 #include "xfs_rtrmap_btree.h"
+#include "xfs_rtrefcount_btree.h"
 
 /* Find the first usable fsblock in this rtgroup. */
 static inline uint32_t
@@ -379,6 +380,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		.fmt_mask	= 1U << XFS_DINODE_FMT_META_BTREE,
 		/* same comment about growfs and rmap inodes applies here */
 		.enabled	= xfs_has_reflink,
+		.create		= xfs_rtrefcountbt_create,
 	},
 };
 
diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index cf9f2ee5314dda..fa6c114d9cbc5a 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -719,3 +719,31 @@ xfs_iflush_rtrefcount(
 			ifp->if_broot_bytes, dfp,
 			XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK));
 }
+
+/*
+ * Create a realtime refcount btree inode.
+ */
+int
+xfs_rtrefcountbt_create(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	bool			init)
+{
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_btree_block	*broot;
+
+	ifp->if_format = XFS_DINODE_FMT_META_BTREE;
+	ASSERT(ifp->if_broot_bytes == 0);
+	ASSERT(ifp->if_bytes == 0);
+
+	/* Initialize the empty incore btree root. */
+	broot = xfs_broot_realloc(ifp,
+			xfs_rtrefcount_broot_space_calc(mp, 0, 0));
+	if (broot)
+		xfs_btree_init_block(mp, broot, &xfs_rtrefcountbt_ops, 0, 0,
+				ip->i_ino);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT);
+	return 0;
+}
diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h
index e558a10c4744ad..a99b7a8aec8659 100644
--- a/libxfs/xfs_rtrefcount_btree.h
+++ b/libxfs/xfs_rtrefcount_btree.h
@@ -183,4 +183,7 @@ void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp,
 		struct xfs_rtrefcount_root *dblock, int dblocklen);
 void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip);
 
+int xfs_rtrefcountbt_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xfs_trans *tp, bool init);
+
 #endif	/* __XFS_RTREFCOUNT_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 42/56] xfs: update rmap to allow cow staging extents in the rt rmap
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (40 preceding siblings ...)
  2025-02-06 22:45   ` [PATCH 41/56] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong
@ 2025-02-06 22:46   ` Darrick J. Wong
  2025-02-06 22:46   ` [PATCH 43/56] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong
                     ` (13 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 0bada82331238bd366aaa0566d125c6338b42590

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rmap.c |    7 +++++++
 1 file changed, 7 insertions(+)


diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 3748bfc7a9dfc1..8c0c22a3491df4 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -284,6 +284,13 @@ xfs_rtrmap_check_meta_irec(
 		if (irec->rm_blockcount != mp->m_sb.sb_rextsize)
 			return __this_address;
 		return NULL;
+	case XFS_RMAP_OWN_COW:
+		if (!xfs_has_rtreflink(mp))
+			return __this_address;
+		if (!xfs_verify_rgbext(rtg, irec->rm_startblock,
+					    irec->rm_blockcount))
+			return __this_address;
+		return NULL;
 	default:
 		return __this_address;
 	}


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 43/56] xfs: compute rtrmap btree max levels when reflink enabled
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (41 preceding siblings ...)
  2025-02-06 22:46   ` [PATCH 42/56] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong
@ 2025-02-06 22:46   ` Darrick J. Wong
  2025-02-06 22:46   ` [PATCH 44/56] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong
                     ` (12 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: c2694ff678c9b667ab4cb7c0b45d45309c4dd64b

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtrmap_btree.c |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index 10055110b8cf42..d897d140efa753 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -717,6 +717,7 @@ xfs_rtrmapbt_maxrecs(
 unsigned int
 xfs_rtrmapbt_maxlevels_ondisk(void)
 {
+	unsigned long long	max_dblocks;
 	unsigned int		minrecs[2];
 	unsigned int		blocklen;
 
@@ -725,8 +726,20 @@ xfs_rtrmapbt_maxlevels_ondisk(void)
 	minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2;
 	minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2;
 
-	/* We need at most one record for every block in an rt group. */
-	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS);
+	/*
+	 * Compute the asymptotic maxlevels for an rtrmapbt on any rtreflink fs.
+	 *
+	 * On a reflink filesystem, each block in an rtgroup can have up to
+	 * 2^32 (per the refcount record format) owners, which means that
+	 * theoretically we could face up to 2^64 rmap records.  However, we're
+	 * likely to run out of blocks in the data device long before that
+	 * happens, which means that we must compute the max height based on
+	 * what the btree will look like if it consumes almost all the blocks
+	 * in the data device due to maximal sharing factor.
+	 */
+	max_dblocks = -1U; /* max ag count */
+	max_dblocks *= XFS_MAX_CRC_AG_BLOCKS;
+	return xfs_btree_space_to_height(minrecs, max_dblocks);
 }
 
 int __init
@@ -765,9 +778,20 @@ xfs_rtrmapbt_compute_maxlevels(
 	 * maximum height is constrained by the size of the data device and
 	 * the height required to store one rmap record for each block in an
 	 * rt group.
+	 *
+	 * On a reflink filesystem, each rt block can have up to 2^32 (per the
+	 * refcount record format) owners, which means that theoretically we
+	 * could face up to 2^64 rmap records.  This makes the computation of
+	 * maxlevels based on record count meaningless, so we only consider the
+	 * size of the data device.
 	 */
 	d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr,
 				mp->m_sb.sb_dblocks);
+	if (xfs_has_rtreflink(mp)) {
+		mp->m_rtrmap_maxlevels = d_maxlevels + 1;
+		return;
+	}
+
 	r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr,
 				mp->m_groups[XG_TYPE_RTG].blocks);
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 44/56] xfs: allow inodes to have the realtime and reflink flags
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (42 preceding siblings ...)
  2025-02-06 22:46   ` [PATCH 43/56] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong
@ 2025-02-06 22:46   ` Darrick J. Wong
  2025-02-06 22:46   ` [PATCH 45/56] xfs: recover CoW leftovers in the realtime volume Darrick J. Wong
                     ` (11 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: c3d3605f9661a2451c437a037d338dc79fb78f37

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_buf.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 0be0625857002c..a7774fc32e2755 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -745,7 +745,8 @@ xfs_dinode_verify(
 		return __this_address;
 
 	/* don't let reflink and realtime mix */
-	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME))
+	if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME) &&
+	    !xfs_has_rtreflink(mp))
 		return __this_address;
 
 	/* COW extent size hint validation */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 45/56] xfs: recover CoW leftovers in the realtime volume
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (43 preceding siblings ...)
  2025-02-06 22:46   ` [PATCH 44/56] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong
@ 2025-02-06 22:46   ` Darrick J. Wong
  2025-02-06 22:47   ` [PATCH 46/56] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong
                     ` (10 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 51e232674975ff138d0e892272fdde9bc444c572

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_refcount.c |   47 +++++++++++++++++++++++++++++++++--------------
 libxfs/xfs_refcount.h |    3 +--
 2 files changed, 34 insertions(+), 16 deletions(-)


diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 738c3cd4ea5131..1000bab245284f 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -2052,12 +2052,13 @@ xfs_refcount_recover_extent(
 /* Find and remove leftover CoW reservations. */
 int
 xfs_refcount_recover_cow_leftovers(
-	struct xfs_mount		*mp,
-	struct xfs_perag		*pag)
+	struct xfs_group		*xg)
 {
+	struct xfs_mount		*mp = xg->xg_mount;
+	bool				isrt = xg->xg_type == XG_TYPE_RTG;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*cur;
-	struct xfs_buf			*agbp;
+	struct xfs_buf			*agbp = NULL;
 	struct xfs_refcount_recovery	*rr, *n;
 	struct list_head		debris;
 	union xfs_btree_irec		low = {
@@ -2070,10 +2071,19 @@ xfs_refcount_recover_cow_leftovers(
 	xfs_fsblock_t			fsb;
 	int				error;
 
-	/* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */
+	/* reflink filesystems must not have groups larger than 2^31-1 blocks */
+	BUILD_BUG_ON(XFS_MAX_RGBLOCKS >= XFS_REFC_COWFLAG);
 	BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG);
-	if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS)
-		return -EOPNOTSUPP;
+
+	if (isrt) {
+		if (!xfs_has_rtgroups(mp))
+			return 0;
+		if (xfs_group_max_blocks(xg) >= XFS_MAX_RGBLOCKS)
+			return -EOPNOTSUPP;
+	} else {
+		if (xfs_group_max_blocks(xg) > XFS_MAX_CRC_AG_BLOCKS)
+			return -EOPNOTSUPP;
+	}
 
 	INIT_LIST_HEAD(&debris);
 
@@ -2091,16 +2101,24 @@ xfs_refcount_recover_cow_leftovers(
 	if (error)
 		return error;
 
-	error = xfs_alloc_read_agf(pag, tp, 0, &agbp);
-	if (error)
-		goto out_trans;
-	cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag);
+	if (isrt) {
+		xfs_rtgroup_lock(to_rtg(xg), XFS_RTGLOCK_REFCOUNT);
+		cur = xfs_rtrefcountbt_init_cursor(tp, to_rtg(xg));
+	} else {
+		error = xfs_alloc_read_agf(to_perag(xg), tp, 0, &agbp);
+		if (error)
+			goto out_trans;
+		cur = xfs_refcountbt_init_cursor(mp, tp, agbp, to_perag(xg));
+	}
 
 	/* Find all the leftover CoW staging extents. */
 	error = xfs_btree_query_range(cur, &low, &high,
 			xfs_refcount_recover_extent, &debris);
 	xfs_btree_del_cursor(cur, error);
-	xfs_trans_brelse(tp, agbp);
+	if (agbp)
+		xfs_trans_brelse(tp, agbp);
+	else
+		xfs_rtgroup_unlock(to_rtg(xg), XFS_RTGLOCK_REFCOUNT);
 	xfs_trans_cancel(tp);
 	if (error)
 		goto out_free;
@@ -2113,14 +2131,15 @@ xfs_refcount_recover_cow_leftovers(
 			goto out_free;
 
 		/* Free the orphan record */
-		fsb = xfs_agbno_to_fsb(pag, rr->rr_rrec.rc_startblock);
-		xfs_refcount_free_cow_extent(tp, false, fsb,
+		fsb = xfs_gbno_to_fsb(xg, rr->rr_rrec.rc_startblock);
+		xfs_refcount_free_cow_extent(tp, isrt, fsb,
 				rr->rr_rrec.rc_blockcount);
 
 		/* Free the block. */
 		error = xfs_free_extent_later(tp, fsb,
 				rr->rr_rrec.rc_blockcount, NULL,
-				XFS_AG_RESV_NONE, 0);
+				XFS_AG_RESV_NONE,
+				isrt ? XFS_FREE_EXTENT_REALTIME : 0);
 		if (error)
 			goto out_trans;
 
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index be11df25abcc5b..f2e299a716a4ee 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -94,8 +94,7 @@ void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt,
 		xfs_fsblock_t fsb, xfs_extlen_t len);
 void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt,
 		xfs_fsblock_t fsb, xfs_extlen_t len);
-extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp,
-		struct xfs_perag *pag);
+int xfs_refcount_recover_cow_leftovers(struct xfs_group *xg);
 
 /*
  * While we're adjusting the refcounts records of an extent, we have


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 46/56] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (44 preceding siblings ...)
  2025-02-06 22:46   ` [PATCH 45/56] xfs: recover CoW leftovers in the realtime volume Darrick J. Wong
@ 2025-02-06 22:47   ` Darrick J. Wong
  2025-02-06 22:47   ` [PATCH 47/56] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
                     ` (9 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6853d23badd0f1852d3b711128924e2456d27634

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 9cfbdb85c975a5..ed969983cccd1b 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -6494,9 +6494,8 @@ xfs_get_extsz_hint(
 	 * No point in aligning allocations if we need to COW to actually
 	 * write to them.
 	 */
-	if (xfs_is_always_cow_inode(ip))
-		return 0;
-	if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize)
+	if (!xfs_is_always_cow_inode(ip) &&
+	    (ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize)
 		return ip->i_extsize;
 	if (XFS_IS_REALTIME_INODE(ip) &&
 	    ip->i_mount->m_sb.sb_rextsize > 1)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 47/56] xfs: apply rt extent alignment constraints to CoW extsize hint
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (45 preceding siblings ...)
  2025-02-06 22:47   ` [PATCH 46/56] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong
@ 2025-02-06 22:47   ` Darrick J. Wong
  2025-02-06 22:47   ` [PATCH 48/56] xfs: enable extent size hints for CoW operations Darrick J. Wong
                     ` (8 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4de1a7ba4171db681691bd80506d0cf43c5cb46a

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_buf.c |   25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)


diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index a7774fc32e2755..5ca857ca2dc10e 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -907,11 +907,29 @@ xfs_inode_validate_cowextsize(
 	bool				rt_flag;
 	bool				hint_flag;
 	uint32_t			cowextsize_bytes;
+	uint32_t			blocksize_bytes;
 
 	rt_flag = (flags & XFS_DIFLAG_REALTIME);
 	hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE);
 	cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize);
 
+	/*
+	 * Similar to extent size hints, a directory can be configured to
+	 * propagate realtime status and a CoW extent size hint to newly
+	 * created files even if there is no realtime device, and the hints on
+	 * disk can become misaligned if the sysadmin changes the rt extent
+	 * size while adding the realtime device.
+	 *
+	 * Therefore, we can only enforce the rextsize alignment check against
+	 * regular realtime files, and rely on callers to decide when alignment
+	 * checks are appropriate, and fix things up as needed.
+	 */
+
+	if (rt_flag)
+		blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
+	else
+		blocksize_bytes = mp->m_sb.sb_blocksize;
+
 	if (hint_flag && !xfs_has_reflink(mp))
 		return __this_address;
 
@@ -925,16 +943,13 @@ xfs_inode_validate_cowextsize(
 	if (mode && !hint_flag && cowextsize != 0)
 		return __this_address;
 
-	if (hint_flag && rt_flag)
-		return __this_address;
-
-	if (cowextsize_bytes % mp->m_sb.sb_blocksize)
+	if (cowextsize_bytes % blocksize_bytes)
 		return __this_address;
 
 	if (cowextsize > XFS_MAX_BMBT_EXTLEN)
 		return __this_address;
 
-	if (cowextsize > mp->m_sb.sb_agblocks / 2)
+	if (!rt_flag && cowextsize > mp->m_sb.sb_agblocks / 2)
 		return __this_address;
 
 	return NULL;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 48/56] xfs: enable extent size hints for CoW operations
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (46 preceding siblings ...)
  2025-02-06 22:47   ` [PATCH 47/56] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
@ 2025-02-06 22:47   ` Darrick J. Wong
  2025-02-06 22:47   ` [PATCH 49/56] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong
                     ` (7 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8e84e8052bc283ebb37f929eb9fb97483ea7385e

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index ed969983cccd1b..1f7326acbf7cb2 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -6518,7 +6518,13 @@ xfs_get_cowextsz_hint(
 	a = 0;
 	if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 		a = ip->i_cowextsize;
-	b = xfs_get_extsz_hint(ip);
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		b = 0;
+		if (ip->i_diflags & XFS_DIFLAG_EXTSIZE)
+			b = ip->i_extsize;
+	} else {
+		b = xfs_get_extsz_hint(ip);
+	}
 
 	a = max(a, b);
 	if (a == 0)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 49/56] xfs: report realtime refcount btree corruption errors to the health system
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (47 preceding siblings ...)
  2025-02-06 22:47   ` [PATCH 48/56] xfs: enable extent size hints for CoW operations Darrick J. Wong
@ 2025-02-06 22:47   ` Darrick J. Wong
  2025-02-06 22:48   ` [PATCH 50/56] xfs: scrub the realtime refcount btree Darrick J. Wong
                     ` (6 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 026c8ed8d4580228949f177445c605d475880c93

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h               |    1 +
 libxfs/xfs_health.h           |    4 +++-
 libxfs/xfs_rtgroup.c          |    1 +
 libxfs/xfs_rtrefcount_btree.c |   10 ++++++++--
 4 files changed, 13 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index d42d3a5617e314..ea9e58a89d92d3 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -996,6 +996,7 @@ struct xfs_rtgroup_geometry {
 #define XFS_RTGROUP_GEOM_SICK_BITMAP	(1U << 1)  /* rtbitmap */
 #define XFS_RTGROUP_GEOM_SICK_SUMMARY	(1U << 2)  /* rtsummary */
 #define XFS_RTGROUP_GEOM_SICK_RMAPBT	(1U << 3)  /* reverse mappings */
+#define XFS_RTGROUP_GEOM_SICK_REFCNTBT	(1U << 4)  /* reference counts */
 
 /*
  * ioctl commands that are used by Linux filesystems
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index 5c8a0aff6ba6e9..b31000f7190ce5 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -71,6 +71,7 @@ struct xfs_rtgroup;
 #define XFS_SICK_RG_BITMAP	(1 << 1)  /* rt group bitmap */
 #define XFS_SICK_RG_SUMMARY	(1 << 2)  /* rt groups summary */
 #define XFS_SICK_RG_RMAPBT	(1 << 3)  /* reverse mappings */
+#define XFS_SICK_RG_REFCNTBT	(1 << 4)  /* reference counts */
 
 /* Observable health issues for AG metadata. */
 #define XFS_SICK_AG_SB		(1 << 0)  /* superblock */
@@ -117,7 +118,8 @@ struct xfs_rtgroup;
 #define XFS_SICK_RG_PRIMARY	(XFS_SICK_RG_SUPER | \
 				 XFS_SICK_RG_BITMAP | \
 				 XFS_SICK_RG_SUMMARY | \
-				 XFS_SICK_RG_RMAPBT)
+				 XFS_SICK_RG_RMAPBT | \
+				 XFS_SICK_RG_REFCNTBT)
 
 #define XFS_SICK_AG_PRIMARY	(XFS_SICK_AG_SB | \
 				 XFS_SICK_AG_AGF | \
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 74701264265ce2..ba1a38633a6ead 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -377,6 +377,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 	[XFS_RTGI_REFCOUNT] = {
 		.name		= "refcount",
 		.metafile_type	= XFS_METAFILE_RTREFCOUNT,
+		.sick		= XFS_SICK_RG_REFCNTBT,
 		.fmt_mask	= 1U << XFS_DINODE_FMT_META_BTREE,
 		/* same comment about growfs and rmap inodes applies here */
 		.enabled	= xfs_has_reflink,
diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c
index fa6c114d9cbc5a..a532476407e913 100644
--- a/libxfs/xfs_rtrefcount_btree.c
+++ b/libxfs/xfs_rtrefcount_btree.c
@@ -25,6 +25,7 @@
 #include "xfs_rtgroup.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_metafile.h"
+#include "xfs_health.h"
 
 static struct kmem_cache	*xfs_rtrefcountbt_cur_cache;
 
@@ -372,6 +373,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = {
 
 	.lru_refs		= XFS_REFC_BTREE_REF,
 	.statoff		= XFS_STATS_CALC_INDEX(xs_rtrefcbt_2),
+	.sick_mask		= XFS_SICK_RG_REFCNTBT,
 
 	.dup_cursor		= xfs_rtrefcountbt_dup_cursor,
 	.alloc_block		= xfs_btree_alloc_metafile_block,
@@ -638,16 +640,20 @@ xfs_iformat_rtrefcount(
 	 * volume to the filesystem, so we cannot use the rtrefcount predicate
 	 * here.
 	 */
-	if (!xfs_has_reflink(ip->i_mount))
+	if (!xfs_has_reflink(ip->i_mount)) {
+		xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE);
 		return -EFSCORRUPTED;
+	}
 
 	dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK);
 	numrecs = be16_to_cpu(dfp->bb_numrecs);
 	level = be16_to_cpu(dfp->bb_level);
 
 	if (level > mp->m_rtrefc_maxlevels ||
-	    xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize)
+	    xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) {
+		xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE);
 		return -EFSCORRUPTED;
+	}
 
 	broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK),
 			xfs_rtrefcount_broot_space_calc(mp, level, numrecs));


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 50/56] xfs: scrub the realtime refcount btree
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (48 preceding siblings ...)
  2025-02-06 22:47   ` [PATCH 49/56] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong
@ 2025-02-06 22:48   ` Darrick J. Wong
  2025-02-06 22:48   ` [PATCH 51/56] xfs: scrub the metadir path of rt refcount btree files Darrick J. Wong
                     ` (5 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: c27929670de144ec76a0dab2f3a168cb4897b314

Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index ea9e58a89d92d3..a4bd6a39c6ba71 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -738,9 +738,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_METAPATH	29	/* metadata directory tree paths */
 #define XFS_SCRUB_TYPE_RGSUPER	30	/* realtime superblock */
 #define XFS_SCRUB_TYPE_RTRMAPBT	31	/* rtgroup reverse mapping btree */
+#define XFS_SCRUB_TYPE_RTREFCBT	32	/* realtime reference count btree */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	32
+#define XFS_SCRUB_TYPE_NR	33
 
 /*
  * This special type code only applies to the vectored scrub implementation.


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 51/56] xfs: scrub the metadir path of rt refcount btree files
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (49 preceding siblings ...)
  2025-02-06 22:48   ` [PATCH 50/56] xfs: scrub the realtime refcount btree Darrick J. Wong
@ 2025-02-06 22:48   ` Darrick J. Wong
  2025-02-06 22:48   ` [PATCH 52/56] xfs: fix the entry condition of exact EOF block allocation optimization Darrick J. Wong
                     ` (4 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ca757af07fccf527f91ad49f3b6648e6783b0bc8

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index a4bd6a39c6ba71..2c3171262b445b 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -832,9 +832,10 @@ struct xfs_scrub_vec_head {
 #define XFS_SCRUB_METAPATH_GRPQUOTA	(6)  /* group quota */
 #define XFS_SCRUB_METAPATH_PRJQUOTA	(7)  /* project quota */
 #define XFS_SCRUB_METAPATH_RTRMAPBT	(8)  /* realtime reverse mapping */
+#define XFS_SCRUB_METAPATH_RTREFCOUNTBT	(9)  /* realtime refcount */
 
 /* Number of metapath sm_ino values */
-#define XFS_SCRUB_METAPATH_NR		(9)
+#define XFS_SCRUB_METAPATH_NR		(10)
 
 /*
  * ioctl limits


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 52/56] xfs: fix the entry condition of exact EOF block allocation optimization
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (50 preceding siblings ...)
  2025-02-06 22:48   ` [PATCH 51/56] xfs: scrub the metadir path of rt refcount btree files Darrick J. Wong
@ 2025-02-06 22:48   ` Darrick J. Wong
  2025-02-06 22:48   ` [PATCH 53/56] xfs: mark xfs_dir_isempty static Darrick J. Wong
                     ` (3 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: david, alexjlzheng, dchinner, cem, hch, linux-xfs

From: Jinliang Zheng <alexjlzheng@gmail.com>

Source kernel commit: 915175b49f65d9edeb81659e82cbb27b621dbc17

When we call create(), lseek() and write() sequentially, offset != 0
cannot be used as a judgment condition for whether the file already
has extents.

Furthermore, when xfs_bmap_adjacent() has not given a better blkno,
it is not necessary to use exact EOF block allocation.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_bmap.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 1f7326acbf7cb2..07c553d924237b 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3557,12 +3557,12 @@ xfs_bmap_btalloc_at_eof(
 	int			error;
 
 	/*
-	 * If there are already extents in the file, try an exact EOF block
-	 * allocation to extend the file as a contiguous extent. If that fails,
-	 * or it's the first allocation in a file, just try for a stripe aligned
-	 * allocation.
+	 * If there are already extents in the file, and xfs_bmap_adjacent() has
+	 * given a better blkno, try an exact EOF block allocation to extend the
+	 * file as a contiguous extent. If that fails, or it's the first
+	 * allocation in a file, just try for a stripe aligned allocation.
 	 */
-	if (ap->offset) {
+	if (ap->eof) {
 		xfs_extlen_t	nextminlen = 0;
 
 		/*
@@ -3730,7 +3730,8 @@ xfs_bmap_btalloc_best_length(
 	int			error;
 
 	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
-	xfs_bmap_adjacent(ap);
+	if (!xfs_bmap_adjacent(ap))
+		ap->eof = false;
 
 	/*
 	 * Search for an allocation group with a single extent large enough for


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 53/56] xfs: mark xfs_dir_isempty static
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (51 preceding siblings ...)
  2025-02-06 22:48   ` [PATCH 52/56] xfs: fix the entry condition of exact EOF block allocation optimization Darrick J. Wong
@ 2025-02-06 22:48   ` Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 54/56] xfs: remove XFS_ILOG_NONCORE Darrick J. Wong
                     ` (2 subsequent siblings)
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, cem, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 23ebf63925989adbe4c4277c8e9b04e0a37f6005

And return bool instead of a boolean condition as int.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_dir2.c |    6 +++---
 libxfs/xfs_dir2.h |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 0b026d5f55d0fd..29e64603d4ae82 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -196,7 +196,7 @@ xfs_da_unmount(
 /*
  * Return 1 if directory contains only "." and "..".
  */
-int
+static bool
 xfs_dir_isempty(
 	xfs_inode_t	*dp)
 {
@@ -204,9 +204,9 @@ xfs_dir_isempty(
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	if (dp->i_disk_size == 0)	/* might happen during shutdown. */
-		return 1;
+		return true;
 	if (dp->i_disk_size > xfs_inode_data_fork_size(dp))
-		return 0;
+		return false;
 	sfp = dp->i_df.if_data;
 	return !sfp->count;
 }
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index 576068ed81fac2..a6594a5a941d9f 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -58,7 +58,6 @@ extern void xfs_dir_startup(void);
 extern int xfs_da_mount(struct xfs_mount *mp);
 extern void xfs_da_unmount(struct xfs_mount *mp);
 
-extern int xfs_dir_isempty(struct xfs_inode *dp);
 extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_inode *pdp);
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 54/56] xfs: remove XFS_ILOG_NONCORE
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (52 preceding siblings ...)
  2025-02-06 22:48   ` [PATCH 53/56] xfs: mark xfs_dir_isempty static Darrick J. Wong
@ 2025-02-06 22:49   ` Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 55/56] xfs: constify feature checks Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 56/56] xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() Darrick J. Wong
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, cem, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 415dee1e06da431f3d314641ceecb9018bb6fa53

XFS_ILOG_NONCORE is not used in the kernel code or xfsprogs, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_log_format.h |    6 ------
 1 file changed, 6 deletions(-)


diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index ec7157eaba5f68..a472ac2e45d0d8 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -359,12 +359,6 @@ struct xfs_inode_log_format_32 {
  */
 #define XFS_ILOG_IVERSION	0x8000
 
-#define	XFS_ILOG_NONCORE	(XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
-				 XFS_ILOG_DBROOT | XFS_ILOG_DEV | \
-				 XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
-				 XFS_ILOG_ABROOT | XFS_ILOG_DOWNER | \
-				 XFS_ILOG_AOWNER)
-
 #define	XFS_ILOG_DFORK		(XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
 				 XFS_ILOG_DBROOT)
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 55/56] xfs: constify feature checks
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (53 preceding siblings ...)
  2025-02-06 22:49   ` [PATCH 54/56] xfs: remove XFS_ILOG_NONCORE Darrick J. Wong
@ 2025-02-06 22:49   ` Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 56/56] xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() Darrick J. Wong
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, cmaiolino, cem, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 183d988ae9e7ada9d7d4333e2289256e74a5ab5b

They will eventually be needed to be const for zoned growfs, but even
now having such simpler helpers as const as possible is a good thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 include/xfs_mount.h  |   12 ++++++------
 libxfs/xfs_rtgroup.c |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index efe7738bcb416a..383cba7d6e3fee 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -207,7 +207,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
 
 #define __XFS_HAS_FEAT(name, NAME) \
-static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
+static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
 { \
 	return mp->m_features & XFS_FEAT_ ## NAME; \
 }
@@ -253,25 +253,25 @@ __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
 __XFS_HAS_FEAT(metadir, METADIR)
 
 
-static inline bool xfs_has_rtgroups(struct xfs_mount *mp)
+static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
 {
 	/* all metadir file systems also allow rtgroups */
 	return xfs_has_metadir(mp);
 }
 
-static inline bool xfs_has_rtsb(struct xfs_mount *mp)
+static inline bool xfs_has_rtsb(const struct xfs_mount *mp)
 {
 	/* all rtgroups filesystems with an rt section have an rtsb */
 	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
 }
 
-static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp)
+static inline bool xfs_has_rtrmapbt(const struct xfs_mount *mp)
 {
 	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) &&
 	       xfs_has_rmapbt(mp);
 }
 
-static inline bool xfs_has_rtreflink(struct xfs_mount *mp)
+static inline bool xfs_has_rtreflink(const struct xfs_mount *mp)
 {
 	return xfs_has_metadir(mp) && xfs_has_realtime(mp) &&
 	       xfs_has_reflink(mp);
@@ -279,7 +279,7 @@ static inline bool xfs_has_rtreflink(struct xfs_mount *mp)
 
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
-static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
+static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
 { \
 	return false; \
 }
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index ba1a38633a6ead..24fb160b806757 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -335,7 +335,7 @@ struct xfs_rtginode_ops {
 	unsigned int		fmt_mask; /* all valid data fork formats */
 
 	/* Does the fs have this feature? */
-	bool			(*enabled)(struct xfs_mount *mp);
+	bool			(*enabled)(const struct xfs_mount *mp);
 
 	/* Create this rtgroup metadata inode and initialize it. */
 	int			(*create)(struct xfs_rtgroup *rtg,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 56/56] xfs/libxfs: replace kmalloc() and memcpy() with kmemdup()
  2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
                     ` (54 preceding siblings ...)
  2025-02-06 22:49   ` [PATCH 55/56] xfs: constify feature checks Darrick J. Wong
@ 2025-02-06 22:49   ` Darrick J. Wong
  55 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:49 UTC (permalink / raw)
  To: djwong, aalbersh
  Cc: cem, chandanbabu, dchinner, linux-xfs, linux-kernel, mtodorovac69,
	cem, hch, linux-xfs

From: Mirsad Todorovac <mtodorovac69@gmail.com>

Source kernel commit: 9d9b72472631262b35157f1a650f066c0e11c2bb

The source static analysis tool gave the following advice:

./fs/xfs/libxfs/xfs_dir2.c:382:15-22: WARNING opportunity for kmemdup

→ 382         args->value = kmalloc(len,
383                          GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
384         if (!args->value)
385                 return -ENOMEM;
386
→ 387         memcpy(args->value, name, len);
388         args->valuelen = len;
389         return -EEXIST;

Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.

Link: https://lwn.net/Articles/198928/
Fixes: 94a69db2367ef ("xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS")
Fixes: 384f3ced07efd ("[XFS] Return case-insensitive match for dentry cache")
Fixes: 2451337dd0439 ("xfs: global error sign conversion")
Cc: Carlos Maiolino <cem@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chandan Babu R <chandanbabu@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 include/kmem.h    |    9 +++++++++
 libxfs/xfs_dir2.c |    3 +--
 2 files changed, 10 insertions(+), 2 deletions(-)


diff --git a/include/kmem.h b/include/kmem.h
index 16a7957f1acee3..66f8b1fbea8fdf 100644
--- a/include/kmem.h
+++ b/include/kmem.h
@@ -79,4 +79,13 @@ static inline void kfree_rcu_mightsleep(const void *ptr)
 __attribute__((format(printf,2,3)))
 char *kasprintf(gfp_t gfp, const char *fmt, ...);
 
+static inline void *kmemdup(const void *src, size_t len, gfp_t gfp)
+{
+	void *p = kmalloc(len, gfp);
+
+	if (p)
+		memcpy(p, src, len);
+	return p;
+}
+
 #endif
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 29e64603d4ae82..1285019b674423 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -378,12 +378,11 @@ xfs_dir_cilookup_result(
 					!(args->op_flags & XFS_DA_OP_CILOOKUP))
 		return -EEXIST;
 
-	args->value = kmalloc(len,
+	args->value = kmemdup(name, len,
 			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
 	if (!args->value)
 		return -ENOMEM;
 
-	memcpy(args->value, name, len);
 	args->valuelen = len;
 	return -EEXIST;
 }


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
@ 2025-02-06 22:49   ` Darrick J. Wong
  2025-02-07  5:07     ` Christoph Hellwig
  2025-02-06 22:50   ` [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
                     ` (25 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Compute max rt rmap btree height information when we set up libxfs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/init.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index 16291466ac86d3..02a4cfdf38b198 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -21,6 +21,7 @@
 #include "xfs_trans.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_metafile.h"
 #include "libfrog/platform.h"
 #include "libfrog/util.h"
 #include "libxfs/xfile.h"
@@ -598,6 +599,14 @@ xfs_agbtree_compute_maxlevels(
 	mp->m_agbtree_maxlevels = max(levels, mp->m_refc_maxlevels);
 }
 
+/* Compute maximum possible height for realtime btree types for this fs. */
+static inline void
+xfs_rtbtree_compute_maxlevels(
+	struct xfs_mount	*mp)
+{
+	mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels;
+}
+
 /* Compute maximum possible height of all btrees. */
 void
 libxfs_compute_all_maxlevels(
@@ -611,10 +620,11 @@ libxfs_compute_all_maxlevels(
 	igeo->attr_fork_offset = xfs_bmap_compute_attr_offset(mp);
 	xfs_ialloc_setup_geometry(mp);
 	xfs_rmapbt_compute_maxlevels(mp);
+	xfs_rtrmapbt_compute_maxlevels(mp);
 	xfs_refcountbt_compute_maxlevels(mp);
 
 	xfs_agbtree_compute_maxlevels(mp);
-
+	xfs_rtbtree_compute_maxlevels(mp);
 }
 
 /* Mount the metadata files under the metadata directory tree. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization Darrick J. Wong
@ 2025-02-06 22:50   ` Darrick J. Wong
  2025-02-07  5:08     ` Christoph Hellwig
  2025-02-06 22:50   ` [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap Darrick J. Wong
                     ` (24 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/defer_item.c |   35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 9db0e471cba69c..1ca9ce15ecfe3d 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -29,6 +29,7 @@
 #include "xfs_exchmaps.h"
 #include "defer_item.h"
 #include "xfs_group.h"
+#include "xfs_rtgroup.h"
 
 /* Dummy defer item ops, since we don't do logging. */
 
@@ -289,9 +290,18 @@ xfs_rmap_defer_add(
 
 	trace_xfs_rmap_defer(mp, ri);
 
+	/*
+	 * Deferred rmap updates for the realtime and data sections must use
+	 * separate transactions to finish deferred work because updates to
+	 * realtime metadata files can lock AGFs to allocate btree blocks and
+	 * we don't want that mixing with the AGF locks taken to finish data
+	 * section updates.
+	 */
 	ri->ri_group = xfs_group_intent_get(mp, ri->ri_bmap.br_startblock,
-			XG_TYPE_AG);
-	xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type);
+			ri->ri_realtime ? XG_TYPE_RTG : XG_TYPE_AG);
+	xfs_defer_add(tp, &ri->ri_list, ri->ri_realtime ?
+			&xfs_rtrmap_update_defer_type :
+			&xfs_rmap_update_defer_type);
 }
 
 /* Cancel a deferred rmap update. */
@@ -356,6 +366,27 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
 	.cancel_item	= xfs_rmap_update_cancel_item,
 };
 
+/* Clean up after calling xfs_rtrmap_finish_one. */
+STATIC void
+xfs_rtrmap_finish_one_cleanup(
+	struct xfs_trans	*tp,
+	struct xfs_btree_cur	*rcur,
+	int			error)
+{
+	if (rcur)
+		xfs_btree_del_cursor(rcur, error);
+}
+
+const struct xfs_defer_op_type xfs_rtrmap_update_defer_type = {
+	.name		= "rtrmap",
+	.create_intent	= xfs_rmap_update_create_intent,
+	.abort_intent	= xfs_rmap_update_abort_intent,
+	.create_done	= xfs_rmap_update_create_done,
+	.finish_item	= xfs_rmap_update_finish_item,
+	.finish_cleanup = xfs_rtrmap_finish_one_cleanup,
+	.cancel_item	= xfs_rmap_update_cancel_item,
+};
+
 /* Reference Counting */
 
 static inline struct xfs_refcount_intent *ci_entry(const struct list_head *e)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
  2025-02-06 22:49   ` [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization Darrick J. Wong
  2025-02-06 22:50   ` [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
@ 2025-02-06 22:50   ` Darrick J. Wong
  2025-02-07  5:08     ` Christoph Hellwig
  2025-02-06 22:50   ` [PATCH 04/27] man: document userspace API changes due to rt rmap Darrick J. Wong
                     ` (23 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new entry so that we can scrub the rtrmapbt and its metadata
directory tree path too.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libfrog/scrub.c |   10 ++++++++++
 scrub/repair.c  |    1 +
 2 files changed, 11 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index 129f592e108ae1..11d467666d6b5a 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -164,6 +164,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 		.descr	= "realtime group superblock",
 		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
+	[XFS_SCRUB_TYPE_RTRMAPBT] = {
+		.name	= "rtrmapbt",
+		.descr	= "realtime reverse mapping btree",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
@@ -207,6 +212,11 @@ const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
 		.descr	= "project quota file",
 		.group	= XFROG_SCRUB_GROUP_FS,
 	},
+	[XFS_SCRUB_METAPATH_RTRMAPBT] = {
+		.name	= "rtrmapbt",
+		.descr	= "rtgroup rmap btree",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 /* Invoke the scrub ioctl.  Returns zero or negative error code. */
diff --git a/scrub/repair.c b/scrub/repair.c
index c8cdb98de5457b..e6906cbd37d0b3 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -533,6 +533,7 @@ repair_item_difficulty(
 
 		switch (scrub_type) {
 		case XFS_SCRUB_TYPE_RMAPBT:
+		case XFS_SCRUB_TYPE_RTRMAPBT:
 			ret |= REPAIR_DIFFICULTY_SECONDARY;
 			break;
 		case XFS_SCRUB_TYPE_SB:


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 04/27] man: document userspace API changes due to rt rmap
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-02-06 22:50   ` [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap Darrick J. Wong
@ 2025-02-06 22:50   ` Darrick J. Wong
  2025-02-07  5:08     ` Christoph Hellwig
  2025-02-06 22:51   ` [PATCH 05/27] xfs_db: compute average btree height Darrick J. Wong
                     ` (22 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update documentation to describe userspace ABI changes made for realtime
rmap support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 man/man2/ioctl_xfs_rtgroup_geometry.2 |    3 +++
 man/man2/ioctl_xfs_scrub_metadata.2   |   12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)


diff --git a/man/man2/ioctl_xfs_rtgroup_geometry.2 b/man/man2/ioctl_xfs_rtgroup_geometry.2
index c4b0de94453558..37e229b4335459 100644
--- a/man/man2/ioctl_xfs_rtgroup_geometry.2
+++ b/man/man2/ioctl_xfs_rtgroup_geometry.2
@@ -73,6 +73,9 @@ .SH DESCRIPTION
 .TP
 .B XFS_RTGROUP_GEOM_SICK_SUMMARY
 Realtime summary for this group.
+.TP
+.B XFS_RTGROUP_GEOM_SICK_RTRMAPBT
+Reverse mapping btree for this group.
 .RE
 .SH RETURN VALUE
 On error, \-1 is returned, and
diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index 545e3fcbac320e..f06bb98de708a4 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -180,11 +180,12 @@ .SH DESCRIPTION
 .PP
 .nf
 .B XFS_SCRUB_TYPE_RTBITMAP
-.fi
-.TP
 .B XFS_SCRUB_TYPE_RTSUM
-Examine the realtime block bitmap and realtime summary inodes for
-corruption.
+.fi
+.TP
+.B XFS_SCRUB_TYPE_RTRMAPBT
+Examine a given realtime allocation group's free space bitmap, summary file,
+or reverse mapping btree, respectively.
 
 .PP
 .nf
@@ -246,6 +247,9 @@ .SH DESCRIPTION
 .TP
 .B XFS_SCRUB_METAPATH_PRJQUOTA
 Project quota file.
+.TP
+.B XFS_SCRUB_METAPATH_RTRMAPBT
+Realtime rmap btree file.
 .RE
 
 The values of


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 05/27] xfs_db: compute average btree height
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-02-06 22:50   ` [PATCH 04/27] man: document userspace API changes due to rt rmap Darrick J. Wong
@ 2025-02-06 22:51   ` Darrick J. Wong
  2025-02-07  5:09     ` Christoph Hellwig
  2025-02-06 22:51   ` [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork Darrick J. Wong
                     ` (21 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Compute the btree height assuming that the blocks are 75% full.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/btheight.c     |   31 +++++++++++++++++++++++++++++++
 man/man8/xfs_db.8 |    6 +++++-
 2 files changed, 36 insertions(+), 1 deletion(-)


diff --git a/db/btheight.c b/db/btheight.c
index 98165b522e4f6f..0456af08b39edf 100644
--- a/db/btheight.c
+++ b/db/btheight.c
@@ -259,6 +259,7 @@ _("%s: pointer size must be less than selected block size (%u bytes).\n"),
 #define REPORT_MAX	(1 << 0)
 #define REPORT_MIN	(1 << 1)
 #define REPORT_ABSMAX	(1 << 2)
+#define REPORT_AVG	(1 << 3)
 
 static void
 report_absmax(const char *tag)
@@ -317,6 +318,34 @@ _("%s: best case per %u-byte block: %u records (leaf) / %u keyptrs (node)\n"),
 		calc_height(nr_records, records_per_block);
 	}
 
+	if (report_what & REPORT_AVG) {
+		records_per_block[0] *= 3;
+		records_per_block[0] /= 4;
+		records_per_block[1] *= 3;
+		records_per_block[1] /= 4;
+
+		if (records_per_block[0] < 2) {
+			fprintf(stderr,
+_("%s: cannot calculate average case scenario due to leaf geometry underflow.\n"),
+				tag);
+			return;
+		}
+
+		if (records_per_block[1] < 4) {
+			fprintf(stderr,
+_("%s: cannot calculate average case scenario due to node geometry underflow.\n"),
+				tag);
+			return;
+		}
+
+		printf(
+_("%s: average case per %u-byte block: %u records (leaf) / %u keyptrs (node)\n"),
+				tag, blocksize, records_per_block[0],
+				records_per_block[1]);
+
+		calc_height(nr_records, records_per_block);
+	}
+
 	if (report_what & REPORT_MIN) {
 		records_per_block[0] /= 2;
 		records_per_block[1] /= 2;
@@ -393,6 +422,8 @@ btheight_f(
 				report_what = REPORT_MAX;
 			else if (!strcmp(optarg, "absmax"))
 				report_what = REPORT_ABSMAX;
+			else if (!strcmp(optarg, "avg"))
+				report_what = REPORT_AVG;
 			else {
 				btheight_help();
 				return 0;
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 06f4464a928596..acee900adbda50 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -519,7 +519,7 @@ .SH COMMANDS
 Dump all keys and pointers in intermediate btree nodes, and all records in leaf btree nodes.
 .RE
 .TP
-.BI "btheight [\-b " blksz "] [\-n " recs "] [\-w " max "|" min "|" absmax "] btree types..."
+.BI "btheight [\-b " blksz "] [\-n " recs "] [\-w " max "|" min "|" absmax "|" avg "] btree types..."
 For a given number of btree records and a btree type, report the number of
 records and blocks for each level of the btree, and the total number of blocks.
 The btree type must be given after the options.
@@ -562,6 +562,10 @@ .SH COMMANDS
 .B \-w min
 shows only the worst case scenario, which is when the btree blocks are
 half full.
+.TP
+.B \-w avg
+shows only the average case scenario, which is when the btree blocks are
+three quarters full.
 .RE
 .TP
 .BI "convert " "type number" " [" "type number" "] ... " type


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-02-06 22:51   ` [PATCH 05/27] xfs_db: compute average btree height Darrick J. Wong
@ 2025-02-06 22:51   ` Darrick J. Wong
  2025-02-07  5:09     ` Christoph Hellwig
  2025-02-06 22:51   ` [PATCH 07/27] xfs_db: display the realtime rmap btree contents Darrick J. Wong
                     ` (20 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're going to introduce new fork formats, so let's fix the problem that
xfs_db's bmap command aborts when the fork format isn't one of the
existing ones.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/bmap.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)


diff --git a/db/bmap.c b/db/bmap.c
index 1c5694c3f7d281..e8dd98d686cb98 100644
--- a/db/bmap.c
+++ b/db/bmap.c
@@ -65,16 +65,18 @@ bmap(
 	fmt = (enum xfs_dinode_fmt)XFS_DFORK_FORMAT(dip, whichfork);
 	typ = whichfork == XFS_DATA_FORK ? TYP_BMAPBTD : TYP_BMAPBTA;
 	ASSERT(typtab[typ].typnm == typ);
-	ASSERT(fmt == XFS_DINODE_FMT_LOCAL || fmt == XFS_DINODE_FMT_EXTENTS ||
-		fmt == XFS_DINODE_FMT_BTREE);
-	if (fmt == XFS_DINODE_FMT_EXTENTS) {
+	switch (fmt) {
+	case XFS_DINODE_FMT_LOCAL:
+		break;
+	case XFS_DINODE_FMT_EXTENTS:
 		nextents = xfs_dfork_nextents(dip, whichfork);
 		xp = (xfs_bmbt_rec_t *)XFS_DFORK_PTR(dip, whichfork);
 		for (ep = xp; ep < &xp[nextents] && n < nex; ep++) {
 			if (!bmap_one_extent(ep, &curoffset, eoffset, &n, bep))
 				break;
 		}
-	} else if (fmt == XFS_DINODE_FMT_BTREE) {
+		break;
+	case XFS_DINODE_FMT_BTREE:
 		push_cur();
 		rblock = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
 		fsize = XFS_DFORK_SIZE(dip, mp, whichfork);
@@ -114,6 +116,13 @@ bmap(
 			block = (struct xfs_btree_block *)iocur_top->data;
 		}
 		pop_cur();
+		break;
+	default:
+		dbprintf(
+ _("%s fork format %u does not support indexable blocks\n"),
+				whichfork == XFS_DATA_FORK ? "data" : "attr",
+				fmt);
+		break;
 	}
 	pop_cur();
 	*nexp = n;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 07/27] xfs_db: display the realtime rmap btree contents
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-02-06 22:51   ` [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork Darrick J. Wong
@ 2025-02-06 22:51   ` Darrick J. Wong
  2025-02-07  5:16     ` Christoph Hellwig
  2025-02-06 22:51   ` [PATCH 08/27] xfs_db: support the realtime rmapbt Darrick J. Wong
                     ` (19 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Implement all the code we need to dump rtrmapbt contents, starting
from the inode root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/bmroot.c              |  149 ++++++++++++++++++++++++++++++++++++++++++++++
 db/bmroot.h              |    2 +
 db/btblock.c             |  100 +++++++++++++++++++++++++++++++
 db/btblock.h             |    5 ++
 db/field.c               |   11 +++
 db/field.h               |    5 ++
 db/inode.c               |   24 +++++++
 db/type.c                |    5 ++
 db/type.h                |    1 
 libxfs/libxfs_api_defs.h |    5 ++
 man/man8/xfs_db.8        |   60 ++++++++++++++++++-
 11 files changed, 364 insertions(+), 3 deletions(-)


diff --git a/db/bmroot.c b/db/bmroot.c
index 7ef07da181e6ff..19490bd24998c5 100644
--- a/db/bmroot.c
+++ b/db/bmroot.c
@@ -24,6 +24,13 @@ static int	bmrootd_key_offset(void *obj, int startoff, int idx);
 static int	bmrootd_ptr_count(void *obj, int startoff);
 static int	bmrootd_ptr_offset(void *obj, int startoff, int idx);
 
+static int	rtrmaproot_rec_count(void *obj, int startoff);
+static int	rtrmaproot_rec_offset(void *obj, int startoff, int idx);
+static int	rtrmaproot_key_count(void *obj, int startoff);
+static int	rtrmaproot_key_offset(void *obj, int startoff, int idx);
+static int	rtrmaproot_ptr_count(void *obj, int startoff);
+static int	rtrmaproot_ptr_offset(void *obj, int startoff, int idx);
+
 #define	OFF(f)	bitize(offsetof(xfs_bmdr_block_t, bb_ ## f))
 const field_t	bmroota_flds[] = {
 	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
@@ -54,6 +61,20 @@ const field_t	bmrootd_key_flds[] = {
 	{ NULL }
 };
 
+/* realtime rmap btree root */
+const field_t	rtrmaproot_flds[] = {
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_RTRMAPBTREC, rtrmaproot_rec_offset, rtrmaproot_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_RTRMAPBTKEY, rtrmaproot_key_offset, rtrmaproot_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_RTRMAPBTPTR, rtrmaproot_ptr_offset, rtrmaproot_ptr_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT },
+	{ NULL }
+};
+#undef OFF
+
 static int
 bmroota_key_count(
 	void			*obj,
@@ -241,3 +262,131 @@ bmrootd_size(
 	dip = obj;
 	return bitize((int)XFS_DFORK_DSIZE(dip, mp));
 }
+
+/* realtime rmap root */
+static int
+rtrmaproot_rec_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_rtrmap_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode	*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) > 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrmaproot_rec_offset(
+	void			*obj,
+	int			startoff,
+	int			idx)
+{
+	struct xfs_rtrmap_root	*block;
+	struct xfs_rmap_rec	*kp;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) == 0);
+	kp = xfs_rtrmap_droot_rec_addr(block, idx);
+	return bitize((int)((char *)kp - (char *)block));
+}
+
+static int
+rtrmaproot_key_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_rtrmap_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode	*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) == 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrmaproot_key_offset(
+	void			*obj,
+	int			startoff,
+	int			idx)
+{
+	struct xfs_rtrmap_root	*block;
+	struct xfs_rmap_key	*kp;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) > 0);
+	kp = xfs_rtrmap_droot_key_addr(block, idx);
+	return bitize((int)((char *)kp - (char *)block));
+}
+
+static int
+rtrmaproot_ptr_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_rtrmap_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode	*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) == 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrmaproot_ptr_offset(
+	void			*obj,
+	int			startoff,
+	int			idx)
+{
+	struct xfs_rtrmap_root	*block;
+	xfs_rtrmap_ptr_t	*pp;
+	struct xfs_dinode	*dip;
+	int			dmxr;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	dip = obj;
+	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) > 0);
+	dmxr = libxfs_rtrmapbt_droot_maxrecs(XFS_DFORK_DSIZE(dip, mp), false);
+	pp = xfs_rtrmap_droot_ptr_addr(block, idx, dmxr);
+	return bitize((int)((char *)pp - (char *)block));
+}
+
+int
+rtrmaproot_size(
+	void			*obj,
+	int			startoff,
+	int			idx)
+{
+	struct xfs_dinode	*dip;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	ASSERT(idx == 0);
+	dip = obj;
+	return bitize((int)XFS_DFORK_DSIZE(dip, mp));
+}
diff --git a/db/bmroot.h b/db/bmroot.h
index a1274cf6a94bdf..a2c5cfb18f0bb2 100644
--- a/db/bmroot.h
+++ b/db/bmroot.h
@@ -8,6 +8,8 @@ extern const struct field	bmroota_flds[];
 extern const struct field	bmroota_key_flds[];
 extern const struct field	bmrootd_flds[];
 extern const struct field	bmrootd_key_flds[];
+extern const struct field	rtrmaproot_flds[];
 
 extern int	bmroota_size(void *obj, int startoff, int idx);
 extern int	bmrootd_size(void *obj, int startoff, int idx);
+extern int	rtrmaproot_size(void *obj, int startoff, int idx);
diff --git a/db/btblock.c b/db/btblock.c
index d5be6adb734cef..5cad166278d98b 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -92,6 +92,12 @@ static struct xfs_db_btree {
 		sizeof(struct xfs_rmap_rec),
 		sizeof(__be32),
 	},
+	{	XFS_RTRMAP_CRC_MAGIC,
+		XFS_BTREE_LBLOCK_CRC_LEN,
+		2 * sizeof(struct xfs_rmap_key),
+		sizeof(struct xfs_rmap_rec),
+		sizeof(__be64),
+	},
 	{	XFS_REFC_CRC_MAGIC,
 		XFS_BTREE_SBLOCK_CRC_LEN,
 		sizeof(struct xfs_refcount_key),
@@ -813,6 +819,100 @@ const field_t	rmapbt_rec_flds[] = {
 	{ NULL }
 };
 
+/* realtime RMAP btree blocks */
+const field_t	rtrmapbt_crc_hfld[] = {
+	{ "", FLDT_RTRMAPBT_CRC, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+#define	OFF(f)	bitize(offsetof(struct xfs_btree_block, bb_ ## f))
+const field_t	rtrmapbt_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "leftsib", FLDT_DFSBNO, OI(OFF(u.l.bb_leftsib)), C1, 0, TYP_RTRMAPBT },
+	{ "rightsib", FLDT_DFSBNO, OI(OFF(u.l.bb_rightsib)), C1, 0, TYP_RTRMAPBT },
+	{ "bno", FLDT_DFSBNO, OI(OFF(u.l.bb_blkno)), C1, 0, TYP_RTRMAPBT },
+	{ "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_RTRMAPBTREC, btblock_rec_offset, btblock_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_RTRMAPBTKEY, btblock_key_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_RTRMAPBTPTR, btblock_ptr_offset, btblock_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT },
+	{ NULL }
+};
+#undef OFF
+
+#define	KOFF(f)	bitize(offsetof(struct xfs_rmap_key, rm_ ## f))
+
+#define RTRMAPBK_STARTBLOCK_BITOFF	0
+#define RTRMAPBK_OWNER_BITOFF		(RTRMAPBK_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RTRMAPBK_ATTRFLAG_BITOFF	(RTRMAPBK_OWNER_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RTRMAPBK_BMBTFLAG_BITOFF	(RTRMAPBK_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RTRMAPBK_EXNTFLAG_BITOFF	(RTRMAPBK_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RTRMAPBK_UNUSED_OFFSET_BITOFF	(RTRMAPBK_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RTRMAPBK_OFFSET_BITOFF		(RTRMAPBK_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+#define HI_KOFF(f)	bitize(sizeof(struct xfs_rmap_key) + offsetof(struct xfs_rmap_key, rm_ ## f))
+
+#define RTRMAPBK_STARTBLOCKHI_BITOFF	(bitize(sizeof(struct xfs_rmap_key)))
+#define RTRMAPBK_OWNERHI_BITOFF		(RTRMAPBK_STARTBLOCKHI_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RTRMAPBK_ATTRFLAGHI_BITOFF	(RTRMAPBK_OWNERHI_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RTRMAPBK_BMBTFLAGHI_BITOFF	(RTRMAPBK_ATTRFLAGHI_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RTRMAPBK_EXNTFLAGHI_BITOFF	(RTRMAPBK_BMBTFLAGHI_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RTRMAPBK_UNUSED_OFFSETHI_BITOFF	(RTRMAPBK_EXNTFLAGHI_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RTRMAPBK_OFFSETHI_BITOFF	(RTRMAPBK_UNUSED_OFFSETHI_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+const field_t	rtrmapbt_key_flds[] = {
+	{ "startblock", FLDT_RGBLOCK, OI(KOFF(startblock)), C1, 0, TYP_DATA },
+	{ "owner", FLDT_INT64D, OI(KOFF(owner)), C1, 0, TYP_NONE },
+	{ "offset", FLDT_RFILEOFFD, OI(RTRMAPBK_OFFSET_BITOFF), C1, 0, TYP_NONE },
+	{ "attrfork", FLDT_RATTRFORKFLG, OI(RTRMAPBK_ATTRFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock", FLDT_RBMBTFLG, OI(RTRMAPBK_BMBTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "startblock_hi", FLDT_RGBLOCK, OI(HI_KOFF(startblock)), C1, 0, TYP_DATA },
+	{ "owner_hi", FLDT_INT64D, OI(HI_KOFF(owner)), C1, 0, TYP_NONE },
+	{ "offset_hi", FLDT_RFILEOFFD, OI(RTRMAPBK_OFFSETHI_BITOFF), C1, 0, TYP_NONE },
+	{ "attrfork_hi", FLDT_RATTRFORKFLG, OI(RTRMAPBK_ATTRFLAGHI_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock_hi", FLDT_RBMBTFLG, OI(RTRMAPBK_BMBTFLAGHI_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ NULL }
+};
+#undef HI_KOFF
+#undef KOFF
+
+#define	ROFF(f)	bitize(offsetof(struct xfs_rmap_rec, rm_ ## f))
+
+#define RTRMAPBT_STARTBLOCK_BITOFF	0
+#define RTRMAPBT_BLOCKCOUNT_BITOFF	(RTRMAPBT_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN)
+#define RTRMAPBT_OWNER_BITOFF		(RTRMAPBT_BLOCKCOUNT_BITOFF + RMAPBT_BLOCKCOUNT_BITLEN)
+#define RTRMAPBT_ATTRFLAG_BITOFF	(RTRMAPBT_OWNER_BITOFF + RMAPBT_OWNER_BITLEN)
+#define RTRMAPBT_BMBTFLAG_BITOFF	(RTRMAPBT_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN)
+#define RTRMAPBT_EXNTFLAG_BITOFF	(RTRMAPBT_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN)
+#define RTRMAPBT_UNUSED_OFFSET_BITOFF	(RTRMAPBT_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN)
+#define RTRMAPBT_OFFSET_BITOFF		(RTRMAPBT_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN)
+
+const field_t	rtrmapbt_rec_flds[] = {
+	{ "startblock", FLDT_RGBLOCK, OI(RTRMAPBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "blockcount", FLDT_EXTLEN, OI(RTRMAPBT_BLOCKCOUNT_BITOFF), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INT64D, OI(RTRMAPBT_OWNER_BITOFF), C1, 0, TYP_NONE },
+	{ "offset", FLDT_RFILEOFFD, OI(RTRMAPBT_OFFSET_BITOFF), C1, 0, TYP_NONE },
+	{ "extentflag", FLDT_REXTFLG, OI(RTRMAPBT_EXNTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "attrfork", FLDT_RATTRFORKFLG, OI(RTRMAPBT_ATTRFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ "bmbtblock", FLDT_RBMBTFLG, OI(RTRMAPBT_BMBTFLAG_BITOFF), C1, 0,
+	  TYP_NONE },
+	{ NULL }
+};
+#undef ROFF
+
 /* refcount btree blocks */
 const field_t	refcbt_crc_hfld[] = {
 	{ "", FLDT_REFCBT_CRC, OI(0), C1, 0, TYP_NONE },
diff --git a/db/btblock.h b/db/btblock.h
index 4168c9e2e15ac4..b4013ea8073ec6 100644
--- a/db/btblock.h
+++ b/db/btblock.h
@@ -53,6 +53,11 @@ extern const struct field	rmapbt_crc_hfld[];
 extern const struct field	rmapbt_key_flds[];
 extern const struct field	rmapbt_rec_flds[];
 
+extern const struct field	rtrmapbt_crc_flds[];
+extern const struct field	rtrmapbt_crc_hfld[];
+extern const struct field	rtrmapbt_key_flds[];
+extern const struct field	rtrmapbt_rec_flds[];
+
 extern const struct field	refcbt_crc_flds[];
 extern const struct field	refcbt_crc_hfld[];
 extern const struct field	refcbt_key_flds[];
diff --git a/db/field.c b/db/field.c
index ca0fe1826f9a80..60c4e16d781f48 100644
--- a/db/field.c
+++ b/db/field.c
@@ -194,6 +194,17 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_RMAPBTREC, "rmapbtrec", fp_sarray, (char *)rmapbt_rec_flds,
 	  SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rmapbt_rec_flds },
 
+	{ FLDT_RTRMAPBT_CRC, "rtrmapbt", NULL, (char *)rtrmapbt_crc_flds, btblock_size,
+	  FTARG_SIZE, NULL, rtrmapbt_crc_flds },
+	{ FLDT_RTRMAPBTKEY, "rtrmapbtkey", fp_sarray, (char *)rtrmapbt_key_flds,
+	  SI(bitize(2 * sizeof(struct xfs_rmap_key))), 0, NULL, rtrmapbt_key_flds },
+	{ FLDT_RTRMAPBTPTR, "rtrmapbtptr", fp_num, "%llu",
+	  SI(bitsz(xfs_rtrmap_ptr_t)), 0, fa_dfsbno, NULL },
+	{ FLDT_RTRMAPBTREC, "rtrmapbtrec", fp_sarray, (char *)rtrmapbt_rec_flds,
+	  SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rtrmapbt_rec_flds },
+	{ FLDT_RTRMAPROOT, "rtrmaproot", NULL, (char *)rtrmaproot_flds, rtrmaproot_size,
+	  FTARG_SIZE, NULL, rtrmaproot_flds },
+
 	{ FLDT_REFCBT_CRC, "refcntbt", NULL, (char *)refcbt_crc_flds, btblock_size,
 	  FTARG_SIZE, NULL, refcbt_crc_flds },
 	{ FLDT_REFCBTKEY, "refcntbtkey", fp_sarray, (char *)refcbt_key_flds,
diff --git a/db/field.h b/db/field.h
index 1d7465b4d3e562..67b6cb2a798719 100644
--- a/db/field.h
+++ b/db/field.h
@@ -84,6 +84,11 @@ typedef enum fldt	{
 	FLDT_RMAPBTKEY,
 	FLDT_RMAPBTPTR,
 	FLDT_RMAPBTREC,
+	FLDT_RTRMAPBT_CRC,
+	FLDT_RTRMAPBTKEY,
+	FLDT_RTRMAPBTPTR,
+	FLDT_RTRMAPBTREC,
+	FLDT_RTRMAPROOT,
 	FLDT_REFCBT_CRC,
 	FLDT_REFCBTKEY,
 	FLDT_REFCBTPTR,
diff --git a/db/inode.c b/db/inode.c
index 0a80b8d063603f..45368a3343a17a 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -48,6 +48,7 @@ static int	inode_u_muuid_count(void *obj, int startoff);
 static int	inode_u_sfdir2_count(void *obj, int startoff);
 static int	inode_u_sfdir3_count(void *obj, int startoff);
 static int	inode_u_symlink_count(void *obj, int startoff);
+static int	inode_u_rtrmapbt_count(void *obj, int startoff);
 
 static const cmdinfo_t	inode_cmd =
 	{ "inode", NULL, inode_f, 0, 1, 1, "[inode#]",
@@ -233,6 +234,8 @@ const field_t	inode_u_flds[] = {
 	{ "sfdir3", FLDT_DIR3SF, NULL, inode_u_sfdir3_count, FLD_COUNT, TYP_NONE },
 	{ "symlink", FLDT_CHARNS, NULL, inode_u_symlink_count, FLD_COUNT,
 	  TYP_NONE },
+	{ "rtrmapbt", FLDT_RTRMAPROOT, NULL, inode_u_rtrmapbt_count, FLD_COUNT,
+	  TYP_NONE },
 	{ NULL }
 };
 
@@ -246,7 +249,7 @@ const field_t	inode_a_flds[] = {
 };
 
 static const char	*dinode_fmt_name[] =
-	{ "dev", "local", "extents", "btree", "uuid" };
+	{ "dev", "local", "extents", "btree", "uuid", "meta_btree" };
 static const int	dinode_fmt_name_size =
 	sizeof(dinode_fmt_name) / sizeof(dinode_fmt_name[0]);
 
@@ -299,7 +302,7 @@ fp_dinode_fmt(
 
 static const char	*metatype_name[] =
 	{ "unknown", "dir", "usrquota", "grpquota", "prjquota", "rtbitmap",
-	  "rtsummary"
+	  "rtsummary", "rtrmap"
 	};
 static const int	metatype_name_size = ARRAY_SIZE(metatype_name);
 
@@ -717,6 +720,8 @@ inode_next_type(void)
 				return TYP_RGBITMAP;
 			case XFS_METAFILE_RTSUMMARY:
 				return TYP_RGSUMMARY;
+			case XFS_METAFILE_RTRMAP:
+				return TYP_RTRMAPBT;
 			default:
 				return TYP_DATA;
 			}
@@ -866,6 +871,21 @@ inode_u_sfdir3_count(
 	       xfs_has_ftype(mp);
 }
 
+static int
+inode_u_rtrmapbt_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_dinode	*dip;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	dip = obj;
+	ASSERT((char *)XFS_DFORK_DPTR(dip) - (char *)dip == byteize(startoff));
+	return dip->di_format == XFS_DINODE_FMT_META_BTREE &&
+	       dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP);
+}
+
 int
 inode_u_size(
 	void			*obj,
diff --git a/db/type.c b/db/type.c
index 2091b4ac8b139b..1dfc33ffb44b87 100644
--- a/db/type.c
+++ b/db/type.c
@@ -51,6 +51,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RMAPBT, NULL },
+	{ TYP_RTRMAPBT, NULL },
 	{ TYP_REFCBT, NULL },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL, TYP_F_NO_CRC_OFF },
@@ -91,6 +92,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_cntbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RTRMAPBT, "rtrmapbt", handle_struct, rtrmapbt_crc_hfld,
+		&xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
@@ -141,6 +144,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_cntbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld,
 		&xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RTRMAPBT, "rtrmapbt", handle_struct, rtrmapbt_crc_hfld,
+		&xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
diff --git a/db/type.h b/db/type.h
index e7f0ecc17680bf..c98f3640202e87 100644
--- a/db/type.h
+++ b/db/type.h
@@ -20,6 +20,7 @@ typedef enum typnm
 	TYP_BNOBT,
 	TYP_CNTBT,
 	TYP_RMAPBT,
+	TYP_RTRMAPBT,
 	TYP_REFCBT,
 	TYP_DATA,
 	TYP_DIR2,
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 6b2dc7a30d2547..fcbcaaa11f1025 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -295,6 +295,8 @@
 #define xfs_rtginode_name		libxfs_rtginode_name
 #define xfs_rtsummary_create		libxfs_rtsummary_create
 
+#define xfs_rtginode_load		libxfs_rtginode_load
+#define xfs_rtginode_load_parent	libxfs_rtginode_load_parent
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
 #define xfs_rtgroup_extents		libxfs_rtgroup_extents
 #define xfs_rtgroup_grab		libxfs_rtgroup_grab
@@ -308,6 +310,9 @@
 #define xfs_rtfree_extent		libxfs_rtfree_extent
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
 #define xfs_update_rtsb			libxfs_update_rtsb
+#define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
+#define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
+
 #define xfs_sb_from_disk		libxfs_sb_from_disk
 #define xfs_sb_mount_rextsize		libxfs_sb_mount_rextsize
 #define xfs_sb_quota_from_disk		libxfs_sb_quota_from_disk
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index acee900adbda50..ddbabe36b5fc41 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1302,7 +1302,7 @@ .SH COMMANDS
 .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd ,
 .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk ,
 .BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap ,
-.BR rtsummary ", " sb ", " symlink " and " text .
+.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", and " text .
 See the TYPES section below for more information on these data types.
 .TP
 .BI "timelimit [" OPTIONS ]
@@ -2450,6 +2450,64 @@ .SH TYPES
 .PD
 .RE
 .TP
+.B rtrmapbt
+There is one reverse mapping Btree for each realtime group.
+The
+.BR startblock " and "
+.B blockcount
+fields are 32 bits wide and record blocks within a realtime group.
+The root of this Btree is the reverse-mapping inode, which is recorded in the
+metadata directory.
+Blocks are linked to sibling left and right blocks at each level, as well as by
+pointers from parent to child blocks.
+Each block has the following fields:
+.RS 1.4i
+.PD 0
+.TP 1.2i
+.B magic
+RTRMAP block magic number, 0x4d415052 ('MAPR').
+.TP
+.B level
+level number of this block, 0 is a leaf.
+.TP
+.B numrecs
+number of data entries in the block.
+.TP
+.B leftsib
+left (logically lower) sibling block, 0 if none.
+.TP
+.B rightsib
+right (logically higher) sibling block, 0 if none.
+.TP
+.B recs
+[leaf blocks only] array of reference count records. Each record contains
+.BR startblock ,
+.BR blockcount ,
+.BR owner ,
+.BR offset ,
+.BR attr_fork ,
+.BR bmbt_block ,
+and
+.BR unwritten .
+.TP
+.B keys
+[non-leaf blocks only] array of double-key records. The first ("low") key
+contains the first value of each block in the level below this one. The second
+("high") key contains the largest key that can be used to identify any record
+in the subtree. Each record contains
+.BR startblock ,
+.BR owner ,
+.BR offset ,
+.BR attr_fork ,
+and
+.BR bmbt_block .
+.TP
+.B ptrs
+[non-leaf blocks only] array of child block pointers. Each pointer is a
+block number within the allocation group to the next level in the Btree.
+.PD
+.RE
+.TP
 .B rtbitmap
 If the filesystem has a realtime subvolume, then the
 .B rbmino


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 08/27] xfs_db: support the realtime rmapbt
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-02-06 22:51   ` [PATCH 07/27] xfs_db: display the realtime rmap btree contents Darrick J. Wong
@ 2025-02-06 22:51   ` Darrick J. Wong
  2025-02-07  5:17     ` Christoph Hellwig
  2025-02-06 22:52   ` [PATCH 09/27] xfs_db: copy the realtime rmap btree Darrick J. Wong
                     ` (18 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Wire up various parts of xfs_db for realtime rmap support so that we can
dump the btree contents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/btblock.c             |    3 ++
 db/btdump.c              |   63 ++++++++++++++++++++++++++++++++++++++++++++++
 db/btheight.c            |    5 ++++
 libxfs/libxfs_api_defs.h |    1 +
 man/man8/xfs_db.8        |    3 +-
 5 files changed, 74 insertions(+), 1 deletion(-)


diff --git a/db/btblock.c b/db/btblock.c
index 5cad166278d98b..70f6c3f6aedde5 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -147,6 +147,9 @@ block_to_bt(
 	case TYP_RMAPBT:
 		magic = crc ? XFS_RMAP_CRC_MAGIC : 0;
 		break;
+	case TYP_RTRMAPBT:
+		magic = crc ? XFS_RTRMAP_CRC_MAGIC : 0;
+		break;
 	case TYP_REFCBT:
 		magic = crc ? XFS_REFC_CRC_MAGIC : 0;
 		break;
diff --git a/db/btdump.c b/db/btdump.c
index 81642cde2b6d8f..55301d25de10cd 100644
--- a/db/btdump.c
+++ b/db/btdump.c
@@ -441,6 +441,66 @@ dump_dabtree(
 	return ret;
 }
 
+static bool
+is_btree_inode(void)
+{
+	struct xfs_dinode	*dip = iocur_top->data;
+
+	return dip->di_format == XFS_DINODE_FMT_META_BTREE;
+}
+
+static int
+dump_btree_inode(
+	bool			dump_node_blocks)
+{
+	char			*prefix;
+	struct xfs_dinode	*dip = iocur_top->data;
+	struct xfs_rtrmap_root	*rtrmap;
+	int			level;
+	int			numrecs;
+	int			ret;
+
+	switch (be16_to_cpu(dip->di_metatype)) {
+	case XFS_METAFILE_RTRMAP:
+		prefix = "u3.rtrmapbt";
+		rtrmap = (struct xfs_rtrmap_root *)XFS_DFORK_DPTR(dip);
+		level = be16_to_cpu(rtrmap->bb_level);
+		numrecs = be16_to_cpu(rtrmap->bb_numrecs);
+		break;
+	default:
+		dbprintf("Unknown metadata inode btree type %u\n",
+				be16_to_cpu(dip->di_metatype));
+		return 0;
+	}
+
+	if (numrecs == 0)
+		return 0;
+	if (level > 0) {
+		if (dump_node_blocks) {
+			ret = eval("print %s.keys", prefix);
+			if (ret)
+				goto err;
+			ret = eval("print %s.ptrs", prefix);
+			if (ret)
+				goto err;
+		}
+		ret = eval("addr %s.ptrs[1]", prefix);
+		if (ret)
+			goto err;
+		ret = dump_btree_long(dump_node_blocks);
+	} else {
+		ret = eval("print %s.recs", prefix);
+	}
+	if (ret)
+		goto err;
+
+	ret = eval("pop");
+	return ret;
+err:
+	eval("pop");
+	return ret;
+}
+
 static int
 btdump_f(
 	int		argc,
@@ -488,8 +548,11 @@ btdump_f(
 		return dump_btree_short(iflag);
 	case TYP_BMAPBTA:
 	case TYP_BMAPBTD:
+	case TYP_RTRMAPBT:
 		return dump_btree_long(iflag);
 	case TYP_INODE:
+		if (is_btree_inode())
+			return dump_btree_inode(iflag);
 		return dump_inode(iflag, aflag);
 	case TYP_ATTR:
 		return dump_dabtree(iflag, crc ? &attr3_print : &attr_print);
diff --git a/db/btheight.c b/db/btheight.c
index 0456af08b39edf..31dff1c924a2e0 100644
--- a/db/btheight.c
+++ b/db/btheight.c
@@ -53,6 +53,11 @@ struct btmap {
 		.maxlevels	= libxfs_rmapbt_maxlevels_ondisk,
 		.maxrecs	= libxfs_rmapbt_maxrecs,
 	},
+	{
+		.tag		= "rtrmapbt",
+		.maxlevels	= libxfs_rtrmapbt_maxlevels_ondisk,
+		.maxrecs	= libxfs_rtrmapbt_maxrecs,
+	},
 };
 
 static void
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index fcbcaaa11f1025..cf88656946ab1b 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -311,6 +311,7 @@
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
 #define xfs_update_rtsb			libxfs_update_rtsb
 #define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
+#define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
 #define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
 
 #define xfs_sb_from_disk		libxfs_sb_from_disk
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index ddbabe36b5fc41..83a8734d3c0b47 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -535,8 +535,9 @@ .SH COMMANDS
 .IR finobt ,
 .IR bmapbt ,
 .IR refcountbt ,
+.IR rmapbt ,
 and
-.IR rmapbt .
+.IR rtrmapbt .
 The magic value
 .I all
 can be used to walk through all btree types.


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 09/27] xfs_db: copy the realtime rmap btree
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-02-06 22:51   ` [PATCH 08/27] xfs_db: support the realtime rmapbt Darrick J. Wong
@ 2025-02-06 22:52   ` Darrick J. Wong
  2025-02-07  5:17     ` Christoph Hellwig
  2025-02-06 22:52   ` [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong
                     ` (17 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/metadump.c |  120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index 4f4b4f8a39a551..aa946746cdfa68 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -589,6 +589,55 @@ copy_rmap_btree(
 	return scan_btree(agno, root, levels, TYP_RMAPBT, agf, scanfunc_rmapbt);
 }
 
+static int
+scanfunc_rtrmapbt(
+	struct xfs_btree_block	*block,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	int			level,
+	typnm_t			btype,
+	void			*arg)
+{
+	xfs_rtrmap_ptr_t	*pp;
+	int			i;
+	int			numrecs;
+
+	if (level == 0)
+		return 1;
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (numrecs > mp->m_rtrmap_mxr[1]) {
+		if (metadump.show_warnings)
+			print_warning("invalid numrecs (%u) in %s block %u/%u",
+				numrecs, typtab[btype].name, agno, agbno);
+		return 1;
+	}
+
+	pp = xfs_rtrmap_ptr_addr(block, 1, mp->m_rtrmap_mxr[1]);
+	for (i = 0; i < numrecs; i++) {
+		xfs_agnumber_t	pagno;
+		xfs_agblock_t	pbno;
+
+		pagno = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i]));
+		pbno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i]));
+
+		if (pbno == 0 || pbno > mp->m_sb.sb_agblocks ||
+		    pagno > mp->m_sb.sb_agcount) {
+			if (metadump.show_warnings)
+				print_warning("invalid block number (%u/%u) "
+						"in inode %llu %s block %u/%u",
+						pagno, pbno,
+						(unsigned long long)metadump.cur_ino,
+						typtab[btype].name, agno, agbno);
+			continue;
+		}
+		if (!scan_btree(pagno, pbno, level, btype, arg,
+				scanfunc_rtrmapbt))
+			return 0;
+	}
+	return 1;
+}
+
 static int
 scanfunc_refcntbt(
 	struct xfs_btree_block	*block,
@@ -2325,6 +2374,69 @@ process_exinode(
 			nex);
 }
 
+static int
+process_rtrmap(
+	struct xfs_dinode	*dip)
+{
+	int			whichfork = XFS_DATA_FORK;
+	struct xfs_rtrmap_root	*dib =
+		(struct xfs_rtrmap_root *)XFS_DFORK_PTR(dip, whichfork);
+	xfs_rtrmap_ptr_t	*pp;
+	int			level = be16_to_cpu(dib->bb_level);
+	int			nrecs = be16_to_cpu(dib->bb_numrecs);
+	typnm_t			btype = TYP_RTRMAPBT;
+	int			maxrecs;
+	int			i;
+
+	if (level > mp->m_rtrmap_maxlevels) {
+		if (metadump.show_warnings)
+			print_warning("invalid level (%u) in inode %lld %s "
+					"root", level,
+					(unsigned long long)metadump.cur_ino,
+					typtab[btype].name);
+		return 1;
+	}
+
+	if (level == 0)
+		return 1;
+
+	maxrecs = libxfs_rtrmapbt_droot_maxrecs(
+			XFS_DFORK_SIZE(dip, mp, whichfork),
+			false);
+	if (nrecs > maxrecs) {
+		if (metadump.show_warnings)
+			print_warning("invalid numrecs (%u) in inode %lld %s "
+					"root", nrecs,
+					(unsigned long long)metadump.cur_ino,
+					typtab[btype].name);
+		return 1;
+	}
+
+	pp = xfs_rtrmap_droot_ptr_addr(dib, 1, maxrecs);
+	for (i = 0; i < nrecs; i++) {
+		xfs_agnumber_t	ag;
+		xfs_agblock_t	bno;
+
+		ag = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i]));
+		bno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i]));
+
+		if (bno == 0 || bno > mp->m_sb.sb_agblocks ||
+				ag > mp->m_sb.sb_agcount) {
+			if (metadump.show_warnings)
+				print_warning("invalid block number (%u/%u) "
+						"in inode %llu %s root", ag,
+						bno,
+						(unsigned long long)metadump.cur_ino,
+						typtab[btype].name);
+			continue;
+		}
+
+		if (!scan_btree(ag, bno, level, btype, NULL, scanfunc_rtrmapbt))
+			return 0;
+	}
+	return 1;
+}
+
 static int
 process_inode_data(
 	struct xfs_dinode	*dip)
@@ -2366,6 +2478,14 @@ process_inode_data(
 
 	case XFS_DINODE_FMT_BTREE:
 		return process_btinode(dip, XFS_DATA_FORK);
+
+	case XFS_DINODE_FMT_META_BTREE:
+		switch (be16_to_cpu(dip->di_metatype)) {
+		case XFS_METAFILE_RTRMAP:
+			return process_rtrmap(dip);
+		default:
+			return 1;
+		}
 	}
 	return 1;
 }


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-02-06 22:52   ` [PATCH 09/27] xfs_db: copy the realtime rmap btree Darrick J. Wong
@ 2025-02-06 22:52   ` Darrick J. Wong
  2025-02-07  5:18     ` Christoph Hellwig
  2025-02-06 22:52   ` [PATCH 11/27] xfs_db: add an rgresv command Darrick J. Wong
                     ` (16 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/fsmap.c               |  149 +++++++++++++++++++++++++++++++++++++++++++++-
 libxfs/libxfs_api_defs.h |    3 +
 2 files changed, 148 insertions(+), 4 deletions(-)


diff --git a/db/fsmap.c b/db/fsmap.c
index a9259c4632185b..ddbe4e6a3dfcfa 100644
--- a/db/fsmap.c
+++ b/db/fsmap.c
@@ -102,6 +102,134 @@ fsmap(
 	}
 }
 
+static int
+fsmap_rt_fn(
+	struct xfs_btree_cur		*cur,
+	const struct xfs_rmap_irec	*rec,
+	void				*priv)
+{
+	struct fsmap_info		*info = priv;
+
+	dbprintf(_("%llu: %u/%u len %u owner %lld offset %llu bmbt %d attrfork %d extflag %d\n"),
+		info->nr, cur->bc_group->xg_gno, rec->rm_startblock,
+		rec->rm_blockcount, rec->rm_owner, rec->rm_offset,
+		!!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK),
+		!!(rec->rm_flags & XFS_RMAP_ATTR_FORK),
+		!!(rec->rm_flags & XFS_RMAP_UNWRITTEN));
+	info->nr++;
+
+	return 0;
+}
+
+static int
+fsmap_rtgroup(
+	struct xfs_rtgroup		*rtg,
+	const struct xfs_rmap_irec	*low,
+	const struct xfs_rmap_irec	*high,
+	struct fsmap_info		*info)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_trans	*tp;
+	struct xfs_btree_cur	*bt_cur;
+	int			error;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error) {
+		dbprintf(
+ _("Cannot alloc transaction to look up rtgroup %u rmap inode\n"),
+				rtg_rgno(rtg));
+		return error;
+	}
+
+	error = -libxfs_rtginode_load_parent(tp);
+	if (error) {
+		dbprintf(_("Cannot load realtime metadir, error %d\n"),
+			error);
+		goto out_trans;
+	}
+
+	error = -libxfs_rtginode_load(rtg, XFS_RTGI_RMAP, tp);
+	if (error) {
+		dbprintf(_("Cannot load rtgroup %u rmap inode, error %d\n"),
+			rtg_rgno(rtg), error);
+		goto out_rele_dp;
+	}
+
+	bt_cur = libxfs_rtrmapbt_init_cursor(tp, rtg);
+	if (!bt_cur) {
+		dbprintf(_("Not enough memory.\n"));
+		goto out_rele_ip;
+	}
+
+	error = -libxfs_rmap_query_range(bt_cur, low, high, fsmap_rt_fn,
+			info);
+	if (error) {
+		dbprintf(_("Error %d while querying rt fsmap btree.\n"),
+			error);
+		goto out_cur;
+	}
+
+out_cur:
+	libxfs_btree_del_cursor(bt_cur, error);
+out_rele_ip:
+	libxfs_rtginode_irele(&rtg->rtg_inodes[XFS_RTGI_RMAP]);
+out_rele_dp:
+	libxfs_rtginode_irele(&mp->m_rtdirip);
+out_trans:
+	libxfs_trans_cancel(tp);
+	return error;
+}
+
+static void
+fsmap_rt(
+	xfs_fsblock_t		start_fsb,
+	xfs_fsblock_t		end_fsb)
+{
+	struct fsmap_info	info;
+	xfs_daddr_t		eofs;
+	struct xfs_rmap_irec	low;
+	struct xfs_rmap_irec	high;
+	struct xfs_rtgroup	*rtg = NULL;
+	xfs_rgnumber_t		start_rg;
+	xfs_rgnumber_t		end_rg;
+	int			error;
+
+	if (mp->m_sb.sb_rblocks == 0)
+		return;
+
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks);
+	if (XFS_FSB_TO_DADDR(mp, end_fsb) >= eofs)
+		end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1);
+
+	low.rm_startblock = xfs_rtb_to_rgbno(mp, start_fsb);
+	low.rm_owner = 0;
+	low.rm_offset = 0;
+	low.rm_flags = 0;
+	high.rm_startblock = -1U;
+	high.rm_owner = ULLONG_MAX;
+	high.rm_offset = ULLONG_MAX;
+	high.rm_flags = XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
+			XFS_RMAP_UNWRITTEN;
+
+	start_rg = xfs_rtb_to_rgno(mp, start_fsb);
+	end_rg = xfs_rtb_to_rgno(mp, end_fsb);
+
+	info.nr = 0;
+	while ((rtg = xfs_rtgroup_next_range(mp, rtg, start_rg, end_rg))) {
+		if (rtg_rgno(rtg) == end_rg)
+			high.rm_startblock = xfs_rtb_to_rgbno(mp, end_fsb);
+
+		error = fsmap_rtgroup(rtg, &low, &high, &info);
+		if (error) {
+			libxfs_rtgroup_put(rtg);
+			return;
+		}
+
+		if (rtg_rgno(rtg) == start_rg)
+			low.rm_startblock = 0;
+	}
+}
+
 static int
 fsmap_f(
 	int			argc,
@@ -111,14 +239,18 @@ fsmap_f(
 	int			c;
 	xfs_fsblock_t		start_fsb = 0;
 	xfs_fsblock_t		end_fsb = NULLFSBLOCK;
+	bool			isrt = false;
 
 	if (!xfs_has_rmapbt(mp)) {
 		dbprintf(_("Filesystem does not support reverse mapping btree.\n"));
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "")) != EOF) {
+	while ((c = getopt(argc, argv, "r")) != EOF) {
 		switch (c) {
+		case 'r':
+			isrt = true;
+			break;
 		default:
 			dbprintf(_("Bad option for fsmap command.\n"));
 			return 0;
@@ -141,14 +273,23 @@ fsmap_f(
 		}
 	}
 
-	fsmap(start_fsb, end_fsb);
+	if (argc > optind + 2) {
+		exitcode = 1;
+		dbprintf(_("Too many arguments to fsmap.\n"));
+		return 0;
+	}
+
+	if (isrt)
+		fsmap_rt(start_fsb, end_fsb);
+	else
+		fsmap(start_fsb, end_fsb);
 
 	return 0;
 }
 
 static const cmdinfo_t	fsmap_cmd =
-	{ "fsmap", NULL, fsmap_f, 0, 2, 0,
-	  N_("[start_fsb] [end_fsb]"),
+	{ "fsmap", NULL, fsmap_f, 0, -1, 0,
+	  N_("[-r] [start_fsb] [end_fsb]"),
 	  N_("display reverse mapping(s)"), NULL };
 
 void
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index cf88656946ab1b..3e521cd0c76063 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -295,6 +295,7 @@
 #define xfs_rtginode_name		libxfs_rtginode_name
 #define xfs_rtsummary_create		libxfs_rtsummary_create
 
+#define xfs_rtginode_irele		libxfs_rtginode_irele
 #define xfs_rtginode_load		libxfs_rtginode_load
 #define xfs_rtginode_load_parent	libxfs_rtginode_load_parent
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
@@ -310,8 +311,10 @@
 #define xfs_rtfree_extent		libxfs_rtfree_extent
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
 #define xfs_update_rtsb			libxfs_update_rtsb
+#define xfs_rtgroup_put			libxfs_rtgroup_put
 #define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
 #define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
+#define xfs_rtrmapbt_init_cursor	libxfs_rtrmapbt_init_cursor
 #define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
 
 #define xfs_sb_from_disk		libxfs_sb_from_disk


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 11/27] xfs_db: add an rgresv command
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-02-06 22:52   ` [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong
@ 2025-02-06 22:52   ` Darrick J. Wong
  2025-02-07  5:19     ` Christoph Hellwig
  2025-02-06 22:52   ` [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong
                     ` (15 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a command to dump rtgroup btree space reservations.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/info.c                |  119 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/libxfs_api_defs.h |    2 +
 man/man8/xfs_db.8        |    5 ++
 3 files changed, 126 insertions(+)


diff --git a/db/info.c b/db/info.c
index 9a86d247839f84..ce6f358420a79d 100644
--- a/db/info.c
+++ b/db/info.c
@@ -151,9 +151,128 @@ static const struct cmdinfo agresv_cmd = {
 	.help =		agresv_help,
 };
 
+static void
+rgresv_help(void)
+{
+	dbprintf(_(
+"\n"
+" Print the size and per-rtgroup reservation information for some realtime allocation groups.\n"
+"\n"
+" Specific realtime allocation group numbers can be provided as command line\n"
+" arguments.  If no arguments are provided, all allocation groups are iterated.\n"
+"\n"
+));
+
+}
+
+static void
+print_rgresv_info(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_trans	*tp;
+	xfs_filblks_t		ask = 0;
+	xfs_filblks_t		used = 0;
+	int			error;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error) {
+		dbprintf(
+ _("Cannot alloc transaction to look up rtgroup %u rmap inode\n"),
+				rtg_rgno(rtg));
+		return;
+	}
+
+	error = -libxfs_rtginode_load_parent(tp);
+	if (error) {
+		dbprintf(_("Cannot load realtime metadir, error %d\n"),
+			error);
+		goto out_trans;
+	}
+
+	/* rtrmapbt */
+	error = -libxfs_rtginode_load(rtg, XFS_RTGI_RMAP, tp);
+	if (error) {
+		dbprintf(_("Cannot load rtgroup %u rmap inode, error %d\n"),
+			rtg_rgno(rtg), error);
+		goto out_rele_dp;
+	}
+	if (rtg_rmap(rtg))
+		used += rtg_rmap(rtg)->i_nblocks;
+	libxfs_rtginode_irele(&rtg->rtg_inodes[XFS_RTGI_RMAP]);
+
+	ask += libxfs_rtrmapbt_calc_reserves(mp);
+
+	printf(_("rtg %d: dblocks: %llu fdblocks: %llu reserved: %llu used: %llu"),
+			rtg_rgno(rtg),
+			(unsigned long long)mp->m_sb.sb_dblocks,
+			(unsigned long long)mp->m_sb.sb_fdblocks,
+			(unsigned long long)ask,
+			(unsigned long long)used);
+	if (ask - used > mp->m_sb.sb_fdblocks)
+		printf(_(" <not enough space>"));
+	printf("\n");
+out_rele_dp:
+	libxfs_rtginode_irele(&mp->m_rtdirip);
+out_trans:
+	libxfs_trans_cancel(tp);
+}
+
+static int
+rgresv_f(
+	int			argc,
+	char			**argv)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+	int			i;
+
+	if (argc > 1) {
+		for (i = 1; i < argc; i++) {
+			long	a;
+			char	*p;
+
+			errno = 0;
+			a = strtol(argv[i], &p, 0);
+			if (p == argv[i])
+				errno = ERANGE;
+			if (errno) {
+				perror(argv[i]);
+				continue;
+			}
+
+			if (a < 0 || a >= mp->m_sb.sb_rgcount) {
+				fprintf(stderr, "%ld: Not a rtgroup.\n", a);
+				continue;
+			}
+
+			rtg = libxfs_rtgroup_get(mp, a);
+			print_rgresv_info(rtg);
+			libxfs_rtgroup_put(rtg);
+		}
+		return 0;
+	}
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		print_rgresv_info(rtg);
+
+	return 0;
+}
+
+static const struct cmdinfo rgresv_cmd = {
+	.name =		"rgresv",
+	.altname =	NULL,
+	.cfunc =	rgresv_f,
+	.argmin =	0,
+	.argmax =	-1,
+	.canpush =	0,
+	.args =		NULL,
+	.oneline =	N_("print rtgroup reservation stats"),
+	.help =		rgresv_help,
+};
+
 void
 info_init(void)
 {
 	add_command(&info_cmd);
 	add_command(&agresv_cmd);
+	add_command(&rgresv_cmd);
 }
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 3e521cd0c76063..9beea4f7ce8535 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -311,7 +311,9 @@
 #define xfs_rtfree_extent		libxfs_rtfree_extent
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
 #define xfs_update_rtsb			libxfs_update_rtsb
+#define xfs_rtgroup_get			libxfs_rtgroup_get
 #define xfs_rtgroup_put			libxfs_rtgroup_put
+#define xfs_rtrmapbt_calc_reserves	libxfs_rtrmapbt_calc_reserves
 #define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
 #define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
 #define xfs_rtrmapbt_init_cursor	libxfs_rtrmapbt_init_cursor
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 83a8734d3c0b47..784aa3cb46c588 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1131,6 +1131,11 @@ .SH COMMANDS
 Exit
 .BR xfs_db .
 .TP
+.BI "rgresv [" rgno ]
+Displays the per-rtgroup reservation size, and per-rtgroup
+reservation usage for a given realtime allocation group.
+If no argument is given, display information for all rtgroups.
+.TP
 .BI "rtblock [" rtbno ]
 Set current address to the location on the realtime device given by
 .IR rtbno .


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-02-06 22:52   ` [PATCH 11/27] xfs_db: add an rgresv command Darrick J. Wong
@ 2025-02-06 22:52   ` Darrick J. Wong
  2025-02-07  5:19     ` Christoph Hellwig
  2025-02-06 22:53   ` [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys Darrick J. Wong
                     ` (14 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 spaceman/health.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/spaceman/health.c b/spaceman/health.c
index 0d2767df424f27..5774d609a28de3 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -42,6 +42,11 @@ static bool has_reflink(const struct xfs_fsop_geom *g)
 	return g->flags & XFS_FSOP_GEOM_FLAGS_REFLINK;
 }
 
+static bool has_rtrmapbt(const struct xfs_fsop_geom *g)
+{
+	return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_RMAPBT);
+}
+
 struct flag_map {
 	unsigned int		mask;
 	bool			(*has_fn)(const struct xfs_fsop_geom *g);
@@ -146,6 +151,11 @@ static const struct flag_map rtgroup_flags[] = {
 		.mask = XFS_RTGROUP_GEOM_SICK_SUMMARY,
 		.descr = "realtime summary",
 	},
+	{
+		.mask = XFS_RTGROUP_GEOM_SICK_RMAPBT,
+		.descr = "realtime reverse mappings btree",
+		.has_fn = has_rtrmapbt,
+	},
 	{0},
 };
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-02-06 22:52   ` [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong
@ 2025-02-06 22:53   ` Darrick J. Wong
  2025-02-07  5:20     ` Christoph Hellwig
  2025-02-06 22:53   ` [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong
                     ` (13 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Tidy up the comparison code in this function to match the kernel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/rmap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


diff --git a/repair/rmap.c b/repair/rmap.c
index bd91c721e20e4e..2065bdc0b190ba 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1267,7 +1267,6 @@ rmap_diffkeys(
 {
 	__u64			oa;
 	__u64			ob;
-	int64_t			d;
 	struct xfs_rmap_irec	tmp;
 
 	tmp = *kp1;
@@ -1277,9 +1276,10 @@ rmap_diffkeys(
 	tmp.rm_flags &= ~XFS_RMAP_REC_FLAGS;
 	ob = libxfs_rmap_irec_offset_pack(&tmp);
 
-	d = (int64_t)kp1->rm_startblock - kp2->rm_startblock;
-	if (d)
-		return d;
+	if (kp1->rm_startblock > kp2->rm_startblock)
+		return 1;
+	else if (kp2->rm_startblock > kp1->rm_startblock)
+		return -1;
 
 	if (kp1->rm_owner > kp2->rm_owner)
 		return 1;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-02-06 22:53   ` [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys Darrick J. Wong
@ 2025-02-06 22:53   ` Darrick J. Wong
  2025-02-07  5:21     ` Christoph Hellwig
  2025-02-06 22:53   ` [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong
                     ` (12 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |    2 +-
 repair/scan.c   |   15 ++++++++++++---
 repair/scan.h   |    3 +++
 3 files changed, 16 insertions(+), 4 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 9ab193bc5fe973..4eafb2324909e1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -923,7 +923,7 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
 
 		if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt,
 				type, whichfork, lino, tot, nex, blkmapp,
-				&cursor, 1, check_dups, magic,
+				&cursor, 0, 1, check_dups, magic,
 				(void *)zap_metadata, &xfs_bmbt_buf_ops))
 			return(1);
 		/*
diff --git a/repair/scan.c b/repair/scan.c
index 88fbda6b83f61a..cd44a9b14f3a1c 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -136,6 +136,7 @@ scan_lbtree(
 				xfs_extnum_t		*nex,
 				blkmap_t		**blkmapp,
 				bmap_cursor_t		*bm_cursor,
+				int			suspect,
 				int			isroot,
 				int			check_dups,
 				int			*dirty,
@@ -148,6 +149,7 @@ scan_lbtree(
 	xfs_extnum_t	*nex,
 	blkmap_t	**blkmapp,
 	bmap_cursor_t	*bm_cursor,
+	int		suspect,
 	int		isroot,
 	int		check_dups,
 	uint64_t	magic,
@@ -167,6 +169,12 @@ scan_lbtree(
 			XFS_FSB_TO_AGBNO(mp, root));
 		return(1);
 	}
+	if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) {
+		do_warn(_("btree block %d/%d is suspect, error %d\n"),
+			XFS_FSB_TO_AGNO(mp, root),
+			XFS_FSB_TO_AGBNO(mp, root), bp->b_error);
+		suspect++;
+	}
 
 	/*
 	 * only check for bad CRC here - caller will determine if there
@@ -182,7 +190,7 @@ scan_lbtree(
 
 	err = (*func)(XFS_BUF_TO_BLOCK(bp), nlevels - 1,
 			type, whichfork, root, ino, tot, nex, blkmapp,
-			bm_cursor, isroot, check_dups, &dirty,
+			bm_cursor, suspect, isroot, check_dups, &dirty,
 			magic, priv);
 
 	ASSERT(dirty == 0 || (dirty && !no_modify));
@@ -209,6 +217,7 @@ scan_bmapbt(
 	xfs_extnum_t		*nex,
 	blkmap_t		**blkmapp,
 	bmap_cursor_t		*bm_cursor,
+	int			suspect,
 	int			isroot,
 	int			check_dups,
 	int			*dirty,
@@ -505,7 +514,7 @@ _("bad bmap btree ptr 0x%llx in ino %" PRIu64 "\n"),
 
 		err = scan_lbtree(be64_to_cpu(pp[i]), level, scan_bmapbt,
 				type, whichfork, ino, tot, nex, blkmapp,
-				bm_cursor, 0, check_dups, magic, priv,
+				bm_cursor, suspect, 0, check_dups, magic, priv,
 				&xfs_bmbt_buf_ops);
 		if (err)
 			return(1);
@@ -573,7 +582,7 @@ _("bad fwd (right) sibling pointer (saw %" PRIu64 " should be NULLFSBLOCK)\n"
 				be64_to_cpu(pkey[numrecs - 1].br_startoff);
 	}
 
-	return(0);
+	return suspect > 0 ? 1 : 0;
 }
 
 static void
diff --git a/repair/scan.h b/repair/scan.h
index 4da788becbef66..aeaf9f1a7f4ba9 100644
--- a/repair/scan.h
+++ b/repair/scan.h
@@ -23,6 +23,7 @@ int scan_lbtree(
 				xfs_extnum_t		*nex,
 				struct blkmap		**blkmapp,
 				bmap_cursor_t		*bm_cursor,
+				int			suspect,
 				int			isroot,
 				int			check_dups,
 				int			*dirty,
@@ -35,6 +36,7 @@ int scan_lbtree(
 	xfs_extnum_t	*nex,
 	struct blkmap	**blkmapp,
 	bmap_cursor_t	*bm_cursor,
+	int		suspect,
 	int		isroot,
 	int		check_dups,
 	uint64_t	magic,
@@ -52,6 +54,7 @@ int scan_bmapbt(
 	xfs_extnum_t		*nex,
 	struct blkmap		**blkmapp,
 	bmap_cursor_t		*bm_cursor,
+	int			suspect,
 	int			isroot,
 	int			check_dups,
 	int			*dirty,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-02-06 22:53   ` [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong
@ 2025-02-06 22:53   ` Darrick J. Wong
  2025-02-07  5:22     ` Christoph Hellwig
  2025-02-06 22:53   ` [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong
                     ` (11 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |  202 ++++++++++++++++++++++++++++
 repair/rmap.c   |    9 +
 repair/rmap.h   |    2 
 repair/rt.h     |    4 +
 repair/scan.c   |  390 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 repair/scan.h   |   34 +++++
 6 files changed, 631 insertions(+), 10 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 4eafb2324909e1..641beab333f793 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -828,6 +828,160 @@ get_agino_buf(
  * first, one utility routine for each type of inode
  */
 
+/*
+ * return 1 if inode should be cleared, 0 otherwise
+ */
+static int
+process_rtrmap(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		ino,
+	struct xfs_dinode	*dip,
+	int			type,
+	int			*dirty,
+	xfs_rfsblock_t		*tot,
+	uint64_t		*nex,
+	blkmap_t		**blkmapp,
+	int			check_dups)
+{
+	struct xfs_rmap_irec	oldkey;
+	struct xfs_rmap_irec	key;
+	struct rmap_priv	priv;
+	struct xfs_rtrmap_root	*dib;
+	xfs_rtrmap_ptr_t	*pp;
+	struct xfs_rmap_key	*kp;
+	struct xfs_rmap_rec	*rp;
+	char			*forkname = get_forkname(XFS_DATA_FORK);
+	xfs_ino_t		lino;
+	xfs_fsblock_t		bno;
+	size_t			droot_sz;
+	int			i;
+	int			level;
+	int			numrecs;
+	int			dmxr;
+	int			suspect = 0;
+	int			error;
+
+	/* We rebuild the rtrmapbt, so no need to process blocks again. */
+	if (check_dups) {
+		*tot = be64_to_cpu(dip->di_nblocks);
+		return 0;
+	}
+
+	lino = XFS_AGINO_TO_INO(mp, agno, ino);
+
+	/* This rmap btree inode must be a metadata inode. */
+	if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) {
+		do_warn(
+_("rtrmap inode %" PRIu64 " not flagged as metadata\n"),
+			lino);
+		return 1;
+	}
+
+	if (!is_rtrmap_inode(lino)) {
+		do_warn(
+_("could not associate rtrmap inode %" PRIu64 " with any rtgroup\n"),
+			lino);
+		return 1;
+	}
+
+	memset(&priv.high_key, 0xFF, sizeof(priv.high_key));
+	priv.high_key.rm_blockcount = 0;
+	priv.agcnts = NULL;
+	priv.last_rec.rm_owner = XFS_RMAP_OWN_UNKNOWN;
+
+	dib = (struct xfs_rtrmap_root *)XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	*tot = 0;
+	*nex = 0;
+
+	level = be16_to_cpu(dib->bb_level);
+	numrecs = be16_to_cpu(dib->bb_numrecs);
+
+	if (level > mp->m_rtrmap_maxlevels) {
+		do_warn(
+_("bad level %d in inode %" PRIu64 " rtrmap btree root block\n"),
+			level, lino);
+		return 1;
+	}
+
+	/*
+	 * use rtroot/dfork_dsize since the root block is in the data fork
+	 */
+	droot_sz = xfs_rtrmap_droot_space_calc(level, numrecs);
+	if (droot_sz > XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK)) {
+		do_warn(
+_("computed size of rtrmapbt root (%zu bytes) is greater than space in "
+	  "inode %" PRIu64 " %s fork\n"),
+				droot_sz, lino, forkname);
+		return 1;
+	}
+
+	if (level == 0) {
+		rp = xfs_rtrmap_droot_rec_addr(dib, 1);
+		error = process_rtrmap_reclist(mp, rp, numrecs,
+				&priv.last_rec, NULL, "rtrmapbt root");
+		if (error) {
+			rmap_avoid_check();
+			return 1;
+		}
+		return 0;
+	}
+
+	dmxr = libxfs_rtrmapbt_droot_maxrecs(
+			XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK), false);
+	pp = xfs_rtrmap_droot_ptr_addr(dib, 1, dmxr);
+
+	/* check for in-order keys */
+	for (i = 0; i < numrecs; i++)  {
+		kp = xfs_rtrmap_droot_key_addr(dib, i + 1);
+
+		key.rm_flags = 0;
+		key.rm_startblock = be32_to_cpu(kp->rm_startblock);
+		key.rm_owner = be64_to_cpu(kp->rm_owner);
+		if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&key)) {
+			/* Look for impossible flags. */
+			do_warn(
+_("invalid flags in key %u of rtrmap root ino %" PRIu64 "\n"),
+				i, lino);
+			suspect++;
+			continue;
+		}
+		if (i == 0) {
+			oldkey = key;
+			continue;
+		}
+		if (rmap_diffkeys(&oldkey, &key) > 0) {
+			do_warn(
+_("out of order key %u in rtrmap root ino %" PRIu64 "\n"),
+				i, lino);
+			suspect++;
+			continue;
+		}
+		oldkey = key;
+	}
+
+	/* probe keys */
+	for (i = 0; i < numrecs; i++)  {
+		bno = get_unaligned_be64(&pp[i]);
+
+		if (!libxfs_verify_fsbno(mp, bno))  {
+			do_warn(
+_("bad rtrmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
+				bno, lino);
+			return 1;
+		}
+
+		if (scan_lbtree(bno, level, scan_rtrmapbt,
+				type, XFS_DATA_FORK, lino, tot, nex, blkmapp,
+				NULL, 0, 1, check_dups, XFS_RTRMAP_CRC_MAGIC,
+				&priv, &xfs_rtrmapbt_buf_ops))
+			return 1;
+	}
+
+	return suspect ? 1 : 0;
+}
+
 /*
  * return 1 if inode should be cleared, 0 otherwise
  */
@@ -1606,7 +1760,7 @@ static int
 check_dinode_mode_format(
 	struct xfs_dinode	*dinoc)
 {
-	if (dinoc->di_format >= XFS_DINODE_FMT_UUID)
+	if (dinoc->di_format == XFS_DINODE_FMT_UUID)
 		return -1;	/* FMT_UUID is not used */
 
 	switch (dinode_fmt(dinoc)) {
@@ -1621,8 +1775,19 @@ check_dinode_mode_format(
 			dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0;
 
 	case S_IFREG:
-		return (dinoc->di_format < XFS_DINODE_FMT_EXTENTS ||
-			dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0;
+		switch (dinoc->di_format) {
+		case XFS_DINODE_FMT_EXTENTS:
+		case XFS_DINODE_FMT_BTREE:
+			return 0;
+		case XFS_DINODE_FMT_META_BTREE:
+			switch (be16_to_cpu(dinoc->di_metatype)) {
+			case XFS_METAFILE_RTRMAP:
+				return 0;
+			default:
+				return -1;
+			}
+		}
+		return -1;
 
 	case S_IFLNK:
 		return (dinoc->di_format < XFS_DINODE_FMT_LOCAL ||
@@ -2046,6 +2211,22 @@ process_inode_data_fork(
 			totblocks, nextents, dblkmap, XFS_DATA_FORK,
 			check_dups, zap_metadata);
 		break;
+	case XFS_DINODE_FMT_META_BTREE:
+		switch (be16_to_cpu(dino->di_metatype)) {
+		case XFS_METAFILE_RTRMAP:
+			err = process_rtrmap(mp, agno, ino, dino, type, dirty,
+					totblocks, nextents, dblkmap,
+					check_dups);
+			break;
+		default:
+			do_error(
+ _("unknown meta btree type %d, ino %" PRIu64 " (mode = %d)\n"),
+					be16_to_cpu(dino->di_metatype), lino,
+					be16_to_cpu(dino->di_mode));
+			err = 1;
+			break;
+		}
+		break;
 	case XFS_DINODE_FMT_DEV:
 		err = 0;
 		break;
@@ -2107,6 +2288,21 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"),
 		case XFS_DINODE_FMT_DEV:
 			err = 0;
 			break;
+		case XFS_DINODE_FMT_META_BTREE:
+			switch (be16_to_cpu(dino->di_metatype)) {
+			case XFS_METAFILE_RTRMAP:
+				err = 0;
+				break;
+			default:
+				do_error(
+ _("unknown meta btree type %d, ino %" PRIu64 " (mode = %d)\n"),
+						be16_to_cpu(dino->di_metatype),
+						lino,
+						be16_to_cpu(dino->di_mode));
+				err = 1;
+				break;
+			}
+			break;
 		default:
 			do_error(_("unknown format %d, ino %" PRIu64 " (mode = %d)\n"),
 				dino->di_format, lino,
diff --git a/repair/rmap.c b/repair/rmap.c
index 2065bdc0b190ba..36e4c9858fba03 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -123,6 +123,15 @@ rmaps_init_ag(
 _("Insufficient memory while allocating realtime reverse mapping btree."));
 }
 
+xfs_rgnumber_t
+rtgroup_for_rtrmap_inode(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	/* This will be implemented later. */
+	return NULLRGNUMBER;
+}
+
 /*
  * Initialize per-AG reverse map data.
  */
diff --git a/repair/rmap.h b/repair/rmap.h
index 683a51af3fc511..57e5d5b216650e 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -55,4 +55,6 @@ int rmap_init_mem_cursor(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, struct xfs_btree_cur **rmcurp);
 int rmap_get_mem_rec(struct xfs_btree_cur *rmcur, struct xfs_rmap_irec *irec);
 
+xfs_rgnumber_t rtgroup_for_rtrmap_inode(struct xfs_mount *mp, xfs_ino_t ino);
+
 #endif /* RMAP_H_ */
diff --git a/repair/rt.h b/repair/rt.h
index 865d950b2bf3c4..13558706e4ec15 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -29,6 +29,10 @@ static inline bool is_rtsummary_inode(xfs_ino_t ino)
 {
 	return is_rtgroup_inode(ino, XFS_RTGI_SUMMARY);
 }
+static inline bool is_rtrmap_inode(xfs_ino_t ino)
+{
+	return is_rtgroup_inode(ino, XFS_RTGI_RMAP);
+}
 
 void mark_rtgroup_inodes_bad(struct xfs_mount *mp, enum xfs_rtg_inodes type);
 bool rtgroup_inodes_were_bad(enum xfs_rtg_inodes type);
diff --git a/repair/scan.c b/repair/scan.c
index cd44a9b14f3a1c..386aaa15f78c33 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -949,13 +949,6 @@ _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"),
 	}
 }
 
-struct rmap_priv {
-	struct aghdr_cnts	*agcnts;
-	struct xfs_rmap_irec	high_key;
-	struct xfs_rmap_irec	last_rec;
-	xfs_agblock_t		nr_blocks;
-};
-
 static bool
 rmap_in_order(
 	xfs_agblock_t	b,
@@ -1358,6 +1351,389 @@ _("out of order key %u in %s btree block (%u/%u)\n"),
 		rmap_avoid_check();
 }
 
+int
+process_rtrmap_reclist(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_rec	*rp,
+	int			numrecs,
+	struct xfs_rmap_irec	*last_rec,
+	struct xfs_rmap_irec	*high_key,
+	const char		*name)
+{
+	int			suspect = 0;
+	int			i;
+	struct xfs_rmap_irec	oldkey;
+	struct xfs_rmap_irec	key;
+
+	for (i = 0; i < numrecs; i++) {
+		xfs_rgblock_t		b, end;
+		xfs_extlen_t		len;
+		uint64_t		owner, offset;
+
+		b = be32_to_cpu(rp[i].rm_startblock);
+		len = be32_to_cpu(rp[i].rm_blockcount);
+		owner = be64_to_cpu(rp[i].rm_owner);
+		offset = be64_to_cpu(rp[i].rm_offset);
+
+		key.rm_flags = 0;
+		key.rm_startblock = b;
+		key.rm_blockcount = len;
+		key.rm_owner = owner;
+		if (libxfs_rmap_irec_offset_unpack(offset, &key)) {
+			/* Look for impossible flags. */
+			do_warn(
+_("invalid flags in record %u of %s\n"),
+				i, name);
+			suspect++;
+			continue;
+		}
+
+
+		end = key.rm_startblock + key.rm_blockcount;
+
+		/* Make sure startblock & len make sense. */
+		if (b >= mp->m_groups[XG_TYPE_RTG].blocks) {
+			do_warn(
+_("invalid start block %llu in record %u of %s\n"),
+				(unsigned long long)b, i, name);
+			suspect++;
+			continue;
+		}
+		if (len == 0 || end - 1 >= mp->m_groups[XG_TYPE_RTG].blocks) {
+			do_warn(
+_("invalid length %llu in record %u of %s\n"),
+				(unsigned long long)len, i, name);
+			suspect++;
+			continue;
+		}
+
+		/* We only store file data and superblocks in the rtrmap. */
+		if (XFS_RMAP_NON_INODE_OWNER(owner) &&
+		    owner != XFS_RMAP_OWN_FS) {
+			do_warn(
+_("invalid owner %lld in record %u of %s\n"),
+				(long long int)owner, i, name);
+			suspect++;
+			continue;
+		}
+
+		/* Look for impossible record field combinations. */
+		if (key.rm_flags & XFS_RMAP_KEY_FLAGS) {
+			do_warn(
+_("record %d cannot have attr fork/key flags in %s\n"),
+					i, name);
+			suspect++;
+			continue;
+		}
+
+		/* Check for out of order records. */
+		if (i == 0)
+			oldkey = key;
+		else {
+			if (rmap_diffkeys(&oldkey, &key) > 0)
+				do_warn(
+_("out-of-order record %d (%llu %"PRId64" %"PRIu64" %llu) in %s\n"),
+				i, (unsigned long long)b, owner, offset,
+				(unsigned long long)len, name);
+			else
+				oldkey = key;
+		}
+
+		/* Is this mergeable with the previous record? */
+		if (rmaps_are_mergeable(last_rec, &key)) {
+			do_warn(
+_("record %d in %s should be merged with previous record\n"),
+				i, name);
+			last_rec->rm_blockcount += key.rm_blockcount;
+		} else
+			*last_rec = key;
+
+		/* Check that we don't go past the high key. */
+		key.rm_startblock += key.rm_blockcount - 1;
+		key.rm_offset += key.rm_blockcount - 1;
+		key.rm_blockcount = 0;
+		if (high_key && rmap_diffkeys(&key, high_key) > 0) {
+			do_warn(
+_("record %d greater than high key of %s\n"),
+				i, name);
+			suspect++;
+		}
+	}
+
+	return suspect;
+}
+
+int
+scan_rtrmapbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	int			type,
+	int			whichfork,
+	xfs_fsblock_t		fsbno,
+	xfs_ino_t		ino,
+	xfs_rfsblock_t		*tot,
+	uint64_t		*nex,
+	blkmap_t		**blkmapp,
+	bmap_cursor_t		*bm_cursor,
+	int			suspect,
+	int			isroot,
+	int			check_dups,
+	int			*dirty,
+	uint64_t		magic,
+	void			*priv)
+{
+	const char		*name = "rtrmap";
+	char			rootname[256];
+	int			i;
+	xfs_rtrmap_ptr_t	*pp;
+	struct xfs_rmap_rec	*rp;
+	struct rmap_priv	*rmap_priv = priv;
+	int			hdr_errors = 0;
+	int			numrecs;
+	int			state;
+	struct xfs_rmap_key	*kp;
+	struct xfs_rmap_irec	oldkey;
+	struct xfs_rmap_irec	key;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	int			error;
+
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	/* If anything here is bad, just bail. */
+	if (be32_to_cpu(block->bb_magic) != magic) {
+		do_warn(
+_("bad magic # %#x in inode %" PRIu64 " %s block %" PRIu64 "\n"),
+			be32_to_cpu(block->bb_magic), ino, name, fsbno);
+		return 1;
+	}
+	if (be16_to_cpu(block->bb_level) != level) {
+		do_warn(
+_("expected level %d got %d in inode %" PRIu64 ", %s block %" PRIu64 "\n"),
+			level, be16_to_cpu(block->bb_level),
+			ino, name, fsbno);
+		return(1);
+	}
+
+	/* verify owner */
+	if (be64_to_cpu(block->bb_u.l.bb_owner) != ino) {
+		do_warn(
+_("expected owner inode %" PRIu64 ", got %llu, %s block %" PRIu64 "\n"),
+			ino,
+			(unsigned long long)be64_to_cpu(block->bb_u.l.bb_owner),
+			name, fsbno);
+		return 1;
+	}
+	/* verify block number */
+	if (be64_to_cpu(block->bb_u.l.bb_blkno) !=
+	    XFS_FSB_TO_DADDR(mp, fsbno)) {
+		do_warn(
+_("expected block %" PRIu64 ", got %llu, %s block %" PRIu64 "\n"),
+			XFS_FSB_TO_DADDR(mp, fsbno),
+			(unsigned long long)be64_to_cpu(block->bb_u.l.bb_blkno),
+			name, fsbno);
+		return 1;
+	}
+	/* verify uuid */
+	if (platform_uuid_compare(&block->bb_u.l.bb_uuid,
+				  &mp->m_sb.sb_meta_uuid) != 0) {
+		do_warn(
+_("wrong FS UUID, %s block %" PRIu64 "\n"),
+			name, fsbno);
+		return 1;
+	}
+
+	/*
+	 * Check for btree blocks multiply claimed.  We're going to regenerate
+	 * the rtrmap anyway, so mark the blocks as metadata so they get freed.
+	 */
+	state = get_bmap(agno, agbno);
+	if (!(state == XR_E_UNKNOWN || state == XR_E_INUSE1))  {
+		do_warn(
+_("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
+				name, state, agno, agbno, suspect);
+		suspect++;
+		goto out;
+	}
+	set_bmap(agno, agbno, XR_E_METADATA);
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+
+	/*
+	 * All realtime rmap btree blocks are freed for a fully empty
+	 * filesystem, thus they are counted towards the free data
+	 * block counter.  The root lives in an inode and is thus not
+	 * counted.
+	 */
+	(*tot)++;
+
+	if (level == 0) {
+		if (numrecs > mp->m_rtrmap_mxr[0])  {
+			numrecs = mp->m_rtrmap_mxr[0];
+			hdr_errors++;
+		}
+		if (isroot == 0 && numrecs < mp->m_rtrmap_mnr[0])  {
+			numrecs = mp->m_rtrmap_mnr[0];
+			hdr_errors++;
+		}
+
+		if (hdr_errors) {
+			do_warn(
+_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs),
+				mp->m_rtrmap_mnr[0], mp->m_rtrmap_mxr[0],
+				name, agno, agbno);
+			suspect++;
+		}
+
+		rp = xfs_rtrmap_rec_addr(block, 1);
+		snprintf(rootname, 256, "%s btree block %u/%u", name, agno, agbno);
+		error = process_rtrmap_reclist(mp, rp, numrecs,
+				&rmap_priv->last_rec, &rmap_priv->high_key,
+				rootname);
+		if (error)
+			suspect++;
+		goto out;
+	}
+
+	/*
+	 * interior record
+	 */
+	pp = xfs_rtrmap_ptr_addr(block, 1, mp->m_rtrmap_mxr[1]);
+
+	if (numrecs > mp->m_rtrmap_mxr[1])  {
+		numrecs = mp->m_rtrmap_mxr[1];
+		hdr_errors++;
+	}
+	if (isroot == 0 && numrecs < mp->m_rtrmap_mnr[1])  {
+		numrecs = mp->m_rtrmap_mnr[1];
+		hdr_errors++;
+	}
+
+	/*
+	 * don't pass bogus tree flag down further if this block
+	 * looked ok.  bail out if two levels in a row look bad.
+	 */
+	if (hdr_errors)  {
+		do_warn(
+_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"),
+			be16_to_cpu(block->bb_numrecs),
+			mp->m_rtrmap_mnr[1], mp->m_rtrmap_mxr[1],
+			name, agno, agbno);
+		if (suspect)
+			goto out;
+		suspect++;
+	} else if (suspect) {
+		suspect = 0;
+	}
+
+	/* check the node's high keys */
+	for (i = 0; !isroot && i < numrecs; i++) {
+		kp = xfs_rtrmap_high_key_addr(block, i + 1);
+
+		key.rm_flags = 0;
+		key.rm_startblock = be32_to_cpu(kp->rm_startblock);
+		key.rm_owner = be64_to_cpu(kp->rm_owner);
+		if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&key)) {
+			/* Look for impossible flags. */
+			do_warn(
+_("invalid flags in key %u of %s btree block %u/%u\n"),
+				i, name, agno, agbno);
+			suspect++;
+			continue;
+		}
+		if (rmap_diffkeys(&key, &rmap_priv->high_key) > 0) {
+			do_warn(
+_("key %d greater than high key of block (%u/%u) in %s tree\n"),
+				i, agno, agbno, name);
+			suspect++;
+		}
+	}
+
+	/* check for in-order keys */
+	for (i = 0; i < numrecs; i++)  {
+		kp = xfs_rtrmap_key_addr(block, i + 1);
+
+		key.rm_flags = 0;
+		key.rm_startblock = be32_to_cpu(kp->rm_startblock);
+		key.rm_owner = be64_to_cpu(kp->rm_owner);
+		if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&key)) {
+			/* Look for impossible flags. */
+			do_warn(
+_("invalid flags in key %u of %s btree block %u/%u\n"),
+				i, name, agno, agbno);
+			suspect++;
+			continue;
+		}
+		if (i == 0) {
+			oldkey = key;
+			continue;
+		}
+		if (rmap_diffkeys(&oldkey, &key) > 0) {
+			do_warn(
+_("out of order key %u in %s btree block (%u/%u)\n"),
+				i, name, agno, agbno);
+			suspect++;
+		}
+		oldkey = key;
+	}
+
+	for (i = 0; i < numrecs; i++)  {
+		xfs_fsblock_t		pbno = be64_to_cpu(pp[i]);
+
+		/*
+		 * XXX - put sibling detection right here.
+		 * we know our sibling chain is good.  So as we go,
+		 * we check the entry before and after each entry.
+		 * If either of the entries references a different block,
+		 * check the sibling pointer.  If there's a sibling
+		 * pointer mismatch, try and extract as much data
+		 * as possible.
+		 */
+		kp = xfs_rtrmap_high_key_addr(block, i + 1);
+		rmap_priv->high_key.rm_flags = 0;
+		rmap_priv->high_key.rm_startblock =
+				be32_to_cpu(kp->rm_startblock);
+		rmap_priv->high_key.rm_owner =
+				be64_to_cpu(kp->rm_owner);
+		if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset),
+				&rmap_priv->high_key)) {
+			/* Look for impossible flags. */
+			do_warn(
+_("invalid flags in high key %u of %s btree block %u/%u\n"),
+				i, name, agno, agbno);
+			suspect++;
+			continue;
+		}
+
+		if (!libxfs_verify_fsbno(mp, pbno)) {
+			do_warn(
+_("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"),
+			       name, (unsigned long long)pbno, ino);
+			return 1;
+		}
+
+		error = scan_lbtree(pbno, level, scan_rtrmapbt,
+				type, whichfork, ino, tot, nex, blkmapp,
+				bm_cursor, suspect, 0, check_dups, magic,
+				rmap_priv, &xfs_rtrmapbt_buf_ops);
+		if (error) {
+			suspect++;
+			goto out;
+		}
+	}
+
+out:
+	if (hdr_errors || suspect) {
+		rmap_avoid_check();
+		return 1;
+	}
+	return 0;
+}
+
 struct refc_priv {
 	struct xfs_refcount_irec	last_rec;
 	xfs_agblock_t			nr_blocks;
diff --git a/repair/scan.h b/repair/scan.h
index aeaf9f1a7f4ba9..a624c882734c77 100644
--- a/repair/scan.h
+++ b/repair/scan.h
@@ -66,4 +66,38 @@ scan_ags(
 	struct xfs_mount	*mp,
 	int			scan_threads);
 
+struct rmap_priv {
+	struct aghdr_cnts	*agcnts;
+	struct xfs_rmap_irec	high_key;
+	struct xfs_rmap_irec	last_rec;
+	xfs_agblock_t		nr_blocks;
+};
+
+int
+process_rtrmap_reclist(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_rec	*rp,
+	int			numrecs,
+	struct xfs_rmap_irec	*last_rec,
+	struct xfs_rmap_irec	*high_key,
+	const char		*name);
+
+int scan_rtrmapbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	int			type,
+	int			whichfork,
+	xfs_fsblock_t		bno,
+	xfs_ino_t		ino,
+	xfs_rfsblock_t		*tot,
+	uint64_t		*nex,
+	struct blkmap		**blkmapp,
+	bmap_cursor_t		*bm_cursor,
+	int			suspect,
+	int			isroot,
+	int			check_dups,
+	int			*dirty,
+	uint64_t		magic,
+	void			*priv);
+
 #endif /* _XR_SCAN_H */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-02-06 22:53   ` [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong
@ 2025-02-06 22:53   ` Darrick J. Wong
  2025-02-07  5:22     ` Christoph Hellwig
  2025-02-06 22:54   ` [PATCH 17/27] xfs_repair: refactor realtime inode check Darrick J. Wong
                     ` (10 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +
 repair/agbtree.c         |    5 +-
 repair/dinode.c          |    2 -
 repair/rmap.c            |  139 +++++++++++++++++++++++++++++++++++++---------
 repair/rmap.h            |    7 +-
 5 files changed, 121 insertions(+), 34 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 9beea4f7ce8535..b62efad757470b 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -318,6 +318,8 @@
 #define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
 #define xfs_rtrmapbt_init_cursor	libxfs_rtrmapbt_init_cursor
 #define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
+#define xfs_rtrmapbt_mem_init		libxfs_rtrmapbt_mem_init
+#define xfs_rtrmapbt_mem_cursor		libxfs_rtrmapbt_mem_cursor
 
 #define xfs_sb_from_disk		libxfs_sb_from_disk
 #define xfs_sb_mount_rextsize		libxfs_sb_mount_rextsize
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 42bfeee9eafbcb..2135147fcf9354 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -646,7 +646,7 @@ init_rmapbt_cursor(
 
 	/* Compute how many blocks we'll need. */
 	error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload,
-			rmap_record_count(sc->mp, agno));
+			rmap_record_count(sc->mp, false, agno));
 	if (error)
 		do_error(
 _("Unable to compute rmap btree geometry, error %d.\n"), error);
@@ -663,7 +663,8 @@ build_rmap_tree(
 {
 	int			error;
 
-	error = rmap_init_mem_cursor(sc->mp, NULL, agno, &btr->rmapbt_cursor);
+	error = rmap_init_mem_cursor(sc->mp, NULL, false, agno,
+			&btr->rmapbt_cursor);
 	if (error)
 		do_error(
 _("Insufficient memory to construct rmap cursor.\n"));
diff --git a/repair/dinode.c b/repair/dinode.c
index 641beab333f793..3f78eb064919be 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -708,7 +708,7 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 			}
 		}
 		if (collect_rmaps && !zap_metadata) /* && !check_dups */
-			rmap_add_rec(mp, ino, whichfork, &irec);
+			rmap_add_rec(mp, ino, whichfork, &irec, isrt);
 		*tot += irec.br_blockcount;
 	}
 	error = 0;
diff --git a/repair/rmap.c b/repair/rmap.c
index 36e4c9858fba03..13e9a06b04f370 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -24,7 +24,7 @@
 # define dbg_printf(f, a...)
 #endif
 
-/* per-AG rmap object anchor */
+/* allocation group (AG or rtgroup) rmap object anchor */
 struct xfs_ag_rmap {
 	/* root of rmap observations btree */
 	struct xfbtree		ar_xfbtree;
@@ -42,9 +42,17 @@ struct xfs_ag_rmap {
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
+static struct xfs_ag_rmap *rg_rmaps;
 bool rmapbt_suspect;
 static bool refcbt_suspect;
 
+static struct xfs_ag_rmap *rmaps_for_group(bool isrt, unsigned int group)
+{
+	if (isrt)
+		return &rg_rmaps[group];
+	return &ag_rmaps[group];
+}
+
 static inline int rmap_compare(const void *a, const void *b)
 {
 	return libxfs_rmap_compare(a, b);
@@ -83,6 +91,44 @@ rmaps_destroy(
 	xmbuf_free(ag_rmap->ar_xmbtp);
 }
 
+/* Initialize the in-memory rmap btree for collecting realtime rmap records. */
+STATIC void
+rmaps_init_rt(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno,
+	struct xfs_ag_rmap	*ag_rmap)
+{
+	char			*descr;
+	unsigned long long	maxbytes;
+	int			error;
+
+	if (!xfs_has_realtime(mp))
+		return;
+
+	/*
+	 * Each rtgroup rmap btree file can consume the entire data device,
+	 * even if the metadata space reservation will be smaller than that.
+	 */
+	maxbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks);
+	descr = kasprintf(GFP_KERNEL,
+			"xfs_repair (%s): rtgroup %u rmap records",
+			mp->m_fsname, rgno);
+	error = -xmbuf_alloc(mp, descr, maxbytes, &ag_rmap->ar_xmbtp);
+	kfree(descr);
+	if (error)
+		goto nomem;
+
+	error = -libxfs_rtrmapbt_mem_init(mp, &ag_rmap->ar_xfbtree,
+			ag_rmap->ar_xmbtp, rgno);
+	if (error)
+		goto nomem;
+
+	return;
+nomem:
+	do_error(
+_("Insufficient memory while allocating realtime reverse mapping btree."));
+}
+
 /* Initialize the in-memory rmap btree for collecting per-AG rmap records. */
 STATIC void
 rmaps_init_ag(
@@ -150,6 +196,13 @@ rmaps_init(
 
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		rmaps_init_ag(mp, i, &ag_rmaps[i]);
+
+	rg_rmaps = calloc(mp->m_sb.sb_rgcount, sizeof(struct xfs_ag_rmap));
+	if (!rg_rmaps)
+		do_error(_("couldn't allocate per-rtgroup reverse map roots\n"));
+
+	for (i = 0; i < mp->m_sb.sb_rgcount; i++)
+		rmaps_init_rt(mp, i, &rg_rmaps[i]);
 }
 
 /*
@@ -164,6 +217,11 @@ rmaps_free(
 	if (!rmap_needs_work(mp))
 		return;
 
+	for (i = 0; i < mp->m_sb.sb_rgcount; i++)
+		rmaps_destroy(mp, &rg_rmaps[i]);
+	free(rg_rmaps);
+	rg_rmaps = NULL;
+
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		rmaps_destroy(mp, &ag_rmaps[i]);
 	free(ag_rmaps);
@@ -199,22 +257,32 @@ int
 rmap_init_mem_cursor(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
+	bool			isrt,
 	xfs_agnumber_t		agno,
 	struct xfs_btree_cur	**rmcurp)
 {
 	struct xfbtree		*xfbt;
-	struct xfs_perag	*pag;
+	struct xfs_perag	*pag = NULL;
+	struct xfs_rtgroup	*rtg = NULL;
 	int			error;
 
-	xfbt = &ag_rmaps[agno].ar_xfbtree;
-	pag = libxfs_perag_get(mp, agno);
-	*rmcurp = libxfs_rmapbt_mem_cursor(pag, tp, xfbt);
+	xfbt = &rmaps_for_group(isrt, agno)->ar_xfbtree;
+	if (isrt) {
+		rtg = libxfs_rtgroup_get(mp, agno);
+		*rmcurp = libxfs_rtrmapbt_mem_cursor(rtg, tp, xfbt);
+	} else {
+		pag = libxfs_perag_get(mp, agno);
+		*rmcurp = libxfs_rmapbt_mem_cursor(pag, tp, xfbt);
+	}
 
 	error = -libxfs_btree_goto_left_edge(*rmcurp);
 	if (error)
 		libxfs_btree_del_cursor(*rmcurp, error);
 
-	libxfs_perag_put(pag);
+	if (pag)
+		libxfs_perag_put(pag);
+	if (rtg)
+		libxfs_rtgroup_put(rtg);
 	return error;
 }
 
@@ -247,6 +315,7 @@ rmap_get_mem_rec(
 static void
 rmap_add_mem_rec(
 	struct xfs_mount	*mp,
+	bool			isrt,
 	xfs_agnumber_t		agno,
 	struct xfs_rmap_irec	*rmap)
 {
@@ -255,12 +324,12 @@ rmap_add_mem_rec(
 	struct xfs_trans	*tp;
 	int			error;
 
-	xfbt = &ag_rmaps[agno].ar_xfbtree;
+	xfbt = &rmaps_for_group(isrt, agno)->ar_xfbtree;
 	error = -libxfs_trans_alloc_empty(mp, &tp);
 	if (error)
 		do_error(_("allocating tx for in-memory rmap update\n"));
 
-	error = rmap_init_mem_cursor(mp, tp, agno, &rmcur);
+	error = rmap_init_mem_cursor(mp, tp, isrt, agno, &rmcur);
 	if (error)
 		do_error(_("reading in-memory rmap btree head\n"));
 
@@ -285,7 +354,8 @@ rmap_add_rec(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino,
 	int			whichfork,
-	struct xfs_bmbt_irec	*irec)
+	struct xfs_bmbt_irec	*irec,
+	bool			isrt)
 {
 	struct xfs_rmap_irec	rmap;
 	xfs_agnumber_t		agno;
@@ -294,11 +364,17 @@ rmap_add_rec(
 	if (!rmap_needs_work(mp))
 		return;
 
-	agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
-	agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
-	ASSERT(agno != NULLAGNUMBER);
-	ASSERT(agno < mp->m_sb.sb_agcount);
-	ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_agblocks);
+	if (isrt) {
+		agno = xfs_rtb_to_rgno(mp, irec->br_startblock);
+		agbno = xfs_rtb_to_rgbno(mp, irec->br_startblock);
+		ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_rblocks);
+	} else {
+		agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
+		agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
+		ASSERT(agno != NULLAGNUMBER);
+		ASSERT(agno < mp->m_sb.sb_agcount);
+		ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_agblocks);
+	}
 	ASSERT(ino != NULLFSINO);
 	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
 
@@ -312,7 +388,7 @@ rmap_add_rec(
 	if (irec->br_state == XFS_EXT_UNWRITTEN)
 		rmap.rm_flags |= XFS_RMAP_UNWRITTEN;
 
-	rmap_add_mem_rec(mp, agno, &rmap);
+	rmap_add_mem_rec(mp, isrt, agno, &rmap);
 }
 
 /* add a raw rmap; these will be merged later */
@@ -339,7 +415,7 @@ __rmap_add_raw_rec(
 	rmap.rm_startblock = agbno;
 	rmap.rm_blockcount = len;
 
-	rmap_add_mem_rec(mp, agno, &rmap);
+	rmap_add_mem_rec(mp, false, agno, &rmap);
 }
 
 /*
@@ -408,6 +484,7 @@ rmap_add_agbtree_mapping(
 		.rm_blockcount	= len,
 	};
 	struct xfs_perag	*pag;
+	struct xfs_ag_rmap	*x;
 
 	if (!rmap_needs_work(mp))
 		return 0;
@@ -416,7 +493,8 @@ rmap_add_agbtree_mapping(
 	assert(libxfs_verify_agbext(pag, agbno, len));
 	libxfs_perag_put(pag);
 
-	return slab_add(ag_rmaps[agno].ar_agbtree_rmaps, &rmap);
+	x = rmaps_for_group(false, agno);
+	return slab_add(x->ar_agbtree_rmaps, &rmap);
 }
 
 static int
@@ -532,7 +610,7 @@ rmap_commit_agbtree_mappings(
 	struct xfs_buf		*agflbp = NULL;
 	struct xfs_trans	*tp;
 	__be32			*agfl_bno, *b;
-	struct xfs_ag_rmap	*ag_rmap = &ag_rmaps[agno];
+	struct xfs_ag_rmap	*ag_rmap = rmaps_for_group(false, agno);
 	struct bitmap		*own_ag_bitmap = NULL;
 	int			error = 0;
 
@@ -795,7 +873,7 @@ refcount_emit(
 	int			error;
 	struct xfs_slab		*rlslab;
 
-	rlslab = ag_rmaps[agno].ar_refcount_items;
+	rlslab = rmaps_for_group(false, agno)->ar_refcount_items;
 	ASSERT(nr_rmaps > 0);
 
 	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
@@ -928,12 +1006,12 @@ compute_refcounts(
 
 	if (!xfs_has_reflink(mp))
 		return 0;
-	if (!rmaps_has_observations(&ag_rmaps[agno]))
+	if (!rmaps_has_observations(rmaps_for_group(false, agno)))
 		return 0;
 
-	nr_rmaps = rmap_record_count(mp, agno);
+	nr_rmaps = rmap_record_count(mp, false, agno);
 
-	error = rmap_init_mem_cursor(mp, NULL, agno, &rmcur);
+	error = rmap_init_mem_cursor(mp, NULL, false, agno, &rmcur);
 	if (error)
 		return error;
 
@@ -1038,16 +1116,17 @@ count_btree_records(
 uint64_t
 rmap_record_count(
 	struct xfs_mount	*mp,
+	bool			isrt,
 	xfs_agnumber_t		agno)
 {
 	struct xfs_btree_cur	*rmcur;
 	uint64_t		nr = 0;
 	int			error;
 
-	if (!rmaps_has_observations(&ag_rmaps[agno]))
+	if (!rmaps_has_observations(rmaps_for_group(isrt, agno)))
 		return 0;
 
-	error = rmap_init_mem_cursor(mp, NULL, agno, &rmcur);
+	error = rmap_init_mem_cursor(mp, NULL, isrt, agno, &rmcur);
 	if (error)
 		do_error(_("%s while reading in-memory rmap btree\n"),
 				strerror(error));
@@ -1163,7 +1242,7 @@ rmaps_verify_btree(
 	}
 
 	/* Create cursors to rmap structures */
-	error = rmap_init_mem_cursor(mp, NULL, agno, &rm_cur);
+	error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur);
 	if (error) {
 		do_warn(_("Not enough memory to check reverse mappings.\n"));
 		return;
@@ -1483,7 +1562,9 @@ refcount_record_count(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno)
 {
-	return slab_count(ag_rmaps[agno].ar_refcount_items);
+	struct xfs_ag_rmap	*x = rmaps_for_group(false, agno);
+
+	return slab_count(x->ar_refcount_items);
 }
 
 /*
@@ -1494,7 +1575,9 @@ init_refcount_cursor(
 	xfs_agnumber_t		agno,
 	struct xfs_slab_cursor	**cur)
 {
-	return init_slab_cursor(ag_rmaps[agno].ar_refcount_items, NULL, cur);
+	struct xfs_ag_rmap	*x = rmaps_for_group(false, agno);
+
+	return init_slab_cursor(x->ar_refcount_items, NULL, cur);
 }
 
 /*
@@ -1695,7 +1778,7 @@ rmap_store_agflcount(
 	if (!rmap_needs_work(mp))
 		return;
 
-	ag_rmaps[agno].ar_flcount = count;
+	rmaps_for_group(false, agno)->ar_flcount = count;
 }
 
 /* Estimate the size of the ondisk rmapbt from the incore data. */
diff --git a/repair/rmap.h b/repair/rmap.h
index 57e5d5b216650e..23871e6d60e774 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -15,7 +15,7 @@ extern void rmaps_init(struct xfs_mount *);
 extern void rmaps_free(struct xfs_mount *);
 
 void rmap_add_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork,
-		struct xfs_bmbt_irec *irec);
+		struct xfs_bmbt_irec *irec, bool realtime);
 void rmap_add_bmbt_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork,
 		xfs_fsblock_t fsbno);
 bool rmaps_are_mergeable(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
@@ -26,7 +26,8 @@ int rmap_add_agbtree_mapping(struct xfs_mount *mp, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner);
 int rmap_commit_agbtree_mappings(struct xfs_mount *mp, xfs_agnumber_t agno);
 
-uint64_t rmap_record_count(struct xfs_mount *mp, xfs_agnumber_t agno);
+uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt,
+		xfs_agnumber_t agno);
 extern void rmap_avoid_check(void);
 void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno);
 
@@ -52,7 +53,7 @@ xfs_extlen_t estimate_rmapbt_blocks(struct xfs_perag *pag);
 xfs_extlen_t estimate_refcountbt_blocks(struct xfs_perag *pag);
 
 int rmap_init_mem_cursor(struct xfs_mount *mp, struct xfs_trans *tp,
-		xfs_agnumber_t agno, struct xfs_btree_cur **rmcurp);
+		bool isrt, xfs_agnumber_t agno, struct xfs_btree_cur **rmcurp);
 int rmap_get_mem_rec(struct xfs_btree_cur *rmcur, struct xfs_rmap_irec *irec);
 
 xfs_rgnumber_t rtgroup_for_rtrmap_inode(struct xfs_mount *mp, xfs_ino_t ino);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 17/27] xfs_repair: refactor realtime inode check
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-02-06 22:53   ` [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong
@ 2025-02-06 22:54   ` Darrick J. Wong
  2025-02-07  5:22     ` Christoph Hellwig
  2025-02-06 22:54   ` [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong
                     ` (9 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |   87 +++++++++++++++++++++++++------------------------------
 1 file changed, 39 insertions(+), 48 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 3f78eb064919be..628f02714abc05 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -1798,6 +1798,39 @@ check_dinode_mode_format(
 	return 0;	/* invalid modes are checked elsewhere */
 }
 
+static int
+process_check_rt_inode(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dinoc,
+	xfs_ino_t		lino,
+	int			*type,
+	int			*dirty,
+	int			expected_type,
+	const char		*tag)
+{
+	xfs_extnum_t		dnextents = xfs_dfork_data_extents(dinoc);
+
+	if (*type != expected_type) {
+		do_warn(
+_("%s inode %" PRIu64 " has bad type 0x%x, "),
+			tag, lino, dinode_fmt(dinoc));
+		if (!no_modify)  {
+			do_warn(_("resetting to regular file\n"));
+			change_dinode_fmt(dinoc, S_IFREG);
+			*dirty = 1;
+		} else  {
+			do_warn(_("would reset to regular file\n"));
+		}
+	}
+	if (mp->m_sb.sb_rblocks == 0 && dnextents != 0)  {
+		do_warn(
+_("bad # of extents (%" PRIu64 ") for %s inode %" PRIu64 "\n"),
+			dnextents, tag, lino);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * If inode is a superblock inode, does type check to make sure is it valid.
  * Returns 0 if it's valid, non-zero if it needs to be cleared.
@@ -1811,8 +1844,6 @@ process_check_metadata_inodes(
 	int			*type,
 	int			*dirty)
 {
-	xfs_extnum_t		dnextents;
-
 	if (lino == mp->m_sb.sb_rootino) {
 		if (*type != XR_INO_DIR)  {
 			do_warn(_("root inode %" PRIu64 " has bad type 0x%x\n"),
@@ -1854,52 +1885,12 @@ process_check_metadata_inodes(
 		}
 		return 0;
 	}
-
-	dnextents = xfs_dfork_data_extents(dinoc);
-	if (lino == mp->m_sb.sb_rsumino ||
-	    is_rtsummary_inode(lino)) {
-		if (*type != XR_INO_RTSUM) {
-			do_warn(
-_("realtime summary inode %" PRIu64 " has bad type 0x%x, "),
-				lino, dinode_fmt(dinoc));
-			if (!no_modify)  {
-				do_warn(_("resetting to regular file\n"));
-				change_dinode_fmt(dinoc, S_IFREG);
-				*dirty = 1;
-			} else  {
-				do_warn(_("would reset to regular file\n"));
-			}
-		}
-		if (mp->m_sb.sb_rblocks == 0 && dnextents != 0)  {
-			do_warn(
-_("bad # of extents (%" PRIu64 ") for realtime summary inode %" PRIu64 "\n"),
-				dnextents, lino);
-			return 1;
-		}
-		return 0;
-	}
-	if (lino == mp->m_sb.sb_rbmino ||
-	    is_rtbitmap_inode(lino)) {
-		if (*type != XR_INO_RTBITMAP) {
-			do_warn(
-_("realtime bitmap inode %" PRIu64 " has bad type 0x%x, "),
-				lino, dinode_fmt(dinoc));
-			if (!no_modify)  {
-				do_warn(_("resetting to regular file\n"));
-				change_dinode_fmt(dinoc, S_IFREG);
-				*dirty = 1;
-			} else  {
-				do_warn(_("would reset to regular file\n"));
-			}
-		}
-		if (mp->m_sb.sb_rblocks == 0 && dnextents != 0)  {
-			do_warn(
-_("bad # of extents (%" PRIu64 ") for realtime bitmap inode %" PRIu64 "\n"),
-				dnextents, lino);
-			return 1;
-		}
-		return 0;
-	}
+	if (lino == mp->m_sb.sb_rsumino || is_rtsummary_inode(lino))
+		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
+				XR_INO_RTSUM, _("realtime summary"));
+	if (lino == mp->m_sb.sb_rbmino || is_rtbitmap_inode(lino))
+		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
+				XR_INO_RTBITMAP, _("realtime bitmap"));
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (16 preceding siblings ...)
  2025-02-06 22:54   ` [PATCH 17/27] xfs_repair: refactor realtime inode check Darrick J. Wong
@ 2025-02-06 22:54   ` Darrick J. Wong
  2025-02-07  5:33     ` Christoph Hellwig
  2025-02-06 22:54   ` [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong
                     ` (8 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dino_chunks.c |   13 ++++++++++
 repair/dinode.c      |   65 +++++++++++++++++++++++++++++++++++++++++++++-----
 repair/dir2.c        |    7 +++++
 repair/incore.h      |    1 +
 repair/rmap.c        |   19 ++++++---------
 repair/rmap.h        |    5 ++--
 repair/scan.c        |    8 +++---
 7 files changed, 95 insertions(+), 23 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index fe106f0b6ab536..8c5387cdf4ea52 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -16,6 +16,8 @@
 #include "prefetch.h"
 #include "progress.h"
 #include "rt.h"
+#include "slab.h"
+#include "rmap.h"
 
 /*
  * validates inode block or chunk, returns # of good inodes
@@ -1023,6 +1025,17 @@ process_inode_chunk(
 	_("would clear rtgroup summary inode %" PRIu64 "\n"),
 						ino);
 				}
+			} else if (is_rtrmap_inode(ino)) {
+				rmap_avoid_check(mp);
+				if (!no_modify)  {
+					do_warn(
+	_("cleared rtgroup rmap inode %" PRIu64 "\n"),
+						ino);
+				} else  {
+					do_warn(
+	_("would clear rtgroup rmap inode %" PRIu64 "\n"),
+						ino);
+				}
 			} else if (!no_modify)  {
 				do_warn(_("cleared inode %" PRIu64 "\n"),
 					ino);
diff --git a/repair/dinode.c b/repair/dinode.c
index 628f02714abc05..58691b196bc4cb 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -178,6 +178,9 @@ clear_dinode(
 
 	if (is_rtsummary_inode(ino_num))
 		mark_rtgroup_inodes_bad(mp, XFS_RTGI_SUMMARY);
+
+	if (is_rtrmap_inode(ino_num))
+		rmap_avoid_check(mp);
 }
 
 /*
@@ -823,6 +826,14 @@ get_agino_buf(
 	return bp;
 }
 
+static inline xfs_rgnumber_t
+metafile_rgnumber(
+	const struct xfs_dinode	*dip)
+{
+	return (xfs_rgnumber_t)be16_to_cpu(dip->di_projid_hi) << 16 |
+			       be16_to_cpu(dip->di_projid_lo);
+}
+
 /*
  * higher level inode processing stuff starts here:
  * first, one utility routine for each type of inode
@@ -870,7 +881,10 @@ process_rtrmap(
 
 	lino = XFS_AGINO_TO_INO(mp, agno, ino);
 
-	/* This rmap btree inode must be a metadata inode. */
+	/*
+	 * This rmap btree inode must be a metadata inode reachable via
+	 * /rtgroups/$rgno.rmap in the metadata directory tree.
+	 */
 	if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) {
 		do_warn(
 _("rtrmap inode %" PRIu64 " not flagged as metadata\n"),
@@ -878,11 +892,25 @@ _("rtrmap inode %" PRIu64 " not flagged as metadata\n"),
 		return 1;
 	}
 
-	if (!is_rtrmap_inode(lino)) {
-		do_warn(
+	/*
+	 * If this rtrmap file claims to be from an rtgroup that actually
+	 * exists, check that inode discovery actually found it.  Note that
+	 * we can have stray rtrmap files from failed growfsrt operations.
+	 */
+	if (metafile_rgnumber(dip) < mp->m_sb.sb_rgcount) {
+		if (type != XR_INO_RTRMAP) {
+			do_warn(
+_("rtrmap inode %" PRIu64 " was not found in the metadata directory tree\n"),
+				lino);
+			return 1;
+		}
+
+		if (!is_rtrmap_inode(lino)) {
+			do_warn(
 _("could not associate rtrmap inode %" PRIu64 " with any rtgroup\n"),
-			lino);
-		return 1;
+				lino);
+			return 1;
+		}
 	}
 
 	memset(&priv.high_key, 0xFF, sizeof(priv.high_key));
@@ -921,7 +949,7 @@ _("computed size of rtrmapbt root (%zu bytes) is greater than space in "
 		error = process_rtrmap_reclist(mp, rp, numrecs,
 				&priv.last_rec, NULL, "rtrmapbt root");
 		if (error) {
-			rmap_avoid_check();
+			rmap_avoid_check(mp);
 			return 1;
 		}
 		return 0;
@@ -1891,6 +1919,9 @@ process_check_metadata_inodes(
 	if (lino == mp->m_sb.sb_rbmino || is_rtbitmap_inode(lino))
 		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
 				XR_INO_RTBITMAP, _("realtime bitmap"));
+	if (is_rtrmap_inode(lino))
+		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
+				XR_INO_RTRMAP, _("realtime rmap btree"));
 	return 0;
 }
 
@@ -1989,6 +2020,18 @@ _("realtime summary inode %" PRIu64 " has bad size %" PRIu64 " (should be %" PRI
 		}
 		break;
 
+	case XR_INO_RTRMAP:
+		/*
+		 * if we have no rmapbt, any inode claiming
+		 * to be a real-time file is bogus
+		 */
+		if (!xfs_has_rmapbt(mp)) {
+			do_warn(
+_("found inode %" PRIu64 " claiming to be a rtrmapbt file, but rmapbt is disabled\n"), lino);
+			return 1;
+		}
+		break;
+
 	default:
 		break;
 	}
@@ -2017,6 +2060,14 @@ _("bad attr fork offset %d in dev inode %" PRIu64 ", should be %d\n"),
 			return 1;
 		}
 		break;
+	case XFS_DINODE_FMT_META_BTREE:
+		if (!xfs_has_metadir(mp) || !xfs_has_parent(mp)) {
+			do_warn(
+_("metadata inode %" PRIu64 " type %d cannot have attr fork\n"),
+				lino, dino->di_format);
+			return 1;
+		}
+		fallthrough;
 	case XFS_DINODE_FMT_LOCAL:
 	case XFS_DINODE_FMT_EXTENTS:
 	case XFS_DINODE_FMT_BTREE:
@@ -3173,6 +3224,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			type = XR_INO_GQUOTA;
 		else if (is_quota_inode(XFS_DQTYPE_PROJ, lino))
 			type = XR_INO_PQUOTA;
+		else if (is_rtrmap_inode(lino))
+			type = XR_INO_RTRMAP;
 		else
 			type = XR_INO_DATA;
 		break;
diff --git a/repair/dir2.c b/repair/dir2.c
index ed615662009957..af00b2d8d6c852 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -15,6 +15,8 @@
 #include "da_util.h"
 #include "prefetch.h"
 #include "progress.h"
+#include "slab.h"
+#include "rmap.h"
 #include "rt.h"
 
 /*
@@ -277,6 +279,9 @@ process_sf_dir2(
 		} else if (lino == mp->m_sb.sb_metadirino)  {
 			junkit = 1;
 			junkreason = _("metadata directory root");
+		} else if (is_rtrmap_inode(lino)) {
+			junkit = 1;
+			junkreason = _("realtime rmap");
 		} else if ((irec_p = find_inode_rec(mp,
 					XFS_INO_TO_AGNO(mp, lino),
 					XFS_INO_TO_AGINO(mp, lino))) != NULL) {
@@ -754,6 +759,8 @@ process_dir2_data(
 			clearreason = _("project quota");
 		} else if (ent_ino == mp->m_sb.sb_metadirino)  {
 			clearreason = _("metadata directory root");
+		} else if (is_rtrmap_inode(ent_ino)) {
+			clearreason = _("realtime rmap");
 		} else {
 			irec_p = find_inode_rec(mp,
 						XFS_INO_TO_AGNO(mp, ent_ino),
diff --git a/repair/incore.h b/repair/incore.h
index 61730c330911f7..4add12615e0a04 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -241,6 +241,7 @@ int		count_bcnt_extents(xfs_agnumber_t);
 #define XR_INO_UQUOTA	12		/* user quota inode */
 #define XR_INO_GQUOTA	13		/* group quota inode */
 #define XR_INO_PQUOTA	14		/* project quota inode */
+#define XR_INO_RTRMAP	15		/* realtime rmap */
 
 /* inode allocation tree */
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 13e9a06b04f370..8656c8df3cbc83 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -15,6 +15,7 @@
 #include "libfrog/bitmap.h"
 #include "libfrog/platform.h"
 #include "rcbag.h"
+#include "rt.h"
 
 #undef RMAP_DEBUG
 
@@ -169,15 +170,6 @@ rmaps_init_ag(
 _("Insufficient memory while allocating realtime reverse mapping btree."));
 }
 
-xfs_rgnumber_t
-rtgroup_for_rtrmap_inode(
-	struct xfs_mount	*mp,
-	xfs_ino_t		ino)
-{
-	/* This will be implemented later. */
-	return NULLRGNUMBER;
-}
-
 /*
  * Initialize per-AG reverse map data.
  */
@@ -203,6 +195,8 @@ rmaps_init(
 
 	for (i = 0; i < mp->m_sb.sb_rgcount; i++)
 		rmaps_init_rt(mp, i, &rg_rmaps[i]);
+
+	discover_rtgroup_inodes(mp);
 }
 
 /*
@@ -1142,11 +1136,14 @@ rmap_record_count(
 }
 
 /*
- * Disable the refcount btree check.
+ * Disable the rmap btree check.
  */
 void
-rmap_avoid_check(void)
+rmap_avoid_check(
+	struct xfs_mount	*mp)
 {
+	if (xfs_has_rtgroups(mp))
+		mark_rtgroup_inodes_bad(mp, XFS_RTGI_RMAP);
 	rmapbt_suspect = true;
 }
 
diff --git a/repair/rmap.h b/repair/rmap.h
index 23871e6d60e774..b5c8b4f0bef794 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -28,7 +28,7 @@ int rmap_commit_agbtree_mappings(struct xfs_mount *mp, xfs_agnumber_t agno);
 
 uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt,
 		xfs_agnumber_t agno);
-extern void rmap_avoid_check(void);
+extern void rmap_avoid_check(struct xfs_mount *mp);
 void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno);
 
 extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
@@ -56,6 +56,7 @@ int rmap_init_mem_cursor(struct xfs_mount *mp, struct xfs_trans *tp,
 		bool isrt, xfs_agnumber_t agno, struct xfs_btree_cur **rmcurp);
 int rmap_get_mem_rec(struct xfs_btree_cur *rmcur, struct xfs_rmap_irec *irec);
 
-xfs_rgnumber_t rtgroup_for_rtrmap_inode(struct xfs_mount *mp, xfs_ino_t ino);
+void populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg,
+		xfs_filblks_t est_fdblocks);
 
 #endif /* RMAP_H_ */
diff --git a/repair/scan.c b/repair/scan.c
index 386aaa15f78c33..7a74f87c5f0c61 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1348,7 +1348,7 @@ _("out of order key %u in %s btree block (%u/%u)\n"),
 
 out:
 	if (suspect)
-		rmap_avoid_check();
+		rmap_avoid_check(mp);
 }
 
 int
@@ -1728,7 +1728,7 @@ _("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"),
 
 out:
 	if (hdr_errors || suspect) {
-		rmap_avoid_check();
+		rmap_avoid_check(mp);
 		return 1;
 	}
 	return 0;
@@ -2811,7 +2811,7 @@ validate_agf(
 		if (levels == 0 || levels > mp->m_rmap_maxlevels) {
 			do_warn(_("bad levels %u for rmapbt root, agno %d\n"),
 				levels, agno);
-			rmap_avoid_check();
+			rmap_avoid_check(mp);
 		}
 
 		bno = be32_to_cpu(agf->agf_rmap_root);
@@ -2826,7 +2826,7 @@ validate_agf(
 		} else {
 			do_warn(_("bad agbno %u for rmapbt root, agno %d\n"),
 				bno, agno);
-			rmap_avoid_check();
+			rmap_avoid_check(mp);
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (17 preceding siblings ...)
  2025-02-06 22:54   ` [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong
@ 2025-02-06 22:54   ` Darrick J. Wong
  2025-02-07  5:35     ` Christoph Hellwig
  2025-02-06 22:54   ` [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong
                     ` (7 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/phase4.c |   14 +++
 repair/rmap.c   |  226 ++++++++++++++++++++++++++++++++++++++++++-------------
 repair/rmap.h   |    2 
 3 files changed, 188 insertions(+), 54 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 728d9ed84cdc7a..29efa58af33178 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -159,6 +159,16 @@ check_rmap_btrees(
 	rmaps_verify_btree(wq->wq_ctx, agno);
 }
 
+static void
+check_rtrmap_btrees(
+	struct workqueue *wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	rmap_add_fixed_rtgroup_rec(wq->wq_ctx, agno);
+	rtrmaps_verify_btree(wq->wq_ctx, agno);
+}
+
 static void
 compute_ag_refcounts(
 	struct workqueue*wq,
@@ -211,6 +221,10 @@ process_rmap_data(
 	create_work_queue(&wq, mp, platform_nproc());
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, check_rmap_btrees, i, NULL);
+	if (xfs_has_rtrmapbt(mp)) {
+		for (i = 0; i < mp->m_sb.sb_rgcount; i++)
+			queue_work(&wq, check_rtrmap_btrees, i, NULL);
+	}
 	destroy_work_queue(&wq);
 
 	if (!xfs_has_reflink(mp))
diff --git a/repair/rmap.c b/repair/rmap.c
index 8656c8df3cbc83..a40851b4d0dc69 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -16,6 +16,7 @@
 #include "libfrog/platform.h"
 #include "rcbag.h"
 #include "rt.h"
+#include "prefetch.h"
 
 #undef RMAP_DEBUG
 
@@ -195,8 +196,6 @@ rmaps_init(
 
 	for (i = 0; i < mp->m_sb.sb_rgcount; i++)
 		rmaps_init_rt(mp, i, &rg_rmaps[i]);
-
-	discover_rtgroup_inodes(mp);
 }
 
 /*
@@ -573,6 +572,27 @@ rmap_add_fixed_ag_rec(
 	}
 }
 
+/* Add this realtime group's fixed metadata to the incore data. */
+void
+rmap_add_fixed_rtgroup_rec(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	struct xfs_rmap_irec	rmap = {
+		.rm_startblock	= 0,
+		.rm_blockcount	= mp->m_sb.sb_rextsize,
+		.rm_owner	= XFS_RMAP_OWN_FS,
+		.rm_offset	= 0,
+		.rm_flags	= 0,
+	};
+
+	if (!rmap_needs_work(mp))
+		return;
+
+	if (xfs_has_rtsb(mp) && rgno == 0)
+		rmap_add_mem_rec(mp, true, rgno, &rmap);
+}
+
 /*
  * Copy the per-AG btree reverse-mapping data into the rmapbt.
  *
@@ -1213,62 +1233,25 @@ rmap_is_good(
 #undef NEXTP
 #undef NEXTL
 
-/*
- * Compare the observed reverse mappings against what's in the ag btree.
- */
-void
-rmaps_verify_btree(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno)
+static int
+rmap_compare_records(
+	struct xfs_btree_cur	*rm_cur,
+	struct xfs_btree_cur	*bt_cur,
+	unsigned int		group)
 {
-	struct xfs_btree_cur	*rm_cur;
 	struct xfs_rmap_irec	rm_rec;
 	struct xfs_rmap_irec	tmp;
-	struct xfs_btree_cur	*bt_cur = NULL;
-	struct xfs_buf		*agbp = NULL;
-	struct xfs_perag	*pag = NULL;
 	int			have;
 	int			error;
 
-	if (!xfs_has_rmapbt(mp))
-		return;
-	if (rmapbt_suspect) {
-		if (no_modify && agno == 0)
-			do_warn(_("would rebuild corrupt rmap btrees.\n"));
-		return;
-	}
-
-	/* Create cursors to rmap structures */
-	error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur);
-	if (error) {
-		do_warn(_("Not enough memory to check reverse mappings.\n"));
-		return;
-	}
-
-	pag = libxfs_perag_get(mp, agno);
-	error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp);
-	if (error) {
-		do_warn(_("Could not read AGF %u to check rmap btree.\n"),
-				agno);
-		goto err_pag;
-	}
-
-	/* Leave the per-ag data "uninitialized" since we rewrite it later */
-	clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
-
-	bt_cur = libxfs_rmapbt_init_cursor(mp, NULL, agbp, pag);
-	if (!bt_cur) {
-		do_warn(_("Not enough memory to check reverse mappings.\n"));
-		goto err_agf;
-	}
-
 	while ((error = rmap_get_mem_rec(rm_cur, &rm_rec)) == 1) {
 		error = rmap_lookup(bt_cur, &rm_rec, &tmp, &have);
 		if (error) {
 			do_warn(
 _("Could not read reverse-mapping record for (%u/%u).\n"),
-					agno, rm_rec.rm_startblock);
-			goto err_cur;
+					group,
+					rm_rec.rm_startblock);
+			return error;
 		}
 
 		/*
@@ -1283,15 +1266,15 @@ _("Could not read reverse-mapping record for (%u/%u).\n"),
 			if (error) {
 				do_warn(
 _("Could not read reverse-mapping record for (%u/%u).\n"),
-						agno, rm_rec.rm_startblock);
-				goto err_cur;
+						group, rm_rec.rm_startblock);
+				return error;
 			}
 		}
 		if (!have) {
 			do_warn(
 _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \
 %s%soff %"PRIu64"\n"),
-				agno, rm_rec.rm_startblock,
+				group, rm_rec.rm_startblock,
 				(rm_rec.rm_flags & XFS_RMAP_UNWRITTEN) ?
 					_("unwritten ") : "",
 				rm_rec.rm_blockcount,
@@ -1304,12 +1287,12 @@ _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \
 			continue;
 		}
 
-		/* Compare each refcount observation against the btree's */
+		/* Compare each rmap observation against the btree's */
 		if (!rmap_is_good(&rm_rec, &tmp)) {
 			do_warn(
 _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \
 %"PRIu64"; should be (%u/%u) %slen %u owner %"PRId64" %s%soff %"PRIu64"\n"),
-				agno, tmp.rm_startblock,
+				group, tmp.rm_startblock,
 				(tmp.rm_flags & XFS_RMAP_UNWRITTEN) ?
 					_("unwritten ") : "",
 				tmp.rm_blockcount,
@@ -1319,7 +1302,7 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \
 				(tmp.rm_flags & XFS_RMAP_BMBT_BLOCK) ?
 					_("bmbt ") : "",
 				tmp.rm_offset,
-				agno, rm_rec.rm_startblock,
+				group, rm_rec.rm_startblock,
 				(rm_rec.rm_flags & XFS_RMAP_UNWRITTEN) ?
 					_("unwritten ") : "",
 				rm_rec.rm_blockcount,
@@ -1332,8 +1315,61 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \
 		}
 	}
 
+	return error;
+}
+
+/*
+ * Compare the observed reverse mappings against what's in the ag btree.
+ */
+void
+rmaps_verify_btree(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_cur	*rm_cur;
+	struct xfs_btree_cur	*bt_cur = NULL;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_perag	*pag = NULL;
+	int			error;
+
+	if (!xfs_has_rmapbt(mp))
+		return;
+	if (rmapbt_suspect) {
+		if (no_modify && agno == 0)
+			do_warn(_("would rebuild corrupt rmap btrees.\n"));
+		return;
+	}
+
+	/* Create cursors to rmap structures */
+	error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur);
+	if (error) {
+		do_warn(_("Not enough memory to check reverse mappings.\n"));
+		return;
+	}
+
+	pag = libxfs_perag_get(mp, agno);
+	error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp);
+	if (error) {
+		do_warn(_("Could not read AGF %u to check rmap btree.\n"),
+				agno);
+		goto err_pag;
+	}
+
+	/* Leave the per-ag data "uninitialized" since we rewrite it later */
+	clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
+
+	bt_cur = libxfs_rmapbt_init_cursor(mp, NULL, agbp, pag);
+	if (!bt_cur) {
+		do_warn(_("Not enough memory to check reverse mappings.\n"));
+		goto err_agf;
+	}
+
+	error = rmap_compare_records(rm_cur, bt_cur, agno);
+	if (error)
+		goto err_cur;
+
 err_cur:
-	libxfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
+	libxfs_btree_del_cursor(bt_cur, error);
 err_agf:
 	libxfs_buf_relse(agbp);
 err_pag:
@@ -1341,6 +1377,88 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \
 	libxfs_btree_del_cursor(rm_cur, error);
 }
 
+/*
+ * Compare the observed reverse mappings against what's in the rtgroup btree.
+ */
+void
+rtrmaps_verify_btree(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	struct xfs_btree_cur	*rm_cur;
+	struct xfs_btree_cur	*bt_cur = NULL;
+	struct xfs_rtgroup	*rtg = NULL;
+	struct xfs_inode	*ip = NULL;
+	int			error;
+
+	if (!xfs_has_rmapbt(mp))
+		return;
+	if (rmapbt_suspect) {
+		if (no_modify && rgno == 0)
+			do_warn(_("would rebuild corrupt rmap btrees.\n"));
+		return;
+	}
+
+	/* Create cursors to rmap structures */
+	error = rmap_init_mem_cursor(mp, NULL, true, rgno, &rm_cur);
+	if (error) {
+		do_warn(_("Not enough memory to check reverse mappings.\n"));
+		return;
+	}
+
+	rtg = libxfs_rtgroup_get(mp, rgno);
+	if (!rtg) {
+		do_warn(_("Could not load rtgroup %u.\n"), rgno);
+		goto err_rcur;
+	}
+
+	ip = rtg_rmap(rtg);
+	if (!ip) {
+		do_warn(_("Could not find rtgroup %u rmap inode.\n"), rgno);
+		goto err_rtg;
+	}
+
+	if (ip->i_df.if_format != XFS_DINODE_FMT_META_BTREE) {
+		do_warn(
+_("rtgroup %u rmap inode has wrong format 0x%x, expected 0x%x\n"),
+				rgno, ip->i_df.if_format,
+				XFS_DINODE_FMT_META_BTREE);
+		goto err_rtg;
+	}
+
+	if (ip->i_metatype != XFS_METAFILE_RTRMAP) {
+		do_warn(
+_("rtgroup %u rmap inode has wrong metatype 0x%x, expected 0x%x\n"),
+				rgno, ip->i_df.if_format,
+				XFS_METAFILE_RTRMAP);
+		goto err_rtg;
+	}
+
+	if (xfs_inode_has_attr_fork(ip) &&
+	    !(xfs_has_metadir(mp) && xfs_has_parent(mp))) {
+		do_warn(
+_("rtgroup %u rmap inode should not have extended attributes\n"), rgno);
+		goto err_rtg;
+	}
+
+	bt_cur = libxfs_rtrmapbt_init_cursor(NULL, rtg);
+	if (!bt_cur) {
+		do_warn(_("Not enough memory to check reverse mappings.\n"));
+		goto err_rtg;
+	}
+
+	error = rmap_compare_records(rm_cur, bt_cur, rgno);
+	if (error)
+		goto err_cur;
+
+err_cur:
+	libxfs_btree_del_cursor(bt_cur, error);
+err_rtg:
+	libxfs_rtgroup_put(rtg);
+err_rcur:
+	libxfs_btree_del_cursor(rm_cur, error);
+}
+
 /*
  * Compare the key fields of two rmap records -- positive if key1 > key2,
  * negative if key1 < key2, and zero if equal.
diff --git a/repair/rmap.h b/repair/rmap.h
index b5c8b4f0bef794..ebda561e59bc8f 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -21,6 +21,7 @@ void rmap_add_bmbt_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork,
 bool rmaps_are_mergeable(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2);
 
 void rmap_add_fixed_ag_rec(struct xfs_mount *mp, xfs_agnumber_t agno);
+void rmap_add_fixed_rtgroup_rec(struct xfs_mount *mp, xfs_rgnumber_t rgno);
 
 int rmap_add_agbtree_mapping(struct xfs_mount *mp, xfs_agnumber_t agno,
 		xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner);
@@ -30,6 +31,7 @@ uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt,
 		xfs_agnumber_t agno);
 extern void rmap_avoid_check(struct xfs_mount *mp);
 void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno);
+void rtrmaps_verify_btree(struct xfs_mount *mp, xfs_rgnumber_t rgno);
 
 extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 		struct xfs_rmap_irec *kp2);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (18 preceding siblings ...)
  2025-02-06 22:54   ` [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong
@ 2025-02-06 22:54   ` Darrick J. Wong
  2025-02-07  5:35     ` Christoph Hellwig
  2025-02-06 22:55   ` [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree Darrick J. Wong
                     ` (6 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |   95 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 11 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 58691b196bc4cb..3995584a364771 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -253,7 +253,7 @@ _("data fork in rt ino %" PRIu64 " claims dup rt extent,"
 	return 0;
 }
 
-static int
+static void
 process_rt_rec_state(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino,
@@ -297,11 +297,78 @@ _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d
 			set_rtbmap(ext, zap_metadata ? XR_E_METADATA :
 						       XR_E_INUSE);
 			break;
+		case XR_E_BAD_STATE:
+			do_error(
+_("bad state in rt extent map %" PRIu64 "\n"),
+				ext);
 		case XR_E_METADATA:
+		case XR_E_FS_MAP:
+		case XR_E_INO:
+		case XR_E_INUSE_FS:
+			break;
+		case XR_E_INUSE:
+		case XR_E_MULT:
+			set_rtbmap(ext, XR_E_MULT);
+			break;
+		case XR_E_FREE1:
+		default:
 			do_error(
+_("illegal state %d in rt extent %" PRIu64 "\n"),
+				state, ext);
+		}
+		b += mp->m_sb.sb_rextsize;
+	} while (b < irec->br_startblock + irec->br_blockcount);
+}
+
+/*
+ * Checks the realtime file's data mapping against in-core extent info, and
+ * complains if there are discrepancies.  Returns 0 if good, 1 if bad.
+ */
+static int
+check_rt_rec_state(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	struct xfs_bmbt_irec	*irec)
+{
+	xfs_fsblock_t		b = irec->br_startblock;
+	xfs_rtblock_t		ext;
+	int			state;
+
+	do {
+		ext = (xfs_rtblock_t)b / mp->m_sb.sb_rextsize;
+		state = get_rtbmap(ext);
+
+		if ((b % mp->m_sb.sb_rextsize) != 0) {
+			/*
+			 * We are midway through a partially written extent.
+			 * If we don't find the state that gets set in the
+			 * other clause of this loop body, then we have a
+			 * partially *mapped* rt extent and should complain.
+			 */
+			if (state != XR_E_INUSE && state != XR_E_FREE) {
+				do_warn(
+_("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d at rt block %"PRIu64"\n"),
+					ino, ext, state, b);
+				return 1;
+			}
+
+			b = roundup(b, mp->m_sb.sb_rextsize);
+			continue;
+		}
+
+		/*
+		 * This is the start of an rt extent.  Complain if there are
+		 * conflicting states.  We'll set the state elsewhere.
+		 */
+		switch (state)  {
+		case XR_E_FREE:
+		case XR_E_UNKNOWN:
+			break;
+		case XR_E_METADATA:
+			do_warn(
 _("data fork in rt inode %" PRIu64 " found metadata file block %" PRIu64 " in rt bmap\n"),
 				ino, ext);
-			break;
+			return 1;
 		case XR_E_BAD_STATE:
 			do_error(
 _("bad state in rt extent map %" PRIu64 "\n"),
@@ -309,12 +376,12 @@ _("bad state in rt extent map %" PRIu64 "\n"),
 		case XR_E_FS_MAP:
 		case XR_E_INO:
 		case XR_E_INUSE_FS:
-			do_error(
+			do_warn(
 _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt bmap\n"),
 				ino, ext);
+			return 1;
 		case XR_E_INUSE:
 		case XR_E_MULT:
-			set_rtbmap(ext, XR_E_MULT);
 			do_warn(
 _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"),
 				ino, b);
@@ -376,20 +443,26 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", "
 	}
 
 	pthread_mutex_lock(&rt_lock);
-	if (check_dups)
-		bad = process_rt_rec_dups(mp, ino, irec);
-	else
-		bad = process_rt_rec_state(mp, ino, zap_metadata, irec);
-	pthread_mutex_unlock(&rt_lock);
+	bad = check_rt_rec_state(mp, ino, irec);
 	if (bad)
-		return bad;
+		goto out_unlock;
+
+	if (check_dups) {
+		bad = process_rt_rec_dups(mp, ino, irec);
+		if (bad)
+			goto out_unlock;
+	} else {
+		process_rt_rec_state(mp, ino, zap_metadata, irec);
+	}
 
 	/*
 	 * bump up the block counter
 	 */
 	*tot += irec->br_blockcount;
 
-	return 0;
+out_unlock:
+	pthread_mutex_unlock(&rt_lock);
+	return bad;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (19 preceding siblings ...)
  2025-02-06 22:54   ` [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong
@ 2025-02-06 22:55   ` Darrick J. Wong
  2025-02-07  5:54     ` Christoph Hellwig
  2025-02-06 22:55   ` [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels Darrick J. Wong
                     ` (5 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    8 +
 repair/Makefile          |    1 
 repair/bulkload.c        |   41 +++++++
 repair/bulkload.h        |    2 
 repair/phase6.c          |   21 ++++
 repair/rmap.c            |   26 +++++
 repair/rmap.h            |    1 
 repair/rtrmap_repair.c   |  265 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/xfs_repair.c      |    8 +
 9 files changed, 371 insertions(+), 2 deletions(-)
 create mode 100644 repair/rtrmap_repair.c


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b62efad757470b..193b1eeaa7537e 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -278,6 +278,7 @@
 #define xfs_rmap_irec_offset_unpack	libxfs_rmap_irec_offset_unpack
 #define xfs_rmap_lookup_le		libxfs_rmap_lookup_le
 #define xfs_rmap_lookup_le_range	libxfs_rmap_lookup_le_range
+#define xfs_rmap_map_extent		libxfs_rmap_map_extent
 #define xfs_rmap_map_raw		libxfs_rmap_map_raw
 #define xfs_rmap_query_all		libxfs_rmap_query_all
 #define xfs_rmap_query_range		libxfs_rmap_query_range
@@ -295,9 +296,12 @@
 #define xfs_rtginode_name		libxfs_rtginode_name
 #define xfs_rtsummary_create		libxfs_rtsummary_create
 
+#define xfs_rtginode_create		libxfs_rtginode_create
 #define xfs_rtginode_irele		libxfs_rtginode_irele
 #define xfs_rtginode_load		libxfs_rtginode_load
 #define xfs_rtginode_load_parent	libxfs_rtginode_load_parent
+#define xfs_rtginode_mkdir_parent	libxfs_rtginode_mkdir_parent
+#define xfs_rtginode_name		libxfs_rtginode_name
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
 #define xfs_rtgroup_extents		libxfs_rtgroup_extents
 #define xfs_rtgroup_grab		libxfs_rtgroup_grab
@@ -314,12 +318,16 @@
 #define xfs_rtgroup_get			libxfs_rtgroup_get
 #define xfs_rtgroup_put			libxfs_rtgroup_put
 #define xfs_rtrmapbt_calc_reserves	libxfs_rtrmapbt_calc_reserves
+#define xfs_rtrmapbt_calc_size		libxfs_rtrmapbt_calc_size
+#define xfs_rtrmapbt_commit_staged_btree	libxfs_rtrmapbt_commit_staged_btree
+#define xfs_rtrmapbt_create		libxfs_rtrmapbt_create
 #define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
 #define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
 #define xfs_rtrmapbt_init_cursor	libxfs_rtrmapbt_init_cursor
 #define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
 #define xfs_rtrmapbt_mem_init		libxfs_rtrmapbt_mem_init
 #define xfs_rtrmapbt_mem_cursor		libxfs_rtrmapbt_mem_cursor
+#define xfs_rtrmapbt_stage_cursor	libxfs_rtrmapbt_stage_cursor
 
 #define xfs_sb_from_disk		libxfs_sb_from_disk
 #define xfs_sb_mount_rextsize		libxfs_sb_mount_rextsize
diff --git a/repair/Makefile b/repair/Makefile
index a36a95e353a504..6f4ec3b3a9c4dc 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -73,6 +73,7 @@ CFILES = \
 	rcbag.c \
 	rmap.c \
 	rt.c \
+	rtrmap_repair.c \
 	sb.c \
 	scan.c \
 	slab.c \
diff --git a/repair/bulkload.c b/repair/bulkload.c
index aada5bbae579f8..a9e51de0a24c17 100644
--- a/repair/bulkload.c
+++ b/repair/bulkload.c
@@ -361,3 +361,44 @@ bulkload_estimate_ag_slack(
 	if (bload->node_slack < 0)
 		bload->node_slack = 2;
 }
+
+/*
+ * Estimate proper slack values for a btree that's being reloaded.
+ *
+ * Under most circumstances, we'll take whatever default loading value the
+ * btree bulk loading code calculates for us.  However, there are some
+ * exceptions to this rule:
+ *
+ * (1) If someone turned one of the debug knobs.
+ * (2) The FS has less than ~9% space free.
+ *
+ * Note that we actually use 3/32 for the comparison to avoid division.
+ */
+void
+bulkload_estimate_inode_slack(
+	struct xfs_mount	*mp,
+	struct xfs_btree_bload	*bload,
+	unsigned long long	free)
+{
+	/*
+	 * The global values are set to -1 (i.e. take the bload defaults)
+	 * unless someone has set them otherwise, so we just pull the values
+	 * here.
+	 */
+	bload->leaf_slack = bload_leaf_slack;
+	bload->node_slack = bload_node_slack;
+
+	/* No further changes if there's more than 3/32ths space left. */
+	if (free >= ((mp->m_sb.sb_dblocks * 3) >> 5))
+		return;
+
+	/*
+	 * We're low on space; load the btrees as tightly as possible.  Leave
+	 * a couple of open slots in each btree block so that we don't end up
+	 * splitting the btrees like crazy right after mount.
+	 */
+	if (bload->leaf_slack < 0)
+		bload->leaf_slack = 2;
+	if (bload->node_slack < 0)
+		bload->node_slack = 2;
+}
diff --git a/repair/bulkload.h b/repair/bulkload.h
index a88aafaa678a3a..842121b15190e7 100644
--- a/repair/bulkload.h
+++ b/repair/bulkload.h
@@ -78,5 +78,7 @@ void bulkload_cancel(struct bulkload *bkl);
 int bulkload_commit(struct bulkload *bkl);
 void bulkload_estimate_ag_slack(struct repair_ctx *sc,
 		struct xfs_btree_bload *bload, unsigned int free);
+void bulkload_estimate_inode_slack(struct xfs_mount *mp,
+		struct xfs_btree_bload *bload, unsigned long long free);
 
 #endif /* __XFS_REPAIR_BULKLOAD_H__ */
diff --git a/repair/phase6.c b/repair/phase6.c
index 7d2e0554594265..cae9d970481840 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -21,6 +21,8 @@
 #include "repair/pptr.h"
 #include "repair/rt.h"
 #include "repair/quotacheck.h"
+#include "repair/slab.h"
+#include "repair/rmap.h"
 
 static xfs_ino_t		orphanage_ino;
 
@@ -685,6 +687,15 @@ ensure_rtgroup_summary(
 	fill_rtsummary(rtg);
 }
 
+static void
+ensure_rtgroup_rmapbt(
+	struct xfs_rtgroup	*rtg,
+	xfs_filblks_t		est_fdblocks)
+{
+	if (ensure_rtgroup_file(rtg, XFS_RTGI_RMAP))
+		populate_rtgroup_rmapbt(rtg, est_fdblocks);
+}
+
 /* Initialize a root directory. */
 static int
 init_fs_root_dir(
@@ -3365,6 +3376,8 @@ reset_rt_metadir_inodes(
 	struct xfs_mount	*mp)
 {
 	struct xfs_rtgroup	*rtg = NULL;
+	xfs_filblks_t		metadata_blocks = 0;
+	xfs_filblks_t		est_fdblocks = 0;
 	int			error;
 
 	/*
@@ -3386,6 +3399,13 @@ reset_rt_metadir_inodes(
 		mark_ino_metadata(mp, mp->m_rtdirip->i_ino);
 	}
 
+	/* Estimate how much free space will be left after building btrees */
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		metadata_blocks += estimate_rtrmapbt_blocks(rtg);
+
+	if (mp->m_sb.sb_fdblocks > metadata_blocks)
+		est_fdblocks = mp->m_sb.sb_fdblocks - metadata_blocks;
+
 	/*
 	 * This isn't the whole story, but it keeps the message that we've had
 	 * for years and which is expected in xfstests and more.
@@ -3400,6 +3420,7 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 		ensure_rtgroup_bitmap(rtg);
 		ensure_rtgroup_summary(rtg);
+		ensure_rtgroup_rmapbt(rtg, est_fdblocks);
 	}
 }
 
diff --git a/repair/rmap.c b/repair/rmap.c
index a40851b4d0dc69..85a65048db9afc 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1940,3 +1940,29 @@ estimate_refcountbt_blocks(
 	return libxfs_refcountbt_calc_size(mp,
 			slab_count(x->ar_refcount_items));
 }
+
+/* Estimate the size of the ondisk rtrmapbt from the incore tree. */
+xfs_filblks_t
+estimate_rtrmapbt_blocks(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_ag_rmap	*x;
+	unsigned long long	nr_recs;
+
+	if (!rmap_needs_work(mp) || !xfs_has_rtrmapbt(mp))
+		return 0;
+
+	/*
+	 * Overestimate the amount of space needed by pretending that every
+	 * byte in the incore tree is used to store rtrmapbt records.  This
+	 * means we can use SEEK_DATA/HOLE on the xfile, which is faster than
+	 * walking the entire btree.
+	 */
+	x = &rg_rmaps[rtg_rgno(rtg)];
+	if (!rmaps_has_observations(x))
+		return 0;
+
+	nr_recs = xmbuf_bytes(x->ar_xmbtp) / sizeof(struct xfs_rmap_rec);
+	return libxfs_rtrmapbt_calc_size(mp, nr_recs);
+}
diff --git a/repair/rmap.h b/repair/rmap.h
index ebda561e59bc8f..23859bf6c2ad42 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -60,5 +60,6 @@ int rmap_get_mem_rec(struct xfs_btree_cur *rmcur, struct xfs_rmap_irec *irec);
 
 void populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg,
 		xfs_filblks_t est_fdblocks);
+xfs_filblks_t estimate_rtrmapbt_blocks(struct xfs_rtgroup *rtg);
 
 #endif /* RMAP_H_ */
diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c
new file mode 100644
index 00000000000000..2b07e8943e591d
--- /dev/null
+++ b/repair/rtrmap_repair.c
@@ -0,0 +1,265 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2019-2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include <libxfs.h>
+#include "btree.h"
+#include "err_protos.h"
+#include "libxlog.h"
+#include "incore.h"
+#include "globals.h"
+#include "dinode.h"
+#include "slab.h"
+#include "rmap.h"
+#include "bulkload.h"
+
+/* Ported routines from fs/xfs/scrub/rtrmap_repair.c */
+
+/*
+ * Realtime Reverse Mapping (RTRMAPBT) Repair
+ * ==========================================
+ *
+ * Gather all the rmap records for the inode and fork we're fixing, reset the
+ * incore fork, then recreate the btree.
+ */
+struct xrep_rtrmap {
+	struct xfs_btree_cur	*btree_cursor;
+
+	/* New fork. */
+	struct bulkload		new_fork_info;
+	struct xfs_btree_bload	rtrmap_bload;
+
+	struct repair_ctx	*sc;
+	struct xfs_rtgroup	*rtg;
+
+	/* Estimated free space after building all rt btrees */
+	xfs_filblks_t		est_fdblocks;
+};
+
+/* Retrieve rtrmapbt data for bulk load. */
+STATIC int
+xrep_rtrmap_get_records(
+	struct xfs_btree_cur	*cur,
+	unsigned int		idx,
+	struct xfs_btree_block	*block,
+	unsigned int		nr_wanted,
+	void			*priv)
+{
+	struct xrep_rtrmap	*rr = priv;
+	union xfs_btree_rec	*block_rec;
+	unsigned int		loaded;
+	int			ret;
+
+	for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
+		ret = rmap_get_mem_rec(rr->btree_cursor, &cur->bc_rec.r);
+		if (ret < 0)
+			return ret;
+		if (ret == 0)
+			do_error(
+ _("ran out of records while rebuilding rt rmap btree\n"));
+
+		block_rec = libxfs_btree_rec_addr(cur, idx, block);
+		cur->bc_ops->init_rec_from_cur(cur, block_rec);
+	}
+
+	return loaded;
+}
+
+/* Feed one of the new btree blocks to the bulk loader. */
+STATIC int
+xrep_rtrmap_claim_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	void			*priv)
+{
+	struct xrep_rtrmap	*rr = priv;
+
+	return bulkload_claim_block(cur, &rr->new_fork_info, ptr);
+}
+
+/* Figure out how much space we need to create the incore btree root block. */
+STATIC size_t
+xrep_rtrmap_iroot_size(
+	struct xfs_btree_cur	*cur,
+	unsigned int		level,
+	unsigned int		nr_this_level,
+	void			*priv)
+{
+	return xfs_rtrmap_broot_space_calc(cur->bc_mp, level, nr_this_level);
+}
+
+/* Reserve new btree blocks and bulk load all the rtrmap records. */
+STATIC int
+xrep_rtrmap_btree_load(
+	struct xrep_rtrmap	*rr,
+	struct xfs_btree_cur	*rtrmap_cur)
+{
+	struct repair_ctx	*sc = rr->sc;
+	int			error;
+
+	rr->rtrmap_bload.get_records = xrep_rtrmap_get_records;
+	rr->rtrmap_bload.claim_block = xrep_rtrmap_claim_block;
+	rr->rtrmap_bload.iroot_size = xrep_rtrmap_iroot_size;
+	bulkload_estimate_inode_slack(sc->mp, &rr->rtrmap_bload,
+			rr->est_fdblocks);
+
+	/* Compute how many blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(rtrmap_cur,
+			&rr->rtrmap_bload,
+			rmap_record_count(sc->mp, true, rtg_rgno(rr->rtg)));
+	if (error)
+		return error;
+
+	/*
+	 * Guess how many blocks we're going to need to rebuild an entire rtrmap
+	 * from the number of extents we found, and pump up our transaction to
+	 * have sufficient block reservation.
+	 */
+	error = -libxfs_trans_reserve_more(sc->tp, rr->rtrmap_bload.nr_blocks,
+			0);
+	if (error)
+		return error;
+
+	/*
+	 * Reserve the space we'll need for the new btree.  Drop the cursor
+	 * while we do this because that can roll the transaction and cursors
+	 * can't handle that.
+	 */
+	error = bulkload_alloc_file_blocks(&rr->new_fork_info,
+			rr->rtrmap_bload.nr_blocks);
+	if (error)
+		return error;
+
+	/* Add all observed rtrmap records. */
+	error = rmap_init_mem_cursor(rr->sc->mp, sc->tp, true,
+			rtg_rgno(rr->rtg), &rr->btree_cursor);
+	if (error)
+		return error;
+	error = -libxfs_btree_bload(rtrmap_cur, &rr->rtrmap_bload, rr);
+	libxfs_btree_del_cursor(rr->btree_cursor, error);
+	return error;
+}
+
+/* Update the inode counters. */
+STATIC int
+xrep_rtrmap_reset_counters(
+	struct xrep_rtrmap	*rr)
+{
+	struct repair_ctx	*sc = rr->sc;
+
+	/*
+	 * Update the inode block counts to reflect the btree we just
+	 * generated.
+	 */
+	sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
+	libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+
+	/* Quotas don't exist so we're done. */
+	return 0;
+}
+
+/*
+ * Use the collected rmap information to stage a new rt rmap btree.  If this is
+ * successful we'll return with the new btree root information logged to the
+ * repair transaction but not yet committed.
+ */
+static int
+xrep_rtrmap_build_new_tree(
+	struct xrep_rtrmap	*rr)
+{
+	struct xfs_owner_info	oinfo;
+	struct xfs_btree_cur	*cur;
+	struct repair_ctx	*sc = rr->sc;
+	struct xbtree_ifakeroot	*ifake = &rr->new_fork_info.ifake;
+	int			error;
+
+	/*
+	 * Prepare to construct the new fork by initializing the new btree
+	 * structure and creating a fake ifork in the ifakeroot structure.
+	 */
+	libxfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK);
+	bulkload_init_inode(&rr->new_fork_info, sc, XFS_DATA_FORK, &oinfo);
+	cur = libxfs_rtrmapbt_init_cursor(NULL, rr->rtg);
+	libxfs_btree_stage_ifakeroot(cur, ifake);
+
+	/*
+	 * Figure out the size and format of the new fork, then fill it with
+	 * all the rtrmap records we've found.  Join the inode to the
+	 * transaction so that we can roll the transaction while holding the
+	 * inode locked.
+	 */
+	libxfs_trans_ijoin(sc->tp, sc->ip, 0);
+	ifake->if_fork->if_format = XFS_DINODE_FMT_META_BTREE;
+	error = xrep_rtrmap_btree_load(rr, cur);
+	if (error)
+		goto err_cur;
+
+	/*
+	 * Install the new fork in the inode.  After this point the old mapping
+	 * data are no longer accessible and the new tree is live.  We delete
+	 * the cursor immediately after committing the staged root because the
+	 * staged fork might be in extents format.
+	 */
+	libxfs_rtrmapbt_commit_staged_btree(cur, sc->tp);
+	libxfs_btree_del_cursor(cur, 0);
+
+	/* Reset the inode counters now that we've changed the fork. */
+	error = xrep_rtrmap_reset_counters(rr);
+	if (error)
+		goto err_newbt;
+
+	/* Dispose of any unused blocks and the accounting infomation. */
+	error = bulkload_commit(&rr->new_fork_info);
+	if (error)
+		return error;
+
+	return -libxfs_trans_roll_inode(&sc->tp, sc->ip);
+err_cur:
+	if (cur)
+		libxfs_btree_del_cursor(cur, error);
+err_newbt:
+	bulkload_cancel(&rr->new_fork_info);
+	return error;
+}
+
+/* Store the realtime reverse-mappings in the rtrmapbt. */
+void
+populate_rtgroup_rmapbt(
+	struct xfs_rtgroup	*rtg,
+	xfs_filblks_t		est_fdblocks)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_inode	*ip = rtg_rmap(rtg);
+	struct repair_ctx	sc = {
+		.mp		= mp,
+		.ip		= ip,
+	};
+	struct xrep_rtrmap	rr = {
+		.sc		= &sc,
+		.rtg		= rtg,
+		.est_fdblocks	= est_fdblocks,
+	};
+	int			error;
+
+	if (!xfs_has_rtrmapbt(mp))
+		return;
+
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0,
+			&sc.tp);
+	if (error)
+		goto out;
+
+	error = xrep_rtrmap_build_new_tree(&rr);
+	if (error) {
+		libxfs_trans_cancel(sc.tp);
+		goto out;
+	}
+
+	error = -libxfs_trans_commit(sc.tp);
+out:
+	if (error)
+		do_error(
+ _("rtgroup %u rmap btree could not be rebuilt, error %d\n"),
+			rtg_rgno(rtg), error);
+}
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 9509f04685c870..eeaaf643468941 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1385,15 +1385,19 @@ main(int argc, char **argv)
 	rcbagbt_destroy_cur_cache();
 
 	/*
-	 * Done with the block usage maps, toss them...
+	 * Done with the block usage maps, toss them.  Realtime metadata aren't
+	 * rebuilt until phase 6, so we have to keep them around.
 	 */
-	rmaps_free(mp);
+	if (mp->m_sb.sb_rblocks == 0)
+		rmaps_free(mp);
 	free_bmaps(mp);
 
 	if (!bad_ino_btree)  {
 		phase6(mp);
 		phase_end(mp, 6);
 
+		if (mp->m_sb.sb_rblocks != 0)
+			rmaps_free(mp);
 		free_rtgroup_inodes();
 
 		phase7(mp, phase2_threads);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (20 preceding siblings ...)
  2025-02-06 22:55   ` [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree Darrick J. Wong
@ 2025-02-06 22:55   ` Darrick J. Wong
  2025-02-07  5:56     ` Christoph Hellwig
  2025-02-06 22:55   ` [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong
                     ` (4 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/globals.c |    6 +++
 repair/globals.h |    2 +
 repair/phase5.c  |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 repair/phase6.c  |   16 +++++---
 4 files changed, 129 insertions(+), 9 deletions(-)


diff --git a/repair/globals.c b/repair/globals.c
index 99291d6afd61b9..143b4a8beb53f4 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -114,6 +114,12 @@ int		thread_count;
 /* If nonzero, simulate failure after this phase. */
 int		fail_after_phase;
 
+/*
+ * Do we think we're going to be so low on disk space that we need to pack
+ * all rebuilt btree blocks completely full to avoid running out of space?
+ */
+bool		need_packed_btrees;
+
 /* quota inode numbers */
 enum quotino_state {
 	QI_STATE_UNKNOWN,
diff --git a/repair/globals.h b/repair/globals.h
index b23a06af6cc81b..8bb9bbaeca4fb0 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -159,6 +159,8 @@ extern int		fail_after_phase;
 
 extern struct libxfs_init x;
 
+extern bool		need_packed_btrees;
+
 void set_quota_inode(xfs_dqtype_t type, xfs_ino_t);
 void lose_quota_inode(xfs_dqtype_t type);
 void clear_quota_inode(xfs_dqtype_t type);
diff --git a/repair/phase5.c b/repair/phase5.c
index ac5f04697b7110..cacaf74dda3a60 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -481,11 +481,14 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 
 	/*
 	 * Estimate the number of free blocks in this AG after rebuilding
-	 * all btrees.
+	 * all btrees, unless we already decided that we need to pack all
+	 * btree blocks maximally.
 	 */
-	total_btblocks = estimate_agbtree_blocks(pag, num_extents);
-	if (num_freeblocks > total_btblocks)
-		est_agfreeblocks = num_freeblocks - total_btblocks;
+	if (!need_packed_btrees) {
+		total_btblocks = estimate_agbtree_blocks(pag, num_extents);
+		if (num_freeblocks > total_btblocks)
+			est_agfreeblocks = num_freeblocks - total_btblocks;
+	}
 
 	init_ino_cursors(&sc, pag, est_agfreeblocks, &sb_icount_ag[agno],
 			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
@@ -632,6 +635,107 @@ check_rtmetadata(
 	check_rtsummary(mp);
 }
 
+/*
+ * Estimate the amount of free space used by the perag metadata without
+ * building the incore tree.  This is only necessary if realtime btrees are
+ * enabled.
+ */
+static xfs_extlen_t
+estimate_agbtree_blocks_early(
+	struct xfs_perag	*pag,
+	unsigned int		*num_freeblocks)
+{
+	struct xfs_mount	*mp = pag_mount(pag);
+	xfs_agblock_t		agbno;
+	xfs_agblock_t		ag_end;
+	xfs_extlen_t		extent_len;
+	xfs_extlen_t		blen;
+	unsigned int		num_extents = 0;
+	int			bstate;
+	bool			in_extent = false;
+
+	/* Find the number of free space extents. */
+	ag_end = libxfs_ag_block_count(mp, pag_agno(pag));
+	for (agbno = 0; agbno < ag_end; agbno += blen) {
+		bstate = get_bmap_ext(pag_agno(pag), agbno, ag_end, &blen,
+				false);
+		if (bstate < XR_E_INUSE)  {
+			if (!in_extent) {
+				/*
+				 * found the start of a free extent
+				 */
+				in_extent = true;
+				num_extents++;
+				extent_len = blen;
+			} else {
+				extent_len += blen;
+			}
+		} else {
+			if (in_extent)  {
+				/*
+				 * free extent ends here
+				 */
+				in_extent = false;
+				*num_freeblocks += extent_len;
+			}
+		}
+	}
+	if (in_extent)
+		*num_freeblocks += extent_len;
+
+	return estimate_agbtree_blocks(pag, num_extents);
+}
+
+/*
+ * Decide if we need to pack every new btree block completely full to conserve
+ * disk space.  Normally we rebuild btree blocks to be 75% full, but we don't
+ * want to start rebuilding AG btrees that way only to discover that there
+ * isn't enough space left in the data volume to rebuild inode-based btrees.
+ */
+static bool
+are_packed_btrees_needed(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag = NULL;
+	struct xfs_rtgroup	*rtg = NULL;
+	unsigned long long	metadata_blocks = 0;
+	unsigned long long	fdblocks = 0;
+
+	/*
+	 * If we don't have inode-based metadata, we can let the AG btrees
+	 * pack as needed; there are no global space concerns here.
+	 */
+	if (!xfs_has_rtrmapbt(mp))
+		return false;
+
+	while ((pag = xfs_perag_next(mp, pag))) {
+		unsigned int	ag_fdblocks = 0;
+
+		metadata_blocks += estimate_agbtree_blocks_early(pag,
+								 &ag_fdblocks);
+		fdblocks += ag_fdblocks;
+	}
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		metadata_blocks += estimate_rtrmapbt_blocks(rtg);
+
+	/*
+	 * If we think we'll have more metadata blocks than free space, then
+	 * pack the btree blocks.
+	 */
+	if (metadata_blocks > fdblocks)
+		return true;
+
+	/*
+	 * If the amount of free space after building btrees is less than 9%
+	 * of the data volume, pack the btree blocks.
+	 */
+	fdblocks -= metadata_blocks;
+	if (fdblocks < ((mp->m_sb.sb_dblocks * 3) >> 5))
+		return true;
+	return false;
+}
+
 void
 phase5(xfs_mount_t *mp)
 {
@@ -683,6 +787,8 @@ phase5(xfs_mount_t *mp)
 	if (error)
 		do_error(_("cannot alloc lost block bitmap\n"));
 
+	need_packed_btrees = are_packed_btrees_needed(mp);
+
 	while ((pag = xfs_perag_next(mp, pag)))
 		phase5_func(mp, pag, lost_blocks);
 
diff --git a/repair/phase6.c b/repair/phase6.c
index cae9d970481840..2ddfd0526767e0 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3399,12 +3399,18 @@ reset_rt_metadir_inodes(
 		mark_ino_metadata(mp, mp->m_rtdirip->i_ino);
 	}
 
-	/* Estimate how much free space will be left after building btrees */
-	while ((rtg = xfs_rtgroup_next(mp, rtg)))
-		metadata_blocks += estimate_rtrmapbt_blocks(rtg);
+	/*
+	 * Estimate how much free space will be left after building btrees
+	 * unless we already decided that we needed to pack all new blocks
+	 * maximally.
+	 */
+	if (!need_packed_btrees) {
+		while ((rtg = xfs_rtgroup_next(mp, rtg)))
+			metadata_blocks += estimate_rtrmapbt_blocks(rtg);
 
-	if (mp->m_sb.sb_fdblocks > metadata_blocks)
-		est_fdblocks = mp->m_sb.sb_fdblocks - metadata_blocks;
+		if (mp->m_sb.sb_fdblocks > metadata_blocks)
+			est_fdblocks = mp->m_sb.sb_fdblocks - metadata_blocks;
+	}
 
 	/*
 	 * This isn't the whole story, but it keeps the message that we've had


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (21 preceding siblings ...)
  2025-02-06 22:55   ` [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels Darrick J. Wong
@ 2025-02-06 22:55   ` Darrick J. Wong
  2025-02-07  5:56     ` Christoph Hellwig
  2025-02-06 22:55   ` [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong
                     ` (3 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/bmap_repair.c |  109 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 105 insertions(+), 4 deletions(-)


diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c
index 7e7c2a39f5724b..5d1f639be81ff4 100644
--- a/repair/bmap_repair.c
+++ b/repair/bmap_repair.c
@@ -209,6 +209,101 @@ xrep_bmap_scan_ag(
 	return error;
 }
 
+/* Check for any obvious errors or conflicts in the file mapping. */
+STATIC int
+xrep_bmap_check_rtfork_rmap(
+	struct repair_ctx		*sc,
+	struct xfs_btree_cur		*cur,
+	const struct xfs_rmap_irec	*rec)
+{
+	/* xattr extents are never stored on realtime devices */
+	if (rec->rm_flags & XFS_RMAP_ATTR_FORK)
+		return EFSCORRUPTED;
+
+	/* bmbt blocks are never stored on realtime devices */
+	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK)
+		return EFSCORRUPTED;
+
+	/* Data extents for non-rt files are never stored on the rt device. */
+	if (!XFS_IS_REALTIME_INODE(sc->ip))
+		return EFSCORRUPTED;
+
+	/* Check the file offsets and physical extents. */
+	if (!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount))
+		return EFSCORRUPTED;
+
+	/* Check that this fits in the rt volume. */
+	if (!xfs_verify_rgbext(to_rtg(cur->bc_group), rec->rm_startblock,
+				rec->rm_blockcount))
+		return EFSCORRUPTED;
+
+	return 0;
+}
+
+/* Record realtime extents that belong to this inode's fork. */
+STATIC int
+xrep_bmap_walk_rtrmap(
+	struct xfs_btree_cur		*cur,
+	const struct xfs_rmap_irec	*rec,
+	void				*priv)
+{
+	struct xrep_bmap		*rb = priv;
+	int				error = 0;
+
+	/* Skip extents which are not owned by this inode and fork. */
+	if (rec->rm_owner != rb->sc->ip->i_ino)
+		return 0;
+
+	error = xrep_bmap_check_rtfork_rmap(rb->sc, cur, rec);
+	if (error)
+		return error;
+
+	/*
+	 * Record all blocks allocated to this file even if the extent isn't
+	 * for the fork we're rebuilding so that we can reset di_nblocks later.
+	 */
+	rb->nblocks += rec->rm_blockcount;
+
+	/* If this rmap isn't for the fork we want, we're done. */
+	if (rb->whichfork == XFS_DATA_FORK &&
+	    (rec->rm_flags & XFS_RMAP_ATTR_FORK))
+		return 0;
+	if (rb->whichfork == XFS_ATTR_FORK &&
+	    !(rec->rm_flags & XFS_RMAP_ATTR_FORK))
+		return 0;
+
+	return xrep_bmap_from_rmap(rb, rec->rm_offset, rec->rm_startblock,
+			rec->rm_blockcount,
+			rec->rm_flags & XFS_RMAP_UNWRITTEN);
+}
+
+/*
+ * Scan the realtime reverse mappings to build the new extent map.  The rt rmap
+ * inodes must be loaded from disk explicitly here, since we have not yet
+ * validated the metadata directory tree but do not wish to throw away user
+ * data unnecessarily.
+ */
+STATIC int
+xrep_bmap_scan_rt(
+	struct xrep_bmap	*rb,
+	struct xfs_rtgroup	*rtg)
+{
+	struct repair_ctx	*sc = rb->sc;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_inode	*ip = rtg_rmap(rtg);
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	/* failed to load the rtdir inode? */
+	if (!xfs_has_rtrmapbt(mp) || !ip)
+		return ENOENT;
+
+	cur = libxfs_rtrmapbt_init_cursor(sc->tp, rtg);
+	error = -libxfs_rmap_query_all(cur, xrep_bmap_walk_rtrmap, rb);
+	libxfs_btree_del_cursor(cur, error);
+	return error;
+}
+
 /*
  * Collect block mappings for this fork of this inode and decide if we have
  * enough space to rebuild.  Caller is responsible for cleaning up the list if
@@ -219,8 +314,18 @@ xrep_bmap_find_mappings(
 	struct xrep_bmap	*rb)
 {
 	struct xfs_perag	*pag = NULL;
+	struct xfs_rtgroup	*rtg = NULL;
 	int			error;
 
+	/* Iterate the rtrmaps for extents. */
+	while ((rtg = xfs_rtgroup_next(rb->sc->mp, rtg))) {
+		error = xrep_bmap_scan_rt(rb, rtg);
+		if (error) {
+			libxfs_rtgroup_put(rtg);
+			return error;
+		}
+	}
+
 	/* Iterate the rmaps for extents. */
 	while ((pag = xfs_perag_next(rb->sc->mp, pag))) {
 		error = xrep_bmap_scan_ag(rb, pag);
@@ -570,10 +675,6 @@ xrep_bmap_check_inputs(
 		return EINVAL;
 	}
 
-	/* Don't know how to rebuild realtime data forks. */
-	if (XFS_IS_REALTIME_INODE(sc->ip))
-		return EOPNOTSUPP;
-
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (22 preceding siblings ...)
  2025-02-06 22:55   ` [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong
@ 2025-02-06 22:55   ` Darrick J. Wong
  2025-02-07  5:57     ` Christoph Hellwig
  2025-02-06 22:56   ` [PATCH 25/27] xfs_logprint: report realtime RUIs Darrick J. Wong
                     ` (2 subsequent siblings)
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 include/libxfs.h |    1 +
 repair/phase6.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)


diff --git a/include/libxfs.h b/include/libxfs.h
index 79f8e1ff03d3f5..82b34b9d81c3a7 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -101,6 +101,7 @@ struct iomap;
 #include "xfs_rtgroup.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_rtrmap_btree.h"
+#include "xfs_ag_resv.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/repair/phase6.c b/repair/phase6.c
index 2ddfd0526767e0..30ea19fda9fd87 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3540,10 +3540,41 @@ reset_quota_metadir_inodes(
 	libxfs_irele(dp);
 }
 
+static int
+reserve_ag_blocks(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag = NULL;
+	int			error = 0;
+	int			err2;
+
+	mp->m_finobt_nores = false;
+
+	while ((pag = xfs_perag_next(mp, pag))) {
+		err2 = -libxfs_ag_resv_init(pag, NULL);
+		if (err2 && !error)
+			error = err2;
+	}
+
+	return error;
+}
+
+static void
+unreserve_ag_blocks(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag = NULL;
+
+	while ((pag = xfs_perag_next(mp, pag)))
+		libxfs_ag_resv_free(pag);
+}
+
 void
 phase6(xfs_mount_t *mp)
 {
 	ino_tree_node_t		*irec;
+	bool			reserve_perag;
+	int			error;
 	int			i;
 
 	parent_ptr_init(mp);
@@ -3588,6 +3619,17 @@ phase6(xfs_mount_t *mp)
 		do_warn(_("would reinitialize metadata root directory\n"));
 	}
 
+	reserve_perag = xfs_has_realtime(mp) && !no_modify;
+	if (reserve_perag) {
+		error = reserve_ag_blocks(mp);
+		if (error) {
+			if (error != ENOSPC)
+				do_warn(
+	_("could not reserve per-AG space to rebuild realtime metadata"));
+			reserve_perag = false;
+		}
+	}
+
 	if (xfs_has_rtgroups(mp))
 		reset_rt_metadir_inodes(mp);
 	else
@@ -3596,6 +3638,9 @@ phase6(xfs_mount_t *mp)
 	if (xfs_has_metadir(mp) && xfs_has_quota(mp) && !no_modify)
 		reset_quota_metadir_inodes(mp);
 
+	if (reserve_perag)
+		unreserve_ag_blocks(mp);
+
 	mark_standalone_inodes(mp);
 
 	do_log(_("        - traversing filesystem ...\n"));


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 25/27] xfs_logprint: report realtime RUIs
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (23 preceding siblings ...)
  2025-02-06 22:55   ` [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong
@ 2025-02-06 22:56   ` Darrick J. Wong
  2025-02-07  5:58     ` Christoph Hellwig
  2025-02-07  5:58     ` Christoph Hellwig
  2025-02-06 22:56   ` [PATCH 26/27] mkfs: add some rtgroup inode helpers Darrick J. Wong
  2025-02-06 22:56   ` [PATCH 27/27] mkfs: create the realtime rmap inode Darrick J. Wong
  26 siblings, 2 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 logprint/log_misc.c      |    2 ++
 logprint/log_print_all.c |    8 ++++++++
 logprint/log_redo.c      |   24 +++++++++++++++++++-----
 3 files changed, 29 insertions(+), 5 deletions(-)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 8336f26e093310..aaa9598616a308 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -1014,12 +1014,14 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_RUI_RT:
 		    case XFS_LI_RUI: {
 			skip = xlog_print_trans_rui(&ptr,
 					be32_to_cpu(op_head->oh_len),
 					continued);
 			break;
 		    }
+		    case XFS_LI_RUD_RT:
 		    case XFS_LI_RUD: {
 			skip = xlog_print_trans_rud(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index ed5f234975ccad..56e765d64f2df4 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -422,9 +422,11 @@ xlog_recover_print_logitem(
 	case XFS_LI_ATTRI:
 		xlog_recover_print_attri(item);
 		break;
+	case XFS_LI_RUD_RT:
 	case XFS_LI_RUD:
 		xlog_recover_print_rud(item);
 		break;
+	case XFS_LI_RUI_RT:
 	case XFS_LI_RUI:
 		xlog_recover_print_rui(item);
 		break;
@@ -498,6 +500,12 @@ xlog_recover_print_item(
 	case XFS_LI_RUI:
 		printf("RUI");
 		break;
+	case XFS_LI_RUD_RT:
+		printf("RUD_RT");
+		break;
+	case XFS_LI_RUI_RT:
+		printf("RUI_RT");
+		break;
 	case XFS_LI_CUD:
 		printf("CUD");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 41e7c94a52dc21..a0cc558499ae0b 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -274,6 +274,7 @@ xlog_print_trans_rui(
 	uint			src_len,
 	int			continued)
 {
+	const char		*item_name = "RUI?";
 	struct xfs_rui_log_format	*src_f, *f = NULL;
 	uint			dst_len;
 	uint			nextents;
@@ -318,8 +319,14 @@ xlog_print_trans_rui(
 		goto error;
 	}
 
-	printf(_("RUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
-		f->rui_size, f->rui_nextents, (unsigned long long)f->rui_id);
+	switch (f->rui_type) {
+	case XFS_LI_RUI:	item_name = "RUI"; break;
+	case XFS_LI_RUI_RT:	item_name = "RUI_RT"; break;
+	}
+
+	printf(_("%s:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			item_name, f->rui_size, f->rui_nextents,
+			(unsigned long long)f->rui_id);
 
 	if (continued) {
 		printf(_("RUI extent data skipped (CONTINUE set, no space)\n"));
@@ -359,6 +366,7 @@ xlog_print_trans_rud(
 	char				**ptr,
 	uint				len)
 {
+	const char			*item_name = "RUD?";
 	struct xfs_rud_log_format	*f;
 	struct xfs_rud_log_format	lbuf;
 
@@ -371,11 +379,17 @@ xlog_print_trans_rud(
 	 */
 	memmove(&lbuf, *ptr, min(core_size, len));
 	f = &lbuf;
+
+	switch (f->rud_type) {
+	case XFS_LI_RUD:	item_name = "RUD"; break;
+	case XFS_LI_RUD_RT:	item_name = "RUD_RT"; break;
+	}
+
 	*ptr += len;
 	if (len >= core_size) {
-		printf(_("RUD:  #regs: %d	                 id: 0x%llx\n"),
-			f->rud_size,
-			(unsigned long long)f->rud_rui_id);
+		printf(_("%s:  #regs: %d	                 id: 0x%llx\n"),
+				item_name, f->rud_size,
+				(unsigned long long)f->rud_rui_id);
 
 		/* don't print extents as they are not used */
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 26/27] mkfs: add some rtgroup inode helpers
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (24 preceding siblings ...)
  2025-02-06 22:56   ` [PATCH 25/27] xfs_logprint: report realtime RUIs Darrick J. Wong
@ 2025-02-06 22:56   ` Darrick J. Wong
  2025-02-07  5:59     ` Christoph Hellwig
  2025-02-06 22:56   ` [PATCH 27/27] mkfs: create the realtime rmap inode Darrick J. Wong
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes.  Conversion was done with this spatch and some
minor reformatting:

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_BITMAP]
+ rtg_bitmap(rtg)

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_SUMMARY]
+ rtg_summary(rtg)

and the CLI command:

$ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 mkfs/proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 6dd3a2005b15b3..60e5c7d02713d0 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1057,7 +1057,7 @@ rtfreesp_init(
 		if (error)
 			res_failed(error);
 
-		libxfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP], 0);
+		libxfs_trans_ijoin(tp, rtg_bitmap(rtg), 0);
 		error = -libxfs_rtfree_extent(tp, rtg, start_rtx, nr);
 		if (error) {
 			fprintf(stderr,


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 27/27] mkfs: create the realtime rmap inode
  2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
                     ` (25 preceding siblings ...)
  2025-02-06 22:56   ` [PATCH 26/27] mkfs: add some rtgroup inode helpers Darrick J. Wong
@ 2025-02-06 22:56   ` Darrick J. Wong
  2025-02-07  6:00     ` Christoph Hellwig
  26 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/init.c            |    7 ----
 libxfs/libxfs_api_defs.h |    3 ++
 mkfs/proto.c             |   29 +++++++++++++++
 mkfs/xfs_mkfs.c          |   87 +++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 114 insertions(+), 12 deletions(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index 02a4cfdf38b198..f92805620c33f1 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -312,13 +312,6 @@ rtmount_init(
 		return -1;
 	}
 
-	if (xfs_has_rmapbt(mp)) {
-		fprintf(stderr,
-	_("%s: Reverse mapping btree not compatible with realtime device. Please try a newer xfsprogs.\n"),
-				progname);
-		return -1;
-	}
-
 	if (mp->m_rtdev_targp->bt_bdev == 0 && !xfs_is_debugger(mp)) {
 		fprintf(stderr, _("%s: filesystem has a realtime subvolume\n"),
 			progname);
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 193b1eeaa7537e..df24f36f0d2874 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -224,6 +224,8 @@
 
 #define xfs_metafile_iget		libxfs_metafile_iget
 #define xfs_trans_metafile_iget		libxfs_trans_metafile_iget
+#define xfs_metafile_resv_free		libxfs_metafile_resv_free
+#define xfs_metafile_resv_init		libxfs_metafile_resv_init
 #define xfs_metafile_set_iflag		libxfs_metafile_set_iflag
 #define xfs_metadir_cancel		libxfs_metadir_cancel
 #define xfs_metadir_commit		libxfs_metadir_commit
@@ -324,6 +326,7 @@
 #define xfs_rtrmapbt_droot_maxrecs	libxfs_rtrmapbt_droot_maxrecs
 #define xfs_rtrmapbt_maxlevels_ondisk	libxfs_rtrmapbt_maxlevels_ondisk
 #define xfs_rtrmapbt_init_cursor	libxfs_rtrmapbt_init_cursor
+#define xfs_rtrmapbt_init_rtsb		libxfs_rtrmapbt_init_rtsb
 #define xfs_rtrmapbt_maxrecs		libxfs_rtrmapbt_maxrecs
 #define xfs_rtrmapbt_mem_init		libxfs_rtrmapbt_mem_init
 #define xfs_rtrmapbt_mem_cursor		libxfs_rtrmapbt_mem_cursor
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 60e5c7d02713d0..2c453480271666 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1095,6 +1095,28 @@ rtinit_nogroups(
 	}
 }
 
+static int
+init_rtrmap_for_rtsb(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_trans	*tp;
+	int			error;
+
+	error = -libxfs_trans_alloc_inode(rtg_rmap(rtg),
+			&M_RES(mp)->tr_itruncate, 0, 0, false, &tp);
+	if (error)
+		return error;
+
+	error = -libxfs_rtrmapbt_init_rtsb(mp, rtg, tp);
+	if (error) {
+		libxfs_trans_cancel(tp);
+		return error;
+	}
+
+	return -libxfs_trans_commit(tp);
+}
+
 static void
 rtinit_groups(
 	struct xfs_mount	*mp)
@@ -1115,6 +1137,13 @@ rtinit_groups(
 						error);
 		}
 
+		if (xfs_has_rtsb(mp) && xfs_has_rtrmapbt(mp) &&
+		    rtg_rgno(rtg) == 0) {
+			error = init_rtrmap_for_rtsb(rtg);
+			if (error)
+				fail(_("rtrmap rtsb init failed"), error);
+		}
+
 		rtfreesp_init(rtg);
 	}
 }
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index f5556fcc4040ed..c8042261328171 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2678,12 +2678,18 @@ _("reflink not supported with realtime devices\n"));
 		}
 		cli->sb_feat.reflink = false;
 
-		if (cli->sb_feat.rmapbt && cli_opt_set(&mopts, M_RMAPBT)) {
-			fprintf(stderr,
-_("rmapbt not supported with realtime devices\n"));
-			usage();
+		if (!cli->sb_feat.metadir && cli->sb_feat.rmapbt) {
+			if (cli_opt_set(&mopts, M_RMAPBT) &&
+			    cli_opt_set(&mopts, M_METADIR)) {
+				fprintf(stderr,
+_("rmapbt not supported on realtime devices without metadir feature\n"));
+				usage();
+			} else if (cli_opt_set(&mopts, M_RMAPBT)) {
+				cli->sb_feat.metadir = true;
+			} else {
+				cli->sb_feat.rmapbt = false;
+			}
 		}
-		cli->sb_feat.rmapbt = false;
 	}
 
 	if ((cli->fsx.fsx_xflags & FS_XFLAG_COWEXTSIZE) &&
@@ -5006,6 +5012,74 @@ write_rtsb(
 	libxfs_buf_relse(sb_bp);
 }
 
+static inline void
+prealloc_fail(
+	struct xfs_mount	*mp,
+	int			error,
+	xfs_filblks_t		ask,
+	const char		*tag)
+{
+	if (error == ENOSPC)
+		fprintf(stderr,
+	_("%s: cannot handle expansion of %s; need %llu free blocks, have %llu\n"),
+				progname, tag, (unsigned long long)ask,
+				(unsigned long long)mp->m_sb.sb_fdblocks);
+	else
+		fprintf(stderr,
+	_("%s: error %d while checking free space for %s\n"),
+				progname, error, tag);
+	exit(1);
+}
+
+/*
+ * Make sure there's enough space on the data device to handle realtime
+ * metadata btree expansions.
+ */
+static void
+check_rt_meta_prealloc(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag = NULL;
+	struct xfs_rtgroup	*rtg = NULL;
+	xfs_filblks_t		ask;
+	int			error;
+
+	/*
+	 * First create all the per-AG reservations, since they take from the
+	 * free block count.  Each AG should start with enough free space for
+	 * the per-AG reservation.
+	 */
+	mp->m_finobt_nores = false;
+
+	while ((pag = xfs_perag_next(mp, pag))) {
+		error = -libxfs_ag_resv_init(pag, NULL);
+		if (error && error != ENOSPC) {
+			fprintf(stderr,
+	_("%s: error %d while checking AG free space for realtime metadata\n"),
+					progname, error);
+			exit(1);
+		}
+	}
+
+	/* Realtime metadata btree inode */
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		ask = libxfs_rtrmapbt_calc_reserves(mp);
+		error = -libxfs_metafile_resv_init(rtg_rmap(rtg), ask);
+		if (error)
+			prealloc_fail(mp, error, ask, _("realtime rmap btree"));
+	}
+
+	/* Unreserve the realtime metadata reservations. */
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		libxfs_metafile_resv_free(rtg_rmap(rtg));
+
+	/* Unreserve the per-AG reservations. */
+	while ((pag = xfs_perag_next(mp, pag)))
+		libxfs_ag_resv_free(pag);
+
+	mp->m_finobt_nores = false;
+}
+
 int
 main(
 	int			argc,
@@ -5343,6 +5417,9 @@ main(
 	 */
 	check_root_ino(mp);
 
+	/* Make sure we can handle space preallocations of rt metadata btrees */
+	check_rt_meta_prealloc(mp);
+
 	/*
 	 * Re-write multiple secondary superblocks with rootinode field set
 	 */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
@ 2025-02-06 22:57   ` Darrick J. Wong
  2025-02-13  4:20     ` Christoph Hellwig
  2025-02-06 22:57   ` [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
                     ` (20 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Compute max rt refcount btree height information when we set up libxfs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/init.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index f92805620c33f1..00bd46a6013e28 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -597,7 +597,8 @@ static inline void
 xfs_rtbtree_compute_maxlevels(
 	struct xfs_mount	*mp)
 {
-	mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels;
+	mp->m_rtbtree_maxlevels = max(mp->m_rtrmap_maxlevels,
+				      mp->m_rtrefc_maxlevels);
 }
 
 /* Compute maximum possible height of all btrees. */
@@ -615,6 +616,7 @@ libxfs_compute_all_maxlevels(
 	xfs_rmapbt_compute_maxlevels(mp);
 	xfs_rtrmapbt_compute_maxlevels(mp);
 	xfs_refcountbt_compute_maxlevels(mp);
+	xfs_rtrefcountbt_compute_maxlevels(mp);
 
 	xfs_agbtree_compute_maxlevels(mp);
 	xfs_rtbtree_compute_maxlevels(mp);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
  2025-02-06 22:57   ` [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization Darrick J. Wong
@ 2025-02-06 22:57   ` Darrick J. Wong
  2025-02-13  4:20     ` Christoph Hellwig
  2025-02-06 22:57   ` [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
                     ` (19 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/defer_item.c |   34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 1ca9ce15ecfe3d..6beefa6a439980 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -442,9 +442,18 @@ xfs_refcount_defer_add(
 
 	trace_xfs_refcount_defer(mp, ri);
 
+	/*
+	 * Deferred refcount updates for the realtime and data sections must
+	 * use separate transactions to finish deferred work because updates to
+	 * realtime metadata files can lock AGFs to allocate btree blocks and
+	 * we don't want that mixing with the AGF locks taken to finish data
+	 * section updates.
+	 */
 	ri->ri_group = xfs_group_intent_get(mp, ri->ri_startblock,
-			XG_TYPE_AG);
-	xfs_defer_add(tp, &ri->ri_list, &xfs_refcount_update_defer_type);
+			ri->ri_realtime ? XG_TYPE_RTG : XG_TYPE_AG);
+	xfs_defer_add(tp, &ri->ri_list, ri->ri_realtime ?
+			&xfs_rtrefcount_update_defer_type :
+			&xfs_refcount_update_defer_type);
 }
 
 /* Cancel a deferred refcount update. */
@@ -516,6 +525,27 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
 	.cancel_item	= xfs_refcount_update_cancel_item,
 };
 
+/* Clean up after calling xfs_rtrefcount_finish_one. */
+STATIC void
+xfs_rtrefcount_finish_one_cleanup(
+	struct xfs_trans	*tp,
+	struct xfs_btree_cur	*rcur,
+	int			error)
+{
+	if (rcur)
+		xfs_btree_del_cursor(rcur, error);
+}
+
+const struct xfs_defer_op_type xfs_rtrefcount_update_defer_type = {
+	.name		= "rtrefcount",
+	.create_intent	= xfs_refcount_update_create_intent,
+	.abort_intent	= xfs_refcount_update_abort_intent,
+	.create_done	= xfs_refcount_update_create_done,
+	.finish_item	= xfs_refcount_update_finish_item,
+	.finish_cleanup = xfs_rtrefcount_finish_one_cleanup,
+	.cancel_item	= xfs_refcount_update_cancel_item,
+};
+
 /* Inode Block Mapping */
 
 static inline struct xfs_bmap_intent *bi_entry(const struct list_head *e)


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
  2025-02-06 22:57   ` [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization Darrick J. Wong
  2025-02-06 22:57   ` [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
@ 2025-02-06 22:57   ` Darrick J. Wong
  2025-02-13  4:21     ` Christoph Hellwig
  2025-02-06 22:57   ` [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong
                     ` (18 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/logitem.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/libxfs/logitem.c b/libxfs/logitem.c
index 7757259dfc5e42..062d6311b942ae 100644
--- a/libxfs/logitem.c
+++ b/libxfs/logitem.c
@@ -233,6 +233,20 @@ xfs_inode_item_precommit(
 	if (flags & XFS_ILOG_IVERSION)
 		flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE);
 
+	/*
+	 * Inode verifiers do not check that the CoW extent size hint is an
+	 * integer multiple of the rt extent size on a directory with both
+	 * rtinherit and cowextsize flags set.  If we're logging a directory
+	 * that is misconfigured in this way, clear the hint.
+	 */
+	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
+	    (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+	    xfs_extlen_to_rtxmod(ip->i_mount, ip->i_cowextsize) > 0) {
+		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		ip->i_cowextsize = 0;
+		flags |= XFS_ILOG_CORE;
+	}
+
 	if (!iip->ili_item.li_buf) {
 		struct xfs_buf	*bp;
 		int		error;


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-02-06 22:57   ` [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
@ 2025-02-06 22:57   ` Darrick J. Wong
  2025-02-13  4:21     ` Christoph Hellwig
  2025-02-06 22:58   ` [PATCH 05/22] man: document userspace API changes due to rt reflink Darrick J. Wong
                     ` (17 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new entry so that we can scrub the rtrefcountbt and its metadata
directory tree path.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libfrog/scrub.c |   10 ++++++++++
 scrub/repair.c  |    1 +
 2 files changed, 11 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index 11d467666d6b5a..9ffe4698e864ab 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -169,6 +169,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 		.descr	= "realtime reverse mapping btree",
 		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
+	[XFS_SCRUB_TYPE_RTREFCBT] = {
+		.name	= "rtrefcountbt",
+		.descr	= "realtime reference count btree",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
@@ -217,6 +222,11 @@ const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
 		.descr	= "rtgroup rmap btree",
 		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
+	[XFS_SCRUB_METAPATH_RTREFCOUNTBT] = {
+		.name	= "rtrefcbt",
+		.descr	= "rtgroup refcount btree",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 /* Invoke the scrub ioctl.  Returns zero or negative error code. */
diff --git a/scrub/repair.c b/scrub/repair.c
index e6906cbd37d0b3..b2c4232fe8cea1 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -548,6 +548,7 @@ repair_item_difficulty(
 		case XFS_SCRUB_TYPE_RTBITMAP:
 		case XFS_SCRUB_TYPE_RTSUM:
 		case XFS_SCRUB_TYPE_RGSUPER:
+		case XFS_SCRUB_TYPE_RTREFCBT:
 			ret |= REPAIR_DIFFICULTY_PRIMARY;
 			break;
 		}


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 05/22] man: document userspace API changes due to rt reflink
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (3 preceding siblings ...)
  2025-02-06 22:57   ` [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong
@ 2025-02-06 22:58   ` Darrick J. Wong
  2025-02-13  4:21     ` Christoph Hellwig
  2025-02-06 22:58   ` [PATCH 06/22] xfs_db: display the realtime refcount btree contents Darrick J. Wong
                     ` (16 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update documentation to describe userspace ABI changes made for realtime
reflink support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 man/man2/ioctl_xfs_scrub_metadata.2 |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)


diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index f06bb98de708a4..c72f1c5c568b01 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -181,11 +181,12 @@ .SH DESCRIPTION
 .nf
 .B XFS_SCRUB_TYPE_RTBITMAP
 .B XFS_SCRUB_TYPE_RTSUM
-.fi
-.TP
 .B XFS_SCRUB_TYPE_RTRMAPBT
+.fi
+.TP
+.B XFS_SCRUB_TYPE_RTREFCBT
 Examine a given realtime allocation group's free space bitmap, summary file,
-or reverse mapping btree, respectively.
+reverse mapping btree, or reference count btree, respectively.
 
 .PP
 .nf
@@ -250,6 +251,9 @@ .SH DESCRIPTION
 .TP
 .B XFS_SCRUB_METAPATH_RTRMAPBT
 Realtime rmap btree file.
+.TP
+.B XFS_SCRUB_METAPATH_RTREFCOUNTBT
+Realtime reference count btree file.
 .RE
 
 The values of


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 06/22] xfs_db: display the realtime refcount btree contents
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (4 preceding siblings ...)
  2025-02-06 22:58   ` [PATCH 05/22] man: document userspace API changes due to rt reflink Darrick J. Wong
@ 2025-02-06 22:58   ` Darrick J. Wong
  2025-02-13  4:22     ` Christoph Hellwig
  2025-02-06 22:58   ` [PATCH 07/22] xfs_db: support the realtime refcountbt Darrick J. Wong
                     ` (15 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Implement all the code we need to dump rtrefcountbt contents, starting
from the inode root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/bmroot.c              |  148 ++++++++++++++++++++++++++++++++++++++++++++++
 db/bmroot.h              |    2 +
 db/btblock.c             |   50 ++++++++++++++++
 db/btblock.h             |    5 ++
 db/field.c               |   15 +++++
 db/field.h               |    6 ++
 db/inode.c               |   22 +++++++
 db/type.c                |    5 ++
 db/type.h                |    1 
 libxfs/libxfs_api_defs.h |    4 +
 man/man8/xfs_db.8        |   48 +++++++++++++++
 11 files changed, 304 insertions(+), 2 deletions(-)


diff --git a/db/bmroot.c b/db/bmroot.c
index 19490bd24998c5..cb334aa45837d1 100644
--- a/db/bmroot.c
+++ b/db/bmroot.c
@@ -31,6 +31,13 @@ static int	rtrmaproot_key_offset(void *obj, int startoff, int idx);
 static int	rtrmaproot_ptr_count(void *obj, int startoff);
 static int	rtrmaproot_ptr_offset(void *obj, int startoff, int idx);
 
+static int	rtrefcroot_rec_count(void *obj, int startoff);
+static int	rtrefcroot_rec_offset(void *obj, int startoff, int idx);
+static int	rtrefcroot_key_count(void *obj, int startoff);
+static int	rtrefcroot_key_offset(void *obj, int startoff, int idx);
+static int	rtrefcroot_ptr_count(void *obj, int startoff);
+static int	rtrefcroot_ptr_offset(void *obj, int startoff, int idx);
+
 #define	OFF(f)	bitize(offsetof(xfs_bmdr_block_t, bb_ ## f))
 const field_t	bmroota_flds[] = {
 	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
@@ -73,6 +80,19 @@ const field_t	rtrmaproot_flds[] = {
 	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT },
 	{ NULL }
 };
+
+/* realtime refcount btree root */
+const field_t	rtrefcroot_flds[] = {
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_RTREFCBTREC, rtrefcroot_rec_offset, rtrefcroot_rec_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_RTREFCBTKEY, rtrefcroot_key_offset, rtrefcroot_key_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_RTREFCBTPTR, rtrefcroot_ptr_offset, rtrefcroot_ptr_count,
+	  FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTREFCBT },
+	{ NULL }
+};
 #undef OFF
 
 static int
@@ -390,3 +410,131 @@ rtrmaproot_size(
 	dip = obj;
 	return bitize((int)XFS_DFORK_DSIZE(dip, mp));
 }
+
+/* realtime refcount root */
+static int
+rtrefcroot_rec_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_rtrefcount_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode		*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) > 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrefcroot_rec_offset(
+	void				*obj,
+	int				startoff,
+	int				idx)
+{
+	struct xfs_rtrefcount_root	*block;
+	struct xfs_refcount_rec		*kp;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) == 0);
+	kp = xfs_rtrefcount_droot_rec_addr(block, idx);
+	return bitize((int)((char *)kp - (char *)block));
+}
+
+static int
+rtrefcroot_key_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_rtrefcount_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode		*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) == 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrefcroot_key_offset(
+	void				*obj,
+	int				startoff,
+	int				idx)
+{
+	struct xfs_rtrefcount_root	*block;
+	struct xfs_refcount_key		*kp;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) > 0);
+	kp = xfs_rtrefcount_droot_key_addr(block, idx);
+	return bitize((int)((char *)kp - (char *)block));
+}
+
+static int
+rtrefcroot_ptr_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_rtrefcount_root	*block;
+#ifdef DEBUG
+	struct xfs_dinode		*dip = obj;
+#endif
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT((char *)block == XFS_DFORK_DPTR(dip));
+	if (be16_to_cpu(block->bb_level) == 0)
+		return 0;
+	return be16_to_cpu(block->bb_numrecs);
+}
+
+static int
+rtrefcroot_ptr_offset(
+	void				*obj,
+	int				startoff,
+	int				idx)
+{
+	struct xfs_rtrefcount_root	*block;
+	xfs_rtrefcount_ptr_t		*pp;
+	struct xfs_dinode		*dip;
+	int				dmxr;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	dip = obj;
+	block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff));
+	ASSERT(be16_to_cpu(block->bb_level) > 0);
+	dmxr = libxfs_rtrefcountbt_droot_maxrecs(XFS_DFORK_DSIZE(dip, mp), false);
+	pp = xfs_rtrefcount_droot_ptr_addr(block, idx, dmxr);
+	return bitize((int)((char *)pp - (char *)block));
+}
+
+int
+rtrefcroot_size(
+	void			*obj,
+	int			startoff,
+	int			idx)
+{
+	struct xfs_dinode	*dip;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	ASSERT(idx == 0);
+	dip = obj;
+	return bitize((int)XFS_DFORK_DSIZE(dip, mp));
+}
diff --git a/db/bmroot.h b/db/bmroot.h
index a2c5cfb18f0bb2..70bc8483cd85bc 100644
--- a/db/bmroot.h
+++ b/db/bmroot.h
@@ -9,7 +9,9 @@ extern const struct field	bmroota_key_flds[];
 extern const struct field	bmrootd_flds[];
 extern const struct field	bmrootd_key_flds[];
 extern const struct field	rtrmaproot_flds[];
+extern const struct field	rtrefcroot_flds[];
 
 extern int	bmroota_size(void *obj, int startoff, int idx);
 extern int	bmrootd_size(void *obj, int startoff, int idx);
 extern int	rtrmaproot_size(void *obj, int startoff, int idx);
+extern int	rtrefcroot_size(void *obj, int startoff, int idx);
diff --git a/db/btblock.c b/db/btblock.c
index 70f6c3f6aedde5..40913a094375aa 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -104,6 +104,12 @@ static struct xfs_db_btree {
 		sizeof(struct xfs_refcount_rec),
 		sizeof(__be32),
 	},
+	{	XFS_RTREFC_CRC_MAGIC,
+		XFS_BTREE_LBLOCK_CRC_LEN,
+		sizeof(struct xfs_refcount_key),
+		sizeof(struct xfs_refcount_rec),
+		sizeof(__be64),
+	},
 	{	0,
 	},
 };
@@ -962,3 +968,47 @@ const field_t	refcbt_rec_flds[] = {
 	{ NULL }
 };
 #undef ROFF
+
+/* realtime refcount btree blocks */
+const field_t	rtrefcbt_crc_hfld[] = {
+	{ "", FLDT_RTREFCBT_CRC, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+#define	OFF(f)	bitize(offsetof(struct xfs_btree_block, bb_ ## f))
+const field_t	rtrefcbt_crc_flds[] = {
+	{ "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE },
+	{ "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE },
+	{ "leftsib", FLDT_DFSBNO, OI(OFF(u.l.bb_leftsib)), C1, 0, TYP_RTREFCBT },
+	{ "rightsib", FLDT_DFSBNO, OI(OFF(u.l.bb_rightsib)), C1, 0, TYP_RTREFCBT },
+	{ "bno", FLDT_DFSBNO, OI(OFF(u.l.bb_blkno)), C1, 0, TYP_REFCBT },
+	{ "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE },
+	{ "recs", FLDT_RTREFCBTREC, btblock_rec_offset, btblock_rec_count,
+	  FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_NONE },
+	{ "keys", FLDT_RTREFCBTKEY, btblock_key_offset, btblock_key_count,
+	  FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_NONE },
+	{ "ptrs", FLDT_RTREFCBTPTR, btblock_ptr_offset, btblock_key_count,
+	  FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_RTREFCBT },
+	{ NULL }
+};
+#undef OFF
+
+const field_t	rtrefcbt_key_flds[] = {
+	{ "startblock", FLDT_CRGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA },
+	{ NULL }
+};
+
+#define	ROFF(f)	bitize(offsetof(struct xfs_refcount_rec, rc_ ## f))
+const field_t	rtrefcbt_rec_flds[] = {
+	{ "startblock", FLDT_CRGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA },
+	{ "blockcount", FLDT_EXTLEN, OI(ROFF(blockcount)), C1, 0, TYP_NONE },
+	{ "refcount", FLDT_UINT32D, OI(ROFF(refcount)), C1, 0, TYP_DATA },
+	{ "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA },
+	{ NULL }
+};
+#undef ROFF
diff --git a/db/btblock.h b/db/btblock.h
index b4013ea8073ec6..5bbe857a7effd0 100644
--- a/db/btblock.h
+++ b/db/btblock.h
@@ -63,4 +63,9 @@ extern const struct field	refcbt_crc_hfld[];
 extern const struct field	refcbt_key_flds[];
 extern const struct field	refcbt_rec_flds[];
 
+extern const struct field	rtrefcbt_crc_flds[];
+extern const struct field	rtrefcbt_crc_hfld[];
+extern const struct field	rtrefcbt_key_flds[];
+extern const struct field	rtrefcbt_rec_flds[];
+
 extern int	btblock_size(void *obj, int startoff, int idx);
diff --git a/db/field.c b/db/field.c
index 60c4e16d781f48..62c983fc3d4938 100644
--- a/db/field.c
+++ b/db/field.c
@@ -214,6 +214,21 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_REFCBTREC, "refcntbtrec", fp_sarray, (char *)refcbt_rec_flds,
 	  SI(bitsz(struct xfs_refcount_rec)), 0, NULL, refcbt_rec_flds },
 
+	{ FLDT_CRGBLOCK, "crgblock", fp_num, "%u", SI(REFCNTBT_AGBLOCK_BITLEN),
+	  FTARG_DONULL, NULL, NULL },
+	{ FLDT_RTREFCBT_CRC, "rtrefcntbt", NULL, (char *)rtrefcbt_crc_flds,
+	  btblock_size, FTARG_SIZE, NULL, rtrefcbt_crc_flds },
+	{ FLDT_RTREFCBTKEY, "rtrefcntbtkey", fp_sarray,
+	  (char *)rtrefcbt_key_flds, SI(bitsz(struct xfs_refcount_key)), 0,
+	  NULL, rtrefcbt_key_flds },
+	{ FLDT_RTREFCBTPTR, "rtrefcntbtptr", fp_num, "%u",
+	  SI(bitsz(xfs_rtrefcount_ptr_t)), 0, fa_dfsbno, NULL },
+	{ FLDT_RTREFCBTREC, "rtrefcntbtrec", fp_sarray,
+	  (char *)rtrefcbt_rec_flds, SI(bitsz(struct xfs_refcount_rec)), 0,
+	  NULL, rtrefcbt_rec_flds },
+	{ FLDT_RTREFCROOT, "rtrefcroot", NULL, (char *)rtrefcroot_flds,
+	  rtrefcroot_size, FTARG_SIZE, NULL, rtrefcroot_flds },
+
 /* CRC field */
 	{ FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(uint32_t)),
 	  0, NULL, NULL },
diff --git a/db/field.h b/db/field.h
index 67b6cb2a798719..27df5a0fcd1bdd 100644
--- a/db/field.h
+++ b/db/field.h
@@ -93,6 +93,12 @@ typedef enum fldt	{
 	FLDT_REFCBTKEY,
 	FLDT_REFCBTPTR,
 	FLDT_REFCBTREC,
+	FLDT_CRGBLOCK,
+	FLDT_RTREFCBT_CRC,
+	FLDT_RTREFCBTKEY,
+	FLDT_RTREFCBTPTR,
+	FLDT_RTREFCBTREC,
+	FLDT_RTREFCROOT,
 
 	/* CRC field type */
 	FLDT_CRC,
diff --git a/db/inode.c b/db/inode.c
index 45368a3343a17a..46f83f18f083ce 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -49,6 +49,7 @@ static int	inode_u_sfdir2_count(void *obj, int startoff);
 static int	inode_u_sfdir3_count(void *obj, int startoff);
 static int	inode_u_symlink_count(void *obj, int startoff);
 static int	inode_u_rtrmapbt_count(void *obj, int startoff);
+static int	inode_u_rtrefcbt_count(void *obj, int startoff);
 
 static const cmdinfo_t	inode_cmd =
 	{ "inode", NULL, inode_f, 0, 1, 1, "[inode#]",
@@ -236,6 +237,8 @@ const field_t	inode_u_flds[] = {
 	  TYP_NONE },
 	{ "rtrmapbt", FLDT_RTRMAPROOT, NULL, inode_u_rtrmapbt_count, FLD_COUNT,
 	  TYP_NONE },
+	{ "rtrefcbt", FLDT_RTREFCROOT, NULL, inode_u_rtrefcbt_count, FLD_COUNT,
+	  TYP_NONE },
 	{ NULL }
 };
 
@@ -302,7 +305,7 @@ fp_dinode_fmt(
 
 static const char	*metatype_name[] =
 	{ "unknown", "dir", "usrquota", "grpquota", "prjquota", "rtbitmap",
-	  "rtsummary", "rtrmap"
+	  "rtsummary", "rtrmap", "rtrefcount"
 	};
 static const int	metatype_name_size = ARRAY_SIZE(metatype_name);
 
@@ -722,6 +725,8 @@ inode_next_type(void)
 				return TYP_RGSUMMARY;
 			case XFS_METAFILE_RTRMAP:
 				return TYP_RTRMAPBT;
+			case XFS_METAFILE_RTREFCOUNT:
+				return TYP_RTREFCBT;
 			default:
 				return TYP_DATA;
 			}
@@ -886,6 +891,21 @@ inode_u_rtrmapbt_count(
 	       dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP);
 }
 
+static int
+inode_u_rtrefcbt_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_dinode	*dip;
+
+	ASSERT(bitoffs(startoff) == 0);
+	ASSERT(obj == iocur_top->data);
+	dip = obj;
+	ASSERT((char *)XFS_DFORK_DPTR(dip) - (char *)dip == byteize(startoff));
+	return dip->di_format == XFS_DINODE_FMT_META_BTREE &&
+	       dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTREFCOUNT);
+}
+
 int
 inode_u_size(
 	void			*obj,
diff --git a/db/type.c b/db/type.c
index 1dfc33ffb44b87..324f416a49cc4d 100644
--- a/db/type.c
+++ b/db/type.c
@@ -53,6 +53,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_RMAPBT, NULL },
 	{ TYP_RTRMAPBT, NULL },
 	{ TYP_REFCBT, NULL },
+	{ TYP_RTREFCBT, NULL },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL, TYP_F_NO_CRC_OFF },
@@ -96,6 +97,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RTREFCBT, "rtrefcntbt", handle_struct, rtrefcbt_crc_hfld,
+		&xfs_rtrefcountbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc },
@@ -148,6 +151,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld,
 		&xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RTREFCBT, "rtrefcntbt", handle_struct, rtrefcbt_crc_hfld,
+		&xfs_rtrefcountbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF },
 	{ TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_DIR2, "dir3", handle_struct, dir3_hfld,
 		&xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc },
diff --git a/db/type.h b/db/type.h
index c98f3640202e87..a2488a663dbd79 100644
--- a/db/type.h
+++ b/db/type.h
@@ -22,6 +22,7 @@ typedef enum typnm
 	TYP_RMAPBT,
 	TYP_RTRMAPBT,
 	TYP_REFCBT,
+	TYP_RTREFCBT,
 	TYP_DATA,
 	TYP_DIR2,
 	TYP_DQBLK,
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index df24f36f0d2874..167df04df8fb1b 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -319,6 +319,10 @@
 #define xfs_update_rtsb			libxfs_update_rtsb
 #define xfs_rtgroup_get			libxfs_rtgroup_get
 #define xfs_rtgroup_put			libxfs_rtgroup_put
+
+#define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
+#define xfs_rtrefcountbt_maxrecs	libxfs_rtrefcountbt_maxrecs
+
 #define xfs_rtrmapbt_calc_reserves	libxfs_rtrmapbt_calc_reserves
 #define xfs_rtrmapbt_calc_size		libxfs_rtrmapbt_calc_size
 #define xfs_rtrmapbt_commit_staged_btree	libxfs_rtrmapbt_commit_staged_btree
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 784aa3cb46c588..8326bddcef8378 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1308,7 +1308,7 @@ .SH COMMANDS
 .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd ,
 .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk ,
 .BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap ,
-.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", and " text .
+.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", " rtrefcbt ", and " text .
 See the TYPES section below for more information on these data types.
 .TP
 .BI "timelimit [" OPTIONS ]
@@ -2402,6 +2402,52 @@ .SH TYPES
 .PD
 .RE
 .TP
+.B rtrefcbt
+There is one reference count Btree for the entire realtime device.  The
+.BR startblock " and "
+.B blockcount
+fields are 32 bits wide and record block counts within a realtime group.
+The root of this Btree is the realtime refcount inode, which is recorded in the
+metadata directory.
+Blocks are linked to sibling left and right blocks at each level, as well as by
+pointers from parent to child blocks.
+Each block has the following fields:
+.RS 1.4i
+.PD 0
+.TP 1.2i
+.B magic
+RTREFC block magic number, 0x52434e54 ('RCNT').
+.TP
+.B level
+level number of this block, 0 is a leaf.
+.TP
+.B numrecs
+number of data entries in the block.
+.TP
+.B leftsib
+left (logically lower) sibling block, 0 if none.
+.TP
+.B rightsib
+right (logically higher) sibling block, 0 if none.
+.TP
+.B recs
+[leaf blocks only] array of reference count records. Each record contains
+.BR startblock ,
+.BR blockcount ,
+and
+.BR refcount .
+.TP
+.B keys
+[non-leaf blocks only] array of key records. These are the first value
+of each block in the level below this one. Each record contains
+.BR startblock .
+.TP
+.B ptrs
+[non-leaf blocks only] array of child block pointers. Each pointer is a
+block number within the allocation group to the next level in the Btree.
+.PD
+.RE
+.TP
 .B rmapbt
 There is one set of filesystem blocks forming the reverse mapping Btree for
 each allocation group. The root block of this Btree is designated by the


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 07/22] xfs_db: support the realtime refcountbt
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (5 preceding siblings ...)
  2025-02-06 22:58   ` [PATCH 06/22] xfs_db: display the realtime refcount btree contents Darrick J. Wong
@ 2025-02-06 22:58   ` Darrick J. Wong
  2025-02-13  4:22     ` Christoph Hellwig
  2025-02-06 22:58   ` [PATCH 08/22] xfs_db: copy the realtime refcount btree Darrick J. Wong
                     ` (14 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Wire up various parts of xfs_db for realtime refcount support so that we
can dump the rt refcount btree contents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/btblock.c             |    3 +++
 db/btdump.c              |    8 ++++++++
 db/btheight.c            |    5 +++++
 libxfs/libxfs_api_defs.h |    1 +
 man/man8/xfs_db.8        |    1 +
 5 files changed, 18 insertions(+)


diff --git a/db/btblock.c b/db/btblock.c
index 40913a094375aa..e9e5d2f86b9f01 100644
--- a/db/btblock.c
+++ b/db/btblock.c
@@ -159,6 +159,9 @@ block_to_bt(
 	case TYP_REFCBT:
 		magic = crc ? XFS_REFC_CRC_MAGIC : 0;
 		break;
+	case TYP_RTREFCBT:
+		magic = crc ? XFS_RTREFC_CRC_MAGIC : 0;
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/db/btdump.c b/db/btdump.c
index 55301d25de10cd..29ff28f1639977 100644
--- a/db/btdump.c
+++ b/db/btdump.c
@@ -456,6 +456,7 @@ dump_btree_inode(
 	char			*prefix;
 	struct xfs_dinode	*dip = iocur_top->data;
 	struct xfs_rtrmap_root	*rtrmap;
+	struct xfs_rtrefcount_root *rtrefc;
 	int			level;
 	int			numrecs;
 	int			ret;
@@ -467,6 +468,12 @@ dump_btree_inode(
 		level = be16_to_cpu(rtrmap->bb_level);
 		numrecs = be16_to_cpu(rtrmap->bb_numrecs);
 		break;
+	case XFS_METAFILE_RTREFCOUNT:
+		prefix = "u3.rtrefcbt";
+		rtrefc = (struct xfs_rtrefcount_root *)XFS_DFORK_DPTR(dip);
+		level = be16_to_cpu(rtrefc->bb_level);
+		numrecs = be16_to_cpu(rtrefc->bb_numrecs);
+		break;
 	default:
 		dbprintf("Unknown metadata inode btree type %u\n",
 				be16_to_cpu(dip->di_metatype));
@@ -549,6 +556,7 @@ btdump_f(
 	case TYP_BMAPBTA:
 	case TYP_BMAPBTD:
 	case TYP_RTRMAPBT:
+	case TYP_RTREFCBT:
 		return dump_btree_long(iflag);
 	case TYP_INODE:
 		if (is_btree_inode())
diff --git a/db/btheight.c b/db/btheight.c
index 31dff1c924a2e0..14081c969a922c 100644
--- a/db/btheight.c
+++ b/db/btheight.c
@@ -58,6 +58,11 @@ struct btmap {
 		.maxlevels	= libxfs_rtrmapbt_maxlevels_ondisk,
 		.maxrecs	= libxfs_rtrmapbt_maxrecs,
 	},
+	{
+		.tag		= "rtrefcountbt",
+		.maxlevels	= libxfs_rtrefcountbt_maxlevels_ondisk,
+		.maxrecs	= libxfs_rtrefcountbt_maxrecs,
+	},
 };
 
 static void
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 167df04df8fb1b..87a598f346f86a 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -321,6 +321,7 @@
 #define xfs_rtgroup_put			libxfs_rtgroup_put
 
 #define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
+#define xfs_rtrefcountbt_maxlevels_ondisk	libxfs_rtrefcountbt_maxlevels_ondisk
 #define xfs_rtrefcountbt_maxrecs	libxfs_rtrefcountbt_maxrecs
 
 #define xfs_rtrmapbt_calc_reserves	libxfs_rtrmapbt_calc_reserves
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 8326bddcef8378..08f38f37ca01cc 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -536,6 +536,7 @@ .SH COMMANDS
 .IR bmapbt ,
 .IR refcountbt ,
 .IR rmapbt ,
+.IR rtrefcountbt ,
 and
 .IR rtrmapbt .
 The magic value


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 08/22] xfs_db: copy the realtime refcount btree
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (6 preceding siblings ...)
  2025-02-06 22:58   ` [PATCH 07/22] xfs_db: support the realtime refcountbt Darrick J. Wong
@ 2025-02-06 22:58   ` Darrick J. Wong
  2025-02-13  4:23     ` Christoph Hellwig
  2025-02-06 22:59   ` [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command Darrick J. Wong
                     ` (13 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Copy the realtime refcountbt when we're metadumping the filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/metadump.c |  114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index aa946746cdfa68..3e00e6de817752 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -710,6 +710,55 @@ copy_refcount_btree(
 	return scan_btree(agno, root, levels, TYP_REFCBT, agf, scanfunc_refcntbt);
 }
 
+static int
+scanfunc_rtrefcbt(
+	struct xfs_btree_block	*block,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	int			level,
+	typnm_t			btype,
+	void			*arg)
+{
+	xfs_rtrefcount_ptr_t	*pp;
+	int			i;
+	int			numrecs;
+
+	if (level == 0)
+		return 1;
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (numrecs > mp->m_rtrefc_mxr[1]) {
+		if (metadump.show_warnings)
+			print_warning("invalid numrecs (%u) in %s block %u/%u",
+				numrecs, typtab[btype].name, agno, agbno);
+		return 1;
+	}
+
+	pp = xfs_rtrefcount_ptr_addr(block, 1, mp->m_rtrefc_mxr[1]);
+	for (i = 0; i < numrecs; i++) {
+		xfs_agnumber_t	pagno;
+		xfs_agblock_t	pbno;
+
+		pagno = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i]));
+		pbno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i]));
+
+		if (pbno == 0 || pbno > mp->m_sb.sb_agblocks ||
+		    pagno > mp->m_sb.sb_agcount) {
+			if (metadump.show_warnings)
+				print_warning("invalid block number (%u/%u) "
+						"in inode %llu %s block %u/%u",
+						pagno, pbno,
+						(unsigned long long)metadump.cur_ino,
+						typtab[btype].name, agno, agbno);
+			continue;
+		}
+		if (!scan_btree(pagno, pbno, level, btype, arg,
+				scanfunc_rtrefcbt))
+			return 0;
+	}
+	return 1;
+}
+
 /* filename and extended attribute obfuscation routines */
 
 struct name_ent {
@@ -2437,6 +2486,69 @@ process_rtrmap(
 	return 1;
 }
 
+static int
+process_rtrefc(
+	struct xfs_dinode	*dip)
+{
+	int			whichfork = XFS_DATA_FORK;
+	struct xfs_rtrefcount_root *dib =
+		(struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, whichfork);
+	int			level = be16_to_cpu(dib->bb_level);
+	int			nrecs = be16_to_cpu(dib->bb_numrecs);
+	typnm_t			btype = TYP_RTREFCBT;
+	int			maxrecs;
+	xfs_rtrefcount_ptr_t	*pp;
+	int			i;
+
+	if (level > mp->m_rtrefc_maxlevels) {
+		if (metadump.show_warnings)
+			print_warning("invalid level (%u) in inode %lld %s "
+					"root", level,
+					(unsigned long long)metadump.cur_ino,
+					typtab[btype].name);
+		return 1;
+	}
+
+	if (level == 0)
+		return 1;
+
+	maxrecs = libxfs_rtrefcountbt_droot_maxrecs(
+			XFS_DFORK_SIZE(dip, mp, whichfork),
+			false);
+	if (nrecs > maxrecs) {
+		if (metadump.show_warnings)
+			print_warning("invalid numrecs (%u) in inode %lld %s "
+					"root", nrecs,
+					(unsigned long long)metadump.cur_ino,
+					typtab[btype].name);
+		return 1;
+	}
+
+	pp = xfs_rtrefcount_droot_ptr_addr(dib, 1, maxrecs);
+	for (i = 0; i < nrecs; i++) {
+		xfs_agnumber_t	ag;
+		xfs_agblock_t	bno;
+
+		ag = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i]));
+		bno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i]));
+
+		if (bno == 0 || bno > mp->m_sb.sb_agblocks ||
+				ag > mp->m_sb.sb_agcount) {
+			if (metadump.show_warnings)
+				print_warning("invalid block number (%u/%u) "
+						"in inode %llu %s root", ag,
+						bno,
+						(unsigned long long)metadump.cur_ino,
+						typtab[btype].name);
+			continue;
+		}
+
+		if (!scan_btree(ag, bno, level, btype, NULL, scanfunc_rtrefcbt))
+			return 0;
+	}
+	return 1;
+}
+
 static int
 process_inode_data(
 	struct xfs_dinode	*dip)
@@ -2483,6 +2595,8 @@ process_inode_data(
 		switch (be16_to_cpu(dip->di_metatype)) {
 		case XFS_METAFILE_RTRMAP:
 			return process_rtrmap(dip);
+		case XFS_METAFILE_RTREFCOUNT:
+			return process_rtrefc(dip);
 		default:
 			return 1;
 		}


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (7 preceding siblings ...)
  2025-02-06 22:58   ` [PATCH 08/22] xfs_db: copy the realtime refcount btree Darrick J. Wong
@ 2025-02-06 22:59   ` Darrick J. Wong
  2025-02-13  4:23     ` Christoph Hellwig
  2025-02-06 22:59   ` [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong
                     ` (12 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report rt refcount btree reservations in the rgresv subcommand output.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/info.c                |   13 +++++++++++++
 libxfs/libxfs_api_defs.h |    1 +
 2 files changed, 14 insertions(+)


diff --git a/db/info.c b/db/info.c
index ce6f358420a79d..6ad3e23832faa3 100644
--- a/db/info.c
+++ b/db/info.c
@@ -202,6 +202,19 @@ print_rgresv_info(
 
 	ask += libxfs_rtrmapbt_calc_reserves(mp);
 
+	/* rtrefcount */
+	error = -libxfs_rtginode_load(rtg, XFS_RTGI_REFCOUNT, tp);
+	if (error) {
+		dbprintf(_("Cannot load rtgroup %u refcount inode, error %d\n"),
+			rtg_rgno(rtg), error);
+		goto out_rele_dp;
+	}
+	if (rtg_refcount(rtg))
+		used += rtg_refcount(rtg)->i_nblocks;
+	libxfs_rtginode_irele(&rtg->rtg_inodes[XFS_RTGI_REFCOUNT]);
+
+	ask += libxfs_rtrefcountbt_calc_reserves(mp);
+
 	printf(_("rtg %d: dblocks: %llu fdblocks: %llu reserved: %llu used: %llu"),
 			rtg_rgno(rtg),
 			(unsigned long long)mp->m_sb.sb_dblocks,
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 87a598f346f86a..7ce10c408de1a0 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -320,6 +320,7 @@
 #define xfs_rtgroup_get			libxfs_rtgroup_get
 #define xfs_rtgroup_put			libxfs_rtgroup_put
 
+#define xfs_rtrefcountbt_calc_reserves	libxfs_rtrefcountbt_calc_reserves
 #define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
 #define xfs_rtrefcountbt_maxlevels_ondisk	libxfs_rtrefcountbt_maxlevels_ondisk
 #define xfs_rtrefcountbt_maxrecs	libxfs_rtrefcountbt_maxrecs


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (8 preceding siblings ...)
  2025-02-06 22:59   ` [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command Darrick J. Wong
@ 2025-02-06 22:59   ` Darrick J. Wong
  2025-02-13  4:24     ` Christoph Hellwig
  2025-02-06 22:59   ` [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong
                     ` (11 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report the health of the realtime reference count btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 spaceman/health.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/spaceman/health.c b/spaceman/health.c
index 5774d609a28de3..efe2a7261de5c4 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -47,6 +47,11 @@ static bool has_rtrmapbt(const struct xfs_fsop_geom *g)
 	return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_RMAPBT);
 }
 
+static bool has_rtreflink(const struct xfs_fsop_geom *g)
+{
+	return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_REFLINK);
+}
+
 struct flag_map {
 	unsigned int		mask;
 	bool			(*has_fn)(const struct xfs_fsop_geom *g);
@@ -156,6 +161,11 @@ static const struct flag_map rtgroup_flags[] = {
 		.descr = "realtime reverse mappings btree",
 		.has_fn = has_rtrmapbt,
 	},
+	{
+		.mask = XFS_RTGROUP_GEOM_SICK_REFCNTBT,
+		.descr = "realtime reference count btree",
+		.has_fn = has_rtreflink,
+	},
 	{0},
 };
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (9 preceding siblings ...)
  2025-02-06 22:59   ` [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong
@ 2025-02-06 22:59   ` Darrick J. Wong
  2025-02-13  4:24     ` Christoph Hellwig
  2025-02-06 22:59   ` [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong
                     ` (10 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't flag the rt rmap btree as having errors if there are CoW staging
extent records in it and the filesystem supports reflink.  As far as
reporting leftover staging extents, we'll report them when we scan the
rt refcount btree, in a future patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/scan.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)


diff --git a/repair/scan.c b/repair/scan.c
index 7a74f87c5f0c61..7980795e3a6f9f 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1407,9 +1407,20 @@ _("invalid length %llu in record %u of %s\n"),
 			continue;
 		}
 
-		/* We only store file data and superblocks in the rtrmap. */
-		if (XFS_RMAP_NON_INODE_OWNER(owner) &&
-		    owner != XFS_RMAP_OWN_FS) {
+		/*
+		 * We only store file data, COW data, and superblocks in the
+		 * rtrmap.
+		 */
+		if (owner == XFS_RMAP_OWN_COW) {
+			if (!xfs_has_reflink(mp)) {
+				do_warn(
+_("invalid CoW staging extent in record %u of %s\n"),
+						i, name);
+				suspect++;
+				continue;
+			}
+		} else if (XFS_RMAP_NON_INODE_OWNER(owner) &&
+			   owner != XFS_RMAP_OWN_FS) {
 			do_warn(
 _("invalid owner %lld in record %u of %s\n"),
 				(long long int)owner, i, name);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (10 preceding siblings ...)
  2025-02-06 22:59   ` [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong
@ 2025-02-06 22:59   ` Darrick J. Wong
  2025-02-13  4:24     ` Christoph Hellwig
  2025-02-06 23:00   ` [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong
                     ` (9 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 22:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the realtime refcount btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/dinode.c          |  154 +++++++++++++++++++++++
 repair/rt.h              |    4 +
 repair/scan.c            |  314 +++++++++++++++++++++++++++++++++++++++++++++-
 repair/scan.h            |   33 +++++
 5 files changed, 500 insertions(+), 6 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 7ce10c408de1a0..0dc6f0350dd0f6 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -394,6 +394,7 @@
 #define xfs_verify_fsbext		libxfs_verify_fsbext
 #define xfs_verify_fsbno		libxfs_verify_fsbno
 #define xfs_verify_ino			libxfs_verify_ino
+#define xfs_verify_rgbno		libxfs_verify_rgbno
 #define xfs_verify_rtbno		libxfs_verify_rtbno
 #define xfs_zero_extent			libxfs_zero_extent
 
diff --git a/repair/dinode.c b/repair/dinode.c
index 3995584a364771..ac5db8b0ea4392 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -308,6 +308,8 @@ _("bad state in rt extent map %" PRIu64 "\n"),
 			break;
 		case XR_E_INUSE:
 		case XR_E_MULT:
+			if (xfs_has_rtreflink(mp))
+				break;
 			set_rtbmap(ext, XR_E_MULT);
 			break;
 		case XR_E_FREE1:
@@ -382,6 +384,8 @@ _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt
 			return 1;
 		case XR_E_INUSE:
 		case XR_E_MULT:
+			if (xfs_has_rtreflink(mp))
+				break;
 			do_warn(
 _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"),
 				ino, b);
@@ -1083,6 +1087,149 @@ _("bad rtrmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
 	return suspect ? 1 : 0;
 }
 
+/*
+ * return 1 if inode should be cleared, 0 otherwise
+ */
+static int
+process_rtrefc(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t			agno,
+	xfs_agino_t			ino,
+	struct xfs_dinode		*dip,
+	int				type,
+	int				*dirty,
+	xfs_rfsblock_t			*tot,
+	uint64_t			*nex,
+	blkmap_t			**blkmapp,
+	int				check_dups)
+{
+	struct refc_priv		priv = { .nr_blocks = 0 };
+	struct xfs_rtrefcount_root	*dib;
+	xfs_rtrefcount_ptr_t		*pp;
+	struct xfs_refcount_key		*kp;
+	struct xfs_refcount_rec		*rp;
+	char				*forkname = get_forkname(XFS_DATA_FORK);
+	xfs_rgblock_t			oldkey, key;
+	xfs_ino_t			lino;
+	xfs_fsblock_t			bno;
+	size_t				droot_sz;
+	int				i;
+	int				level;
+	int				numrecs;
+	int				dmxr;
+	int				suspect = 0;
+	int				error;
+
+	/* We rebuild the rtrefcountbt, so no need to process blocks again. */
+	if (check_dups) {
+		*tot = be64_to_cpu(dip->di_nblocks);
+		return 0;
+	}
+
+	lino = XFS_AGINO_TO_INO(mp, agno, ino);
+
+	/*
+	 * This refcount btree inode must be a metadata inode reachable via
+	 * /rtgroups/$rgno.refcount in the metadata directory tree.
+	 */
+	if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) {
+		do_warn(
+_("rtrefcount inode %" PRIu64 " not flagged as metadata\n"),
+			lino);
+		return 1;
+	}
+
+	if (!is_rtrefcount_inode(lino)) {
+		do_warn(
+_("could not associate refcount inode %" PRIu64 " with any rtgroup\n"),
+			lino);
+		return 1;
+	}
+
+	priv.rgno = metafile_rgnumber(dip);
+
+	dib = (struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, XFS_DATA_FORK);
+	*tot = 0;
+	*nex = 0;
+
+	level = be16_to_cpu(dib->bb_level);
+	numrecs = be16_to_cpu(dib->bb_numrecs);
+
+	if (level > mp->m_rtrefc_maxlevels) {
+		do_warn(
+_("bad level %d in inode %" PRIu64 " rtrefcount btree root block\n"),
+			level, lino);
+		return 1;
+	}
+
+	/*
+	 * use rtroot/dfork_dsize since the root block is in the data fork
+	 */
+	droot_sz = xfs_rtrefcount_droot_space_calc(level, numrecs);
+	if (droot_sz > XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK)) {
+		do_warn(
+_("computed size of rtrefcountbt root (%zu bytes) is greater than space in "
+	  "inode %" PRIu64 " %s fork\n"),
+				droot_sz, lino, forkname);
+		return 1;
+	}
+
+	if (level == 0) {
+		rp = xfs_rtrefcount_droot_rec_addr(dib, 1);
+		error = process_rtrefc_reclist(mp, rp, numrecs,
+				&priv, "rtrefcountbt root");
+		if (error) {
+			refcount_avoid_check();
+			return 1;
+		}
+		return 0;
+	}
+
+	dmxr = libxfs_rtrefcountbt_droot_maxrecs(
+			XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK), false);
+	pp = xfs_rtrefcount_droot_ptr_addr(dib, 1, dmxr);
+
+	/* check for in-order keys */
+	for (i = 0; i < numrecs; i++)  {
+		kp = xfs_rtrefcount_droot_key_addr(dib, i + 1);
+
+		key = be32_to_cpu(kp->rc_startblock);
+		if (i == 0) {
+			oldkey = key;
+			continue;
+		}
+		if (key < oldkey) {
+			do_warn(
+_("out of order key %u in rtrefcount root ino %" PRIu64 "\n"),
+				i, lino);
+			suspect++;
+			continue;
+		}
+		oldkey = key;
+	}
+
+	/* probe keys */
+	for (i = 0; i < numrecs; i++)  {
+		bno = get_unaligned_be64(&pp[i]);
+
+		if (!libxfs_verify_fsbno(mp, bno))  {
+			do_warn(
+_("bad rtrefcount btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
+				bno, lino);
+			return 1;
+		}
+
+		if (scan_lbtree(bno, level, scan_rtrefcbt,
+				type, XFS_DATA_FORK, lino, tot, nex, blkmapp,
+				NULL, 0, 1, check_dups, XFS_RTREFC_CRC_MAGIC,
+				&priv, &xfs_rtrefcountbt_buf_ops))
+			return 1;
+	}
+
+	*tot = priv.nr_blocks;
+	return suspect ? 1 : 0;
+}
+
 /*
  * return 1 if inode should be cleared, 0 otherwise
  */
@@ -1883,6 +2030,7 @@ check_dinode_mode_format(
 		case XFS_DINODE_FMT_META_BTREE:
 			switch (be16_to_cpu(dinoc->di_metatype)) {
 			case XFS_METAFILE_RTRMAP:
+			case XFS_METAFILE_RTREFCOUNT:
 				return 0;
 			default:
 				return -1;
@@ -2333,6 +2481,11 @@ process_inode_data_fork(
 					totblocks, nextents, dblkmap,
 					check_dups);
 			break;
+		case XFS_METAFILE_RTREFCOUNT:
+			err = process_rtrefc(mp, agno, ino, dino, type, dirty,
+					totblocks, nextents, dblkmap,
+					check_dups);
+			break;
 		default:
 			do_error(
  _("unknown meta btree type %d, ino %" PRIu64 " (mode = %d)\n"),
@@ -2406,6 +2559,7 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"),
 		case XFS_DINODE_FMT_META_BTREE:
 			switch (be16_to_cpu(dino->di_metatype)) {
 			case XFS_METAFILE_RTRMAP:
+			case XFS_METAFILE_RTREFCOUNT:
 				err = 0;
 				break;
 			default:
diff --git a/repair/rt.h b/repair/rt.h
index 13558706e4ec15..e4f3d5d9af3188 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -33,6 +33,10 @@ static inline bool is_rtrmap_inode(xfs_ino_t ino)
 {
 	return is_rtgroup_inode(ino, XFS_RTGI_RMAP);
 }
+static inline bool is_rtrefcount_inode(xfs_ino_t ino)
+{
+	return is_rtgroup_inode(ino, XFS_RTGI_REFCOUNT);
+}
 
 void mark_rtgroup_inodes_bad(struct xfs_mount *mp, enum xfs_rtg_inodes type);
 bool rtgroup_inodes_were_bad(enum xfs_rtg_inodes type);
diff --git a/repair/scan.c b/repair/scan.c
index 7980795e3a6f9f..21fa9018800c77 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1745,12 +1745,6 @@ _("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"),
 	return 0;
 }
 
-struct refc_priv {
-	struct xfs_refcount_irec	last_rec;
-	xfs_agblock_t			nr_blocks;
-};
-
-
 static void
 scan_refcbt(
 	struct xfs_btree_block	*block,
@@ -1990,6 +1984,314 @@ _("extent (%u/%u) len %u claimed, state is %d\n"),
 	return;
 }
 
+
+int
+process_rtrefc_reclist(
+	struct xfs_mount	*mp,
+	struct xfs_refcount_rec	*rp,
+	int			numrecs,
+	struct refc_priv	*refc_priv,
+	const char		*name)
+{
+	struct xfs_rtgroup	*rtg;
+	xfs_rgnumber_t		rgno = refc_priv->rgno;
+	xfs_rtblock_t		lastblock = 0;
+	int			state;
+	int			suspect = 0;
+	int			i;
+
+	rtg = libxfs_rtgroup_get(mp, rgno);
+	if (!rtg) {
+		if (numrecs) {
+			do_warn(
+_("no rt group 0x%x but %d rtrefcount records\n"),
+					rgno, numrecs);
+			suspect++;
+		}
+
+		return suspect;
+	}
+
+	for (i = 0; i < numrecs; i++) {
+		enum xfs_refc_domain	domain;
+		xfs_rgblock_t		b, rgbno, end;
+		xfs_extlen_t		len;
+		xfs_nlink_t		nr;
+
+		b = rgbno = be32_to_cpu(rp[i].rc_startblock);
+		len = be32_to_cpu(rp[i].rc_blockcount);
+		nr = be32_to_cpu(rp[i].rc_refcount);
+
+		if (b & XFS_REFC_COWFLAG) {
+			domain = XFS_REFC_DOMAIN_COW;
+			rgbno &= ~XFS_REFC_COWFLAG;
+		} else {
+			domain = XFS_REFC_DOMAIN_SHARED;
+		}
+
+		if (domain == XFS_REFC_DOMAIN_COW && nr != 1) {
+			do_warn(
+_("leftover rt CoW extent has incorrect refcount in record %u of %s\n"),
+					i, name);
+			suspect++;
+		}
+		if (nr == 1) {
+			if (domain != XFS_REFC_DOMAIN_COW) {
+				do_warn(
+_("leftover rt CoW extent has invalid startblock in record %u of %s\n"),
+					i, name);
+				suspect++;
+			}
+		}
+		end = rgbno + len;
+
+		if (!libxfs_verify_rgbno(rtg, rgbno)) {
+			do_warn(
+_("invalid start block %llu in record %u of %s\n"),
+					(unsigned long long)b, i, name);
+			suspect++;
+			continue;
+		}
+
+		if (len == 0 || end <= rgbno ||
+		    !libxfs_verify_rgbno(rtg, end - 1)) {
+			do_warn(
+_("invalid length %llu in record %u of %s\n"),
+					(unsigned long long)len, i, name);
+			suspect++;
+			continue;
+		}
+
+		if (nr < 2 || nr > XFS_REFC_REFCOUNT_MAX) {
+			do_warn(
+_("invalid rt reference count %u in record %u of %s\n"),
+					nr, i, name);
+			suspect++;
+			continue;
+		}
+
+		if (nr == 1) {
+			xfs_rgblock_t		b;
+			xfs_extlen_t		blen;
+
+			for (b = rgbno; b < end; b += len) {
+				state = get_bmap_ext(rgno, b, end, &blen, true);
+				blen = min(blen, len);
+
+				switch (state) {
+				case XR_E_UNKNOWN:
+				case XR_E_COW:
+					do_warn(
+_("leftover CoW rtextent (%llu)\n"),
+						(unsigned long long)rgbno);
+					set_bmap_ext(rgno, b, len, XR_E_FREE,
+							true);
+					break;
+				default:
+					do_warn(
+_("rtextent (%llu) claimed, state is %d\n"),
+						(unsigned long long)rgbno, state);
+					break;
+				}
+				suspect++;
+			}
+		}
+
+		if (b && b <= lastblock) {
+			do_warn(_(
+"out-of-order %s btree record %d (%llu %llu) in %s\n"),
+					name, i, (unsigned long long)b,
+					(unsigned long long)len, name);
+			suspect++;
+		} else {
+			lastblock = end - 1;
+		}
+
+		/* Is this record mergeable with the last one? */
+		if (refc_priv->last_rec.rc_domain == domain &&
+		    refc_priv->last_rec.rc_startblock +
+		    refc_priv->last_rec.rc_blockcount == rgbno &&
+		    refc_priv->last_rec.rc_refcount == nr) {
+			do_warn(
+_("record %d of %s tree should be merged with previous record\n"),
+					i, name);
+			suspect++;
+			refc_priv->last_rec.rc_blockcount += len;
+		} else {
+			refc_priv->last_rec.rc_domain = domain;
+			refc_priv->last_rec.rc_startblock = rgbno;
+			refc_priv->last_rec.rc_blockcount = len;
+			refc_priv->last_rec.rc_refcount = nr;
+		}
+
+		/* XXX: probably want to mark the reflinked areas? */
+	}
+
+	libxfs_rtgroup_put(rtg);
+	return suspect;
+}
+
+int
+scan_rtrefcbt(
+	struct xfs_btree_block		*block,
+	int				level,
+	int				type,
+	int				whichfork,
+	xfs_fsblock_t			fsbno,
+	xfs_ino_t			ino,
+	xfs_rfsblock_t			*tot,
+	uint64_t			*nex,
+	struct blkmap			**blkmapp,
+	bmap_cursor_t			*bm_cursor,
+	int				suspect,
+	int				isroot,
+	int				check_dups,
+	int				*dirty,
+	uint64_t			magic,
+	void				*priv)
+{
+	const char			*name = "rtrefcount";
+	char				rootname[256];
+	int				i;
+	xfs_rtrefcount_ptr_t		*pp;
+	struct xfs_refcount_rec	*rp;
+	struct refc_priv		*refc_priv = priv;
+	int				hdr_errors = 0;
+	int				numrecs;
+	int				state;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	int				error;
+
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+	if (magic != XFS_RTREFC_CRC_MAGIC) {
+		name = "(unknown)";
+		hdr_errors++;
+		suspect++;
+		goto out;
+	}
+
+	if (be32_to_cpu(block->bb_magic) != magic) {
+		do_warn(_("bad magic # %#x in %s btree block %d/%d\n"),
+				be32_to_cpu(block->bb_magic), name, agno,
+				agbno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	if (be16_to_cpu(block->bb_level) != level) {
+		do_warn(_("expected level %d got %d in %s btree block %d/%d\n"),
+				level, be16_to_cpu(block->bb_level), name,
+				agno, agbno);
+		hdr_errors++;
+		if (suspect)
+			goto out;
+	}
+
+	refc_priv->nr_blocks++;
+
+	/*
+	 * Check for btree blocks multiply claimed.  We're going to regenerate
+	 * the btree anyway, so mark the blocks as metadata so they get freed.
+	 */
+	state = get_bmap(agno, agbno);
+	if (!(state == XR_E_UNKNOWN || state == XR_E_INUSE1))  {
+		do_warn(
+_("%s btree block claimed (state %d), agno %d, agbno %d, suspect %d\n"),
+				name, state, agno, agbno, suspect);
+		goto out;
+	}
+	set_bmap(agno, agbno, XR_E_METADATA);
+
+	numrecs = be16_to_cpu(block->bb_numrecs);
+	if (level == 0) {
+		if (numrecs > mp->m_rtrefc_mxr[0])  {
+			numrecs = mp->m_rtrefc_mxr[0];
+			hdr_errors++;
+		}
+		if (isroot == 0 && numrecs < mp->m_rtrefc_mnr[0])  {
+			numrecs = mp->m_rtrefc_mnr[0];
+			hdr_errors++;
+		}
+
+		if (hdr_errors) {
+			do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+					be16_to_cpu(block->bb_numrecs),
+					mp->m_rtrefc_mnr[0],
+					mp->m_rtrefc_mxr[0], name, agno, agbno);
+			suspect++;
+		}
+
+		rp = xfs_rtrefcount_rec_addr(block, 1);
+		snprintf(rootname, 256, "%s btree block %u/%u", name, agno,
+				agbno);
+		error = process_rtrefc_reclist(mp, rp, numrecs, refc_priv,
+				rootname);
+		if (error)
+			suspect++;
+		goto out;
+	}
+
+	/*
+	 * interior record
+	 */
+	pp = xfs_rtrefcount_ptr_addr(block, 1, mp->m_rtrefc_mxr[1]);
+
+	if (numrecs > mp->m_rtrefc_mxr[1])  {
+		numrecs = mp->m_rtrefc_mxr[1];
+		hdr_errors++;
+	}
+	if (isroot == 0 && numrecs < mp->m_rtrefc_mnr[1])  {
+		numrecs = mp->m_rtrefc_mnr[1];
+		hdr_errors++;
+	}
+
+	/*
+	 * don't pass bogus tree flag down further if this block
+	 * looked ok.  bail out if two levels in a row look bad.
+	 */
+	if (hdr_errors)  {
+		do_warn(
+	_("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"),
+				be16_to_cpu(block->bb_numrecs),
+				mp->m_rtrefc_mnr[1], mp->m_rtrefc_mxr[1], name,
+				agno, agbno);
+		if (suspect)
+			goto out;
+		suspect++;
+	} else if (suspect) {
+		suspect = 0;
+	}
+
+	for (i = 0; i < numrecs; i++)  {
+		xfs_fsblock_t		pbno = be64_to_cpu(pp[i]);
+
+		if (!libxfs_verify_fsbno(mp, pbno)) {
+			do_warn(
+	_("bad btree pointer (%u) in %sbt block %u/%u\n"),
+					agbno, name, agno, agbno);
+			suspect++;
+			return 0;
+		}
+
+		scan_lbtree(pbno, level, scan_rtrefcbt, type, whichfork, ino,
+				tot, nex, blkmapp, bm_cursor, suspect, 0,
+				check_dups, magic, refc_priv,
+				&xfs_rtrefcountbt_buf_ops);
+	}
+out:
+	if (suspect) {
+		refcount_avoid_check();
+		return 1;
+	}
+
+	return 0;
+}
+
 /*
  * The following helpers are to help process and validate individual on-disk
  * inode btree records. We have two possible inode btrees with slightly
diff --git a/repair/scan.h b/repair/scan.h
index a624c882734c77..1643a2397aeaf5 100644
--- a/repair/scan.h
+++ b/repair/scan.h
@@ -100,4 +100,37 @@ int scan_rtrmapbt(
 	uint64_t		magic,
 	void			*priv);
 
+struct refc_priv {
+	struct xfs_refcount_irec	last_rec;
+	xfs_agblock_t			nr_blocks;
+	xfs_rgnumber_t			rgno;
+};
+
+int
+process_rtrefc_reclist(
+	struct xfs_mount	*mp,
+	struct xfs_refcount_rec	*rp,
+	int			numrecs,
+	struct refc_priv	*refc_priv,
+	const char		*name);
+
+int
+scan_rtrefcbt(
+	struct xfs_btree_block	*block,
+	int			level,
+	int			type,
+	int			whichfork,
+	xfs_fsblock_t		bno,
+	xfs_ino_t		ino,
+	xfs_rfsblock_t		*tot,
+	uint64_t		*nex,
+	struct blkmap		**blkmapp,
+	bmap_cursor_t		*bm_cursor,
+	int			suspect,
+	int			isroot,
+	int			check_dups,
+	int			*dirty,
+	uint64_t		magic,
+	void			*priv);
+
 #endif /* _XR_SCAN_H */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (11 preceding siblings ...)
  2025-02-06 22:59   ` [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong
@ 2025-02-06 23:00   ` Darrick J. Wong
  2025-02-13  4:25     ` Christoph Hellwig
  2025-02-06 23:00   ` [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong
                     ` (8 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make sure that we find the realtime refcountbt inode and mark it
appropriately, just in case we find a rogue inode claiming to
be an rtrefcount, or just plain garbage in the superblock field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dino_chunks.c |   11 +++++++++++
 repair/dinode.c      |   49 +++++++++++++++++++++++++++++++++++++++++--------
 repair/dir2.c        |    5 +++++
 repair/incore.h      |    1 +
 repair/rmap.c        |    5 ++++-
 repair/rmap.h        |    2 +-
 repair/scan.c        |    8 ++++----
 7 files changed, 67 insertions(+), 14 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 8c5387cdf4ea52..250985ec264ead 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -1036,6 +1036,17 @@ process_inode_chunk(
 	_("would clear rtgroup rmap inode %" PRIu64 "\n"),
 						ino);
 				}
+			} else if (is_rtrefcount_inode(ino)) {
+				refcount_avoid_check(mp);
+				if (!no_modify)  {
+					do_warn(
+	_("cleared rtgroup refcount inode %" PRIu64 "\n"),
+						ino);
+				} else  {
+					do_warn(
+	_("would clear rtgroup refcount inode %" PRIu64 "\n"),
+						ino);
+				}
 			} else if (!no_modify)  {
 				do_warn(_("cleared inode %" PRIu64 "\n"),
 					ino);
diff --git a/repair/dinode.c b/repair/dinode.c
index ac5db8b0ea4392..3260df94511ed2 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -181,6 +181,9 @@ clear_dinode(
 
 	if (is_rtrmap_inode(ino_num))
 		rmap_avoid_check(mp);
+
+	if (is_rtrefcount_inode(ino_num))
+		refcount_avoid_check(mp);
 }
 
 /*
@@ -1139,14 +1142,27 @@ _("rtrefcount inode %" PRIu64 " not flagged as metadata\n"),
 		return 1;
 	}
 
-	if (!is_rtrefcount_inode(lino)) {
-		do_warn(
-_("could not associate refcount inode %" PRIu64 " with any rtgroup\n"),
-			lino);
-		return 1;
-	}
-
+	/*
+	 * If this rtrefcount file claims to be from an rtgroup that actually
+	 * exists, check that inode discovery actually found it.  Note that
+	 * we can have stray rtrefcount files from failed growfsrt operations.
+	 */
 	priv.rgno = metafile_rgnumber(dip);
+	if (priv.rgno < mp->m_sb.sb_rgcount) {
+		if (type != XR_INO_RTREFC) {
+			do_warn(
+_("rtrefcount inode %" PRIu64 " was not found in the metadata directory tree\n"),
+				lino);
+			return 1;
+		}
+
+		if (!is_rtrefcount_inode(lino)) {
+			do_warn(
+_("could not associate refcount inode %" PRIu64 " with any rtgroup\n"),
+				lino);
+			return 1;
+		}
+	}
 
 	dib = (struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, XFS_DATA_FORK);
 	*tot = 0;
@@ -1179,7 +1195,7 @@ _("computed size of rtrefcountbt root (%zu bytes) is greater than space in "
 		error = process_rtrefc_reclist(mp, rp, numrecs,
 				&priv, "rtrefcountbt root");
 		if (error) {
-			refcount_avoid_check();
+			refcount_avoid_check(mp);
 			return 1;
 		}
 		return 0;
@@ -2143,6 +2159,9 @@ process_check_metadata_inodes(
 	if (is_rtrmap_inode(lino))
 		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
 				XR_INO_RTRMAP, _("realtime rmap btree"));
+	if (is_rtrefcount_inode(lino))
+		return process_check_rt_inode(mp, dinoc, lino, type, dirty,
+				XR_INO_RTREFC, _("realtime refcount btree"));
 	return 0;
 }
 
@@ -2253,6 +2272,18 @@ _("found inode %" PRIu64 " claiming to be a rtrmapbt file, but rmapbt is disable
 		}
 		break;
 
+	case XR_INO_RTREFC:
+		/*
+		 * if we have no refcountbt, any inode claiming
+		 * to be a real-time file is bogus
+		 */
+		if (!xfs_has_reflink(mp)) {
+			do_warn(
+_("found inode %" PRIu64 " claiming to be a rtrefcountbt file, but reflink is disabled\n"), lino);
+			return 1;
+		}
+		break;
+
 	default:
 		break;
 	}
@@ -3453,6 +3484,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			type = XR_INO_PQUOTA;
 		else if (is_rtrmap_inode(lino))
 			type = XR_INO_RTRMAP;
+		else if (is_rtrefcount_inode(lino))
+			type = XR_INO_RTREFC;
 		else
 			type = XR_INO_DATA;
 		break;
diff --git a/repair/dir2.c b/repair/dir2.c
index af00b2d8d6c852..a80160afaea5cf 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -282,6 +282,9 @@ process_sf_dir2(
 		} else if (is_rtrmap_inode(lino)) {
 			junkit = 1;
 			junkreason = _("realtime rmap");
+		} else if (is_rtrefcount_inode(lino)) {
+			junkit = 1;
+			junkreason = _("realtime refcount");
 		} else if ((irec_p = find_inode_rec(mp,
 					XFS_INO_TO_AGNO(mp, lino),
 					XFS_INO_TO_AGINO(mp, lino))) != NULL) {
@@ -761,6 +764,8 @@ process_dir2_data(
 			clearreason = _("metadata directory root");
 		} else if (is_rtrmap_inode(ent_ino)) {
 			clearreason = _("realtime rmap");
+		} else if (is_rtrefcount_inode(ent_ino)) {
+			clearreason = _("realtime refcount");
 		} else {
 			irec_p = find_inode_rec(mp,
 						XFS_INO_TO_AGNO(mp, ent_ino),
diff --git a/repair/incore.h b/repair/incore.h
index 4add12615e0a04..57019148f588c3 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -242,6 +242,7 @@ int		count_bcnt_extents(xfs_agnumber_t);
 #define XR_INO_GQUOTA	13		/* group quota inode */
 #define XR_INO_PQUOTA	14		/* project quota inode */
 #define XR_INO_RTRMAP	15		/* realtime rmap */
+#define XR_INO_RTREFC	16		/* realtime refcount */
 
 /* inode allocation tree */
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 85a65048db9afc..e39c74cc7b44f7 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1699,8 +1699,11 @@ init_refcount_cursor(
  * Disable the refcount btree check.
  */
 void
-refcount_avoid_check(void)
+refcount_avoid_check(
+	struct xfs_mount	*mp)
 {
+	if (xfs_has_rtgroups(mp))
+		mark_rtgroup_inodes_bad(mp, XFS_RTGI_REFCOUNT);
 	refcbt_suspect = true;
 }
 
diff --git a/repair/rmap.h b/repair/rmap.h
index 23859bf6c2ad42..c0984d97322861 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -41,7 +41,7 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
 uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno);
 extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
-extern void refcount_avoid_check(void);
+extern void refcount_avoid_check(struct xfs_mount *mp);
 void check_refcounts(struct xfs_mount *mp, xfs_agnumber_t agno);
 
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
diff --git a/repair/scan.c b/repair/scan.c
index 21fa9018800c77..86565ebb9f2faf 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -1980,7 +1980,7 @@ _("extent (%u/%u) len %u claimed, state is %d\n"),
 	libxfs_perag_put(pag);
 out:
 	if (suspect)
-		refcount_avoid_check();
+		refcount_avoid_check(mp);
 	return;
 }
 
@@ -2285,7 +2285,7 @@ _("%s btree block claimed (state %d), agno %d, agbno %d, suspect %d\n"),
 	}
 out:
 	if (suspect) {
-		refcount_avoid_check();
+		refcount_avoid_check(mp);
 		return 1;
 	}
 
@@ -3148,7 +3148,7 @@ validate_agf(
 		if (levels == 0 || levels > mp->m_refc_maxlevels) {
 			do_warn(_("bad levels %u for refcountbt root, agno %d\n"),
 				levels, agno);
-			refcount_avoid_check();
+			refcount_avoid_check(mp);
 		}
 
 		bno = be32_to_cpu(agf->agf_refcount_root);
@@ -3166,7 +3166,7 @@ validate_agf(
 		} else {
 			do_warn(_("bad agbno %u for refcntbt root, agno %d\n"),
 				bno, agno);
-			refcount_avoid_check();
+			refcount_avoid_check(mp);
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (12 preceding siblings ...)
  2025-02-06 23:00   ` [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong
@ 2025-02-06 23:00   ` Darrick J. Wong
  2025-02-13  4:25     ` Christoph Hellwig
  2025-02-06 23:00   ` [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong
                     ` (7 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

At the end of phase 4, compute reference count information for realtime
groups from the realtime rmap information collected, just like we do for
AGs in the data section.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/phase4.c |   21 ++++++++++++++++++++-
 repair/rmap.c   |   17 ++++++++++++-----
 repair/rmap.h   |    2 +-
 3 files changed, 33 insertions(+), 7 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 29efa58af33178..4cfad1a6911764 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -177,13 +177,28 @@ compute_ag_refcounts(
 {
 	int		error;
 
-	error = compute_refcounts(wq->wq_ctx, agno);
+	error = compute_refcounts(wq->wq_ctx, false, agno);
 	if (error)
 		do_error(
 _("%s while computing reference count records.\n"),
 			 strerror(error));
 }
 
+static void
+compute_rt_refcounts(
+	struct workqueue*wq,
+	xfs_agnumber_t	rgno,
+	void		*arg)
+{
+	int		error;
+
+	error = compute_refcounts(wq->wq_ctx, true, rgno);
+	if (error)
+		do_error(
+_("%s while computing realtime reference count records.\n"),
+			 strerror(error));
+}
+
 static void
 process_inode_reflink_flags(
 	struct workqueue	*wq,
@@ -233,6 +248,10 @@ process_rmap_data(
 	create_work_queue(&wq, mp, platform_nproc());
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, compute_ag_refcounts, i, NULL);
+	if (xfs_has_rtreflink(mp)) {
+		for (i = 0; i < mp->m_sb.sb_rgcount; i++)
+			queue_work(&wq, compute_rt_refcounts, i, NULL);
+	}
 	destroy_work_queue(&wq);
 
 	create_work_queue(&wq, mp, platform_nproc());
diff --git a/repair/rmap.c b/repair/rmap.c
index e39c74cc7b44f7..64a9f06d0ee915 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -125,6 +125,11 @@ rmaps_init_rt(
 	if (error)
 		goto nomem;
 
+	error = init_slab(&ag_rmap->ar_refcount_items,
+			  sizeof(struct xfs_refcount_irec));
+	if (error)
+		goto nomem;
+
 	return;
 nomem:
 	do_error(
@@ -878,6 +883,7 @@ mark_reflink_inodes(
 static void
 refcount_emit(
 	struct xfs_mount	*mp,
+	bool			isrt,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		agbno,
 	xfs_extlen_t		len,
@@ -887,7 +893,7 @@ refcount_emit(
 	int			error;
 	struct xfs_slab		*rlslab;
 
-	rlslab = rmaps_for_group(false, agno)->ar_refcount_items;
+	rlslab = rmaps_for_group(isrt, agno)->ar_refcount_items;
 	ASSERT(nr_rmaps > 0);
 
 	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
@@ -1005,6 +1011,7 @@ refcount_push_rmaps_at(
 int
 compute_refcounts(
 	struct xfs_mount	*mp,
+	bool			isrt,
 	xfs_agnumber_t		agno)
 {
 	struct xfs_btree_cur	*rmcur;
@@ -1020,12 +1027,12 @@ compute_refcounts(
 
 	if (!xfs_has_reflink(mp))
 		return 0;
-	if (!rmaps_has_observations(rmaps_for_group(false, agno)))
+	if (!rmaps_has_observations(rmaps_for_group(isrt, agno)))
 		return 0;
 
-	nr_rmaps = rmap_record_count(mp, false, agno);
+	nr_rmaps = rmap_record_count(mp, isrt, agno);
 
-	error = rmap_init_mem_cursor(mp, NULL, false, agno, &rmcur);
+	error = rmap_init_mem_cursor(mp, NULL, isrt, agno, &rmcur);
 	if (error)
 		return error;
 
@@ -1082,7 +1089,7 @@ compute_refcounts(
 			ASSERT(nbno > cbno);
 			if (rcbag_count(rcstack) != old_stack_height) {
 				if (old_stack_height > 1) {
-					refcount_emit(mp, agno, cbno,
+					refcount_emit(mp, isrt, agno, cbno,
 							nbno - cbno,
 							old_stack_height);
 				}
diff --git a/repair/rmap.h b/repair/rmap.h
index c0984d97322861..98f2891692a6f8 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -38,7 +38,7 @@ extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1,
 extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
-extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno);
 uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno);
 extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
 extern void refcount_avoid_check(struct xfs_mount *mp);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (13 preceding siblings ...)
  2025-02-06 23:00   ` [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong
@ 2025-02-06 23:00   ` Darrick J. Wong
  2025-02-13  4:26     ` Christoph Hellwig
  2025-02-06 23:00   ` [PATCH 16/22] xfs_repair: reject unwritten shared extents Darrick J. Wong
                     ` (6 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime refcount
btree (particularly if we're in -n mode) to detect rtrefcountbt
problems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/agbtree.c         |    2 
 repair/phase4.c          |   13 +++
 repair/rmap.c            |  197 +++++++++++++++++++++++++++++++++++-----------
 repair/rmap.h            |    4 +
 5 files changed, 169 insertions(+), 48 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 0dc6f0350dd0f6..b5a39856bc1e80 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -322,6 +322,7 @@
 
 #define xfs_rtrefcountbt_calc_reserves	libxfs_rtrefcountbt_calc_reserves
 #define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
+#define xfs_rtrefcountbt_init_cursor	libxfs_rtrefcountbt_init_cursor
 #define xfs_rtrefcountbt_maxlevels_ondisk	libxfs_rtrefcountbt_maxlevels_ondisk
 #define xfs_rtrefcountbt_maxrecs	libxfs_rtrefcountbt_maxrecs
 
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 2135147fcf9354..01066130767cb6 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -747,7 +747,7 @@ build_refcount_tree(
 {
 	int			error;
 
-	error = init_refcount_cursor(agno, &btr->slab_cursor);
+	error = init_refcount_cursor(false, agno, &btr->slab_cursor);
 	if (error)
 		do_error(
 _("Insufficient memory to construct refcount cursor.\n"));
diff --git a/repair/phase4.c b/repair/phase4.c
index 4cfad1a6911764..b752b4c871ea83 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -223,6 +223,15 @@ check_refcount_btrees(
 	check_refcounts(wq->wq_ctx, agno);
 }
 
+static void
+check_rt_refcount_btrees(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	check_rtrefcounts(wq->wq_ctx, agno);
+}
+
 static void
 process_rmap_data(
 	struct xfs_mount	*mp)
@@ -259,6 +268,10 @@ process_rmap_data(
 		queue_work(&wq, process_inode_reflink_flags, i, NULL);
 		queue_work(&wq, check_refcount_btrees, i, NULL);
 	}
+	if (xfs_has_rtreflink(mp)) {
+		for (i = 0; i < mp->m_sb.sb_rgcount; i++)
+			queue_work(&wq, check_rt_refcount_btrees, i, NULL);
+	}
 	destroy_work_queue(&wq);
 }
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 64a9f06d0ee915..638e5ea92278cb 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1694,10 +1694,11 @@ refcount_record_count(
  */
 int
 init_refcount_cursor(
+	bool			isrt,
 	xfs_agnumber_t		agno,
 	struct xfs_slab_cursor	**cur)
 {
-	struct xfs_ag_rmap	*x = rmaps_for_group(false, agno);
+	struct xfs_ag_rmap	*x = rmaps_for_group(isrt, agno);
 
 	return init_slab_cursor(x->ar_refcount_items, NULL, cur);
 }
@@ -1714,56 +1715,18 @@ refcount_avoid_check(
 	refcbt_suspect = true;
 }
 
-/*
- * Compare the observed reference counts against what's in the ag btree.
- */
-void
-check_refcounts(
-	struct xfs_mount		*mp,
+static int
+check_refcount_records(
+	struct xfs_slab_cursor		*rl_cur,
+	struct xfs_btree_cur		*bt_cur,
 	xfs_agnumber_t			agno)
 {
 	struct xfs_refcount_irec	tmp;
-	struct xfs_slab_cursor		*rl_cur;
-	struct xfs_btree_cur		*bt_cur = NULL;
-	struct xfs_buf			*agbp = NULL;
-	struct xfs_perag		*pag = NULL;
 	struct xfs_refcount_irec	*rl_rec;
-	int				have;
 	int				i;
+	int				have;
 	int				error;
 
-	if (!xfs_has_reflink(mp))
-		return;
-	if (refcbt_suspect) {
-		if (no_modify && agno == 0)
-			do_warn(_("would rebuild corrupt refcount btrees.\n"));
-		return;
-	}
-
-	/* Create cursors to refcount structures */
-	error = init_refcount_cursor(agno, &rl_cur);
-	if (error) {
-		do_warn(_("Not enough memory to check refcount data.\n"));
-		return;
-	}
-
-	pag = libxfs_perag_get(mp, agno);
-	error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp);
-	if (error) {
-		do_warn(_("Could not read AGF %u to check refcount btree.\n"),
-				agno);
-		goto err_pag;
-	}
-
-	/* Leave the per-ag data "uninitialized" since we rewrite it later */
-	clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
-
-	bt_cur = libxfs_refcountbt_init_cursor(mp, NULL, agbp, pag);
-	if (!bt_cur) {
-		do_warn(_("Not enough memory to check refcount data.\n"));
-		goto err_agf;
-	}
-
 	rl_rec = pop_slab_cursor(rl_cur);
 	while (rl_rec) {
 		/* Look for a refcount record in the btree */
@@ -1774,7 +1737,7 @@ check_refcounts(
 			do_warn(
 _("Could not read reference count record for (%u/%u).\n"),
 					agno, rl_rec->rc_startblock);
-			goto err_cur;
+			return error;
 		}
 		if (!have) {
 			do_warn(
@@ -1789,7 +1752,7 @@ _("Missing reference count record for (%u/%u) len %u count %u\n"),
 			do_warn(
 _("Could not read reference count record for (%u/%u).\n"),
 					agno, rl_rec->rc_startblock);
-			goto err_cur;
+			return error;
 		}
 		if (!i) {
 			do_warn(
@@ -1819,6 +1782,59 @@ _("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) le
 		rl_rec = pop_slab_cursor(rl_cur);
 	}
 
+	return 0;
+}
+
+/*
+ * Compare the observed reference counts against what's in the ag btree.
+ */
+void
+check_refcounts(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_slab_cursor		*rl_cur;
+	struct xfs_btree_cur		*bt_cur = NULL;
+	struct xfs_buf			*agbp = NULL;
+	struct xfs_perag		*pag = NULL;
+	int				error;
+
+	if (!xfs_has_reflink(mp))
+		return;
+	if (refcbt_suspect) {
+		if (no_modify && agno == 0)
+			do_warn(_("would rebuild corrupt refcount btrees.\n"));
+		return;
+	}
+
+	/* Create cursors to refcount structures */
+	error = init_refcount_cursor(false, agno, &rl_cur);
+	if (error) {
+		do_warn(_("Not enough memory to check refcount data.\n"));
+		return;
+	}
+
+	pag = libxfs_perag_get(mp, agno);
+	error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp);
+	if (error) {
+		do_warn(_("Could not read AGF %u to check refcount btree.\n"),
+				agno);
+		goto err_pag;
+	}
+
+	/* Leave the per-ag data "uninitialized" since we rewrite it later */
+	clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
+
+	bt_cur = libxfs_refcountbt_init_cursor(mp, NULL, agbp, pag);
+	if (!bt_cur) {
+		do_warn(_("Not enough memory to check refcount data.\n"));
+		goto err_agf;
+	}
+
+	error = check_refcount_records(rl_cur, bt_cur, agno);
+	if (error)
+		goto err_cur;
+
 err_cur:
 	libxfs_btree_del_cursor(bt_cur, error);
 err_agf:
@@ -1828,6 +1844,95 @@ _("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) le
 	free_slab_cursor(&rl_cur);
 }
 
+/*
+ * Compare the observed reference counts against what's in the ondisk btree.
+ */
+void
+check_rtrefcounts(
+	struct xfs_mount		*mp,
+	xfs_rgnumber_t			rgno)
+{
+	struct xfs_slab_cursor		*rl_cur;
+	struct xfs_btree_cur		*bt_cur = NULL;
+	struct xfs_rtgroup		*rtg = NULL;
+	struct xfs_inode		*ip = NULL;
+	int				error;
+
+	if (!xfs_has_reflink(mp))
+		return;
+	if (refcbt_suspect) {
+		if (no_modify && rgno == 0)
+			do_warn(_("would rebuild corrupt refcount btrees.\n"));
+		return;
+	}
+	if (mp->m_sb.sb_rblocks == 0) {
+		if (rmap_record_count(mp, true, rgno) != 0)
+			do_error(_("realtime refcounts but no rtdev?\n"));
+		return;
+	}
+
+	/* Create cursors to refcount structures */
+	error = init_refcount_cursor(true, rgno, &rl_cur);
+	if (error) {
+		do_warn(_("Not enough memory to check refcount data.\n"));
+		return;
+	}
+
+	rtg = libxfs_rtgroup_get(mp, rgno);
+	if (!rtg) {
+		do_warn(_("Could not load rtgroup %u.\n"), rgno);
+		goto err_rcur;
+	}
+
+	ip = rtg_refcount(rtg);
+	if (!ip) {
+		do_warn(_("Could not find rtgroup %u refcount inode.\n"),
+			rgno);
+		goto err_rtg;
+	}
+
+	if (ip->i_df.if_format != XFS_DINODE_FMT_META_BTREE) {
+		do_warn(
+_("rtgroup %u refcount inode has wrong format 0x%x, expected 0x%x\n"),
+				rgno, ip->i_df.if_format,
+				XFS_DINODE_FMT_META_BTREE);
+		goto err_rtg;
+	}
+
+	if (ip->i_metatype != XFS_METAFILE_RTREFCOUNT) {
+		do_warn(
+_("rtgroup %u refcount inode has wrong metatype 0x%x, expected 0x%x\n"),
+				rgno, ip->i_df.if_format,
+				XFS_METAFILE_RTREFCOUNT);
+		goto err_rtg;
+	}
+
+	if (xfs_inode_has_attr_fork(ip) &&
+	    !(xfs_has_metadir(mp) && xfs_has_parent(mp))) {
+		do_warn(
+_("rtgroup %u refcount inode should not have extended attributes\n"),
+				rgno);
+		goto err_rtg;
+	}
+
+	bt_cur = libxfs_rtrefcountbt_init_cursor(NULL, rtg);
+	if (!bt_cur) {
+		do_warn(_("Not enough memory to check refcount data.\n"));
+		goto err_rtg;
+	}
+
+	error = check_refcount_records(rl_cur, bt_cur, rgno);
+	if (error)
+		goto err_cur;
+
+err_cur:
+	libxfs_btree_del_cursor(bt_cur, error);
+err_rtg:
+	libxfs_rtgroup_put(rtg);
+err_rcur:
+	free_slab_cursor(&rl_cur);
+}
+
 /*
  * Regenerate the AGFL so that we don't run out of it while rebuilding the
  * rmap btree.  If skip_rmapbt is true, don't update the rmapbt (most probably
diff --git a/repair/rmap.h b/repair/rmap.h
index 98f2891692a6f8..80e82a4ac4c008 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -40,9 +40,11 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 
 int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno);
 uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno);
-extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
+int init_refcount_cursor(bool isrt, xfs_agnumber_t agno,
+		struct xfs_slab_cursor **pcur);
 extern void refcount_avoid_check(struct xfs_mount *mp);
 void check_refcounts(struct xfs_mount *mp, xfs_agnumber_t agno);
+void check_rtrefcounts(struct xfs_mount *mp, xfs_rgnumber_t rgno);
 
 extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *,
 	xfs_agnumber_t, xfs_agino_t, xfs_ino_t);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 16/22] xfs_repair: reject unwritten shared extents
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (14 preceding siblings ...)
  2025-02-06 23:00   ` [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong
@ 2025-02-06 23:00   ` Darrick J. Wong
  2025-02-13  4:26     ` Christoph Hellwig
  2025-02-06 23:01   ` [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong
                     ` (5 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We don't allow sharing of unwritten extents, which means that repair
should reject an unwritten extent if someone else has already claimed
the space.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |   36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 3260df94511ed2..f49c735d34356b 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -311,7 +311,8 @@ _("bad state in rt extent map %" PRIu64 "\n"),
 			break;
 		case XR_E_INUSE:
 		case XR_E_MULT:
-			if (xfs_has_rtreflink(mp))
+			if (xfs_has_rtreflink(mp) &&
+			    irec->br_state == XFS_EXT_NORM)
 				break;
 			set_rtbmap(ext, XR_E_MULT);
 			break;
@@ -387,8 +388,14 @@ _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt
 			return 1;
 		case XR_E_INUSE:
 		case XR_E_MULT:
-			if (xfs_has_rtreflink(mp))
-				break;
+			if (xfs_has_rtreflink(mp)) {
+				if (irec->br_state == XFS_EXT_NORM)
+					break;
+				do_warn(
+_("data fork in rt inode %" PRIu64 " claims shared unwritten rt extent %" PRIu64 "\n"),
+					ino, b);
+				return 1;
+			}
 			do_warn(
 _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"),
 				ino, b);
@@ -472,6 +479,18 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", "
 	return bad;
 }
 
+static inline bool
+is_reflink_type(
+	struct xfs_mount	*mp,
+	int			type)
+{
+	if (type == XR_INO_DATA && xfs_has_reflink(mp))
+		return true;
+	if (type == XR_INO_RTDATA && xfs_has_rtreflink(mp))
+		return true;
+	return false;
+}
+
 /*
  * return 1 if inode should be cleared, 0 otherwise
  * if check_dups should be set to 1, that implies that
@@ -717,9 +736,14 @@ _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"),
 
 			case XR_E_INUSE:
 			case XR_E_MULT:
-				if (type == XR_INO_DATA &&
-				    xfs_has_reflink(mp))
-					break;
+				if (is_reflink_type(mp, type)) {
+					if (irec.br_state == XFS_EXT_NORM)
+						break;
+					do_warn(
+_("%s fork in %s inode %" PRIu64 " claims shared unwritten block %" PRIu64 "\n"),
+						forkname, ftype, ino, b);
+					goto done;
+				}
 				do_warn(
 _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"),
 					forkname, ftype, ino, b);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (15 preceding siblings ...)
  2025-02-06 23:00   ` [PATCH 16/22] xfs_repair: reject unwritten shared extents Darrick J. Wong
@ 2025-02-06 23:01   ` Darrick J. Wong
  2025-02-13  4:27     ` Christoph Hellwig
  2025-02-06 23:01   ` [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong
                     ` (4 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the collected reference count information to rebuild the btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h   |    5 +
 repair/Makefile            |    1 
 repair/agbtree.c           |    2 
 repair/phase5.c            |    6 +
 repair/phase6.c            |   14 ++
 repair/rmap.c              |   22 ++++
 repair/rmap.h              |    7 +
 repair/rtrefcount_repair.c |  257 ++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 308 insertions(+), 6 deletions(-)
 create mode 100644 repair/rtrefcount_repair.c


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b5a39856bc1e80..66cbb34f05a48f 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -320,11 +320,16 @@
 #define xfs_rtgroup_get			libxfs_rtgroup_get
 #define xfs_rtgroup_put			libxfs_rtgroup_put
 
+#define xfs_rtrefcountbt_absolute_maxlevels	libxfs_rtrefcountbt_absolute_maxlevels
 #define xfs_rtrefcountbt_calc_reserves	libxfs_rtrefcountbt_calc_reserves
+#define xfs_rtrefcountbt_calc_size		libxfs_rtrefcountbt_calc_size
+#define xfs_rtrefcountbt_commit_staged_btree	libxfs_rtrefcountbt_commit_staged_btree
+#define xfs_rtrefcountbt_create		libxfs_rtrefcountbt_create
 #define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
 #define xfs_rtrefcountbt_init_cursor	libxfs_rtrefcountbt_init_cursor
 #define xfs_rtrefcountbt_maxlevels_ondisk	libxfs_rtrefcountbt_maxlevels_ondisk
 #define xfs_rtrefcountbt_maxrecs	libxfs_rtrefcountbt_maxrecs
+#define xfs_rtrefcountbt_stage_cursor	libxfs_rtrefcountbt_stage_cursor
 
 #define xfs_rtrmapbt_calc_reserves	libxfs_rtrmapbt_calc_reserves
 #define xfs_rtrmapbt_calc_size		libxfs_rtrmapbt_calc_size
diff --git a/repair/Makefile b/repair/Makefile
index 6f4ec3b3a9c4dc..ff5b1f5abedac0 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -73,6 +73,7 @@ CFILES = \
 	rcbag.c \
 	rmap.c \
 	rt.c \
+	rtrefcount_repair.c \
 	rtrmap_repair.c \
 	sb.c \
 	scan.c \
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 01066130767cb6..983b645e1a35a3 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -730,7 +730,7 @@ init_refc_cursor(
 
 	/* Compute how many blocks we'll need. */
 	error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload,
-			refcount_record_count(sc->mp, agno));
+			refcount_record_count(sc->mp, false, agno));
 	if (error)
 		do_error(
 _("Unable to compute refcount btree geometry, error %d.\n"), error);
diff --git a/repair/phase5.c b/repair/phase5.c
index cacaf74dda3a60..4cf28d8ae1a250 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -705,7 +705,7 @@ are_packed_btrees_needed(
 	 * If we don't have inode-based metadata, we can let the AG btrees
 	 * pack as needed; there are no global space concerns here.
 	 */
-	if (!xfs_has_rtrmapbt(mp))
+	if (!xfs_has_rtrmapbt(mp) && !xfs_has_rtreflink(mp))
 		return false;
 
 	while ((pag = xfs_perag_next(mp, pag))) {
@@ -716,8 +716,10 @@ are_packed_btrees_needed(
 		fdblocks += ag_fdblocks;
 	}
 
-	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 		metadata_blocks += estimate_rtrmapbt_blocks(rtg);
+		metadata_blocks += estimate_rtrefcountbt_blocks(rtg);
+	}
 
 	/*
 	 * If we think we'll have more metadata blocks than free space, then
diff --git a/repair/phase6.c b/repair/phase6.c
index 30ea19fda9fd87..4064a84b24509f 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -696,6 +696,15 @@ ensure_rtgroup_rmapbt(
 		populate_rtgroup_rmapbt(rtg, est_fdblocks);
 }
 
+static void
+ensure_rtgroup_refcountbt(
+	struct xfs_rtgroup	*rtg,
+	xfs_filblks_t		est_fdblocks)
+{
+	if (ensure_rtgroup_file(rtg, XFS_RTGI_REFCOUNT))
+		populate_rtgroup_refcountbt(rtg, est_fdblocks);
+}
+
 /* Initialize a root directory. */
 static int
 init_fs_root_dir(
@@ -3405,8 +3414,10 @@ reset_rt_metadir_inodes(
 	 * maximally.
 	 */
 	if (!need_packed_btrees) {
-		while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 			metadata_blocks += estimate_rtrmapbt_blocks(rtg);
+			metadata_blocks += estimate_rtrefcountbt_blocks(rtg);
+		}
 
 		if (mp->m_sb.sb_fdblocks > metadata_blocks)
 			est_fdblocks = mp->m_sb.sb_fdblocks - metadata_blocks;
@@ -3427,6 +3438,7 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 		ensure_rtgroup_bitmap(rtg);
 		ensure_rtgroup_summary(rtg);
 		ensure_rtgroup_rmapbt(rtg, est_fdblocks);
+		ensure_rtgroup_refcountbt(rtg, est_fdblocks);
 	}
 }
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 638e5ea92278cb..97510dd875911a 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1682,9 +1682,10 @@ _("Unable to fix reflink flag on inode %"PRIu64".\n"),
 uint64_t
 refcount_record_count(
 	struct xfs_mount	*mp,
+	bool			isrt,
 	xfs_agnumber_t		agno)
 {
-	struct xfs_ag_rmap	*x = rmaps_for_group(false, agno);
+	struct xfs_ag_rmap	*x = rmaps_for_group(isrt, agno);
 
 	return slab_count(x->ar_refcount_items);
 }
@@ -2081,3 +2082,22 @@ estimate_rtrmapbt_blocks(
 	nr_recs = xmbuf_bytes(x->ar_xmbtp) / sizeof(struct xfs_rmap_rec);
 	return libxfs_rtrmapbt_calc_size(mp, nr_recs);
 }
+
+/* Estimate the size of the ondisk rtrefcountbt from the incore data. */
+xfs_filblks_t
+estimate_rtrefcountbt_blocks(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_ag_rmap	*x;
+
+	if (!rmap_needs_work(mp) || !xfs_has_rtreflink(mp))
+		return 0;
+
+	x = &rg_rmaps[rtg_rgno(rtg)];
+	if (!x->ar_refcount_items)
+		return 0;
+
+	return libxfs_rtrefcountbt_calc_size(mp,
+			slab_count(x->ar_refcount_items));
+}
diff --git a/repair/rmap.h b/repair/rmap.h
index 80e82a4ac4c008..1f234b8be32e72 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -39,7 +39,8 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec,
 		struct xfs_rmap_irec *key);
 
 int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno);
-uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno);
+uint64_t refcount_record_count(struct xfs_mount *mp, bool isrt,
+		xfs_agnumber_t agno);
 int init_refcount_cursor(bool isrt, xfs_agnumber_t agno,
 		struct xfs_slab_cursor **pcur);
 extern void refcount_avoid_check(struct xfs_mount *mp);
@@ -64,4 +65,8 @@ void populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg,
 		xfs_filblks_t est_fdblocks);
 xfs_filblks_t estimate_rtrmapbt_blocks(struct xfs_rtgroup *rtg);
 
+int populate_rtgroup_refcountbt(struct xfs_rtgroup *rtg,
+		xfs_filblks_t est_fdblocks);
+xfs_filblks_t estimate_rtrefcountbt_blocks(struct xfs_rtgroup *rtg);
+
 #endif /* RMAP_H_ */
diff --git a/repair/rtrefcount_repair.c b/repair/rtrefcount_repair.c
new file mode 100644
index 00000000000000..228d080e5a5dcb
--- /dev/null
+++ b/repair/rtrefcount_repair.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2021-2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include <libxfs.h>
+#include "btree.h"
+#include "err_protos.h"
+#include "libxlog.h"
+#include "incore.h"
+#include "globals.h"
+#include "dinode.h"
+#include "slab.h"
+#include "rmap.h"
+#include "bulkload.h"
+
+/*
+ * Realtime Reference Count (RTREFCBT) Repair
+ * ==========================================
+ *
+ * Gather all the reference count records for the realtime device, reset the
+ * incore fork, then recreate the btree.
+ */
+struct xrep_rtrefc {
+	/* rtrefcbt slab cursor */
+	struct xfs_slab_cursor	*slab_cursor;
+
+	/* New fork. */
+	struct bulkload		new_fork_info;
+	struct xfs_btree_bload	rtrefc_bload;
+
+	struct repair_ctx	*sc;
+	struct xfs_rtgroup	*rtg;
+
+	/* Estimated free space after building all rt btrees */
+	xfs_filblks_t		est_fdblocks;
+};
+
+/* Retrieve rtrefc data for bulk load. */
+STATIC int
+xrep_rtrefc_get_records(
+	struct xfs_btree_cur		*cur,
+	unsigned int			idx,
+	struct xfs_btree_block		*block,
+	unsigned int			nr_wanted,
+	void				*priv)
+{
+	struct xfs_refcount_irec	*rec;
+	struct xrep_rtrefc		*rc = priv;
+	union xfs_btree_rec		*block_rec;
+	unsigned int			loaded;
+
+	for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
+		rec = pop_slab_cursor(rc->slab_cursor);
+		memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec));
+
+		block_rec = libxfs_btree_rec_addr(cur, idx, block);
+		cur->bc_ops->init_rec_from_cur(cur, block_rec);
+	}
+
+	return loaded;
+}
+
+/* Feed one of the new btree blocks to the bulk loader. */
+STATIC int
+xrep_rtrefc_claim_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	void			*priv)
+{
+	struct xrep_rtrefc	*rr = priv;
+
+	return bulkload_claim_block(cur, &rr->new_fork_info, ptr);
+}
+
+/* Figure out how much space we need to create the incore btree root block. */
+STATIC size_t
+xrep_rtrefc_iroot_size(
+	struct xfs_btree_cur	*cur,
+	unsigned int		level,
+	unsigned int		nr_this_level,
+	void			*priv)
+{
+	return xfs_rtrefcount_broot_space_calc(cur->bc_mp, level,
+			nr_this_level);
+}
+
+/* Reserve new btree blocks and bulk load all the rtrmap records. */
+STATIC int
+xrep_rtrefc_btree_load(
+	struct xrep_rtrefc	*rr,
+	struct xfs_btree_cur	*rtrmap_cur)
+{
+	struct repair_ctx	*sc = rr->sc;
+	int			error;
+
+	rr->rtrefc_bload.get_records = xrep_rtrefc_get_records;
+	rr->rtrefc_bload.claim_block = xrep_rtrefc_claim_block;
+	rr->rtrefc_bload.iroot_size = xrep_rtrefc_iroot_size;
+	bulkload_estimate_inode_slack(sc->mp, &rr->rtrefc_bload,
+			rr->est_fdblocks);
+
+	/* Compute how many blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(rtrmap_cur,
+			&rr->rtrefc_bload,
+			refcount_record_count(sc->mp, true, rtg_rgno(rr->rtg)));
+	if (error)
+		return error;
+
+	/*
+	 * Guess how many blocks we're going to need to rebuild an entire
+	 * rtrefcountbt from the number of extents we found, and pump up our
+	 * transaction to have sufficient block reservation.
+	 */
+	error = -libxfs_trans_reserve_more(sc->tp, rr->rtrefc_bload.nr_blocks,
+			0);
+	if (error)
+		return error;
+
+	/*
+	 * Reserve the space we'll need for the new btree.  Drop the cursor
+	 * while we do this because that can roll the transaction and cursors
+	 * can't handle that.
+	 */
+	error = bulkload_alloc_file_blocks(&rr->new_fork_info,
+			rr->rtrefc_bload.nr_blocks);
+	if (error)
+		return error;
+
+	/* Add all observed rtrmap records. */
+	error = init_refcount_cursor(true, rtg_rgno(rr->rtg), &rr->slab_cursor);
+	if (error)
+		return error;
+	error = -libxfs_btree_bload(rtrmap_cur, &rr->rtrefc_bload, rr);
+	free_slab_cursor(&rr->slab_cursor);
+	return error;
+}
+
+/* Update the inode counters. */
+STATIC int
+xrep_rtrefc_reset_counters(
+	struct xrep_rtrefc	*rr)
+{
+	struct repair_ctx	*sc = rr->sc;
+
+	/*
+	 * Update the inode block counts to reflect the btree we just
+	 * generated.
+	 */
+	sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
+	libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+
+	/* Quotas don't exist so we're done. */
+	return 0;
+}
+
+/*
+ * Use the collected rmap information to stage a new rt refcount btree.  If
+ * this is successful we'll return with the new btree root information logged
+ * to the repair transaction but not yet committed.
+ */
+static int
+xrep_rtrefc_build_new_tree(
+	struct xrep_rtrefc	*rr)
+{
+	struct xfs_owner_info	oinfo;
+	struct xfs_btree_cur	*cur;
+	struct repair_ctx	*sc = rr->sc;
+	struct xbtree_ifakeroot	*ifake = &rr->new_fork_info.ifake;
+	int			error;
+
+	/*
+	 * Prepare to construct the new fork by initializing the new btree
+	 * structure and creating a fake ifork in the ifakeroot structure.
+	 */
+	libxfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK);
+	bulkload_init_inode(&rr->new_fork_info, sc, XFS_DATA_FORK, &oinfo);
+	cur = libxfs_rtrefcountbt_init_cursor(NULL, rr->rtg);
+	libxfs_btree_stage_ifakeroot(cur, ifake);
+
+	/*
+	 * Figure out the size and format of the new fork, then fill it with
+	 * all the rtrmap records we've found.  Join the inode to the
+	 * transaction so that we can roll the transaction while holding the
+	 * inode locked.
+	 */
+	libxfs_trans_ijoin(sc->tp, sc->ip, 0);
+	ifake->if_fork->if_format = XFS_DINODE_FMT_META_BTREE;
+	error = xrep_rtrefc_btree_load(rr, cur);
+	if (error)
+		goto err_cur;
+
+	/*
+	 * Install the new fork in the inode.  After this point the old mapping
+	 * data are no longer accessible and the new tree is live.  We delete
+	 * the cursor immediately after committing the staged root because the
+	 * staged fork might be in extents format.
+	 */
+	libxfs_rtrefcountbt_commit_staged_btree(cur, sc->tp);
+	libxfs_btree_del_cursor(cur, 0);
+
+	/* Reset the inode counters now that we've changed the fork. */
+	error = xrep_rtrefc_reset_counters(rr);
+	if (error)
+		goto err_newbt;
+
+	/* Dispose of any unused blocks and the accounting infomation. */
+	error = bulkload_commit(&rr->new_fork_info);
+	if (error)
+		return error;
+
+	return -libxfs_trans_roll_inode(&sc->tp, sc->ip);
+err_cur:
+	if (cur)
+		libxfs_btree_del_cursor(cur, error);
+err_newbt:
+	bulkload_cancel(&rr->new_fork_info);
+	return error;
+}
+
+/* Store the realtime reference counts in the rtrefcbt. */
+int
+populate_rtgroup_refcountbt(
+	struct xfs_rtgroup	*rtg,
+	xfs_filblks_t		est_fdblocks)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_inode	*ip = rtg_refcount(rtg);
+	struct repair_ctx	sc = {
+		.mp		= mp,
+		.ip		= ip,
+	};
+	struct xrep_rtrefc	rr = {
+		.sc		= &sc,
+		.rtg		= rtg,
+		.est_fdblocks	= est_fdblocks,
+	};
+	int			error;
+
+	if (!xfs_has_rtreflink(mp))
+		return 0;
+
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0,
+			&sc.tp);
+	if (error)
+		return error;
+
+	error = xrep_rtrefc_build_new_tree(&rr);
+	if (error)
+		goto out_cancel;
+
+	return -libxfs_trans_commit(sc.tp);
+
+out_cancel:
+	libxfs_trans_cancel(sc.tp);
+	return error;
+}


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (16 preceding siblings ...)
  2025-02-06 23:01   ` [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong
@ 2025-02-06 23:01   ` Darrick J. Wong
  2025-02-13  4:27     ` Christoph Hellwig
  2025-02-06 23:01   ` [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong
                     ` (3 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we allow reflink on the realtime volume, allow that combination
of inode flags if the feature's enabled.  Note that we now allow inodes
to have rtinherit even if there's no realtime volume, since the kernel
has never restricted that.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index f49c735d34356b..5a1c8e8cb3ec11 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3356,7 +3356,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 		}
 
 		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
-		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+		    !xfs_has_rtreflink(mp) &&
+		    (flags & XFS_DIFLAG_REALTIME)) {
 			if (!uncertain) {
 				do_warn(
 	_("Cannot have a reflinked realtime inode %" PRIu64 "\n"),
@@ -3388,7 +3389,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 		}
 
 		if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
-		    (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) {
+		    !xfs_has_rtreflink(mp) &&
+		    (flags & XFS_DIFLAG_REALTIME)) {
 			if (!uncertain) {
 				do_warn(
 	_("Cannot have CoW extent size hint on a realtime inode %" PRIu64 "\n"),


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (17 preceding siblings ...)
  2025-02-06 23:01   ` [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong
@ 2025-02-06 23:01   ` Darrick J. Wong
  2025-02-13  4:27     ` Christoph Hellwig
  2025-02-06 23:01   ` [PATCH 20/22] xfs_logprint: report realtime CUIs Darrick J. Wong
                     ` (2 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

XFS allows a sysadmin to change the rt extent size when adding a rt
section to a filesystem after formatting.  If there are any directories
with both a cowextsize hint and rtinherit set, the hint could become
misaligned with the new rextsize.  Offer to fix the problem if we're in
modify mode and the verifier didn't trip.  If we're in dry run mode,
we let the kernel fix it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 repair/dinode.c |   64 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 21 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 5a1c8e8cb3ec11..8696a838087f1b 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2961,6 +2961,47 @@ should_have_metadir_iflag(
 	return false;
 }
 
+static void
+validate_cowextsize(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dino,
+	xfs_ino_t		lino,
+	int			*dirty)
+{
+	uint16_t		flags = be16_to_cpu(dino->di_flags);
+	uint64_t		flags2 = be64_to_cpu(dino->di_flags2);
+	unsigned int		value = be32_to_cpu(dino->di_cowextsize);
+	bool			misaligned = false;
+	bool			bad;
+
+	/*
+	 * XFS allows a sysadmin to change the rt extent size when adding a
+	 * rt section to a filesystem after formatting.  If there are any
+	 * directories with both a cowextsize hint and rtinherit set, the
+	 * hint could become misaligned with the new rextsize.
+	 */
+	if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
+	    (flags & XFS_DIFLAG_RTINHERIT) &&
+	    value % mp->m_sb.sb_rextsize > 0)
+		misaligned = true;
+
+	/* Complain if the verifier fails. */
+	bad = libxfs_inode_validate_cowextsize(mp, value,
+			be16_to_cpu(dino->di_mode), flags, flags2) != NULL;
+	if (bad || misaligned) {
+		do_warn(
+_("Bad CoW extent size hint %u on inode %" PRIu64 ", "),
+				be32_to_cpu(dino->di_cowextsize), lino);
+		if (!no_modify) {
+			do_warn(_("resetting to zero\n"));
+			dino->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE);
+			dino->di_cowextsize = 0;
+			*dirty = 1;
+		} else
+			do_warn(_("would reset to zero\n"));
+	}
+}
+
 /*
  * returns 0 if the inode is ok, 1 if the inode is corrupt
  * check_dups can be set to 1 *only* when called by the
@@ -3544,27 +3585,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 
 	validate_extsize(mp, dino, lino, dirty);
 
-	/*
-	 * Only (regular files and directories) with COWEXTSIZE flags
-	 * set can have extsize set.
-	 */
-	if (dino->di_version >= 3 &&
-	    libxfs_inode_validate_cowextsize(mp,
-			be32_to_cpu(dino->di_cowextsize),
-			be16_to_cpu(dino->di_mode),
-			be16_to_cpu(dino->di_flags),
-			be64_to_cpu(dino->di_flags2)) != NULL) {
-		do_warn(
-_("Bad CoW extent size %u on inode %" PRIu64 ", "),
-				be32_to_cpu(dino->di_cowextsize), lino);
-		if (!no_modify)  {
-			do_warn(_("resetting to zero\n"));
-			dino->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE);
-			dino->di_cowextsize = 0;
-			*dirty = 1;
-		} else
-			do_warn(_("would reset to zero\n"));
-	}
+	if (dino->di_version >= 3)
+		validate_cowextsize(mp, dino, lino, dirty);
 
 	/* nsec fields cannot be larger than 1 billion */
 	check_nsec("atime", lino, dino, &dino->di_atime, dirty);


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 20/22] xfs_logprint: report realtime CUIs
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (18 preceding siblings ...)
  2025-02-06 23:01   ` [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong
@ 2025-02-06 23:01   ` Darrick J. Wong
  2025-02-13  4:28     ` Christoph Hellwig
  2025-02-06 23:02   ` [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong
  2025-02-06 23:02   ` [PATCH 22/22] mkfs: enable reflink on the realtime device Darrick J. Wong
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Decode the CUI format just enough to report if an CUI targets the
realtime device or not.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 logprint/log_misc.c      |    2 ++
 logprint/log_print_all.c |    8 ++++++++
 logprint/log_redo.c      |   24 +++++++++++++++++++-----
 3 files changed, 29 insertions(+), 5 deletions(-)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index aaa9598616a308..144695753211aa 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -1027,12 +1027,14 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_CUI_RT:
 		    case XFS_LI_CUI: {
 			skip = xlog_print_trans_cui(&ptr,
 					be32_to_cpu(op_head->oh_len),
 					continued);
 			break;
 		    }
+		    case XFS_LI_CUD_RT:
 		    case XFS_LI_CUD: {
 			skip = xlog_print_trans_cud(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index 56e765d64f2df4..1498ef97247d3d 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -430,9 +430,11 @@ xlog_recover_print_logitem(
 	case XFS_LI_RUI:
 		xlog_recover_print_rui(item);
 		break;
+	case XFS_LI_CUD_RT:
 	case XFS_LI_CUD:
 		xlog_recover_print_cud(item);
 		break;
+	case XFS_LI_CUI_RT:
 	case XFS_LI_CUI:
 		xlog_recover_print_cui(item);
 		break;
@@ -512,6 +514,12 @@ xlog_recover_print_item(
 	case XFS_LI_CUI:
 		printf("CUI");
 		break;
+	case XFS_LI_CUD_RT:
+		printf("CUD_RT");
+		break;
+	case XFS_LI_CUI_RT:
+		printf("CUI_RT");
+		break;
 	case XFS_LI_BUD:
 		printf("BUD");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index a0cc558499ae0b..89d7448342b33d 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -440,6 +440,7 @@ xlog_print_trans_cui(
 	uint			src_len,
 	int			continued)
 {
+	const char		*item_name = "CUI?";
 	struct xfs_cui_log_format	*src_f, *f = NULL;
 	uint			dst_len;
 	uint			nextents;
@@ -480,8 +481,14 @@ xlog_print_trans_cui(
 		goto error;
 	}
 
-	printf(_("CUI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
-		f->cui_size, f->cui_nextents, (unsigned long long)f->cui_id);
+	switch (f->cui_type) {
+	case XFS_LI_CUI:	item_name = "CUI"; break;
+	case XFS_LI_CUI_RT:	item_name = "CUI_RT"; break;
+	}
+
+	printf(_("%s:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			item_name, f->cui_size, f->cui_nextents,
+			(unsigned long long)f->cui_id);
 
 	if (continued) {
 		printf(_("CUI extent data skipped (CONTINUE set, no space)\n"));
@@ -520,6 +527,7 @@ xlog_print_trans_cud(
 	char				**ptr,
 	uint				len)
 {
+	const char			*item_name = "CUD?";
 	struct xfs_cud_log_format	*f;
 	struct xfs_cud_log_format	lbuf;
 
@@ -528,11 +536,17 @@ xlog_print_trans_cud(
 
 	memcpy(&lbuf, *ptr, min(core_size, len));
 	f = &lbuf;
+
+	switch (f->cud_type) {
+	case XFS_LI_CUD:	item_name = "CUD"; break;
+	case XFS_LI_CUD_RT:	item_name = "CUD_RT"; break;
+	}
+
 	*ptr += len;
 	if (len >= core_size) {
-		printf(_("CUD:  #regs: %d	                 id: 0x%llx\n"),
-			f->cud_size,
-			(unsigned long long)f->cud_cui_id);
+		printf(_("%s:  #regs: %d	                 id: 0x%llx\n"),
+				item_name, f->cud_size,
+				(unsigned long long)f->cud_cui_id);
 
 		/* don't print extents as they are not used */
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (19 preceding siblings ...)
  2025-02-06 23:01   ` [PATCH 20/22] xfs_logprint: report realtime CUIs Darrick J. Wong
@ 2025-02-06 23:02   ` Darrick J. Wong
  2025-02-13  4:28     ` Christoph Hellwig
  2025-02-06 23:02   ` [PATCH 22/22] mkfs: enable reflink on the realtime device Darrick J. Wong
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Extent size hints exist to nudge the behavior of the file data block
allocator towards trying to make aligned allocations.  Therefore, it
doesn't make sense to allow a hint that isn't a multiple of the
fundamental allocation unit for a given file.

This means that if the sysadmin is formatting with rtinherit set on the
root dir, validate_cowextsize_hint needs to check the hint value on a
simulated realtime file to make sure that it's correct.  This hasn't
been necessary in the past since one cannot have a CoW hint without a
reflink filesystem, and we previously didn't allow rt reflink
filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 mkfs/xfs_mkfs.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index c8042261328171..9239109434d748 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2985,6 +2985,26 @@ _("illegal CoW extent size hint %lld, must be less than %u.\n"),
 				min(XFS_MAX_BMBT_EXTLEN, mp->m_sb.sb_agblocks / 2));
 		usage();
 	}
+
+	/*
+	 * If the value is to be passed on to realtime files, revalidate with
+	 * a realtime file so that we know the hint and flag that get passed on
+	 * to realtime files will be correct.
+	 */
+	if (!(cli->fsx.fsx_xflags & FS_XFLAG_RTINHERIT))
+		return;
+
+	fa = libxfs_inode_validate_cowextsize(mp, cli->fsx.fsx_cowextsize,
+			S_IFREG, XFS_DIFLAG_REALTIME, flags2);
+
+	if (fa) {
+		fprintf(stderr,
+_("illegal CoW extent size hint %lld, must be less than %u and a multiple of %u. %p\n"),
+				(long long)cli->fsx.fsx_cowextsize,
+				min(XFS_MAX_BMBT_EXTLEN, mp->m_sb.sb_agblocks / 2),
+				mp->m_sb.sb_rextsize, fa);
+		usage();
+	}
 }
 
 /* Complain if this filesystem is not a supported configuration. */


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 22/22] mkfs: enable reflink on the realtime device
  2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
                     ` (20 preceding siblings ...)
  2025-02-06 23:02   ` [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong
@ 2025-02-06 23:02   ` Darrick J. Wong
  2025-02-13  4:28     ` Christoph Hellwig
  21 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Allow the creation of filesystems with both reflink and realtime volumes
enabled.  For now we don't support a realtime extent size > 1.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/init.c            |    4 +--
 libxfs/libxfs_api_defs.h |    1 +
 mkfs/xfs_mkfs.c          |   63 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 60 insertions(+), 8 deletions(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index 00bd46a6013e28..5b45ed3472762c 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -305,9 +305,9 @@ rtmount_init(
 	if (mp->m_sb.sb_rblocks == 0)
 		return 0;
 
-	if (xfs_has_reflink(mp)) {
+	if (xfs_has_reflink(mp) && mp->m_sb.sb_rextsize > 1) {
 		fprintf(stderr,
-	_("%s: Reflink not compatible with realtime device. Please try a newer xfsprogs.\n"),
+	_("%s: Reflink not compatible with realtime extent size > 1. Please try a newer xfsprogs.\n"),
 				progname);
 		return -1;
 	}
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 66cbb34f05a48f..530feef2a47db8 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -323,6 +323,7 @@
 #define xfs_rtrefcountbt_absolute_maxlevels	libxfs_rtrefcountbt_absolute_maxlevels
 #define xfs_rtrefcountbt_calc_reserves	libxfs_rtrefcountbt_calc_reserves
 #define xfs_rtrefcountbt_calc_size		libxfs_rtrefcountbt_calc_size
+#define xfs_rtrefcountbt_calc_reserves	libxfs_rtrefcountbt_calc_reserves
 #define xfs_rtrefcountbt_commit_staged_btree	libxfs_rtrefcountbt_commit_staged_btree
 #define xfs_rtrefcountbt_create		libxfs_rtrefcountbt_create
 #define xfs_rtrefcountbt_droot_maxrecs	libxfs_rtrefcountbt_droot_maxrecs
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 9239109434d748..c794f918573f91 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2671,12 +2671,36 @@ _("inode btree counters not supported without finobt support\n"));
 	}
 
 	if (cli->xi->rt.name) {
-		if (cli->sb_feat.reflink && cli_opt_set(&mopts, M_REFLINK)) {
-			fprintf(stderr,
-_("reflink not supported with realtime devices\n"));
-			usage();
+		if (cli->rtextsize && cli->sb_feat.reflink) {
+			if (cli_opt_set(&mopts, M_REFLINK)) {
+				fprintf(stderr,
+_("reflink not supported on realtime devices with rt extent size specified\n"));
+				usage();
+			}
+			cli->sb_feat.reflink = false;
+		}
+		if (cfg->blocksize < XFS_MIN_RTEXTSIZE && cli->sb_feat.reflink) {
+			if (cli_opt_set(&mopts, M_REFLINK)) {
+				fprintf(stderr,
+_("reflink not supported on realtime devices with blocksize %d < %d\n"),
+						cli->blocksize,
+						XFS_MIN_RTEXTSIZE);
+				usage();
+			}
+			cli->sb_feat.reflink = false;
+		}
+		if (!cli->sb_feat.metadir && cli->sb_feat.reflink) {
+			if (cli_opt_set(&mopts, M_REFLINK) &&
+			    cli_opt_set(&mopts, M_METADIR)) {
+				fprintf(stderr,
+_("reflink not supported on realtime devices without metadir feature\n"));
+				usage();
+			} else if (cli_opt_set(&mopts, M_REFLINK)) {
+				cli->sb_feat.metadir = true;
+			} else {
+				cli->sb_feat.reflink = false;
+			}
 		}
-		cli->sb_feat.reflink = false;
 
 		if (!cli->sb_feat.metadir && cli->sb_feat.rmapbt) {
 			if (cli_opt_set(&mopts, M_RMAPBT) &&
@@ -2874,6 +2898,19 @@ validate_rtextsize(
 			usage();
 		}
 		cfg->rtextblocks = (xfs_extlen_t)(rtextbytes >> cfg->blocklog);
+	} else if (cli->sb_feat.reflink && cli->xi->rt.name) {
+		/*
+		 * reflink doesn't support rt extent size > 1FSB yet, so set
+		 * an extent size of 1FSB.  Make sure we still satisfy the
+		 * minimum rt extent size.
+		 */
+		if (cfg->blocksize < XFS_MIN_RTEXTSIZE) {
+			fprintf(stderr,
+		_("reflink not supported on rt volume with blocksize %d\n"),
+				cfg->blocksize);
+			usage();
+		}
+		cfg->rtextblocks = 1;
 	} else {
 		/*
 		 * If realtime extsize has not been specified by the user,
@@ -2905,6 +2942,12 @@ validate_rtextsize(
 		}
 	}
 	ASSERT(cfg->rtextblocks);
+
+	if (cli->sb_feat.reflink && cfg->rtblocks > 0 && cfg->rtextblocks > 1) {
+		fprintf(stderr,
+_("reflink not supported on realtime with extent sizes > 1\n"));
+		usage();
+	}
 }
 
 /* Validate the incoming extsize hint. */
@@ -5087,11 +5130,19 @@ check_rt_meta_prealloc(
 		error = -libxfs_metafile_resv_init(rtg_rmap(rtg), ask);
 		if (error)
 			prealloc_fail(mp, error, ask, _("realtime rmap btree"));
+
+		ask = libxfs_rtrefcountbt_calc_reserves(mp);
+		error = -libxfs_metafile_resv_init(rtg_refcount(rtg), ask);
+		if (error)
+			prealloc_fail(mp, error, ask,
+					_("realtime refcount btree"));
 	}
 
 	/* Unreserve the realtime metadata reservations. */
-	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 		libxfs_metafile_resv_free(rtg_rmap(rtg));
+		libxfs_metafile_resv_free(rtg_refcount(rtg));
+	}
 
 	/* Unreserve the per-AG reservations. */
 	while ((pag = xfs_perag_next(mp, pag)))


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
@ 2025-02-06 23:02   ` Darrick J. Wong
  2025-02-13  4:29     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  2025-02-06 23:03   ` [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass a const pointer to path_walk since we don't actually modify the
contents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/namei.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/db/namei.c b/db/namei.c
index 4eae4e8fd3232c..00610a54af527e 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -140,10 +140,10 @@ path_navigate(
 static int
 path_walk(
 	xfs_ino_t	rootino,
-	char		*path)
+	const char	*path)
 {
 	struct dirpath	*dirpath;
-	char		*p = path;
+	const char	*p = path;
 	int		error = 0;
 
 	if (*p == '/') {


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
  2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
@ 2025-02-06 23:03   ` Darrick J. Wong
  2025-02-13  4:29     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  2025-02-06 23:03   ` [PATCH 3/4] xfs_db: make listdir more generally useful Darrick J. Wong
  2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
  3 siblings, 2 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

A couple of patches from now we're going to reuse the path_walk code in
a new xfs_db subcommand that tries to recover directory trees from
old/damaged filesystems.  Let's pass around an empty transaction to try
too avoid livelocks on malicious/broken metadata.  This is not
completely foolproof, but it's quick enough for most purposes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/namei.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)


diff --git a/db/namei.c b/db/namei.c
index 00610a54af527e..22eae50f219fd0 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -87,15 +87,20 @@ path_navigate(
 	xfs_ino_t		rootino,
 	struct dirpath		*dirpath)
 {
+	struct xfs_trans	*tp;
 	struct xfs_inode	*dp;
 	xfs_ino_t		ino = rootino;
 	unsigned int		i;
 	int			error;
 
-	error = -libxfs_iget(mp, NULL, ino, 0, &dp);
+	error = -libxfs_trans_alloc_empty(mp, &tp);
 	if (error)
 		return error;
 
+	error = -libxfs_iget(mp, tp, ino, 0, &dp);
+	if (error)
+		goto out_trans;
+
 	for (i = 0; i < dirpath->depth; i++) {
 		struct xfs_name	xname = {
 			.name	= (unsigned char *)dirpath->path[i],
@@ -104,35 +109,37 @@ path_navigate(
 
 		if (!S_ISDIR(VFS_I(dp)->i_mode)) {
 			error = ENOTDIR;
-			goto rele;
+			goto out_rele;
 		}
 
-		error = -libxfs_dir_lookup(NULL, dp, &xname, &ino, NULL);
+		error = -libxfs_dir_lookup(tp, dp, &xname, &ino, NULL);
 		if (error)
-			goto rele;
+			goto out_rele;
 		if (!xfs_verify_ino(mp, ino)) {
 			error = EFSCORRUPTED;
-			goto rele;
+			goto out_rele;
 		}
 
 		libxfs_irele(dp);
 		dp = NULL;
 
-		error = -libxfs_iget(mp, NULL, ino, 0, &dp);
+		error = -libxfs_iget(mp, tp, ino, 0, &dp);
 		switch (error) {
 		case EFSCORRUPTED:
 		case EFSBADCRC:
 		case 0:
 			break;
 		default:
-			return error;
+			goto out_trans;
 		}
 	}
 
 	set_cur_inode(ino);
-rele:
+out_rele:
 	if (dp)
 		libxfs_irele(dp);
+out_trans:
+	libxfs_trans_cancel(tp);
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 3/4] xfs_db: make listdir more generally useful
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
  2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
  2025-02-06 23:03   ` [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate Darrick J. Wong
@ 2025-02-06 23:03   ` Darrick J. Wong
  2025-02-13  4:30     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
  3 siblings, 2 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Enhance the current directory entry iteration code in xfs_db to be more
generally useful by allowing callers to pass around a transaction, a
callback function, and a private pointer.  This will be used in the next
patch to iterate directories when we want to copy their contents out of
the filesystem into a directory.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/namei.c |   83 +++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 26 deletions(-)


diff --git a/db/namei.c b/db/namei.c
index 22eae50f219fd0..6f277a65ed91ac 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -266,15 +266,18 @@ get_dstr(
 	return filetype_strings[filetype];
 }
 
-static void
-dir_emit(
-	struct xfs_mount	*mp,
+static int
+print_dirent(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
 	xfs_dir2_dataptr_t	off,
 	char			*name,
 	ssize_t			namelen,
 	xfs_ino_t		ino,
-	uint8_t			dtype)
+	uint8_t			dtype,
+	void			*private)
 {
+	struct xfs_mount	*mp = dp->i_mount;
 	char			*display_name;
 	struct xfs_name		xname = { .name = (unsigned char *)name };
 	const char		*dstr = get_dstr(mp, dtype);
@@ -306,11 +309,14 @@ dir_emit(
 
 	if (display_name != name)
 		free(display_name);
+	return 0;
 }
 
 static int
 list_sfdir(
-	struct xfs_da_args		*args)
+	struct xfs_da_args		*args,
+	dir_emit_t			dir_emit,
+	void				*private)
 {
 	struct xfs_inode		*dp = args->dp;
 	struct xfs_mount		*mp = dp->i_mount;
@@ -321,17 +327,24 @@ list_sfdir(
 	xfs_dir2_dataptr_t		off;
 	unsigned int			i;
 	uint8_t				filetype;
+	int				error;
 
 	/* . and .. entries */
 	off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
 			geo->data_entry_offset);
-	dir_emit(args->dp->i_mount, off, ".", -1, dp->i_ino, XFS_DIR3_FT_DIR);
+	error = dir_emit(args->trans, args->dp, off, ".", -1, dp->i_ino,
+			XFS_DIR3_FT_DIR, private);
+	if (error)
+		return error;
 
 	ino = libxfs_dir2_sf_get_parent_ino(sfp);
 	off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
 			geo->data_entry_offset +
 			libxfs_dir2_data_entsize(mp, sizeof(".") - 1));
-	dir_emit(args->dp->i_mount, off, "..", -1, ino, XFS_DIR3_FT_DIR);
+	error = dir_emit(args->trans, args->dp, off, "..", -1, ino,
+			XFS_DIR3_FT_DIR, private);
+	if (error)
+		return error;
 
 	/* Walk everything else. */
 	sfep = xfs_dir2_sf_firstentry(sfp);
@@ -341,8 +354,11 @@ list_sfdir(
 		off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
 				xfs_dir2_sf_get_offset(sfep));
 
-		dir_emit(args->dp->i_mount, off, (char *)sfep->name,
-				sfep->namelen, ino, filetype);
+		error = dir_emit(args->trans, args->dp, off,
+				(char *)sfep->name, sfep->namelen, ino,
+				filetype, private);
+		if (error)
+			return error;
 		sfep = libxfs_dir2_sf_nextentry(mp, sfp, sfep);
 	}
 
@@ -352,7 +368,9 @@ list_sfdir(
 /* List entries in block format directory. */
 static int
 list_blockdir(
-	struct xfs_da_args	*args)
+	struct xfs_da_args	*args,
+	dir_emit_t		dir_emit,
+	void			*private)
 {
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
@@ -363,7 +381,7 @@ list_blockdir(
 	unsigned int		end;
 	int			error;
 
-	error = xfs_dir3_block_read(NULL, dp, args->owner, &bp);
+	error = xfs_dir3_block_read(args->trans, dp, args->owner, &bp);
 	if (error)
 		return error;
 
@@ -383,8 +401,11 @@ list_blockdir(
 		diroff = xfs_dir2_db_off_to_dataptr(geo, geo->datablk, offset);
 		offset += libxfs_dir2_data_entsize(mp, dep->namelen);
 		filetype = libxfs_dir2_data_get_ftype(dp->i_mount, dep);
-		dir_emit(mp, diroff, (char *)dep->name, dep->namelen,
-				be64_to_cpu(dep->inumber), filetype);
+		error = dir_emit(args->trans, args->dp, diroff,
+				(char *)dep->name, dep->namelen,
+				be64_to_cpu(dep->inumber), filetype, private);
+		if (error)
+			break;
 	}
 
 	libxfs_trans_brelse(args->trans, bp);
@@ -394,7 +415,9 @@ list_blockdir(
 /* List entries in leaf format directory. */
 static int
 list_leafdir(
-	struct xfs_da_args	*args)
+	struct xfs_da_args	*args,
+	dir_emit_t		dir_emit,
+	void			*private)
 {
 	struct xfs_bmbt_irec	map;
 	struct xfs_iext_cursor	icur;
@@ -408,7 +431,7 @@ list_leafdir(
 	int			error = 0;
 
 	/* Read extent map. */
-	error = -libxfs_iread_extents(NULL, dp, XFS_DATA_FORK);
+	error = -libxfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
 	if (error)
 		return error;
 
@@ -424,7 +447,7 @@ list_leafdir(
 		libxfs_trim_extent(&map, dabno, geo->leafblk - dabno);
 
 		/* Read the directory block of that first mapping. */
-		error = xfs_dir3_data_read(NULL, dp, args->owner,
+		error = xfs_dir3_data_read(args->trans, dp, args->owner,
 				map.br_startoff, 0, &bp);
 		if (error)
 			break;
@@ -449,18 +472,22 @@ list_leafdir(
 			offset += libxfs_dir2_data_entsize(mp, dep->namelen);
 			filetype = libxfs_dir2_data_get_ftype(mp, dep);
 
-			dir_emit(mp, xfs_dir2_byte_to_dataptr(dirboff + offset),
+			error = dir_emit(args->trans, args->dp,
+					xfs_dir2_byte_to_dataptr(dirboff + offset),
 					(char *)dep->name, dep->namelen,
-					be64_to_cpu(dep->inumber), filetype);
+					be64_to_cpu(dep->inumber), filetype,
+					private);
+			if (error)
+				break;
 		}
 
 		dabno += XFS_DADDR_TO_FSB(mp, bp->b_length);
-		libxfs_buf_relse(bp);
+		libxfs_trans_brelse(args->trans, bp);
 		bp = NULL;
 	}
 
 	if (bp)
-		libxfs_buf_relse(bp);
+		libxfs_trans_brelse(args->trans, bp);
 
 	return error;
 }
@@ -468,9 +495,13 @@ list_leafdir(
 /* Read the directory, display contents. */
 static int
 listdir(
-	struct xfs_inode	*dp)
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	dir_emit_t		dir_emit,
+	void			*private)
 {
 	struct xfs_da_args	args = {
+		.trans		= tp,
 		.dp		= dp,
 		.geo		= dp->i_mount->m_dir_geo,
 		.owner		= dp->i_ino,
@@ -479,14 +510,14 @@ listdir(
 
 	switch (libxfs_dir2_format(&args, &error)) {
 	case XFS_DIR2_FMT_SF:
-		return list_sfdir(&args);
+		return list_sfdir(&args, dir_emit, private);
 	case XFS_DIR2_FMT_BLOCK:
-		return list_blockdir(&args);
+		return list_blockdir(&args, dir_emit, private);
 	case XFS_DIR2_FMT_LEAF:
 	case XFS_DIR2_FMT_NODE:
-		return list_leafdir(&args);
+		return list_leafdir(&args, dir_emit, private);
 	default:
-		return error;
+		return EFSCORRUPTED;
 	}
 }
 
@@ -526,7 +557,7 @@ ls_cur(
 	if (tag)
 		dbprintf(_("%s:\n"), tag);
 
-	error = listdir(dp);
+	error = listdir(NULL, dp, print_dirent, NULL);
 	if (error)
 		goto rele;
 


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
                     ` (2 preceding siblings ...)
  2025-02-06 23:03   ` [PATCH 3/4] xfs_db: make listdir more generally useful Darrick J. Wong
@ 2025-02-06 23:03   ` Darrick J. Wong
  2025-02-13 13:25     ` Andrey Albershteyn
                       ` (2 more replies)
  3 siblings, 3 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-06 23:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Aheada of deprecating V4 support in the kernel, let's give people a way
to extract their files from a filesystem without needing to mount.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 db/Makefile              |    3 
 db/command.c             |    1 
 db/command.h             |    1 
 db/namei.c               |    5 
 db/namei.h               |   18 +
 db/rdump.c               |  999 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/libxfs_api_defs.h |    2 
 man/man8/xfs_db.8        |   26 +
 8 files changed, 1052 insertions(+), 3 deletions(-)
 create mode 100644 db/namei.h
 create mode 100644 db/rdump.c


diff --git a/db/Makefile b/db/Makefile
index 02eeead25b49d0..e36e775eee6021 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -45,6 +45,7 @@ HFILES = \
 	logformat.h \
 	malloc.h \
 	metadump.h \
+	namei.h \
 	obfuscate.h \
 	output.h \
 	print.h \
@@ -64,7 +65,7 @@ CFILES = $(HFILES:.h=.c) \
 	convert.c \
 	info.c \
 	iunlink.c \
-	namei.c \
+	rdump.c \
 	timelimit.c
 LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
 
diff --git a/db/command.c b/db/command.c
index 1b46c3fec08a0e..15bdabbcb7d728 100644
--- a/db/command.c
+++ b/db/command.c
@@ -145,4 +145,5 @@ init_commands(void)
 	timelimit_init();
 	iunlink_init();
 	bmapinflate_init();
+	rdump_init();
 }
diff --git a/db/command.h b/db/command.h
index 2c2926afd7b516..21419bbe65bfeb 100644
--- a/db/command.h
+++ b/db/command.h
@@ -36,3 +36,4 @@ extern void		timelimit_init(void);
 extern void		namei_init(void);
 extern void		iunlink_init(void);
 extern void		bmapinflate_init(void);
+extern void		rdump_init(void);
diff --git a/db/namei.c b/db/namei.c
index 6f277a65ed91ac..2586e0591c2357 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -14,6 +14,7 @@
 #include "fprint.h"
 #include "field.h"
 #include "inode.h"
+#include "namei.h"
 
 /* Path lookup */
 
@@ -144,7 +145,7 @@ path_navigate(
 }
 
 /* Walk a directory path to an inode and set the io cursor to that inode. */
-static int
+int
 path_walk(
 	xfs_ino_t	rootino,
 	const char	*path)
@@ -493,7 +494,7 @@ list_leafdir(
 }
 
 /* Read the directory, display contents. */
-static int
+int
 listdir(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
diff --git a/db/namei.h b/db/namei.h
new file mode 100644
index 00000000000000..05c384bc9a6c35
--- /dev/null
+++ b/db/namei.h
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef DB_NAMEI_H_
+#define DB_NAMEI_H_
+
+int path_walk(xfs_ino_t rootino, const char *path);
+
+typedef int (*dir_emit_t)(struct xfs_trans *tp, struct xfs_inode *dp,
+		xfs_dir2_dataptr_t off, char *name, ssize_t namelen,
+		xfs_ino_t ino, uint8_t dtype, void *private);
+
+int listdir(struct xfs_trans *tp, struct xfs_inode *dp, dir_emit_t dir_emit,
+		void *private);
+
+#endif /* DB_NAMEI_H_ */
diff --git a/db/rdump.c b/db/rdump.c
new file mode 100644
index 00000000000000..1e31b40a072bdc
--- /dev/null
+++ b/db/rdump.c
@@ -0,0 +1,999 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "command.h"
+#include "output.h"
+#include "init.h"
+#include "io.h"
+#include "namei.h"
+#include "type.h"
+#include "input.h"
+#include "faddr.h"
+#include "fprint.h"
+#include "field.h"
+#include "inode.h"
+#include "listxattr.h"
+#include <sys/xattr.h>
+#include <linux/xattr.h>
+
+static bool strict_errors;
+
+/* file attributes that we might have lost */
+#define LOST_OWNER		(1U << 0)
+#define LOST_MODE		(1U << 1)
+#define LOST_TIME		(1U << 2)
+#define LOST_SOME_FSXATTR	(1U << 3)
+#define LOST_FSXATTR		(1U << 4)
+#define LOST_XATTR		(1U << 5)
+
+static unsigned int lost_mask;
+
+static void
+rdump_help(void)
+{
+	dbprintf(_(
+"\n"
+" Recover files out of the filesystem into a directory.\n"
+"\n"
+" Options:\n"
+"   -s      -- Fail on errors when reading content from the filesystem.\n"
+"   paths   -- Copy only these paths.  If no paths are given, copy everything.\n"
+"   destdir -- The destination into which files are recovered.\n"
+	));
+}
+
+struct destdir {
+	int	fd;
+	char	*path;
+	char	*sep;
+};
+
+struct pathbuf {
+	size_t	len;
+	char	path[PATH_MAX + 1];
+};
+
+static int rdump_file(struct xfs_trans *tp, xfs_ino_t ino,
+		const struct destdir *destdir, struct pathbuf *pbuf);
+
+static inline unsigned int xflags2getflags(const struct fsxattr *fa)
+{
+	unsigned int	ret = 0;
+
+	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
+		ret |= FS_IMMUTABLE_FL;
+	if (fa->fsx_xflags & FS_XFLAG_APPEND)
+		ret |= FS_APPEND_FL;
+	if (fa->fsx_xflags & FS_XFLAG_SYNC)
+		ret |= FS_SYNC_FL;
+	if (fa->fsx_xflags & FS_XFLAG_NOATIME)
+		ret |= FS_NOATIME_FL;
+	if (fa->fsx_xflags & FS_XFLAG_NODUMP)
+		ret |= FS_NODUMP_FL;
+	if (fa->fsx_xflags & FS_XFLAG_DAX)
+		ret |= FS_DAX_FL;
+	if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
+		ret |= FS_PROJINHERIT_FL;
+	return ret;
+}
+
+/* Copy common file attributes to this fd */
+static int
+rdump_fileattrs_fd(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct fsxattr		fsxattr = {
+		.fsx_extsize	= ip->i_extsize,
+		.fsx_projid	= ip->i_projid,
+		.fsx_cowextsize	= ip->i_cowextsize,
+		.fsx_xflags	= xfs_ip2xflags(ip),
+	};
+	int			ret;
+
+	ret = fchmod(fd, VFS_I(ip)->i_mode & ~S_IFMT);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_MODE;
+		else
+			dbprintf(_("%s%s%s: fchmod %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = fchown(fd, i_uid_read(VFS_I(ip)), i_gid_read(VFS_I(ip)));
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_OWNER;
+		else
+			dbprintf(_("%s%s%s: fchown %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = ioctl(fd, XFS_IOC_FSSETXATTR, &fsxattr);
+	if (ret) {
+		unsigned int	getflags = xflags2getflags(&fsxattr);
+
+		/* try to use setflags if the target is not xfs */
+		if (errno == EOPNOTSUPP || errno == ENOTTY) {
+			lost_mask |= LOST_SOME_FSXATTR;
+			ret = ioctl(fd, FS_IOC_SETFLAGS, &getflags);
+		}
+
+		if (errno == EOPNOTSUPP || errno == EPERM || errno == ENOTTY)
+			lost_mask |= LOST_FSXATTR;
+		else
+			dbprintf(_("%s%s%s: fssetxattr %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Copy common file attributes to this path */
+static int
+rdump_fileattrs_path(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	int			ret;
+
+	ret = fchmodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & ~S_IFMT,
+			AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		/* fchmodat on a symlink is not supported */
+		if (errno == EPERM || errno == EOPNOTSUPP)
+			lost_mask |= LOST_MODE;
+		else
+			dbprintf(_("%s%s%s: fchmodat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = fchownat(destdir->fd, pbuf->path, i_uid_read(VFS_I(ip)),
+			i_gid_read(VFS_I(ip)), AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_OWNER;
+		else
+			dbprintf(_("%s%s%s: fchownat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	/* XXX cannot copy fsxattrs */
+
+	return 0;
+}
+
+/* Copy access and modification timestamps to this fd. */
+static int
+rdump_timestamps_fd(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct timespec		times[2] = {
+		[0] = {
+			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
+		},
+		[1] = {
+			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
+		},
+	};
+	int			ret;
+
+	/* XXX cannot copy ctime or btime */
+
+	ret = futimens(fd, times);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_TIME;
+		else
+			dbprintf(_("%s%s%s: futimens %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Copy access and modification timestamps to this path. */
+static int
+rdump_timestamps_path(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	struct timespec		times[2] = {
+		[0] = {
+			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
+		},
+		[1] = {
+			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
+		},
+	};
+	int			ret;
+
+	/* XXX cannot copy ctime or btime */
+
+	ret = utimensat(destdir->fd, pbuf->path, times, AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_TIME;
+		else
+			dbprintf(_("%s%s%s: utimensat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+struct copyxattr {
+	const struct destdir	*destdir;
+	const struct pathbuf	*pbuf;
+	int			fd;
+	char			name[XATTR_NAME_MAX + 1];
+	char			value[XATTR_SIZE_MAX];
+};
+
+/* Copy one extended attribute */
+static int
+rdump_xattr(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	const char		*nsp;
+	struct copyxattr	*cx = priv;
+	const size_t		remaining = sizeof(cx->name);
+	ssize_t			added;
+	int			ret;
+
+	if (attr_flags & XFS_ATTR_PARENT)
+		return 0;
+
+	/* Format xattr name */
+	if (attr_flags & XFS_ATTR_ROOT)
+		nsp = XATTR_TRUSTED_PREFIX;
+	else if (attr_flags & XFS_ATTR_SECURE)
+		nsp = XATTR_SECURITY_PREFIX;
+	else
+		nsp = XATTR_USER_PREFIX;
+	added = snprintf(cx->name, remaining, "%s.%.*s", nsp, namelen, name);
+	if (added > remaining) {
+		dbprintf(_("%s%s%s: ran out of space formatting xattr name\n"),
+				cx->destdir->path, cx->destdir->sep,
+				cx->pbuf->path);
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	/* Retrieve xattr value if needed */
+	if (valuelen > 0 && !value) {
+		struct xfs_da_args	args = {
+			.trans		= tp,
+			.dp		= ip,
+			.geo		= mp->m_attr_geo,
+			.owner		= ip->i_ino,
+			.attr_filter	= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
+			.namelen	= namelen,
+			.name		= name,
+			.value		= cx->value,
+			.valuelen	= valuelen,
+		};
+
+		ret = -libxfs_attr_rmtval_get(&args);
+		if (ret) {
+			dbprintf(_("%s: reading xattr \"%.*s\" value %s\n"),
+					cx->pbuf->path, namelen, name,
+					strerror(ret));
+			return strict_errors ? ECANCELED : 0;
+		}
+	}
+
+	ret = fsetxattr(cx->fd, cx->name, cx->value, valuelen, 0);
+	if (ret) {
+		if (ret == EOPNOTSUPP)
+			lost_mask |= LOST_XATTR;
+		else
+			dbprintf(_("%s%s%s: fsetxattr \"%.*s\" %s\n"),
+					cx->destdir->path, cx->destdir->sep,
+					cx->pbuf->path, namelen, name,
+					strerror(errno));
+		if (strict_errors)
+			return ECANCELED;
+	}
+
+	return 0;
+}
+
+/* Copy extended attributes */
+static int
+rdump_xattrs(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct copyxattr	*cx;
+	int			ret;
+
+	cx = calloc(1, sizeof(struct copyxattr));
+	if (!cx) {
+		dbprintf(_("%s%s%s: allocating xattr buffer %s\n"),
+				destdir->path, destdir->sep, pbuf->path,
+				strerror(errno));
+		return 1;
+	}
+	cx->destdir = destdir;
+	cx->pbuf = pbuf;
+	cx->fd = fd;
+
+	ret = xattr_walk(tp, ip, rdump_xattr, cx);
+	if (ret && ret != ECANCELED) {
+		dbprintf(_("%s%s%s: listxattr %s\n"), destdir->path,
+				destdir->sep, pbuf->path, strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	free(cx);
+	return 0;
+}
+
+struct copydirent {
+	const struct destdir	*destdir;
+	struct pathbuf		*pbuf;
+	int			fd;
+};
+
+/* Copy a directory entry. */
+static int
+rdump_dirent(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	off,
+	char			*name,
+	ssize_t			namelen,
+	xfs_ino_t		ino,
+	uint8_t			dtype,
+	void			*private)
+{
+	struct copydirent	*cd = private;
+	size_t			oldlen = cd->pbuf->len;
+	const size_t		remaining = PATH_MAX + 1 - oldlen;
+	ssize_t			added;
+	int			ret;
+
+	/* Negative length means name is null-terminated */
+	if (namelen < 0)
+		namelen = -namelen;
+
+	/* Ignore dot and dotdot */
+	if (namelen == 1 && name[0] == '.')
+		return 0;
+	if (namelen == 2 && name[0] == '.' && name[1] == '.')
+		return 0;
+
+	if (namelen > FILENAME_MAX) {
+		dbprintf(_("%s%s%s: %s\n"),
+				cd->destdir->path, cd->destdir->sep,
+				cd->pbuf->path, strerror(ENAMETOOLONG));
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	added = snprintf(&cd->pbuf->path[oldlen], remaining, "%s%.*s",
+			oldlen ? "/" : "", (int)namelen, name);
+	if (added > remaining) {
+		dbprintf(_("%s%s%s: ran out of space formatting file name\n"),
+				cd->destdir->path, cd->destdir->sep,
+				cd->pbuf->path);
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	cd->pbuf->len += added;
+	ret = rdump_file(tp, ino, cd->destdir, cd->pbuf);
+	cd->pbuf->len = oldlen;
+	cd->pbuf->path[oldlen] = 0;
+	return ret;
+}
+
+/* Copy a directory */
+static int
+rdump_directory(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	const struct destdir	*destdir,
+	struct pathbuf		*pbuf)
+{
+	struct copydirent	*cd;
+	int			ret, ret2;
+
+	cd = calloc(1, sizeof(struct copydirent));
+	if (!cd) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+	cd->destdir = destdir;
+	cd->pbuf = pbuf;
+
+	if (pbuf->len) {
+		/*
+		 * If path is non-empty, we want to create a child somewhere
+		 * underneath the target directory.
+		 */
+		ret = mkdirat(destdir->fd, pbuf->path, 0600);
+		if (ret && errno != EEXIST) {
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+			goto out_cd;
+		}
+
+		cd->fd = openat(destdir->fd, pbuf->path,
+				O_RDONLY | O_DIRECTORY);
+		if (cd->fd < 0) {
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+			ret = 1;
+			goto out_cd;
+		}
+	} else {
+		/*
+		 * If path is empty, then we're copying the children of a
+		 * directory into the target directory.
+		 */
+		cd->fd = destdir->fd;
+	}
+
+	ret = rdump_fileattrs_fd(dp, destdir, pbuf, cd->fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	if (xfs_inode_has_attr_fork(dp)) {
+		ret = rdump_xattrs(tp, dp, destdir, pbuf, cd->fd);
+		if (ret && strict_errors)
+			goto out_close;
+	}
+
+	ret = listdir(tp, dp, rdump_dirent, cd);
+	if (ret && ret != ECANCELED) {
+		dbprintf(_("%s%s%s: readdir %s\n"), destdir->path,
+				destdir->sep, pbuf->path, strerror(ret));
+		if (strict_errors)
+			goto out_close;
+	}
+
+	ret = rdump_timestamps_fd(dp, destdir, pbuf, cd->fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	ret = 0;
+
+out_close:
+	if (cd->fd != destdir->fd) {
+		ret2 = close(cd->fd);
+		if (ret2) {
+			if (!ret)
+				ret = ret2;
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		}
+	}
+
+out_cd:
+	free(cd);
+	return ret;
+}
+
+/* Copy file data */
+static int
+rdump_regfile_data(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct xfs_bmbt_irec	irec = { };
+	struct xfs_buftarg	*btp;
+	int			nmaps;
+	off_t			pos = 0;
+	const off_t		isize = ip->i_disk_size;
+	int			ret;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		btp = ip->i_mount->m_rtdev_targp;
+	else
+		btp = ip->i_mount->m_ddev_targp;
+
+	for (;
+	     pos < isize;
+	     pos = XFS_FSB_TO_B(mp, irec.br_startoff + irec.br_blockcount)) {
+		struct xfs_buf	*bp;
+		off_t		buf_pos;
+		off_t		fd_pos;
+		xfs_fileoff_t	off = XFS_B_TO_FSBT(mp, pos);
+		xfs_filblks_t	max_read = XFS_B_TO_FSB(mp, 1048576);
+		xfs_daddr_t	daddr;
+		size_t		count;
+
+		nmaps = 1;
+		ret = -libxfs_bmapi_read(ip, off, max_read, &irec, &nmaps, 0);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+			if (strict_errors)
+				return 1;
+			continue;
+		}
+		if (!nmaps)
+			break;
+
+		if (!xfs_bmap_is_written_extent(&irec))
+			continue;
+
+		fd_pos = XFS_FSB_TO_B(mp, irec.br_startoff);
+		if (XFS_IS_REALTIME_INODE(ip))
+			daddr =  xfs_rtb_to_daddr(mp, irec.br_startblock);
+		else
+			daddr = XFS_FSB_TO_DADDR(mp, irec.br_startblock);
+
+		ret = -libxfs_buf_read_uncached(btp, daddr,
+				XFS_FSB_TO_BB(mp, irec.br_blockcount), 0, &bp,
+				NULL);
+		if (ret) {
+			dbprintf(_("%s: reading pos 0x%llx %s\n"), pbuf->path,
+					fd_pos, strerror(ret));
+			if (strict_errors)
+				return 1;
+			continue;
+		}
+
+		count = XFS_FSB_TO_B(mp, irec.br_blockcount);
+		if (fd_pos + count > isize)
+			count = isize - fd_pos;
+
+		buf_pos = 0;
+		while (count > 0) {
+			ssize_t	written;
+
+			written = pwrite(fd, bp->b_addr + buf_pos, count,
+					fd_pos);
+			if (written < 0) {
+				libxfs_buf_relse(bp);
+				dbprintf(_("%s%s%s: writing pos 0x%llx %s\n"),
+						destdir->path, destdir->sep,
+						pbuf->path, fd_pos,
+						strerror(errno));
+				return 1;
+			}
+			if (!written) {
+				libxfs_buf_relse(bp);
+				dbprintf(
+ _("%s%s%s: wrote zero at pos 0x%llx %s\n"),
+						destdir->path, destdir->sep,
+						pbuf->path, fd_pos);
+				return 1;
+			}
+
+			fd_pos += written;
+			buf_pos += written;
+			count -= written;
+		}
+
+		libxfs_buf_relse(bp);
+	}
+
+	ret = ftruncate(fd, isize);
+	if (ret) {
+		dbprintf(_("%s%s%s: setting file length 0x%llx %s\n"),
+				destdir->path, destdir->sep, pbuf->path,
+				isize, strerror(errno));
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Copy a regular file */
+static int
+rdump_regfile(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	int			fd;
+	int			ret, ret2;
+
+	fd = openat(destdir->fd, pbuf->path, O_RDWR | O_CREAT | O_TRUNC, 0600);
+	if (fd < 0) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_fd(ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	if (xfs_inode_has_attr_fork(ip)) {
+		ret = rdump_xattrs(tp, ip, destdir, pbuf, fd);
+		if (ret && strict_errors)
+			goto out_close;
+	}
+
+	ret = rdump_regfile_data(tp, ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	ret = rdump_timestamps_fd(ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+out_close:
+	ret2 = close(fd);
+	if (ret2) {
+		if (!ret)
+			ret = ret2;
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	return ret;
+}
+
+/* Copy a symlink */
+static int
+rdump_symlink(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	char			target[XFS_SYMLINK_MAXLEN + 1];
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	unsigned int		targetlen = ip->i_disk_size;
+	int			ret;
+
+	if (ifp->if_format == XFS_DINODE_FMT_LOCAL) {
+		memcpy(target, ifp->if_data, targetlen);
+	} else {
+		ret = -libxfs_symlink_remote_read(ip, target);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+			return strict_errors ? 1 : 0;
+		}
+	}
+	target[targetlen] = 0;
+
+	ret = symlinkat(target, destdir->fd, pbuf->path);
+	if (ret) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = rdump_timestamps_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+/* Copy a special file */
+static int
+rdump_special(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	const unsigned int	major = IRIX_DEV_MAJOR(VFS_I(ip)->i_rdev);
+	const unsigned int	minor = IRIX_DEV_MINOR(VFS_I(ip)->i_rdev);
+	int			ret;
+
+	ret = mknodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & S_IFMT,
+			makedev(major, minor));
+	if (ret) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = rdump_timestamps_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+/* Dump some kind of file. */
+static int
+rdump_file(
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	const struct destdir	*destdir,
+	struct pathbuf		*pbuf)
+{
+	struct xfs_inode	*ip;
+	int			ret;
+
+	ret = -libxfs_iget(mp, tp, ino, 0, &ip);
+	if (ret) {
+		dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+		return strict_errors ? ret : 0;
+	}
+
+	switch(VFS_I(ip)->i_mode & S_IFMT) {
+	case S_IFDIR:
+		ret = rdump_directory(tp, ip, destdir, pbuf);
+		break;
+	case S_IFREG:
+		ret = rdump_regfile(tp, ip, destdir, pbuf);
+		break;
+	case S_IFLNK:
+		ret = rdump_symlink(tp, ip, destdir, pbuf);
+		break;
+	default:
+		ret = rdump_special(tp, ip, destdir, pbuf);
+		break;
+	}
+
+	libxfs_irele(ip);
+	return ret;
+}
+
+/* Copy one path out of the filesystem. */
+static int
+rdump_path(
+	struct xfs_mount	*mp,
+	bool			sole_path,
+	const char		*path,
+	const struct destdir	*destdir)
+{
+	struct xfs_trans	*tp;
+	struct pathbuf		*pbuf;
+	const char		*basename;
+	ssize_t			pathlen = strlen(path);
+	int			ret = 1;
+
+	/* Set up destination path data */
+	if (pathlen > PATH_MAX) {
+		dbprintf(_("%s: %s\n"), path, strerror(ENAMETOOLONG));
+		return 1;
+	}
+
+	pbuf = calloc(1, sizeof(struct pathbuf));
+	if (!pbuf) {
+		dbprintf(_("allocating path buf: %s\n"), strerror(errno));
+		return 1;
+	}
+	basename = strrchr(path, '/');
+	if (basename) {
+		pbuf->len = pathlen - (basename + 1 - path);
+		memcpy(pbuf->path, basename + 1, pbuf->len);
+	} else {
+		pbuf->len = pathlen;
+		memcpy(pbuf->path, path, pbuf->len);
+	}
+	pbuf->path[pbuf->len] = 0;
+
+	/* Dump the inode referenced. */
+	if (pathlen) {
+		ret = path_walk(mp->m_sb.sb_rootino, path);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), path, strerror(ret));
+			return 1;
+		}
+
+		if (sole_path) {
+			struct xfs_dinode	*dip = iocur_top->data;
+
+			/*
+			 * If this is the only path to copy out and it's a dir,
+			 * then we can copy the children directly into the
+			 * target.
+			 */
+			if (S_ISDIR(be16_to_cpu(dip->di_mode))) {
+				pbuf->len = 0;
+				pbuf->path[0] = 0;
+			}
+		}
+	} else {
+		set_cur_inode(mp->m_sb.sb_rootino);
+	}
+
+	ret = -libxfs_trans_alloc_empty(mp, &tp);
+	if (ret) {
+		dbprintf(_("allocating state: %s\n"), strerror(ret));
+		goto out_pbuf;
+	}
+
+	ret = rdump_file(tp, iocur_top->ino, destdir, pbuf);
+	libxfs_trans_cancel(tp);
+out_pbuf:
+	free(pbuf);
+	return ret;
+}
+
+static int
+rdump_f(
+	int		argc,
+	char		*argv[])
+{
+	struct destdir	destdir;
+	int		i;
+	int		c;
+	int		ret;
+
+	lost_mask = 0;
+	strict_errors = false;
+	while ((c = getopt(argc, argv, "s")) != -1) {
+		switch (c) {
+		case 's':
+			strict_errors = true;
+			break;
+		default:
+			rdump_help();
+			return 0;
+		}
+	}
+
+	if (argc < optind + 1) {
+		dbprintf(
+ _("Must supply destination directory.\n"));
+		return 0;
+	}
+
+	/* Create and open destination directory */
+	destdir.path = argv[argc - 1];
+	ret = mkdir(destdir.path, 0755);
+	if (ret && errno != EEXIST) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (destdir.path[0] == 0) {
+		dbprintf(
+ _("Destination dir must be at least one character.\n"));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (destdir.path[strlen(destdir.path) - 1] != '/')
+		destdir.sep = "/";
+	else
+		destdir.sep = "";
+	destdir.fd = open(destdir.path, O_DIRECTORY | O_RDONLY);
+	if (destdir.fd < 0) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (optind == argc - 1) {
+		/* no dirs given, just do the whole fs */
+		push_cur();
+		ret = rdump_path(mp, false, "", &destdir);
+		pop_cur();
+		if (ret)
+			exitcode = 1;
+		goto out_close;
+	}
+
+	for (i = optind; i < argc - 1; i++) {
+		size_t	len = strlen(argv[i]);
+
+		/* trim trailing slashes */
+		while (len && argv[i][len - 1] == '/')
+			len--;
+		argv[i][len] = 0;
+
+		push_cur();
+		ret = rdump_path(mp, argc == optind + 2, argv[i], &destdir);
+		pop_cur();
+
+		if (ret) {
+			exitcode = 1;
+			if (strict_errors)
+				break;
+		}
+	}
+
+out_close:
+	ret = close(destdir.fd);
+	if (ret) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+	}
+
+	if (lost_mask & LOST_OWNER)
+		dbprintf(_("%s: some uid/gid could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_MODE)
+		dbprintf(_("%s: some file modes could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_TIME)
+		dbprintf(_("%s: some timestamps could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_SOME_FSXATTR)
+		dbprintf(_("%s: some xfs file attr bits could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_FSXATTR)
+		dbprintf(_("%s: some xfs file attrs could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_XATTR)
+		dbprintf(_("%s: some extended xattrs could not be set\n"),
+				destdir.path);
+
+	return 0;
+}
+
+static struct cmdinfo rdump_cmd = {
+	.name		= "rdump",
+	.cfunc		= rdump_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.canpush	= 0,
+	.args		= "[-s] [paths...] dest_directory",
+	.help		= rdump_help,
+};
+
+void
+rdump_init(void)
+{
+	rdump_cmd.oneline = _("recover files out of a filesystem");
+	add_command(&rdump_cmd);
+}
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 530feef2a47db8..14a67c8c24dd7e 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -47,6 +47,7 @@
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_removename		libxfs_attr_removename
+#define xfs_attr_rmtval_get		libxfs_attr_rmtval_get
 #define xfs_attr_set			libxfs_attr_set
 #define xfs_attr_sethash		libxfs_attr_sethash
 #define xfs_attr_sf_firstentry		libxfs_attr_sf_firstentry
@@ -353,6 +354,7 @@
 #define xfs_sb_version_to_features	libxfs_sb_version_to_features
 #define xfs_symlink_blocks		libxfs_symlink_blocks
 #define xfs_symlink_hdr_ok		libxfs_symlink_hdr_ok
+#define xfs_symlink_remote_read		libxfs_symlink_remote_read
 #define xfs_symlink_write_target	libxfs_symlink_write_target
 
 #define xfs_trans_add_item		libxfs_trans_add_item
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 08f38f37ca01cc..2a9322560584b0 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1132,6 +1132,32 @@ .SH COMMANDS
 Exit
 .BR xfs_db .
 .TP
+.BI "rdump [-s] [" "paths..." "] " "destination_dir"
+Recover the files given by the
+.B path
+arguments by copying them of the filesystem into the directory specified in
+.BR destination_dir .
+
+If the
+.B -s
+option is specified, errors are fatal.
+By default, read errors are ignored in favor of recovering as much data from
+the filesystem as possible.
+
+If zero
+.B paths
+are specified, the entire filesystem is dumped.
+If only one
+.B path
+is specified and it is a directory, the children of that directory will be
+copied directly to the destination.
+If multiple
+.B paths
+are specified, each file is copied into the directory as a new child.
+
+If possible, sparse holes, xfs file attributes, and extended attributes will be
+preserved.
+.TP
 .BI "rgresv [" rgno ]
 Displays the per-rtgroup reservation size, and per-rtgroup
 reservation usage for a given realtime allocation group.


^ permalink raw reply related	[flat|nested] 225+ messages in thread

* Re: [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster
  2025-02-06 22:30   ` [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster Darrick J. Wong
@ 2025-02-07  4:10     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat
  2025-02-06 22:31   ` [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat Darrick J. Wong
@ 2025-02-07  4:10     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-06 22:31   ` [PATCH 04/17] libfrog: wrap handle construction code Darrick J. Wong
@ 2025-02-07  4:34     ` Christoph Hellwig
  2025-02-07  4:49       ` Darrick J. Wong
  2025-02-20 21:36     ` [PATCH v1.1] " Darrick J. Wong
  1 sibling, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 02:31:41PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Clean up all the open-coded logic to construct a file handle from a
> fshandle and some bulkstat/parent pointer information.  The new
> functions are stashed in a private header file to avoid leaking the
> details of xfs_handle construction in the public libhandle headers.

So creating a handle from bulkstat is a totally normal thing to do
for xfs-aware applications.  I'd much rathe see this in libhandle
than hiding it away.

> +		handle_from_fshandle(&handle, file->fshandle, file->fshandle_len);

Nit: overly long line.


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice
  2025-02-06 22:31   ` [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice Darrick J. Wong
@ 2025-02-07  4:35     ` Christoph Hellwig
  2025-02-07  4:46       ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:31:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If parent pointers are enabled, report_ioerr_fsmap will report lost file
> data and xattrs for all files, having used the parent pointer ioctls to
> generate the path of the lost file.  For unlinked files, the path lookup
> will fail, but we'll report the inumber of the file that lost data.

Maybe also add this to the code as a comment?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files
  2025-02-06 22:32   ` [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files Darrick J. Wong
@ 2025-02-07  4:37     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

> +/* Iterate all the user files returned by a bulkstat. */
> +static void
> +scan_user_files(
> +	struct workqueue	*wq,
> +	xfs_agnumber_t		agno,
> +	void			*arg)
> +{
> +	struct xfs_handle	handle;
> +	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;

No need for the cast - wq_ctx is a void pointer.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes
  2025-02-06 22:32   ` [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes Darrick J. Wong
@ 2025-02-07  4:38     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers
  2025-02-06 22:32   ` [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers Darrick J. Wong
@ 2025-02-07  4:39     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records
  2025-02-06 22:33   ` [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records Darrick J. Wong
@ 2025-02-07  4:40     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:33:01PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In scan_ag_bulkstat, we have a for loop that iterates all the
> xfs_bulkstat records in breq->bulkstat.  The loop condition test should
> test against the array length, not the number of bits set in an
> unrelated data structure.  If ocount > xi_alloccount then we miss some
> inodes; if ocount < xi_alloccount then we've walked off the end of the
> array.

Look good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3
  2025-02-06 22:33   ` [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3 Darrick J. Wong
@ 2025-02-07  4:41     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly
  2025-02-06 22:33   ` [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly Darrick J. Wong
@ 2025-02-07  4:42     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails
  2025-02-06 22:33   ` [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails Darrick J. Wong
@ 2025-02-07  4:42     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data
  2025-02-06 22:34   ` [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data Darrick J. Wong
@ 2025-02-07  4:43     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:34:04PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If bulkstat doesn't return an error code or any bulkstat records, we've
> hit the end of the filesystem, so return early.  This can happen if the
> inumbers data came from the very last inobt record in the filesystem and
> every inode in that inobt record is freed immediately after INUMBERS.
> There's no bug here, it's just a minor optimization.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice
  2025-02-07  4:35     ` Christoph Hellwig
@ 2025-02-07  4:46       ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07  4:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 08:35:34PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 02:31:57PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If parent pointers are enabled, report_ioerr_fsmap will report lost file
> > data and xattrs for all files, having used the parent pointer ioctls to
> > generate the path of the lost file.  For unlinked files, the path lookup
> > will fail, but we'll report the inumber of the file that lost data.
> 
> Maybe also add this to the code as a comment?

Will do.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step
  2025-02-06 22:34   ` [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step Darrick J. Wong
@ 2025-02-07  4:46     ` Christoph Hellwig
  2025-02-07  4:50       ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

I think this commit is the best indication the inumbers + bulkstat
game was a really bad idea from the start, and we should have instead
added a bulkstat flag to return records for corrupted inodes instead.

I'll cook up a patch for that, but in the meantime we'll need
workarounds like this one.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code
  2025-02-06 22:34   ` [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code Darrick J. Wong
@ 2025-02-07  4:47     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3
  2025-02-06 22:34   ` [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3 Darrick J. Wong
@ 2025-02-07  4:47     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat()
  2025-02-06 22:35   ` [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat() Darrick J. Wong
@ 2025-02-07  4:48     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  4:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-07  4:34     ` Christoph Hellwig
@ 2025-02-07  4:49       ` Darrick J. Wong
  2025-02-07 17:00         ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07  4:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 08:34:36PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 02:31:41PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Clean up all the open-coded logic to construct a file handle from a
> > fshandle and some bulkstat/parent pointer information.  The new
> > functions are stashed in a private header file to avoid leaking the
> > details of xfs_handle construction in the public libhandle headers.
> 
> So creating a handle from bulkstat is a totally normal thing to do
> for xfs-aware applications.  I'd much rathe see this in libhandle
> than hiding it away.

I was going to protest that even xfsdump doesn't construct its own weird
handle, but Debian codesearch says that Ganesha does it, so I'll move it
to actual libhandle.

> > +		handle_from_fshandle(&handle, file->fshandle, file->fshandle_len);
> 
> Nit: overly long line.

Will fix.

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step
  2025-02-07  4:46     ` Christoph Hellwig
@ 2025-02-07  4:50       ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07  4:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 08:46:34PM -0800, Christoph Hellwig wrote:
> I think this commit is the best indication the inumbers + bulkstat
> game was a really bad idea from the start, and we should have instead
> added a bulkstat flag to return records for corrupted inodes instead.

Yeah, I wish I'd thought of that sooner. :/

> I'll cook up a patch for that, but in the meantime we'll need
> workarounds like this one.

But I'm looking forward to this.

> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization
  2025-02-06 22:49   ` [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization Darrick J. Wong
@ 2025-02-07  5:07     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items
  2025-02-06 22:50   ` [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
@ 2025-02-07  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap
  2025-02-06 22:50   ` [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap Darrick J. Wong
@ 2025-02-07  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:50:30PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add a new entry so that we can scrub the rtrmapbt and its metadata
> directory tree path too.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/27] man: document userspace API changes due to rt rmap
  2025-02-06 22:50   ` [PATCH 04/27] man: document userspace API changes due to rt rmap Darrick J. Wong
@ 2025-02-07  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:50:46PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Update documentation to describe userspace ABI changes made for realtime
> rmap support.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 05/27] xfs_db: compute average btree height
  2025-02-06 22:51   ` [PATCH 05/27] xfs_db: compute average btree height Darrick J. Wong
@ 2025-02-07  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork
  2025-02-06 22:51   ` [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork Darrick J. Wong
@ 2025-02-07  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 07/27] xfs_db: display the realtime rmap btree contents
  2025-02-06 22:51   ` [PATCH 07/27] xfs_db: display the realtime rmap btree contents Darrick J. Wong
@ 2025-02-07  5:16     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:51:33PM -0800, Darrick J. Wong wrote:
> +static int
> +rtrmaproot_rec_count(
> +	void			*obj,
> +	int			startoff)
> +{
> +	struct xfs_rtrmap_root	*block;
> +#ifdef DEBUG
> +	struct xfs_dinode	*dip = obj;
> +#endif
> +
> +	ASSERT(bitoffs(startoff) == 0);
> +	ASSERT(obj == iocur_top->data);
> +	block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff));
> +	ASSERT((char *)block == XFS_DFORK_DPTR(dip));

Hmm, wouldn't this be cleaner by turning the logic around:

	struct xfs_dinode	*dip = obj;
	struct xfs_rtrmap_root	*block = XFS_DFORK_DPTR(dip);

	...

	ASSERT(block == obj + byteize(startoff));

Same for the other uses of this pattern.

Otherwise looks good (as good as xfs_db can look like anyway..)

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 08/27] xfs_db: support the realtime rmapbt
  2025-02-06 22:51   ` [PATCH 08/27] xfs_db: support the realtime rmapbt Darrick J. Wong
@ 2025-02-07  5:17     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 09/27] xfs_db: copy the realtime rmap btree
  2025-02-06 22:52   ` [PATCH 09/27] xfs_db: copy the realtime rmap btree Darrick J. Wong
@ 2025-02-07  5:17     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree
  2025-02-06 22:52   ` [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong
@ 2025-02-07  5:18     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 11/27] xfs_db: add an rgresv command
  2025-02-06 22:52   ` [PATCH 11/27] xfs_db: add an rgresv command Darrick J. Wong
@ 2025-02-07  5:19     ` Christoph Hellwig
  2025-02-07  5:24       ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:52:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create a command to dump rtgroup btree space reservations.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Btw, for rthinreit=1 file systems a command to dump the availale
freepace and the data device would also be really helpful.  But
I guess that's more a thing for the kernel as an ioctl that works
on the life file system.


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree
  2025-02-06 22:52   ` [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong
@ 2025-02-07  5:19     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys
  2025-02-06 22:53   ` [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys Darrick J. Wong
@ 2025-02-07  5:20     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:53:07PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Tidy up the comparison code in this function to match the kernel.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Or do you want to use the fancy cmp_int() helper?


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks
  2025-02-06 22:53   ` [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong
@ 2025-02-07  5:21     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types
  2025-02-06 22:53   ` [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong
@ 2025-02-07  5:22     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups
  2025-02-06 22:53   ` [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong
@ 2025-02-07  5:22     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 17/27] xfs_repair: refactor realtime inode check
  2025-02-06 22:54   ` [PATCH 17/27] xfs_repair: refactor realtime inode check Darrick J. Wong
@ 2025-02-07  5:22     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 11/27] xfs_db: add an rgresv command
  2025-02-07  5:19     ` Christoph Hellwig
@ 2025-02-07  5:24       ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07  5:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 09:19:35PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 02:52:36PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create a command to dump rtgroup btree space reservations.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Btw, for rthinreit=1 file systems a command to dump the availale
> freepace and the data device would also be really helpful.  But
> I guess that's more a thing for the kernel as an ioctl that works
> on the life file system.

We could report the reservations via the existing ag/rg/fs geometry
ioctls.

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes
  2025-02-06 22:54   ` [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong
@ 2025-02-07  5:33     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 02:54:25PM -0800, Darrick J. Wong wrote:
> -extern void rmap_avoid_check(void);
> +extern void rmap_avoid_check(struct xfs_mount *mp);

You might want to drop the extern while you're at it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps
  2025-02-06 22:54   ` [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong
@ 2025-02-07  5:35     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 02:54:41PM -0800, Darrick J. Wong wrote:
> +	struct xfs_rmap_irec	rmap = {
> +		.rm_startblock	= 0,
> +		.rm_blockcount	= mp->m_sb.sb_rextsize,
> +		.rm_owner	= XFS_RMAP_OWN_FS,
> +		.rm_offset	= 0,
> +		.rm_flags	= 0,
> +	};

No real need for the zero initializations.

> +
> +	if (!rmap_needs_work(mp))
> +		return;
> +
> +	if (xfs_has_rtsb(mp) && rgno == 0)
> +		rmap_add_mem_rec(mp, true, rgno, &rmap);

And it might be a tad cleaner to move the variable into the branch
scope here.

Nitpicks aside, this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info
  2025-02-06 22:54   ` [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong
@ 2025-02-07  5:35     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree
  2025-02-06 22:55   ` [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree Darrick J. Wong
@ 2025-02-07  5:54     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels
  2025-02-06 22:55   ` [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels Darrick J. Wong
@ 2025-02-07  5:56     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files
  2025-02-06 22:55   ` [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong
@ 2025-02-07  5:56     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata
  2025-02-06 22:55   ` [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong
@ 2025-02-07  5:57     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 02:55:59PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Realtime metadata btrees can consume quite a bit of space on a full
> filesystem.  Since the metadata are just regular files, we need to

I think there is an inodes or btrees missing after metadata above.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 25/27] xfs_logprint: report realtime RUIs
  2025-02-06 22:56   ` [PATCH 25/27] xfs_logprint: report realtime RUIs Darrick J. Wong
@ 2025-02-07  5:58     ` Christoph Hellwig
  2025-02-07  5:58     ` Christoph Hellwig
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 25/27] xfs_logprint: report realtime RUIs
  2025-02-06 22:56   ` [PATCH 25/27] xfs_logprint: report realtime RUIs Darrick J. Wong
  2025-02-07  5:58     ` Christoph Hellwig
@ 2025-02-07  5:58     ` Christoph Hellwig
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 26/27] mkfs: add some rtgroup inode helpers
  2025-02-06 22:56   ` [PATCH 26/27] mkfs: add some rtgroup inode helpers Darrick J. Wong
@ 2025-02-07  5:59     ` Christoph Hellwig
  2025-02-07  6:07       ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  5:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 02:56:30PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create some simple helpers to reduce the amount of typing whenever we
> access rtgroup inodes.  Conversion was done with this spatch and some
> minor reformatting:

Shouldn't this just be folded into the patch adding the line touched?


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 27/27] mkfs: create the realtime rmap inode
  2025-02-06 22:56   ` [PATCH 27/27] mkfs: create the realtime rmap inode Darrick J. Wong
@ 2025-02-07  6:00     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-07  6:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 26/27] mkfs: add some rtgroup inode helpers
  2025-02-07  5:59     ` Christoph Hellwig
@ 2025-02-07  6:07       ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07  6:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Thu, Feb 06, 2025 at 09:59:46PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 02:56:30PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create some simple helpers to reduce the amount of typing whenever we
> > access rtgroup inodes.  Conversion was done with this spatch and some
> > minor reformatting:
> 
> Shouldn't this just be folded into the patch adding the line touched?

Yeah, I'll do go that.

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-07  4:49       ` Darrick J. Wong
@ 2025-02-07 17:00         ` Darrick J. Wong
  2025-02-13  4:05           ` Christoph Hellwig
  0 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-07 17:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 08:49:22PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 06, 2025 at 08:34:36PM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 06, 2025 at 02:31:41PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Clean up all the open-coded logic to construct a file handle from a
> > > fshandle and some bulkstat/parent pointer information.  The new
> > > functions are stashed in a private header file to avoid leaking the
> > > details of xfs_handle construction in the public libhandle headers.
> > 
> > So creating a handle from bulkstat is a totally normal thing to do
> > for xfs-aware applications.  I'd much rathe see this in libhandle
> > than hiding it away.
> 
> I was going to protest that even xfsdump doesn't construct its own weird
> handle, but Debian codesearch says that Ganesha does it, so I'll move it
> to actual libhandle.

I tried moving the code to libhandle, but I don't entirely like the
result.  The libhandle functions pass around handles as arbitrary binary
blobs that come from and are sent to the kernel, meaning that the
interface is full of (void *, size_t) tuples.  Putting these new
functions in libhandle breaks that abstraction because now clients know
that they can deal with a struct xfs_handle.

We could fix that leak by changing it to a (void *, size_t) tuple, but
then we'd have to validate the size_t or returns -1 having set errno,
which then means that all the client code now has to have error handling
for a case that we're fairly sure can't be true.  This is overkill for
xfsprogs code that knows better, because we can trust ourselves to know
the exact layout of a handle.

So this nice compact code:

	memcpy(&handle.ha_fsid, file->fshandle, file->fshandle_len);
	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
			sizeof(handle.ha_fid.fid_len);

becomes:

	ret = handle_from_fshandle(&handle, file->fshandle,
			file->fshandle_len);
	if (ret) {
		perror("what?");
		return -1;
	}

Which is much more verbose code, and right now it exists to handle an
exceptional condition that is not possible.  If someone outside of
xfsprogs would like this sort of functionality in libhandle I'm all for
adding it, but with zero demand from external users, I prefer to keep
things simple.

For now I'm leaving the declarations as taking a pointer to mutable
struct xfs_handle.  This change also causes the libhandle version number
to jump from 1.0.3 to 1.1.0.

--D

> > > +		handle_from_fshandle(&handle, file->fshandle, file->fshandle_len);
> > 
> > Nit: overly long line.
> 
> Will fix.
> 
> --D
> 

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-07 17:00         ` Darrick J. Wong
@ 2025-02-13  4:05           ` Christoph Hellwig
  2025-02-13  4:20             ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, aalbersh, hch, linux-xfs

On Fri, Feb 07, 2025 at 09:00:02AM -0800, Darrick J. Wong wrote:
> Which is much more verbose code, and right now it exists to handle an
> exceptional condition that is not possible.  If someone outside of
> xfsprogs would like this sort of functionality in libhandle I'm all for
> adding it, but with zero demand from external users, I prefer to keep
> things simple.

Ok, let's keep things simple then for now.  If we need an external
version we can still add it late.  And for the future reads curious
about this maybe add this explanation to the commit?


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/17] libfrog: wrap handle construction code
  2025-02-13  4:05           ` Christoph Hellwig
@ 2025-02-13  4:20             ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-13  4:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Wed, Feb 12, 2025 at 08:05:55PM -0800, Christoph Hellwig wrote:
> On Fri, Feb 07, 2025 at 09:00:02AM -0800, Darrick J. Wong wrote:
> > Which is much more verbose code, and right now it exists to handle an
> > exceptional condition that is not possible.  If someone outside of
> > xfsprogs would like this sort of functionality in libhandle I'm all for
> > adding it, but with zero demand from external users, I prefer to keep
> > things simple.
> 
> Ok, let's keep things simple then for now.  If we need an external
> version we can still add it late.  And for the future reads curious
> about this maybe add this explanation to the commit?

Sounds good, I'll amend the commit message and repost.

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization
  2025-02-06 22:57   ` [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization Darrick J. Wong
@ 2025-02-13  4:20     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items
  2025-02-06 22:57   ` [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
@ 2025-02-13  4:20     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint
  2025-02-06 22:57   ` [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
@ 2025-02-13  4:21     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data
  2025-02-06 22:57   ` [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong
@ 2025-02-13  4:21     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 05/22] man: document userspace API changes due to rt reflink
  2025-02-06 22:58   ` [PATCH 05/22] man: document userspace API changes due to rt reflink Darrick J. Wong
@ 2025-02-13  4:21     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 06/22] xfs_db: display the realtime refcount btree contents
  2025-02-06 22:58   ` [PATCH 06/22] xfs_db: display the realtime refcount btree contents Darrick J. Wong
@ 2025-02-13  4:22     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

Same minor cosmetic comments as for the rmap version, otherwise looks
good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 07/22] xfs_db: support the realtime refcountbt
  2025-02-06 22:58   ` [PATCH 07/22] xfs_db: support the realtime refcountbt Darrick J. Wong
@ 2025-02-13  4:22     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 08/22] xfs_db: copy the realtime refcount btree
  2025-02-06 22:58   ` [PATCH 08/22] xfs_db: copy the realtime refcount btree Darrick J. Wong
@ 2025-02-13  4:23     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command
  2025-02-06 22:59   ` [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command Darrick J. Wong
@ 2025-02-13  4:23     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree
  2025-02-06 22:59   ` [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong
@ 2025-02-13  4:24     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records
  2025-02-06 22:59   ` [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong
@ 2025-02-13  4:24     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types
  2025-02-06 22:59   ` [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong
@ 2025-02-13  4:24     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode
  2025-02-06 23:00   ` [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong
@ 2025-02-13  4:25     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups
  2025-02-06 23:00   ` [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong
@ 2025-02-13  4:25     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts
  2025-02-06 23:00   ` [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong
@ 2025-02-13  4:26     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 16/22] xfs_repair: reject unwritten shared extents
  2025-02-06 23:00   ` [PATCH 16/22] xfs_repair: reject unwritten shared extents Darrick J. Wong
@ 2025-02-13  4:26     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree
  2025-02-06 23:01   ` [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong
@ 2025-02-13  4:27     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set
  2025-02-06 23:01   ` [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong
@ 2025-02-13  4:27     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories
  2025-02-06 23:01   ` [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong
@ 2025-02-13  4:27     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 20/22] xfs_logprint: report realtime CUIs
  2025-02-06 23:01   ` [PATCH 20/22] xfs_logprint: report realtime CUIs Darrick J. Wong
@ 2025-02-13  4:28     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set
  2025-02-06 23:02   ` [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong
@ 2025-02-13  4:28     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 22/22] mkfs: enable reflink on the realtime device
  2025-02-06 23:02   ` [PATCH 22/22] mkfs: enable reflink on the realtime device Darrick J. Wong
@ 2025-02-13  4:28     ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them
  2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
@ 2025-02-13  4:29     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate
  2025-02-06 23:03   ` [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate Darrick J. Wong
@ 2025-02-13  4:29     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 3/4] xfs_db: make listdir more generally useful
  2025-02-06 23:03   ` [PATCH 3/4] xfs_db: make listdir more generally useful Darrick J. Wong
@ 2025-02-13  4:30     ` Christoph Hellwig
  2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-13  4:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
@ 2025-02-13 13:25     ` Andrey Albershteyn
  2025-02-18  8:36     ` Christoph Hellwig
  2025-02-18 19:58     ` [PATCH v1.1 " Darrick J. Wong
  2 siblings, 0 replies; 225+ messages in thread
From: Andrey Albershteyn @ 2025-02-13 13:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On 2025-02-06 15:03:32, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Aheada of deprecating V4 support in the kernel, let's give people a way
> to extract their files from a filesystem without needing to mount.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  db/Makefile              |    3 
>  db/command.c             |    1 
>  db/command.h             |    1 
>  db/namei.c               |    5 
>  db/namei.h               |   18 +
>  db/rdump.c               |  999 ++++++++++++++++++++++++++++++++++++++++++++++

Should rdump show progress? I suppose this will be rarely used/one
time use backup command, I can see the log of all of the files
restored being quite handy. Or maybe silent by default but with flag
like -l restore.log. Also, without any errors on huge filesystem it
would be helpful to see if it's working

Otherwise, looks good to me
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>

>  libxfs/libxfs_api_defs.h |    2 
>  man/man8/xfs_db.8        |   26 +
>  8 files changed, 1052 insertions(+), 3 deletions(-)
>  create mode 100644 db/namei.h
>  create mode 100644 db/rdump.c
> 
> 
> diff --git a/db/Makefile b/db/Makefile
> index 02eeead25b49d0..e36e775eee6021 100644
> --- a/db/Makefile
> +++ b/db/Makefile
> @@ -45,6 +45,7 @@ HFILES = \
>  	logformat.h \
>  	malloc.h \
>  	metadump.h \
> +	namei.h \
>  	obfuscate.h \
>  	output.h \
>  	print.h \
> @@ -64,7 +65,7 @@ CFILES = $(HFILES:.h=.c) \
>  	convert.c \
>  	info.c \
>  	iunlink.c \
> -	namei.c \
> +	rdump.c \
>  	timelimit.c
>  LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
>  
> diff --git a/db/command.c b/db/command.c
> index 1b46c3fec08a0e..15bdabbcb7d728 100644
> --- a/db/command.c
> +++ b/db/command.c
> @@ -145,4 +145,5 @@ init_commands(void)
>  	timelimit_init();
>  	iunlink_init();
>  	bmapinflate_init();
> +	rdump_init();
>  }
> diff --git a/db/command.h b/db/command.h
> index 2c2926afd7b516..21419bbe65bfeb 100644
> --- a/db/command.h
> +++ b/db/command.h
> @@ -36,3 +36,4 @@ extern void		timelimit_init(void);
>  extern void		namei_init(void);
>  extern void		iunlink_init(void);
>  extern void		bmapinflate_init(void);
> +extern void		rdump_init(void);
> diff --git a/db/namei.c b/db/namei.c
> index 6f277a65ed91ac..2586e0591c2357 100644
> --- a/db/namei.c
> +++ b/db/namei.c
> @@ -14,6 +14,7 @@
>  #include "fprint.h"
>  #include "field.h"
>  #include "inode.h"
> +#include "namei.h"
>  
>  /* Path lookup */
>  
> @@ -144,7 +145,7 @@ path_navigate(
>  }
>  
>  /* Walk a directory path to an inode and set the io cursor to that inode. */
> -static int
> +int
>  path_walk(
>  	xfs_ino_t	rootino,
>  	const char	*path)
> @@ -493,7 +494,7 @@ list_leafdir(
>  }
>  
>  /* Read the directory, display contents. */
> -static int
> +int
>  listdir(
>  	struct xfs_trans	*tp,
>  	struct xfs_inode	*dp,
> diff --git a/db/namei.h b/db/namei.h
> new file mode 100644
> index 00000000000000..05c384bc9a6c35
> --- /dev/null
> +++ b/db/namei.h
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (c) 2025 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <djwong@kernel.org>
> + */
> +#ifndef DB_NAMEI_H_
> +#define DB_NAMEI_H_
> +
> +int path_walk(xfs_ino_t rootino, const char *path);
> +
> +typedef int (*dir_emit_t)(struct xfs_trans *tp, struct xfs_inode *dp,
> +		xfs_dir2_dataptr_t off, char *name, ssize_t namelen,
> +		xfs_ino_t ino, uint8_t dtype, void *private);
> +
> +int listdir(struct xfs_trans *tp, struct xfs_inode *dp, dir_emit_t dir_emit,
> +		void *private);
> +
> +#endif /* DB_NAMEI_H_ */
> diff --git a/db/rdump.c b/db/rdump.c
> new file mode 100644
> index 00000000000000..1e31b40a072bdc
> --- /dev/null
> +++ b/db/rdump.c
> @@ -0,0 +1,999 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2025 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <djwong@kernel.org>
> + */
> +#include "libxfs.h"
> +#include "command.h"
> +#include "output.h"
> +#include "init.h"
> +#include "io.h"
> +#include "namei.h"
> +#include "type.h"
> +#include "input.h"
> +#include "faddr.h"
> +#include "fprint.h"
> +#include "field.h"
> +#include "inode.h"
> +#include "listxattr.h"
> +#include <sys/xattr.h>
> +#include <linux/xattr.h>
> +
> +static bool strict_errors;
> +
> +/* file attributes that we might have lost */
> +#define LOST_OWNER		(1U << 0)
> +#define LOST_MODE		(1U << 1)
> +#define LOST_TIME		(1U << 2)
> +#define LOST_SOME_FSXATTR	(1U << 3)
> +#define LOST_FSXATTR		(1U << 4)
> +#define LOST_XATTR		(1U << 5)
> +
> +static unsigned int lost_mask;
> +
> +static void
> +rdump_help(void)
> +{
> +	dbprintf(_(
> +"\n"
> +" Recover files out of the filesystem into a directory.\n"
> +"\n"
> +" Options:\n"
> +"   -s      -- Fail on errors when reading content from the filesystem.\n"
> +"   paths   -- Copy only these paths.  If no paths are given, copy everything.\n"
> +"   destdir -- The destination into which files are recovered.\n"
> +	));
> +}
> +
> +struct destdir {
> +	int	fd;
> +	char	*path;
> +	char	*sep;
> +};
> +
> +struct pathbuf {
> +	size_t	len;
> +	char	path[PATH_MAX + 1];
> +};
> +
> +static int rdump_file(struct xfs_trans *tp, xfs_ino_t ino,
> +		const struct destdir *destdir, struct pathbuf *pbuf);
> +
> +static inline unsigned int xflags2getflags(const struct fsxattr *fa)
> +{
> +	unsigned int	ret = 0;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
> +		ret |= FS_IMMUTABLE_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_APPEND)
> +		ret |= FS_APPEND_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_SYNC)
> +		ret |= FS_SYNC_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_NOATIME)
> +		ret |= FS_NOATIME_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_NODUMP)
> +		ret |= FS_NODUMP_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_DAX)
> +		ret |= FS_DAX_FL;
> +	if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
> +		ret |= FS_PROJINHERIT_FL;
> +	return ret;
> +}
> +
> +/* Copy common file attributes to this fd */
> +static int
> +rdump_fileattrs_fd(
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf,
> +	int			fd)
> +{
> +	struct fsxattr		fsxattr = {
> +		.fsx_extsize	= ip->i_extsize,
> +		.fsx_projid	= ip->i_projid,
> +		.fsx_cowextsize	= ip->i_cowextsize,
> +		.fsx_xflags	= xfs_ip2xflags(ip),
> +	};
> +	int			ret;
> +
> +	ret = fchmod(fd, VFS_I(ip)->i_mode & ~S_IFMT);
> +	if (ret) {
> +		if (errno == EPERM)
> +			lost_mask |= LOST_MODE;
> +		else
> +			dbprintf(_("%s%s%s: fchmod %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	ret = fchown(fd, i_uid_read(VFS_I(ip)), i_gid_read(VFS_I(ip)));
> +	if (ret) {
> +		if (errno == EPERM)
> +			lost_mask |= LOST_OWNER;
> +		else
> +			dbprintf(_("%s%s%s: fchown %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	ret = ioctl(fd, XFS_IOC_FSSETXATTR, &fsxattr);
> +	if (ret) {
> +		unsigned int	getflags = xflags2getflags(&fsxattr);
> +
> +		/* try to use setflags if the target is not xfs */
> +		if (errno == EOPNOTSUPP || errno == ENOTTY) {
> +			lost_mask |= LOST_SOME_FSXATTR;
> +			ret = ioctl(fd, FS_IOC_SETFLAGS, &getflags);
> +		}
> +
> +		if (errno == EOPNOTSUPP || errno == EPERM || errno == ENOTTY)
> +			lost_mask |= LOST_FSXATTR;
> +		else
> +			dbprintf(_("%s%s%s: fssetxattr %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Copy common file attributes to this path */
> +static int
> +rdump_fileattrs_path(
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf)
> +{
> +	int			ret;
> +
> +	ret = fchmodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & ~S_IFMT,
> +			AT_SYMLINK_NOFOLLOW);
> +	if (ret) {
> +		/* fchmodat on a symlink is not supported */
> +		if (errno == EPERM || errno == EOPNOTSUPP)
> +			lost_mask |= LOST_MODE;
> +		else
> +			dbprintf(_("%s%s%s: fchmodat %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	ret = fchownat(destdir->fd, pbuf->path, i_uid_read(VFS_I(ip)),
> +			i_gid_read(VFS_I(ip)), AT_SYMLINK_NOFOLLOW);
> +	if (ret) {
> +		if (errno == EPERM)
> +			lost_mask |= LOST_OWNER;
> +		else
> +			dbprintf(_("%s%s%s: fchownat %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	/* XXX cannot copy fsxattrs */
> +
> +	return 0;
> +}
> +
> +/* Copy access and modification timestamps to this fd. */
> +static int
> +rdump_timestamps_fd(
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf,
> +	int			fd)
> +{
> +	struct timespec		times[2] = {
> +		[0] = {
> +			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
> +			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
> +		},
> +		[1] = {
> +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> +		},
> +	};
> +	int			ret;
> +
> +	/* XXX cannot copy ctime or btime */
> +
> +	ret = futimens(fd, times);
> +	if (ret) {
> +		if (errno == EPERM)
> +			lost_mask |= LOST_TIME;
> +		else
> +			dbprintf(_("%s%s%s: futimens %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Copy access and modification timestamps to this path. */
> +static int
> +rdump_timestamps_path(
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf)
> +{
> +	struct timespec		times[2] = {
> +		[0] = {
> +			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
> +			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
> +		},
> +		[1] = {
> +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> +		},
> +	};
> +	int			ret;
> +
> +	/* XXX cannot copy ctime or btime */
> +
> +	ret = utimensat(destdir->fd, pbuf->path, times, AT_SYMLINK_NOFOLLOW);
> +	if (ret) {
> +		if (errno == EPERM)
> +			lost_mask |= LOST_TIME;
> +		else
> +			dbprintf(_("%s%s%s: utimensat %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +struct copyxattr {
> +	const struct destdir	*destdir;
> +	const struct pathbuf	*pbuf;
> +	int			fd;
> +	char			name[XATTR_NAME_MAX + 1];
> +	char			value[XATTR_SIZE_MAX];
> +};
> +
> +/* Copy one extended attribute */
> +static int
> +rdump_xattr(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	unsigned int		attr_flags,
> +	const unsigned char	*name,
> +	unsigned int		namelen,
> +	const void		*value,
> +	unsigned int		valuelen,
> +	void			*priv)
> +{
> +	const char		*nsp;
> +	struct copyxattr	*cx = priv;
> +	const size_t		remaining = sizeof(cx->name);
> +	ssize_t			added;
> +	int			ret;
> +
> +	if (attr_flags & XFS_ATTR_PARENT)
> +		return 0;
> +
> +	/* Format xattr name */
> +	if (attr_flags & XFS_ATTR_ROOT)
> +		nsp = XATTR_TRUSTED_PREFIX;
> +	else if (attr_flags & XFS_ATTR_SECURE)
> +		nsp = XATTR_SECURITY_PREFIX;
> +	else
> +		nsp = XATTR_USER_PREFIX;
> +	added = snprintf(cx->name, remaining, "%s.%.*s", nsp, namelen, name);
> +	if (added > remaining) {
> +		dbprintf(_("%s%s%s: ran out of space formatting xattr name\n"),
> +				cx->destdir->path, cx->destdir->sep,
> +				cx->pbuf->path);
> +		return strict_errors ? ECANCELED : 0;
> +	}
> +
> +	/* Retrieve xattr value if needed */
> +	if (valuelen > 0 && !value) {
> +		struct xfs_da_args	args = {
> +			.trans		= tp,
> +			.dp		= ip,
> +			.geo		= mp->m_attr_geo,
> +			.owner		= ip->i_ino,
> +			.attr_filter	= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
> +			.namelen	= namelen,
> +			.name		= name,
> +			.value		= cx->value,
> +			.valuelen	= valuelen,
> +		};
> +
> +		ret = -libxfs_attr_rmtval_get(&args);
> +		if (ret) {
> +			dbprintf(_("%s: reading xattr \"%.*s\" value %s\n"),
> +					cx->pbuf->path, namelen, name,
> +					strerror(ret));
> +			return strict_errors ? ECANCELED : 0;
> +		}
> +	}
> +
> +	ret = fsetxattr(cx->fd, cx->name, cx->value, valuelen, 0);
> +	if (ret) {
> +		if (ret == EOPNOTSUPP)
> +			lost_mask |= LOST_XATTR;
> +		else
> +			dbprintf(_("%s%s%s: fsetxattr \"%.*s\" %s\n"),
> +					cx->destdir->path, cx->destdir->sep,
> +					cx->pbuf->path, namelen, name,
> +					strerror(errno));
> +		if (strict_errors)
> +			return ECANCELED;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Copy extended attributes */
> +static int
> +rdump_xattrs(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf,
> +	int			fd)
> +{
> +	struct copyxattr	*cx;
> +	int			ret;
> +
> +	cx = calloc(1, sizeof(struct copyxattr));
> +	if (!cx) {
> +		dbprintf(_("%s%s%s: allocating xattr buffer %s\n"),
> +				destdir->path, destdir->sep, pbuf->path,
> +				strerror(errno));
> +		return 1;
> +	}
> +	cx->destdir = destdir;
> +	cx->pbuf = pbuf;
> +	cx->fd = fd;
> +
> +	ret = xattr_walk(tp, ip, rdump_xattr, cx);
> +	if (ret && ret != ECANCELED) {
> +		dbprintf(_("%s%s%s: listxattr %s\n"), destdir->path,
> +				destdir->sep, pbuf->path, strerror(errno));
> +		if (strict_errors)
> +			return 1;
> +	}
> +
> +	free(cx);
> +	return 0;
> +}
> +
> +struct copydirent {
> +	const struct destdir	*destdir;
> +	struct pathbuf		*pbuf;
> +	int			fd;
> +};
> +
> +/* Copy a directory entry. */
> +static int
> +rdump_dirent(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*dp,
> +	xfs_dir2_dataptr_t	off,
> +	char			*name,
> +	ssize_t			namelen,
> +	xfs_ino_t		ino,
> +	uint8_t			dtype,
> +	void			*private)
> +{
> +	struct copydirent	*cd = private;
> +	size_t			oldlen = cd->pbuf->len;
> +	const size_t		remaining = PATH_MAX + 1 - oldlen;
> +	ssize_t			added;
> +	int			ret;
> +
> +	/* Negative length means name is null-terminated */
> +	if (namelen < 0)
> +		namelen = -namelen;
> +
> +	/* Ignore dot and dotdot */
> +	if (namelen == 1 && name[0] == '.')
> +		return 0;
> +	if (namelen == 2 && name[0] == '.' && name[1] == '.')
> +		return 0;
> +
> +	if (namelen > FILENAME_MAX) {
> +		dbprintf(_("%s%s%s: %s\n"),
> +				cd->destdir->path, cd->destdir->sep,
> +				cd->pbuf->path, strerror(ENAMETOOLONG));
> +		return strict_errors ? ECANCELED : 0;
> +	}
> +
> +	added = snprintf(&cd->pbuf->path[oldlen], remaining, "%s%.*s",
> +			oldlen ? "/" : "", (int)namelen, name);
> +	if (added > remaining) {
> +		dbprintf(_("%s%s%s: ran out of space formatting file name\n"),
> +				cd->destdir->path, cd->destdir->sep,
> +				cd->pbuf->path);
> +		return strict_errors ? ECANCELED : 0;
> +	}
> +
> +	cd->pbuf->len += added;
> +	ret = rdump_file(tp, ino, cd->destdir, cd->pbuf);
> +	cd->pbuf->len = oldlen;
> +	cd->pbuf->path[oldlen] = 0;
> +	return ret;
> +}
> +
> +/* Copy a directory */
> +static int
> +rdump_directory(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*dp,
> +	const struct destdir	*destdir,
> +	struct pathbuf		*pbuf)
> +{
> +	struct copydirent	*cd;
> +	int			ret, ret2;
> +
> +	cd = calloc(1, sizeof(struct copydirent));
> +	if (!cd) {
> +		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
> +				pbuf->path, strerror(errno));
> +		return 1;
> +	}
> +	cd->destdir = destdir;
> +	cd->pbuf = pbuf;
> +
> +	if (pbuf->len) {
> +		/*
> +		 * If path is non-empty, we want to create a child somewhere
> +		 * underneath the target directory.
> +		 */
> +		ret = mkdirat(destdir->fd, pbuf->path, 0600);
> +		if (ret && errno != EEXIST) {
> +			dbprintf(_("%s%s%s: %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +			goto out_cd;
> +		}
> +
> +		cd->fd = openat(destdir->fd, pbuf->path,
> +				O_RDONLY | O_DIRECTORY);
> +		if (cd->fd < 0) {
> +			dbprintf(_("%s%s%s: %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +			ret = 1;
> +			goto out_cd;
> +		}
> +	} else {
> +		/*
> +		 * If path is empty, then we're copying the children of a
> +		 * directory into the target directory.
> +		 */
> +		cd->fd = destdir->fd;
> +	}
> +
> +	ret = rdump_fileattrs_fd(dp, destdir, pbuf, cd->fd);
> +	if (ret && strict_errors)
> +		goto out_close;
> +
> +	if (xfs_inode_has_attr_fork(dp)) {
> +		ret = rdump_xattrs(tp, dp, destdir, pbuf, cd->fd);
> +		if (ret && strict_errors)
> +			goto out_close;
> +	}
> +
> +	ret = listdir(tp, dp, rdump_dirent, cd);
> +	if (ret && ret != ECANCELED) {
> +		dbprintf(_("%s%s%s: readdir %s\n"), destdir->path,
> +				destdir->sep, pbuf->path, strerror(ret));
> +		if (strict_errors)
> +			goto out_close;
> +	}
> +
> +	ret = rdump_timestamps_fd(dp, destdir, pbuf, cd->fd);
> +	if (ret && strict_errors)
> +		goto out_close;
> +
> +	ret = 0;
> +
> +out_close:
> +	if (cd->fd != destdir->fd) {
> +		ret2 = close(cd->fd);
> +		if (ret2) {
> +			if (!ret)
> +				ret = ret2;
> +			dbprintf(_("%s%s%s: %s\n"), destdir->path,
> +					destdir->sep, pbuf->path,
> +					strerror(errno));
> +		}
> +	}
> +
> +out_cd:
> +	free(cd);
> +	return ret;
> +}
> +
> +/* Copy file data */
> +static int
> +rdump_regfile_data(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf,
> +	int			fd)
> +{
> +	struct xfs_bmbt_irec	irec = { };
> +	struct xfs_buftarg	*btp;
> +	int			nmaps;
> +	off_t			pos = 0;
> +	const off_t		isize = ip->i_disk_size;
> +	int			ret;
> +
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		btp = ip->i_mount->m_rtdev_targp;
> +	else
> +		btp = ip->i_mount->m_ddev_targp;
> +
> +	for (;
> +	     pos < isize;
> +	     pos = XFS_FSB_TO_B(mp, irec.br_startoff + irec.br_blockcount)) {
> +		struct xfs_buf	*bp;
> +		off_t		buf_pos;
> +		off_t		fd_pos;
> +		xfs_fileoff_t	off = XFS_B_TO_FSBT(mp, pos);
> +		xfs_filblks_t	max_read = XFS_B_TO_FSB(mp, 1048576);
> +		xfs_daddr_t	daddr;
> +		size_t		count;
> +
> +		nmaps = 1;
> +		ret = -libxfs_bmapi_read(ip, off, max_read, &irec, &nmaps, 0);
> +		if (ret) {
> +			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
> +			if (strict_errors)
> +				return 1;
> +			continue;
> +		}
> +		if (!nmaps)
> +			break;
> +
> +		if (!xfs_bmap_is_written_extent(&irec))
> +			continue;
> +
> +		fd_pos = XFS_FSB_TO_B(mp, irec.br_startoff);
> +		if (XFS_IS_REALTIME_INODE(ip))
> +			daddr =  xfs_rtb_to_daddr(mp, irec.br_startblock);
> +		else
> +			daddr = XFS_FSB_TO_DADDR(mp, irec.br_startblock);
> +
> +		ret = -libxfs_buf_read_uncached(btp, daddr,
> +				XFS_FSB_TO_BB(mp, irec.br_blockcount), 0, &bp,
> +				NULL);
> +		if (ret) {
> +			dbprintf(_("%s: reading pos 0x%llx %s\n"), pbuf->path,
> +					fd_pos, strerror(ret));
> +			if (strict_errors)
> +				return 1;
> +			continue;
> +		}
> +
> +		count = XFS_FSB_TO_B(mp, irec.br_blockcount);
> +		if (fd_pos + count > isize)
> +			count = isize - fd_pos;
> +
> +		buf_pos = 0;
> +		while (count > 0) {
> +			ssize_t	written;
> +
> +			written = pwrite(fd, bp->b_addr + buf_pos, count,
> +					fd_pos);
> +			if (written < 0) {
> +				libxfs_buf_relse(bp);
> +				dbprintf(_("%s%s%s: writing pos 0x%llx %s\n"),
> +						destdir->path, destdir->sep,
> +						pbuf->path, fd_pos,
> +						strerror(errno));
> +				return 1;
> +			}
> +			if (!written) {
> +				libxfs_buf_relse(bp);
> +				dbprintf(
> + _("%s%s%s: wrote zero at pos 0x%llx %s\n"),
> +						destdir->path, destdir->sep,
> +						pbuf->path, fd_pos);
> +				return 1;
> +			}
> +
> +			fd_pos += written;
> +			buf_pos += written;
> +			count -= written;
> +		}
> +
> +		libxfs_buf_relse(bp);
> +	}
> +
> +	ret = ftruncate(fd, isize);
> +	if (ret) {
> +		dbprintf(_("%s%s%s: setting file length 0x%llx %s\n"),
> +				destdir->path, destdir->sep, pbuf->path,
> +				isize, strerror(errno));
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Copy a regular file */
> +static int
> +rdump_regfile(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf)
> +{
> +	int			fd;
> +	int			ret, ret2;
> +
> +	fd = openat(destdir->fd, pbuf->path, O_RDWR | O_CREAT | O_TRUNC, 0600);
> +	if (fd < 0) {
> +		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
> +				pbuf->path, strerror(errno));
> +		return 1;
> +	}
> +
> +	ret = rdump_fileattrs_fd(ip, destdir, pbuf, fd);
> +	if (ret && strict_errors)
> +		goto out_close;
> +
> +	if (xfs_inode_has_attr_fork(ip)) {
> +		ret = rdump_xattrs(tp, ip, destdir, pbuf, fd);
> +		if (ret && strict_errors)
> +			goto out_close;
> +	}
> +
> +	ret = rdump_regfile_data(tp, ip, destdir, pbuf, fd);
> +	if (ret && strict_errors)
> +		goto out_close;
> +
> +	ret = rdump_timestamps_fd(ip, destdir, pbuf, fd);
> +	if (ret && strict_errors)
> +		goto out_close;
> +
> +out_close:
> +	ret2 = close(fd);
> +	if (ret2) {
> +		if (!ret)
> +			ret = ret2;
> +		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
> +				pbuf->path, strerror(errno));
> +		return 1;
> +	}
> +
> +	return ret;
> +}
> +
> +/* Copy a symlink */
> +static int
> +rdump_symlink(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf)
> +{
> +	char			target[XFS_SYMLINK_MAXLEN + 1];
> +	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
> +	unsigned int		targetlen = ip->i_disk_size;
> +	int			ret;
> +
> +	if (ifp->if_format == XFS_DINODE_FMT_LOCAL) {
> +		memcpy(target, ifp->if_data, targetlen);
> +	} else {
> +		ret = -libxfs_symlink_remote_read(ip, target);
> +		if (ret) {
> +			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
> +			return strict_errors ? 1 : 0;
> +		}
> +	}
> +	target[targetlen] = 0;
> +
> +	ret = symlinkat(target, destdir->fd, pbuf->path);
> +	if (ret) {
> +		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
> +				pbuf->path, strerror(errno));
> +		return 1;
> +	}
> +
> +	ret = rdump_fileattrs_path(ip, destdir, pbuf);
> +	if (ret && strict_errors)
> +		goto out;
> +
> +	ret = rdump_timestamps_path(ip, destdir, pbuf);
> +	if (ret && strict_errors)
> +		goto out;
> +
> +	ret = 0;
> +out:
> +	return ret;
> +}
> +
> +/* Copy a special file */
> +static int
> +rdump_special(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	const struct destdir	*destdir,
> +	const struct pathbuf	*pbuf)
> +{
> +	const unsigned int	major = IRIX_DEV_MAJOR(VFS_I(ip)->i_rdev);
> +	const unsigned int	minor = IRIX_DEV_MINOR(VFS_I(ip)->i_rdev);
> +	int			ret;
> +
> +	ret = mknodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & S_IFMT,
> +			makedev(major, minor));
> +	if (ret) {
> +		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
> +				pbuf->path, strerror(errno));
> +		return 1;
> +	}
> +
> +	ret = rdump_fileattrs_path(ip, destdir, pbuf);
> +	if (ret && strict_errors)
> +		goto out;
> +
> +	ret = rdump_timestamps_path(ip, destdir, pbuf);
> +	if (ret && strict_errors)
> +		goto out;
> +
> +	ret = 0;
> +out:
> +	return ret;
> +}
> +
> +/* Dump some kind of file. */
> +static int
> +rdump_file(
> +	struct xfs_trans	*tp,
> +	xfs_ino_t		ino,
> +	const struct destdir	*destdir,
> +	struct pathbuf		*pbuf)
> +{
> +	struct xfs_inode	*ip;
> +	int			ret;
> +
> +	ret = -libxfs_iget(mp, tp, ino, 0, &ip);
> +	if (ret) {
> +		dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
> +		return strict_errors ? ret : 0;
> +	}
> +
> +	switch(VFS_I(ip)->i_mode & S_IFMT) {
> +	case S_IFDIR:
> +		ret = rdump_directory(tp, ip, destdir, pbuf);
> +		break;
> +	case S_IFREG:
> +		ret = rdump_regfile(tp, ip, destdir, pbuf);
> +		break;
> +	case S_IFLNK:
> +		ret = rdump_symlink(tp, ip, destdir, pbuf);
> +		break;
> +	default:
> +		ret = rdump_special(tp, ip, destdir, pbuf);
> +		break;
> +	}
> +
> +	libxfs_irele(ip);
> +	return ret;
> +}
> +
> +/* Copy one path out of the filesystem. */
> +static int
> +rdump_path(
> +	struct xfs_mount	*mp,
> +	bool			sole_path,
> +	const char		*path,
> +	const struct destdir	*destdir)
> +{
> +	struct xfs_trans	*tp;
> +	struct pathbuf		*pbuf;
> +	const char		*basename;
> +	ssize_t			pathlen = strlen(path);
> +	int			ret = 1;
> +
> +	/* Set up destination path data */
> +	if (pathlen > PATH_MAX) {
> +		dbprintf(_("%s: %s\n"), path, strerror(ENAMETOOLONG));
> +		return 1;
> +	}
> +
> +	pbuf = calloc(1, sizeof(struct pathbuf));
> +	if (!pbuf) {
> +		dbprintf(_("allocating path buf: %s\n"), strerror(errno));
> +		return 1;
> +	}
> +	basename = strrchr(path, '/');
> +	if (basename) {
> +		pbuf->len = pathlen - (basename + 1 - path);
> +		memcpy(pbuf->path, basename + 1, pbuf->len);
> +	} else {
> +		pbuf->len = pathlen;
> +		memcpy(pbuf->path, path, pbuf->len);
> +	}
> +	pbuf->path[pbuf->len] = 0;
> +
> +	/* Dump the inode referenced. */
> +	if (pathlen) {
> +		ret = path_walk(mp->m_sb.sb_rootino, path);
> +		if (ret) {
> +			dbprintf(_("%s: %s\n"), path, strerror(ret));
> +			return 1;
> +		}
> +
> +		if (sole_path) {
> +			struct xfs_dinode	*dip = iocur_top->data;
> +
> +			/*
> +			 * If this is the only path to copy out and it's a dir,
> +			 * then we can copy the children directly into the
> +			 * target.
> +			 */
> +			if (S_ISDIR(be16_to_cpu(dip->di_mode))) {
> +				pbuf->len = 0;
> +				pbuf->path[0] = 0;
> +			}
> +		}
> +	} else {
> +		set_cur_inode(mp->m_sb.sb_rootino);
> +	}
> +
> +	ret = -libxfs_trans_alloc_empty(mp, &tp);
> +	if (ret) {
> +		dbprintf(_("allocating state: %s\n"), strerror(ret));
> +		goto out_pbuf;
> +	}
> +
> +	ret = rdump_file(tp, iocur_top->ino, destdir, pbuf);
> +	libxfs_trans_cancel(tp);
> +out_pbuf:
> +	free(pbuf);
> +	return ret;
> +}
> +
> +static int
> +rdump_f(
> +	int		argc,
> +	char		*argv[])
> +{
> +	struct destdir	destdir;
> +	int		i;
> +	int		c;
> +	int		ret;
> +
> +	lost_mask = 0;
> +	strict_errors = false;
> +	while ((c = getopt(argc, argv, "s")) != -1) {
> +		switch (c) {
> +		case 's':
> +			strict_errors = true;
> +			break;
> +		default:
> +			rdump_help();
> +			return 0;
> +		}
> +	}
> +
> +	if (argc < optind + 1) {
> +		dbprintf(
> + _("Must supply destination directory.\n"));
> +		return 0;
> +	}
> +
> +	/* Create and open destination directory */
> +	destdir.path = argv[argc - 1];
> +	ret = mkdir(destdir.path, 0755);
> +	if (ret && errno != EEXIST) {
> +		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
> +		exitcode = 1;
> +		return 0;
> +	}
> +
> +	if (destdir.path[0] == 0) {
> +		dbprintf(
> + _("Destination dir must be at least one character.\n"));
> +		exitcode = 1;
> +		return 0;
> +	}
> +
> +	if (destdir.path[strlen(destdir.path) - 1] != '/')
> +		destdir.sep = "/";
> +	else
> +		destdir.sep = "";
> +	destdir.fd = open(destdir.path, O_DIRECTORY | O_RDONLY);
> +	if (destdir.fd < 0) {
> +		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
> +		exitcode = 1;
> +		return 0;
> +	}
> +
> +	if (optind == argc - 1) {
> +		/* no dirs given, just do the whole fs */
> +		push_cur();
> +		ret = rdump_path(mp, false, "", &destdir);
> +		pop_cur();
> +		if (ret)
> +			exitcode = 1;
> +		goto out_close;
> +	}
> +
> +	for (i = optind; i < argc - 1; i++) {
> +		size_t	len = strlen(argv[i]);
> +
> +		/* trim trailing slashes */
> +		while (len && argv[i][len - 1] == '/')
> +			len--;
> +		argv[i][len] = 0;
> +
> +		push_cur();
> +		ret = rdump_path(mp, argc == optind + 2, argv[i], &destdir);
> +		pop_cur();
> +
> +		if (ret) {
> +			exitcode = 1;
> +			if (strict_errors)
> +				break;
> +		}
> +	}
> +
> +out_close:
> +	ret = close(destdir.fd);
> +	if (ret) {
> +		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
> +		exitcode = 1;
> +	}
> +
> +	if (lost_mask & LOST_OWNER)
> +		dbprintf(_("%s: some uid/gid could not be set\n"),
> +				destdir.path);
> +	if (lost_mask & LOST_MODE)
> +		dbprintf(_("%s: some file modes could not be set\n"),
> +				destdir.path);
> +	if (lost_mask & LOST_TIME)
> +		dbprintf(_("%s: some timestamps could not be set\n"),
> +				destdir.path);
> +	if (lost_mask & LOST_SOME_FSXATTR)
> +		dbprintf(_("%s: some xfs file attr bits could not be set\n"),
> +				destdir.path);
> +	if (lost_mask & LOST_FSXATTR)
> +		dbprintf(_("%s: some xfs file attrs could not be set\n"),
> +				destdir.path);
> +	if (lost_mask & LOST_XATTR)
> +		dbprintf(_("%s: some extended xattrs could not be set\n"),
> +				destdir.path);
> +
> +	return 0;
> +}
> +
> +static struct cmdinfo rdump_cmd = {
> +	.name		= "rdump",
> +	.cfunc		= rdump_f,
> +	.argmin		= 0,
> +	.argmax		= -1,
> +	.canpush	= 0,
> +	.args		= "[-s] [paths...] dest_directory",
> +	.help		= rdump_help,
> +};
> +
> +void
> +rdump_init(void)
> +{
> +	rdump_cmd.oneline = _("recover files out of a filesystem");
> +	add_command(&rdump_cmd);
> +}
> diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
> index 530feef2a47db8..14a67c8c24dd7e 100644
> --- a/libxfs/libxfs_api_defs.h
> +++ b/libxfs/libxfs_api_defs.h
> @@ -47,6 +47,7 @@
>  #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
>  #define xfs_attr_namecheck		libxfs_attr_namecheck
>  #define xfs_attr_removename		libxfs_attr_removename
> +#define xfs_attr_rmtval_get		libxfs_attr_rmtval_get
>  #define xfs_attr_set			libxfs_attr_set
>  #define xfs_attr_sethash		libxfs_attr_sethash
>  #define xfs_attr_sf_firstentry		libxfs_attr_sf_firstentry
> @@ -353,6 +354,7 @@
>  #define xfs_sb_version_to_features	libxfs_sb_version_to_features
>  #define xfs_symlink_blocks		libxfs_symlink_blocks
>  #define xfs_symlink_hdr_ok		libxfs_symlink_hdr_ok
> +#define xfs_symlink_remote_read		libxfs_symlink_remote_read
>  #define xfs_symlink_write_target	libxfs_symlink_write_target
>  
>  #define xfs_trans_add_item		libxfs_trans_add_item
> diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
> index 08f38f37ca01cc..2a9322560584b0 100644
> --- a/man/man8/xfs_db.8
> +++ b/man/man8/xfs_db.8
> @@ -1132,6 +1132,32 @@ .SH COMMANDS
>  Exit
>  .BR xfs_db .
>  .TP
> +.BI "rdump [-s] [" "paths..." "] " "destination_dir"
> +Recover the files given by the
> +.B path
> +arguments by copying them of the filesystem into the directory specified in
> +.BR destination_dir .
> +
> +If the
> +.B -s
> +option is specified, errors are fatal.
> +By default, read errors are ignored in favor of recovering as much data from
> +the filesystem as possible.
> +
> +If zero
> +.B paths
> +are specified, the entire filesystem is dumped.
> +If only one
> +.B path
> +is specified and it is a directory, the children of that directory will be
> +copied directly to the destination.
> +If multiple
> +.B paths
> +are specified, each file is copied into the directory as a new child.
> +
> +If possible, sparse holes, xfs file attributes, and extended attributes will be
> +preserved.
> +.TP
>  .BI "rgresv [" rgno ]
>  Displays the per-rtgroup reservation size, and per-rtgroup
>  reservation usage for a given realtime allocation group.
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 3/4] xfs_db: make listdir more generally useful
  2025-02-06 23:03   ` [PATCH 3/4] xfs_db: make listdir more generally useful Darrick J. Wong
  2025-02-13  4:30     ` Christoph Hellwig
@ 2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Andrey Albershteyn @ 2025-02-13 13:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On 2025-02-06 15:03:17, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Enhance the current directory entry iteration code in xfs_db to be more
> generally useful by allowing callers to pass around a transaction, a
> callback function, and a private pointer.  This will be used in the next
> patch to iterate directories when we want to copy their contents out of
> the filesystem into a directory.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

Looks good to me
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>

> ---
>  db/namei.c |   83 +++++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 57 insertions(+), 26 deletions(-)
> 
> 
> diff --git a/db/namei.c b/db/namei.c
> index 22eae50f219fd0..6f277a65ed91ac 100644
> --- a/db/namei.c
> +++ b/db/namei.c
> @@ -266,15 +266,18 @@ get_dstr(
>  	return filetype_strings[filetype];
>  }
>  
> -static void
> -dir_emit(
> -	struct xfs_mount	*mp,
> +static int
> +print_dirent(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*dp,
>  	xfs_dir2_dataptr_t	off,
>  	char			*name,
>  	ssize_t			namelen,
>  	xfs_ino_t		ino,
> -	uint8_t			dtype)
> +	uint8_t			dtype,
> +	void			*private)
>  {
> +	struct xfs_mount	*mp = dp->i_mount;
>  	char			*display_name;
>  	struct xfs_name		xname = { .name = (unsigned char *)name };
>  	const char		*dstr = get_dstr(mp, dtype);
> @@ -306,11 +309,14 @@ dir_emit(
>  
>  	if (display_name != name)
>  		free(display_name);
> +	return 0;
>  }
>  
>  static int
>  list_sfdir(
> -	struct xfs_da_args		*args)
> +	struct xfs_da_args		*args,
> +	dir_emit_t			dir_emit,
> +	void				*private)
>  {
>  	struct xfs_inode		*dp = args->dp;
>  	struct xfs_mount		*mp = dp->i_mount;
> @@ -321,17 +327,24 @@ list_sfdir(
>  	xfs_dir2_dataptr_t		off;
>  	unsigned int			i;
>  	uint8_t				filetype;
> +	int				error;
>  
>  	/* . and .. entries */
>  	off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
>  			geo->data_entry_offset);
> -	dir_emit(args->dp->i_mount, off, ".", -1, dp->i_ino, XFS_DIR3_FT_DIR);
> +	error = dir_emit(args->trans, args->dp, off, ".", -1, dp->i_ino,
> +			XFS_DIR3_FT_DIR, private);
> +	if (error)
> +		return error;
>  
>  	ino = libxfs_dir2_sf_get_parent_ino(sfp);
>  	off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
>  			geo->data_entry_offset +
>  			libxfs_dir2_data_entsize(mp, sizeof(".") - 1));
> -	dir_emit(args->dp->i_mount, off, "..", -1, ino, XFS_DIR3_FT_DIR);
> +	error = dir_emit(args->trans, args->dp, off, "..", -1, ino,
> +			XFS_DIR3_FT_DIR, private);
> +	if (error)
> +		return error;
>  
>  	/* Walk everything else. */
>  	sfep = xfs_dir2_sf_firstentry(sfp);
> @@ -341,8 +354,11 @@ list_sfdir(
>  		off = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
>  				xfs_dir2_sf_get_offset(sfep));
>  
> -		dir_emit(args->dp->i_mount, off, (char *)sfep->name,
> -				sfep->namelen, ino, filetype);
> +		error = dir_emit(args->trans, args->dp, off,
> +				(char *)sfep->name, sfep->namelen, ino,
> +				filetype, private);
> +		if (error)
> +			return error;
>  		sfep = libxfs_dir2_sf_nextentry(mp, sfp, sfep);
>  	}
>  
> @@ -352,7 +368,9 @@ list_sfdir(
>  /* List entries in block format directory. */
>  static int
>  list_blockdir(
> -	struct xfs_da_args	*args)
> +	struct xfs_da_args	*args,
> +	dir_emit_t		dir_emit,
> +	void			*private)
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	struct xfs_mount	*mp = dp->i_mount;
> @@ -363,7 +381,7 @@ list_blockdir(
>  	unsigned int		end;
>  	int			error;
>  
> -	error = xfs_dir3_block_read(NULL, dp, args->owner, &bp);
> +	error = xfs_dir3_block_read(args->trans, dp, args->owner, &bp);
>  	if (error)
>  		return error;
>  
> @@ -383,8 +401,11 @@ list_blockdir(
>  		diroff = xfs_dir2_db_off_to_dataptr(geo, geo->datablk, offset);
>  		offset += libxfs_dir2_data_entsize(mp, dep->namelen);
>  		filetype = libxfs_dir2_data_get_ftype(dp->i_mount, dep);
> -		dir_emit(mp, diroff, (char *)dep->name, dep->namelen,
> -				be64_to_cpu(dep->inumber), filetype);
> +		error = dir_emit(args->trans, args->dp, diroff,
> +				(char *)dep->name, dep->namelen,
> +				be64_to_cpu(dep->inumber), filetype, private);
> +		if (error)
> +			break;
>  	}
>  
>  	libxfs_trans_brelse(args->trans, bp);
> @@ -394,7 +415,9 @@ list_blockdir(
>  /* List entries in leaf format directory. */
>  static int
>  list_leafdir(
> -	struct xfs_da_args	*args)
> +	struct xfs_da_args	*args,
> +	dir_emit_t		dir_emit,
> +	void			*private)
>  {
>  	struct xfs_bmbt_irec	map;
>  	struct xfs_iext_cursor	icur;
> @@ -408,7 +431,7 @@ list_leafdir(
>  	int			error = 0;
>  
>  	/* Read extent map. */
> -	error = -libxfs_iread_extents(NULL, dp, XFS_DATA_FORK);
> +	error = -libxfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
>  	if (error)
>  		return error;
>  
> @@ -424,7 +447,7 @@ list_leafdir(
>  		libxfs_trim_extent(&map, dabno, geo->leafblk - dabno);
>  
>  		/* Read the directory block of that first mapping. */
> -		error = xfs_dir3_data_read(NULL, dp, args->owner,
> +		error = xfs_dir3_data_read(args->trans, dp, args->owner,
>  				map.br_startoff, 0, &bp);
>  		if (error)
>  			break;
> @@ -449,18 +472,22 @@ list_leafdir(
>  			offset += libxfs_dir2_data_entsize(mp, dep->namelen);
>  			filetype = libxfs_dir2_data_get_ftype(mp, dep);
>  
> -			dir_emit(mp, xfs_dir2_byte_to_dataptr(dirboff + offset),
> +			error = dir_emit(args->trans, args->dp,
> +					xfs_dir2_byte_to_dataptr(dirboff + offset),
>  					(char *)dep->name, dep->namelen,
> -					be64_to_cpu(dep->inumber), filetype);
> +					be64_to_cpu(dep->inumber), filetype,
> +					private);
> +			if (error)
> +				break;
>  		}
>  
>  		dabno += XFS_DADDR_TO_FSB(mp, bp->b_length);
> -		libxfs_buf_relse(bp);
> +		libxfs_trans_brelse(args->trans, bp);
>  		bp = NULL;
>  	}
>  
>  	if (bp)
> -		libxfs_buf_relse(bp);
> +		libxfs_trans_brelse(args->trans, bp);
>  
>  	return error;
>  }
> @@ -468,9 +495,13 @@ list_leafdir(
>  /* Read the directory, display contents. */
>  static int
>  listdir(
> -	struct xfs_inode	*dp)
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*dp,
> +	dir_emit_t		dir_emit,
> +	void			*private)
>  {
>  	struct xfs_da_args	args = {
> +		.trans		= tp,
>  		.dp		= dp,
>  		.geo		= dp->i_mount->m_dir_geo,
>  		.owner		= dp->i_ino,
> @@ -479,14 +510,14 @@ listdir(
>  
>  	switch (libxfs_dir2_format(&args, &error)) {
>  	case XFS_DIR2_FMT_SF:
> -		return list_sfdir(&args);
> +		return list_sfdir(&args, dir_emit, private);
>  	case XFS_DIR2_FMT_BLOCK:
> -		return list_blockdir(&args);
> +		return list_blockdir(&args, dir_emit, private);
>  	case XFS_DIR2_FMT_LEAF:
>  	case XFS_DIR2_FMT_NODE:
> -		return list_leafdir(&args);
> +		return list_leafdir(&args, dir_emit, private);
>  	default:
> -		return error;
> +		return EFSCORRUPTED;
>  	}
>  }
>  
> @@ -526,7 +557,7 @@ ls_cur(
>  	if (tag)
>  		dbprintf(_("%s:\n"), tag);
>  
> -	error = listdir(dp);
> +	error = listdir(NULL, dp, print_dirent, NULL);
>  	if (error)
>  		goto rele;
>  
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate
  2025-02-06 23:03   ` [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate Darrick J. Wong
  2025-02-13  4:29     ` Christoph Hellwig
@ 2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Andrey Albershteyn @ 2025-02-13 13:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On 2025-02-06 15:03:01, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> A couple of patches from now we're going to reuse the path_walk code in
> a new xfs_db subcommand that tries to recover directory trees from
> old/damaged filesystems.  Let's pass around an empty transaction to try
> too avoid livelocks on malicious/broken metadata.  This is not
> completely foolproof, but it's quick enough for most purposes.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

Looks good to me
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
> ---
>  db/namei.c |   23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/db/namei.c b/db/namei.c
> index 00610a54af527e..22eae50f219fd0 100644
> --- a/db/namei.c
> +++ b/db/namei.c
> @@ -87,15 +87,20 @@ path_navigate(
>  	xfs_ino_t		rootino,
>  	struct dirpath		*dirpath)
>  {
> +	struct xfs_trans	*tp;
>  	struct xfs_inode	*dp;
>  	xfs_ino_t		ino = rootino;
>  	unsigned int		i;
>  	int			error;
>  
> -	error = -libxfs_iget(mp, NULL, ino, 0, &dp);
> +	error = -libxfs_trans_alloc_empty(mp, &tp);
>  	if (error)
>  		return error;
>  
> +	error = -libxfs_iget(mp, tp, ino, 0, &dp);
> +	if (error)
> +		goto out_trans;
> +
>  	for (i = 0; i < dirpath->depth; i++) {
>  		struct xfs_name	xname = {
>  			.name	= (unsigned char *)dirpath->path[i],
> @@ -104,35 +109,37 @@ path_navigate(
>  
>  		if (!S_ISDIR(VFS_I(dp)->i_mode)) {
>  			error = ENOTDIR;
> -			goto rele;
> +			goto out_rele;
>  		}
>  
> -		error = -libxfs_dir_lookup(NULL, dp, &xname, &ino, NULL);
> +		error = -libxfs_dir_lookup(tp, dp, &xname, &ino, NULL);
>  		if (error)
> -			goto rele;
> +			goto out_rele;
>  		if (!xfs_verify_ino(mp, ino)) {
>  			error = EFSCORRUPTED;
> -			goto rele;
> +			goto out_rele;
>  		}
>  
>  		libxfs_irele(dp);
>  		dp = NULL;
>  
> -		error = -libxfs_iget(mp, NULL, ino, 0, &dp);
> +		error = -libxfs_iget(mp, tp, ino, 0, &dp);
>  		switch (error) {
>  		case EFSCORRUPTED:
>  		case EFSBADCRC:
>  		case 0:
>  			break;
>  		default:
> -			return error;
> +			goto out_trans;
>  		}
>  	}
>  
>  	set_cur_inode(ino);
> -rele:
> +out_rele:
>  	if (dp)
>  		libxfs_irele(dp);
> +out_trans:
> +	libxfs_trans_cancel(tp);
>  	return error;
>  }
>  
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them
  2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
  2025-02-13  4:29     ` Christoph Hellwig
@ 2025-02-13 13:26     ` Andrey Albershteyn
  1 sibling, 0 replies; 225+ messages in thread
From: Andrey Albershteyn @ 2025-02-13 13:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On 2025-02-06 15:02:45, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Pass a const pointer to path_walk since we don't actually modify the
> contents.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

Looks good to me
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>

> ---
>  db/namei.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/db/namei.c b/db/namei.c
> index 4eae4e8fd3232c..00610a54af527e 100644
> --- a/db/namei.c
> +++ b/db/namei.c
> @@ -140,10 +140,10 @@ path_navigate(
>  static int
>  path_walk(
>  	xfs_ino_t	rootino,
> -	char		*path)
> +	const char	*path)
>  {
>  	struct dirpath	*dirpath;
> -	char		*p = path;
> +	const char	*p = path;
>  	int		error = 0;
>  
>  	if (*p == '/') {
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
  2025-02-13 13:25     ` Andrey Albershteyn
@ 2025-02-18  8:36     ` Christoph Hellwig
  2025-02-18 15:54       ` Darrick J. Wong
  2025-02-19 18:04       ` Eric Sandeen
  2025-02-18 19:58     ` [PATCH v1.1 " Darrick J. Wong
  2 siblings, 2 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-18  8:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Aheada of deprecating V4 support in the kernel, let's give people a way
> to extract their files from a filesystem without needing to mount.

So I've wanted a userspace file access for a while, but if we deprecate
the v4 support in the kernel that will propagte to libxfs quickly,
and this code won't help you with v4 file systems either.  So I don't
think the rationale here seems very good.

>  extern void		bmapinflate_init(void);
> +extern void		rdump_init(void);

No need for the extern.

> +	/* XXX cannot copy fsxattrs */

Should this be fixed first?  Or document in a full sentence comment
explaining why it can't should not be?

> +		[1] = {
> +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> +		},
> +	};
> +	int			ret;
> +
> +	/* XXX cannot copy ctime or btime */

Same for this and others.

> +	/* Format xattr name */
> +	if (attr_flags & XFS_ATTR_ROOT)
> +		nsp = XATTR_TRUSTED_PREFIX;
> +	else if (attr_flags & XFS_ATTR_SECURE)
> +		nsp = XATTR_SECURITY_PREFIX;
> +	else
> +		nsp = XATTR_USER_PREFIX;

Add a self-cotained helper for this?  I'm pretty sure we do this
translation in a few places.

> +	if (XFS_IS_REALTIME_INODE(ip))
> +		btp = ip->i_mount->m_rtdev_targp;
> +	else
> +		btp = ip->i_mount->m_ddev_targp;

Should be move xfs_inode_buftarg from kernel code to common code?


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-18  8:36     ` Christoph Hellwig
@ 2025-02-18 15:54       ` Darrick J. Wong
  2025-02-18 18:04         ` Darrick J. Wong
  2025-02-19 18:04       ` Eric Sandeen
  1 sibling, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-18 15:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Tue, Feb 18, 2025 at 12:36:33AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Aheada of deprecating V4 support in the kernel, let's give people a way
> > to extract their files from a filesystem without needing to mount.
> 
> So I've wanted a userspace file access for a while, but if we deprecate
> the v4 support in the kernel that will propagte to libxfs quickly,
> and this code won't help you with v4 file systems either.  So I don't
> think the rationale here seems very good.

We aren't removing V4 support from the kernel until September 2030 and
xfsprogs effectively builds with CONFIG_XFS_SUPPORT_V4=y.  That should
be enough time, right?

> >  extern void		bmapinflate_init(void);
> > +extern void		rdump_init(void);
> 
> No need for the extern.

Ok.

> > +	/* XXX cannot copy fsxattrs */
> 
> Should this be fixed first?  Or document in a full sentence comment
> explaining why it can't should not be?

	/* XXX cannot copy fsxattrs until setfsxattrat() syscall merge */

> > +		[1] = {
> > +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> > +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> > +		},
> > +	};
> > +	int			ret;
> > +
> > +	/* XXX cannot copy ctime or btime */
> 
> Same for this and others.

Is there a way to set ctime or btime?  I don't know of any.

	/* Cannot set ctime or btime */

> > +	/* Format xattr name */
> > +	if (attr_flags & XFS_ATTR_ROOT)
> > +		nsp = XATTR_TRUSTED_PREFIX;
> > +	else if (attr_flags & XFS_ATTR_SECURE)
> > +		nsp = XATTR_SECURITY_PREFIX;
> > +	else
> > +		nsp = XATTR_USER_PREFIX;
> 
> Add a self-cotained helper for this?  I'm pretty sure we do this
> translation in a few places.

Ok.  I think at least scrub phase5 does this.

> > +	if (XFS_IS_REALTIME_INODE(ip))
> > +		btp = ip->i_mount->m_rtdev_targp;
> > +	else
> > +		btp = ip->i_mount->m_ddev_targp;
> 
> Should be move xfs_inode_buftarg from kernel code to common code?

Hmm.  The xfs_inode -> xfs_buftarg translation could be moved to
libxfs/xfs_inode_util.c, yes.  Though that can't happen until 6.15
because we're well past the merge window.  For now I think it's the only
place in xfsprogs where we do that.

--D

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-18 15:54       ` Darrick J. Wong
@ 2025-02-18 18:04         ` Darrick J. Wong
  2025-02-18 19:49           ` Darrick J. Wong
  0 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-18 18:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Tue, Feb 18, 2025 at 07:54:09AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 18, 2025 at 12:36:33AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Aheada of deprecating V4 support in the kernel, let's give people a way
> > > to extract their files from a filesystem without needing to mount.
> > 
> > So I've wanted a userspace file access for a while, but if we deprecate
> > the v4 support in the kernel that will propagte to libxfs quickly,
> > and this code won't help you with v4 file systems either.  So I don't
> > think the rationale here seems very good.
> 
> We aren't removing V4 support from the kernel until September 2030 and
> xfsprogs effectively builds with CONFIG_XFS_SUPPORT_V4=y.  That should
> be enough time, right?
> 
> > >  extern void		bmapinflate_init(void);
> > > +extern void		rdump_init(void);
> > 
> > No need for the extern.
> 
> Ok.
> 
> > > +	/* XXX cannot copy fsxattrs */
> > 
> > Should this be fixed first?  Or document in a full sentence comment
> > explaining why it can't should not be?
> 
> 	/* XXX cannot copy fsxattrs until setfsxattrat() syscall merge */
> 
> > > +		[1] = {
> > > +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> > > +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> > > +		},
> > > +	};
> > > +	int			ret;
> > > +
> > > +	/* XXX cannot copy ctime or btime */
> > 
> > Same for this and others.
> 
> Is there a way to set ctime or btime?  I don't know of any.

Now that I'm past all the morning meetings and have had time to research
things a little more deeply/wake up more: no, there's no way to set the
inode change or birth times.

> 	/* Cannot set ctime or btime */

So I'll go with ^^ this comment.

> > > +	/* Format xattr name */
> > > +	if (attr_flags & XFS_ATTR_ROOT)
> > > +		nsp = XATTR_TRUSTED_PREFIX;
> > > +	else if (attr_flags & XFS_ATTR_SECURE)
> > > +		nsp = XATTR_SECURITY_PREFIX;
> > > +	else
> > > +		nsp = XATTR_USER_PREFIX;
> > 
> > Add a self-cotained helper for this?  I'm pretty sure we do this
> > translation in a few places.

Actually, xfsprogs doesn't:

$ git grep XATTR_SECURITY_PREFIX
db/rdump.c:293:         nsp = XATTR_SECURITY_PREFIX;
mkfs/proto.c:393:       } else if (!strncmp(attrname, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
mkfs/proto.c:394:               args.name = (unsigned char *)attrname + XATTR_SECURITY_PREFIX_LEN;

The kernel gets a little closer in xfs_xattr.c:

$ git grep XATTR_SECURITY_PREFIX include/ fs/*.c fs/xfs/
fs/xattr.c:128: if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN) ||
fs/xattr.c:228: int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
fs/xattr.c:229:                            XATTR_SECURITY_PREFIX_LEN);
fs/xattr.c:249:                 const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
fs/xattr.c:442: if (!strncmp(name, XATTR_SECURITY_PREFIX,
fs/xattr.c:443:                         XATTR_SECURITY_PREFIX_LEN)) {
fs/xattr.c:444:         const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
fs/xfs/xfs_xattr.c:207: .prefix = XATTR_SECURITY_PREFIX,
fs/xfs/xfs_xattr.c:304:         prefix = XATTR_SECURITY_PREFIX;
fs/xfs/xfs_xattr.c:305:         prefix_len = XATTR_SECURITY_PREFIX_LEN;

But xfs_xattr_put_listent has some custom logic in it:

	if (flags & XFS_ATTR_ROOT) {
#ifdef CONFIG_XFS_POSIX_ACL
		if (namelen == SGI_ACL_FILE_SIZE &&
		    strncmp(name, SGI_ACL_FILE,
			    SGI_ACL_FILE_SIZE) == 0) {
			__xfs_xattr_put_listent(
					context, XATTR_SYSTEM_PREFIX,
					XATTR_SYSTEM_PREFIX_LEN,
					XATTR_POSIX_ACL_ACCESS,
					strlen(XATTR_POSIX_ACL_ACCESS));
		} else if (namelen == SGI_ACL_DEFAULT_SIZE &&
			 strncmp(name, SGI_ACL_DEFAULT,
				 SGI_ACL_DEFAULT_SIZE) == 0) {
			__xfs_xattr_put_listent(
					context, XATTR_SYSTEM_PREFIX,
					XATTR_SYSTEM_PREFIX_LEN,
					XATTR_POSIX_ACL_DEFAULT,
					strlen(XATTR_POSIX_ACL_DEFAULT));
		}
#endif

		/*
		 * Only show root namespace entries if we are actually allowed to
		 * see them.
		 */
		if (!capable(CAP_SYS_ADMIN))
			return;

		prefix = XATTR_TRUSTED_PREFIX;
		prefix_len = XATTR_TRUSTED_PREFIX_LEN;

But I do see that rdump should translate SGI_ACL_FILE to
XATTR_POSIX_ACL_ACCESS so I'll go do that.

--D

> Ok.  I think at least scrub phase5 does this.
> 
> > > +	if (XFS_IS_REALTIME_INODE(ip))
> > > +		btp = ip->i_mount->m_rtdev_targp;
> > > +	else
> > > +		btp = ip->i_mount->m_ddev_targp;
> > 
> > Should be move xfs_inode_buftarg from kernel code to common code?
> 
> Hmm.  The xfs_inode -> xfs_buftarg translation could be moved to
> libxfs/xfs_inode_util.c, yes.  Though that can't happen until 6.15
> because we're well past the merge window.  For now I think it's the only
> place in xfsprogs where we do that.
> 
> --D
> 

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-18 18:04         ` Darrick J. Wong
@ 2025-02-18 19:49           ` Darrick J. Wong
  0 siblings, 0 replies; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-18 19:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, hch, linux-xfs

On Tue, Feb 18, 2025 at 10:04:56AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 18, 2025 at 07:54:09AM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 18, 2025 at 12:36:33AM -0800, Christoph Hellwig wrote:
> > > On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Aheada of deprecating V4 support in the kernel, let's give people a way
> > > > to extract their files from a filesystem without needing to mount.
> > > 
> > > So I've wanted a userspace file access for a while, but if we deprecate
> > > the v4 support in the kernel that will propagte to libxfs quickly,
> > > and this code won't help you with v4 file systems either.  So I don't
> > > think the rationale here seems very good.
> > 
> > We aren't removing V4 support from the kernel until September 2030 and
> > xfsprogs effectively builds with CONFIG_XFS_SUPPORT_V4=y.  That should
> > be enough time, right?
> > 
> > > >  extern void		bmapinflate_init(void);
> > > > +extern void		rdump_init(void);
> > > 
> > > No need for the extern.
> > 
> > Ok.
> > 
> > > > +	/* XXX cannot copy fsxattrs */
> > > 
> > > Should this be fixed first?  Or document in a full sentence comment
> > > explaining why it can't should not be?
> > 
> > 	/* XXX cannot copy fsxattrs until setfsxattrat() syscall merge */
> > 
> > > > +		[1] = {
> > > > +			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
> > > > +			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
> > > > +		},
> > > > +	};
> > > > +	int			ret;
> > > > +
> > > > +	/* XXX cannot copy ctime or btime */
> > > 
> > > Same for this and others.
> > 
> > Is there a way to set ctime or btime?  I don't know of any.
> 
> Now that I'm past all the morning meetings and have had time to research
> things a little more deeply/wake up more: no, there's no way to set the
> inode change or birth times.
> 
> > 	/* Cannot set ctime or btime */
> 
> So I'll go with ^^ this comment.
> 
> > > > +	/* Format xattr name */
> > > > +	if (attr_flags & XFS_ATTR_ROOT)
> > > > +		nsp = XATTR_TRUSTED_PREFIX;
> > > > +	else if (attr_flags & XFS_ATTR_SECURE)
> > > > +		nsp = XATTR_SECURITY_PREFIX;
> > > > +	else
> > > > +		nsp = XATTR_USER_PREFIX;
> > > 
> > > Add a self-cotained helper for this?  I'm pretty sure we do this
> > > translation in a few places.
> 
> Actually, xfsprogs doesn't:
> 
> $ git grep XATTR_SECURITY_PREFIX
> db/rdump.c:293:         nsp = XATTR_SECURITY_PREFIX;
> mkfs/proto.c:393:       } else if (!strncmp(attrname, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
> mkfs/proto.c:394:               args.name = (unsigned char *)attrname + XATTR_SECURITY_PREFIX_LEN;
> 
> The kernel gets a little closer in xfs_xattr.c:
> 
> $ git grep XATTR_SECURITY_PREFIX include/ fs/*.c fs/xfs/
> fs/xattr.c:128: if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN) ||
> fs/xattr.c:228: int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
> fs/xattr.c:229:                            XATTR_SECURITY_PREFIX_LEN);
> fs/xattr.c:249:                 const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> fs/xattr.c:442: if (!strncmp(name, XATTR_SECURITY_PREFIX,
> fs/xattr.c:443:                         XATTR_SECURITY_PREFIX_LEN)) {
> fs/xattr.c:444:         const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> fs/xfs/xfs_xattr.c:207: .prefix = XATTR_SECURITY_PREFIX,
> fs/xfs/xfs_xattr.c:304:         prefix = XATTR_SECURITY_PREFIX;
> fs/xfs/xfs_xattr.c:305:         prefix_len = XATTR_SECURITY_PREFIX_LEN;
> 
> But xfs_xattr_put_listent has some custom logic in it:
> 
> 	if (flags & XFS_ATTR_ROOT) {
> #ifdef CONFIG_XFS_POSIX_ACL
> 		if (namelen == SGI_ACL_FILE_SIZE &&
> 		    strncmp(name, SGI_ACL_FILE,
> 			    SGI_ACL_FILE_SIZE) == 0) {
> 			__xfs_xattr_put_listent(
> 					context, XATTR_SYSTEM_PREFIX,
> 					XATTR_SYSTEM_PREFIX_LEN,
> 					XATTR_POSIX_ACL_ACCESS,
> 					strlen(XATTR_POSIX_ACL_ACCESS));
> 		} else if (namelen == SGI_ACL_DEFAULT_SIZE &&
> 			 strncmp(name, SGI_ACL_DEFAULT,
> 				 SGI_ACL_DEFAULT_SIZE) == 0) {
> 			__xfs_xattr_put_listent(
> 					context, XATTR_SYSTEM_PREFIX,
> 					XATTR_SYSTEM_PREFIX_LEN,
> 					XATTR_POSIX_ACL_DEFAULT,
> 					strlen(XATTR_POSIX_ACL_DEFAULT));
> 		}
> #endif
> 
> 		/*
> 		 * Only show root namespace entries if we are actually allowed to
> 		 * see them.
> 		 */
> 		if (!capable(CAP_SYS_ADMIN))
> 			return;
> 
> 		prefix = XATTR_TRUSTED_PREFIX;
> 		prefix_len = XATTR_TRUSTED_PREFIX_LEN;
> 
> But I do see that rdump should translate SGI_ACL_FILE to
> XATTR_POSIX_ACL_ACCESS so I'll go do that.

...actually, no.  The SGI_ACL_{FILE,DEFAULT} xattrs can be copied
verbatim into another XFS filesystem and they continue to work
correctly.  Translating them into the generic
"system.posix_acl_{access,default}" xattrs requires the introduction of
libacl as a hard dependency because the exact format of the blob passed
around by setxattr/getxattr is private to libacl.

I think it's a bridge too far to have xfs_db depend on libacl for the
sake of non-XFS filesystems so I'll add a warning if we think we're
copying the ACL xattr to something that isn't an XFS filesystem.

--D

> --D
> 
> > Ok.  I think at least scrub phase5 does this.
> > 
> > > > +	if (XFS_IS_REALTIME_INODE(ip))
> > > > +		btp = ip->i_mount->m_rtdev_targp;
> > > > +	else
> > > > +		btp = ip->i_mount->m_ddev_targp;
> > > 
> > > Should be move xfs_inode_buftarg from kernel code to common code?
> > 
> > Hmm.  The xfs_inode -> xfs_buftarg translation could be moved to
> > libxfs/xfs_inode_util.c, yes.  Though that can't happen until 6.15
> > because we're well past the merge window.  For now I think it's the only
> > place in xfsprogs where we do that.
> > 
> > --D
> > 
> 

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCH v1.1 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
  2025-02-13 13:25     ` Andrey Albershteyn
  2025-02-18  8:36     ` Christoph Hellwig
@ 2025-02-18 19:58     ` Darrick J. Wong
  2025-02-19  5:27       ` Christoph Hellwig
  2 siblings, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-18 19:58 UTC (permalink / raw)
  To: aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Aheada of deprecating V4 support in the kernel, let's give people a way
to extract their files from a filesystem without needing to mount.  The
libxfs code won't be removed from the kernel until 2030 and xfsprogs
effectively builds with XFS_SUPPORT_V4=y so that'll give us five years
of releases for archaeologists to draw from.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
---
v1.1: tidy up the comments, externs, and xattr name reporting when
things fail, and warn if we lose acls
---
 db/command.h             |    1 
 db/namei.h               |   18 +
 libxfs/libxfs_api_defs.h |    2 
 db/Makefile              |    3 
 db/command.c             |    1 
 db/namei.c               |    5 
 db/rdump.c               | 1054 ++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_db.8        |   26 +
 8 files changed, 1107 insertions(+), 3 deletions(-)
 create mode 100644 db/namei.h
 create mode 100644 db/rdump.c

diff --git a/db/command.h b/db/command.h
index 2c2926afd7b516..7b1738addff8e1 100644
--- a/db/command.h
+++ b/db/command.h
@@ -36,3 +36,4 @@ extern void		timelimit_init(void);
 extern void		namei_init(void);
 extern void		iunlink_init(void);
 extern void		bmapinflate_init(void);
+void			rdump_init(void);
diff --git a/db/namei.h b/db/namei.h
new file mode 100644
index 00000000000000..05c384bc9a6c35
--- /dev/null
+++ b/db/namei.h
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef DB_NAMEI_H_
+#define DB_NAMEI_H_
+
+int path_walk(xfs_ino_t rootino, const char *path);
+
+typedef int (*dir_emit_t)(struct xfs_trans *tp, struct xfs_inode *dp,
+		xfs_dir2_dataptr_t off, char *name, ssize_t namelen,
+		xfs_ino_t ino, uint8_t dtype, void *private);
+
+int listdir(struct xfs_trans *tp, struct xfs_inode *dp, dir_emit_t dir_emit,
+		void *private);
+
+#endif /* DB_NAMEI_H_ */
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 530feef2a47db8..14a67c8c24dd7e 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -47,6 +47,7 @@
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_removename		libxfs_attr_removename
+#define xfs_attr_rmtval_get		libxfs_attr_rmtval_get
 #define xfs_attr_set			libxfs_attr_set
 #define xfs_attr_sethash		libxfs_attr_sethash
 #define xfs_attr_sf_firstentry		libxfs_attr_sf_firstentry
@@ -353,6 +354,7 @@
 #define xfs_sb_version_to_features	libxfs_sb_version_to_features
 #define xfs_symlink_blocks		libxfs_symlink_blocks
 #define xfs_symlink_hdr_ok		libxfs_symlink_hdr_ok
+#define xfs_symlink_remote_read		libxfs_symlink_remote_read
 #define xfs_symlink_write_target	libxfs_symlink_write_target
 
 #define xfs_trans_add_item		libxfs_trans_add_item
diff --git a/db/Makefile b/db/Makefile
index 02eeead25b49d0..e36e775eee6021 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -45,6 +45,7 @@ HFILES = \
 	logformat.h \
 	malloc.h \
 	metadump.h \
+	namei.h \
 	obfuscate.h \
 	output.h \
 	print.h \
@@ -64,7 +65,7 @@ CFILES = $(HFILES:.h=.c) \
 	convert.c \
 	info.c \
 	iunlink.c \
-	namei.c \
+	rdump.c \
 	timelimit.c
 LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
 
diff --git a/db/command.c b/db/command.c
index 1b46c3fec08a0e..15bdabbcb7d728 100644
--- a/db/command.c
+++ b/db/command.c
@@ -145,4 +145,5 @@ init_commands(void)
 	timelimit_init();
 	iunlink_init();
 	bmapinflate_init();
+	rdump_init();
 }
diff --git a/db/namei.c b/db/namei.c
index 6f277a65ed91ac..2586e0591c2357 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -14,6 +14,7 @@
 #include "fprint.h"
 #include "field.h"
 #include "inode.h"
+#include "namei.h"
 
 /* Path lookup */
 
@@ -144,7 +145,7 @@ path_navigate(
 }
 
 /* Walk a directory path to an inode and set the io cursor to that inode. */
-static int
+int
 path_walk(
 	xfs_ino_t	rootino,
 	const char	*path)
@@ -493,7 +494,7 @@ list_leafdir(
 }
 
 /* Read the directory, display contents. */
-static int
+int
 listdir(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
diff --git a/db/rdump.c b/db/rdump.c
new file mode 100644
index 00000000000000..3138c67b6d56b5
--- /dev/null
+++ b/db/rdump.c
@@ -0,0 +1,1054 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "command.h"
+#include "output.h"
+#include "init.h"
+#include "io.h"
+#include "namei.h"
+#include "type.h"
+#include "input.h"
+#include "faddr.h"
+#include "fprint.h"
+#include "field.h"
+#include "inode.h"
+#include "listxattr.h"
+#include <sys/xattr.h>
+#include <linux/xattr.h>
+
+static bool strict_errors;
+
+/* file attributes that we might have lost */
+#define LOST_OWNER		(1U << 0)
+#define LOST_MODE		(1U << 1)
+#define LOST_TIME		(1U << 2)
+#define LOST_SOME_FSXATTR	(1U << 3)
+#define LOST_FSXATTR		(1U << 4)
+#define LOST_XATTR		(1U << 5)
+#define LOST_ACL		(1U << 6)
+
+static unsigned int lost_mask;
+
+static void
+rdump_help(void)
+{
+	dbprintf(_(
+"\n"
+" Recover files out of the filesystem into a directory.\n"
+"\n"
+" Options:\n"
+"   -s      -- Fail on errors when reading content from the filesystem.\n"
+"   paths   -- Copy only these paths.  If no paths are given, copy everything.\n"
+"   destdir -- The destination into which files are recovered.\n"
+	));
+}
+
+struct destdir {
+	int	fd;
+	char	*path;
+	char	*sep;
+};
+
+struct pathbuf {
+	size_t	len;
+	char	path[PATH_MAX + 1];
+};
+
+static int rdump_file(struct xfs_trans *tp, xfs_ino_t ino,
+		const struct destdir *destdir, struct pathbuf *pbuf);
+
+static inline unsigned int xflags2getflags(const struct fsxattr *fa)
+{
+	unsigned int	ret = 0;
+
+	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
+		ret |= FS_IMMUTABLE_FL;
+	if (fa->fsx_xflags & FS_XFLAG_APPEND)
+		ret |= FS_APPEND_FL;
+	if (fa->fsx_xflags & FS_XFLAG_SYNC)
+		ret |= FS_SYNC_FL;
+	if (fa->fsx_xflags & FS_XFLAG_NOATIME)
+		ret |= FS_NOATIME_FL;
+	if (fa->fsx_xflags & FS_XFLAG_NODUMP)
+		ret |= FS_NODUMP_FL;
+	if (fa->fsx_xflags & FS_XFLAG_DAX)
+		ret |= FS_DAX_FL;
+	if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
+		ret |= FS_PROJINHERIT_FL;
+	return ret;
+}
+
+/* Copy common file attributes to this fd */
+static int
+rdump_fileattrs_fd(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct fsxattr		fsxattr = {
+		.fsx_extsize	= ip->i_extsize,
+		.fsx_projid	= ip->i_projid,
+		.fsx_cowextsize	= ip->i_cowextsize,
+		.fsx_xflags	= xfs_ip2xflags(ip),
+	};
+	int			ret;
+
+	ret = fchmod(fd, VFS_I(ip)->i_mode & ~S_IFMT);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_MODE;
+		else
+			dbprintf(_("%s%s%s: fchmod %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = fchown(fd, i_uid_read(VFS_I(ip)), i_gid_read(VFS_I(ip)));
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_OWNER;
+		else
+			dbprintf(_("%s%s%s: fchown %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = ioctl(fd, XFS_IOC_FSSETXATTR, &fsxattr);
+	if (ret) {
+		unsigned int	getflags = xflags2getflags(&fsxattr);
+
+		/* try to use setflags if the target is not xfs */
+		if (errno == EOPNOTSUPP || errno == ENOTTY) {
+			lost_mask |= LOST_SOME_FSXATTR;
+			ret = ioctl(fd, FS_IOC_SETFLAGS, &getflags);
+		}
+
+		if (errno == EOPNOTSUPP || errno == EPERM || errno == ENOTTY)
+			lost_mask |= LOST_FSXATTR;
+		else
+			dbprintf(_("%s%s%s: fssetxattr %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Copy common file attributes to this path */
+static int
+rdump_fileattrs_path(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	int			ret;
+
+	ret = fchmodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & ~S_IFMT,
+			AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		/* fchmodat on a symlink is not supported */
+		if (errno == EPERM || errno == EOPNOTSUPP)
+			lost_mask |= LOST_MODE;
+		else
+			dbprintf(_("%s%s%s: fchmodat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	ret = fchownat(destdir->fd, pbuf->path, i_uid_read(VFS_I(ip)),
+			i_gid_read(VFS_I(ip)), AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_OWNER;
+		else
+			dbprintf(_("%s%s%s: fchownat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	/* Cannot copy fsxattrs until setfsxattrat gets merged */
+
+	return 0;
+}
+
+/* Copy access and modification timestamps to this fd. */
+static int
+rdump_timestamps_fd(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct timespec		times[2] = {
+		[0] = {
+			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
+		},
+		[1] = {
+			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
+		},
+	};
+	int			ret;
+
+	/* Cannot set ctime or btime */
+
+	ret = futimens(fd, times);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_TIME;
+		else
+			dbprintf(_("%s%s%s: futimens %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Copy access and modification timestamps to this path. */
+static int
+rdump_timestamps_path(
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	struct timespec		times[2] = {
+		[0] = {
+			.tv_sec  = inode_get_atime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_atime_nsec(VFS_I(ip)),
+		},
+		[1] = {
+			.tv_sec  = inode_get_mtime_sec(VFS_I(ip)),
+			.tv_nsec = inode_get_mtime_nsec(VFS_I(ip)),
+		},
+	};
+	int			ret;
+
+	/* Cannot set ctime or btime */
+
+	ret = utimensat(destdir->fd, pbuf->path, times, AT_SYMLINK_NOFOLLOW);
+	if (ret) {
+		if (errno == EPERM)
+			lost_mask |= LOST_TIME;
+		else
+			dbprintf(_("%s%s%s: utimensat %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	return 0;
+}
+
+struct copyxattr {
+	const struct destdir	*destdir;
+	const struct pathbuf	*pbuf;
+	int			fd;
+	char			name[XATTR_NAME_MAX +
+				     XATTR_SECURITY_PREFIX_LEN + 1];
+	char			value[XATTR_SIZE_MAX];
+};
+
+/*
+ * ACL xattrs can be copied verbatim to another XFS filesystem without issue
+ * because the name of the ondisk xattr is not that of the magic POSIX ACL
+ * xattr.  However, if we're rdumping to another filesystem type, we need to
+ * warn the user that the ACLs most likely won't work because we don't want to
+ * introduce a hard dep on libacl to perform a translation for a foreign fs.
+ */
+static inline bool
+cannot_translate_acl(
+	int			fd,
+	const char		*name,
+	unsigned int		namelen)
+{
+	struct statfs		statfsbuf;
+	int			ret;
+
+	if (namelen == SGI_ACL_FILE_SIZE &&
+	    !strncmp(name, SGI_ACL_FILE, SGI_ACL_FILE_SIZE))
+		return false;
+
+	if (namelen == SGI_ACL_DEFAULT_SIZE &&
+	    !strncmp(name, SGI_ACL_DEFAULT, SGI_ACL_DEFAULT_SIZE))
+		return false;
+
+	ret = fstatfs(fd, &statfsbuf);
+	if (ret)
+		return false;
+
+	return statfsbuf.f_type != XFS_SUPER_MAGIC;
+}
+
+/* Copy one extended attribute */
+static int
+rdump_xattr(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct copyxattr	*cx = priv;
+	const char		*namespace;
+	char			*realname = (char *)name;
+	const size_t		remaining = sizeof(cx->name);
+	ssize_t			added;
+	int			realnamelen = namelen;
+	int			ret;
+
+	switch (attr_flags & XFS_ATTR_NSP_ONDISK_MASK) {
+	case XFS_ATTR_PARENT:
+		return 0;
+	case XFS_ATTR_ROOT:
+		namespace = XATTR_TRUSTED_PREFIX;
+		if (!(lost_mask & LOST_ACL) &&
+		    cannot_translate_acl(cx->fd, (const char *)name, namelen))
+			lost_mask |= LOST_ACL;
+		break;
+	case XFS_ATTR_SECURE:
+		namespace = XATTR_SECURITY_PREFIX;
+		break;
+	case 0: /* user xattrs */
+		namespace = XATTR_USER_PREFIX;
+		break;
+	default:
+		dbprintf(_("%s%s%s: unknown xattr namespace 0x%x\n"),
+				cx->destdir->path, cx->destdir->sep,
+				cx->pbuf->path,
+				attr_flags & XFS_ATTR_NSP_ONDISK_MASK);
+		return strict_errors ? ECANCELED : 0;
+	}
+	added = snprintf(cx->name, remaining, "%s%.*s", namespace,
+			realnamelen, realname);
+	if (added > remaining) {
+		dbprintf(
+ _("%s%s%s: ran out of space formatting xattr name %s%.*s\n"),
+				cx->destdir->path, cx->destdir->sep,
+				cx->pbuf->path, namespace, realnamelen,
+				realname);
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	/* Retrieve xattr value if needed */
+	if (valuelen > 0 && !value) {
+		struct xfs_da_args	args = {
+			.trans		= tp,
+			.dp		= ip,
+			.geo		= mp->m_attr_geo,
+			.owner		= ip->i_ino,
+			.attr_filter	= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
+			.namelen	= namelen,
+			.name		= name,
+			.value		= cx->value,
+			.valuelen	= valuelen,
+		};
+
+		ret = -libxfs_attr_rmtval_get(&args);
+		if (ret) {
+			dbprintf(_("%s: reading xattr \"%s\" value %s\n"),
+					cx->pbuf->path, cx->name,
+					strerror(ret));
+			return strict_errors ? ECANCELED : 0;
+		}
+
+		value = cx->value;
+	}
+
+	ret = fsetxattr(cx->fd, cx->name, value, valuelen, 0);
+	if (ret) {
+		if (ret == EOPNOTSUPP)
+			lost_mask |= LOST_XATTR;
+		else
+			dbprintf(_("%s%s%s: fsetxattr \"%s\" %s\n"),
+					cx->destdir->path, cx->destdir->sep,
+					cx->pbuf->path, cx->name,
+					strerror(errno));
+		if (strict_errors)
+			return ECANCELED;
+	}
+
+	return 0;
+}
+
+/* Copy extended attributes */
+static int
+rdump_xattrs(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct copyxattr	*cx;
+	int			ret;
+
+	cx = calloc(1, sizeof(struct copyxattr));
+	if (!cx) {
+		dbprintf(_("%s%s%s: allocating xattr buffer %s\n"),
+				destdir->path, destdir->sep, pbuf->path,
+				strerror(errno));
+		return 1;
+	}
+	cx->destdir = destdir;
+	cx->pbuf = pbuf;
+	cx->fd = fd;
+
+	ret = xattr_walk(tp, ip, rdump_xattr, cx);
+	if (ret && ret != ECANCELED) {
+		dbprintf(_("%s%s%s: listxattr %s\n"), destdir->path,
+				destdir->sep, pbuf->path, strerror(errno));
+		if (strict_errors)
+			return 1;
+	}
+
+	free(cx);
+	return 0;
+}
+
+struct copydirent {
+	const struct destdir	*destdir;
+	struct pathbuf		*pbuf;
+	int			fd;
+};
+
+/* Copy a directory entry. */
+static int
+rdump_dirent(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	off,
+	char			*name,
+	ssize_t			namelen,
+	xfs_ino_t		ino,
+	uint8_t			dtype,
+	void			*private)
+{
+	struct copydirent	*cd = private;
+	size_t			oldlen = cd->pbuf->len;
+	const size_t		remaining = PATH_MAX + 1 - oldlen;
+	ssize_t			added;
+	int			ret;
+
+	/* Negative length means name is null-terminated */
+	if (namelen < 0)
+		namelen = -namelen;
+
+	/* Ignore dot and dotdot */
+	if (namelen == 1 && name[0] == '.')
+		return 0;
+	if (namelen == 2 && name[0] == '.' && name[1] == '.')
+		return 0;
+
+	if (namelen > FILENAME_MAX) {
+		dbprintf(_("%s%s%s: %s\n"),
+				cd->destdir->path, cd->destdir->sep,
+				cd->pbuf->path, strerror(ENAMETOOLONG));
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	added = snprintf(&cd->pbuf->path[oldlen], remaining, "%s%.*s",
+			oldlen ? "/" : "", (int)namelen, name);
+	if (added > remaining) {
+		dbprintf(_("%s%s%s: ran out of space formatting file name\n"),
+				cd->destdir->path, cd->destdir->sep,
+				cd->pbuf->path);
+		return strict_errors ? ECANCELED : 0;
+	}
+
+	cd->pbuf->len += added;
+	ret = rdump_file(tp, ino, cd->destdir, cd->pbuf);
+	cd->pbuf->len = oldlen;
+	cd->pbuf->path[oldlen] = 0;
+	return ret;
+}
+
+/* Copy a directory */
+static int
+rdump_directory(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	const struct destdir	*destdir,
+	struct pathbuf		*pbuf)
+{
+	struct copydirent	*cd;
+	int			ret, ret2;
+
+	cd = calloc(1, sizeof(struct copydirent));
+	if (!cd) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+	cd->destdir = destdir;
+	cd->pbuf = pbuf;
+
+	if (pbuf->len) {
+		/*
+		 * If path is non-empty, we want to create a child somewhere
+		 * underneath the target directory.
+		 */
+		ret = mkdirat(destdir->fd, pbuf->path, 0600);
+		if (ret && errno != EEXIST) {
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+			goto out_cd;
+		}
+
+		cd->fd = openat(destdir->fd, pbuf->path,
+				O_RDONLY | O_DIRECTORY);
+		if (cd->fd < 0) {
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+			ret = 1;
+			goto out_cd;
+		}
+	} else {
+		/*
+		 * If path is empty, then we're copying the children of a
+		 * directory into the target directory.
+		 */
+		cd->fd = destdir->fd;
+	}
+
+	ret = rdump_fileattrs_fd(dp, destdir, pbuf, cd->fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	if (xfs_inode_has_attr_fork(dp)) {
+		ret = rdump_xattrs(tp, dp, destdir, pbuf, cd->fd);
+		if (ret && strict_errors)
+			goto out_close;
+	}
+
+	ret = listdir(tp, dp, rdump_dirent, cd);
+	if (ret && ret != ECANCELED) {
+		dbprintf(_("%s%s%s: readdir %s\n"), destdir->path,
+				destdir->sep, pbuf->path, strerror(ret));
+		if (strict_errors)
+			goto out_close;
+	}
+
+	ret = rdump_timestamps_fd(dp, destdir, pbuf, cd->fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	ret = 0;
+
+out_close:
+	if (cd->fd != destdir->fd) {
+		ret2 = close(cd->fd);
+		if (ret2) {
+			if (!ret)
+				ret = ret2;
+			dbprintf(_("%s%s%s: %s\n"), destdir->path,
+					destdir->sep, pbuf->path,
+					strerror(errno));
+		}
+	}
+
+out_cd:
+	free(cd);
+	return ret;
+}
+
+/* Copy file data */
+static int
+rdump_regfile_data(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf,
+	int			fd)
+{
+	struct xfs_bmbt_irec	irec = { };
+	struct xfs_buftarg	*btp;
+	int			nmaps;
+	off_t			pos = 0;
+	const off_t		isize = ip->i_disk_size;
+	int			ret;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		btp = ip->i_mount->m_rtdev_targp;
+	else
+		btp = ip->i_mount->m_ddev_targp;
+
+	for (;
+	     pos < isize;
+	     pos = XFS_FSB_TO_B(mp, irec.br_startoff + irec.br_blockcount)) {
+		struct xfs_buf	*bp;
+		off_t		buf_pos;
+		off_t		fd_pos;
+		xfs_fileoff_t	off = XFS_B_TO_FSBT(mp, pos);
+		xfs_filblks_t	max_read = XFS_B_TO_FSB(mp, 1048576);
+		xfs_daddr_t	daddr;
+		size_t		count;
+
+		nmaps = 1;
+		ret = -libxfs_bmapi_read(ip, off, max_read, &irec, &nmaps, 0);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+			if (strict_errors)
+				return 1;
+			continue;
+		}
+		if (!nmaps)
+			break;
+
+		if (!xfs_bmap_is_written_extent(&irec))
+			continue;
+
+		fd_pos = XFS_FSB_TO_B(mp, irec.br_startoff);
+		if (XFS_IS_REALTIME_INODE(ip))
+			daddr =  xfs_rtb_to_daddr(mp, irec.br_startblock);
+		else
+			daddr = XFS_FSB_TO_DADDR(mp, irec.br_startblock);
+
+		ret = -libxfs_buf_read_uncached(btp, daddr,
+				XFS_FSB_TO_BB(mp, irec.br_blockcount), 0, &bp,
+				NULL);
+		if (ret) {
+			dbprintf(_("%s: reading pos 0x%llx %s\n"), pbuf->path,
+					fd_pos, strerror(ret));
+			if (strict_errors)
+				return 1;
+			continue;
+		}
+
+		count = XFS_FSB_TO_B(mp, irec.br_blockcount);
+		if (fd_pos + count > isize)
+			count = isize - fd_pos;
+
+		buf_pos = 0;
+		while (count > 0) {
+			ssize_t	written;
+
+			written = pwrite(fd, bp->b_addr + buf_pos, count,
+					fd_pos);
+			if (written < 0) {
+				libxfs_buf_relse(bp);
+				dbprintf(_("%s%s%s: writing pos 0x%llx %s\n"),
+						destdir->path, destdir->sep,
+						pbuf->path, fd_pos,
+						strerror(errno));
+				return 1;
+			}
+			if (!written) {
+				libxfs_buf_relse(bp);
+				dbprintf(
+ _("%s%s%s: wrote zero at pos 0x%llx %s\n"),
+						destdir->path, destdir->sep,
+						pbuf->path, fd_pos);
+				return 1;
+			}
+
+			fd_pos += written;
+			buf_pos += written;
+			count -= written;
+		}
+
+		libxfs_buf_relse(bp);
+	}
+
+	ret = ftruncate(fd, isize);
+	if (ret) {
+		dbprintf(_("%s%s%s: setting file length 0x%llx %s\n"),
+				destdir->path, destdir->sep, pbuf->path,
+				isize, strerror(errno));
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Copy a regular file */
+static int
+rdump_regfile(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	int			fd;
+	int			ret, ret2;
+
+	fd = openat(destdir->fd, pbuf->path, O_RDWR | O_CREAT | O_TRUNC, 0600);
+	if (fd < 0) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_fd(ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	if (xfs_inode_has_attr_fork(ip)) {
+		ret = rdump_xattrs(tp, ip, destdir, pbuf, fd);
+		if (ret && strict_errors)
+			goto out_close;
+	}
+
+	ret = rdump_regfile_data(tp, ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+	ret = rdump_timestamps_fd(ip, destdir, pbuf, fd);
+	if (ret && strict_errors)
+		goto out_close;
+
+out_close:
+	ret2 = close(fd);
+	if (ret2) {
+		if (!ret)
+			ret = ret2;
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	return ret;
+}
+
+/* Copy a symlink */
+static int
+rdump_symlink(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	char			target[XFS_SYMLINK_MAXLEN + 1];
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+	unsigned int		targetlen = ip->i_disk_size;
+	int			ret;
+
+	if (ifp->if_format == XFS_DINODE_FMT_LOCAL) {
+		memcpy(target, ifp->if_data, targetlen);
+	} else {
+		ret = -libxfs_symlink_remote_read(ip, target);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+			return strict_errors ? 1 : 0;
+		}
+	}
+	target[targetlen] = 0;
+
+	ret = symlinkat(target, destdir->fd, pbuf->path);
+	if (ret) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = rdump_timestamps_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+/* Copy a special file */
+static int
+rdump_special(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	const struct destdir	*destdir,
+	const struct pathbuf	*pbuf)
+{
+	const unsigned int	major = IRIX_DEV_MAJOR(VFS_I(ip)->i_rdev);
+	const unsigned int	minor = IRIX_DEV_MINOR(VFS_I(ip)->i_rdev);
+	int			ret;
+
+	ret = mknodat(destdir->fd, pbuf->path, VFS_I(ip)->i_mode & S_IFMT,
+			makedev(major, minor));
+	if (ret) {
+		dbprintf(_("%s%s%s: %s\n"), destdir->path, destdir->sep,
+				pbuf->path, strerror(errno));
+		return 1;
+	}
+
+	ret = rdump_fileattrs_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = rdump_timestamps_path(ip, destdir, pbuf);
+	if (ret && strict_errors)
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+/* Dump some kind of file. */
+static int
+rdump_file(
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	const struct destdir	*destdir,
+	struct pathbuf		*pbuf)
+{
+	struct xfs_inode	*ip;
+	int			ret;
+
+	ret = -libxfs_iget(mp, tp, ino, 0, &ip);
+	if (ret) {
+		dbprintf(_("%s: %s\n"), pbuf->path, strerror(ret));
+		return strict_errors ? ret : 0;
+	}
+
+	switch(VFS_I(ip)->i_mode & S_IFMT) {
+	case S_IFDIR:
+		ret = rdump_directory(tp, ip, destdir, pbuf);
+		break;
+	case S_IFREG:
+		ret = rdump_regfile(tp, ip, destdir, pbuf);
+		break;
+	case S_IFLNK:
+		ret = rdump_symlink(tp, ip, destdir, pbuf);
+		break;
+	default:
+		ret = rdump_special(tp, ip, destdir, pbuf);
+		break;
+	}
+
+	libxfs_irele(ip);
+	return ret;
+}
+
+/* Copy one path out of the filesystem. */
+static int
+rdump_path(
+	struct xfs_mount	*mp,
+	bool			sole_path,
+	const char		*path,
+	const struct destdir	*destdir)
+{
+	struct xfs_trans	*tp;
+	struct pathbuf		*pbuf;
+	const char		*basename;
+	ssize_t			pathlen = strlen(path);
+	int			ret = 1;
+
+	/* Set up destination path data */
+	if (pathlen > PATH_MAX) {
+		dbprintf(_("%s: %s\n"), path, strerror(ENAMETOOLONG));
+		return 1;
+	}
+
+	pbuf = calloc(1, sizeof(struct pathbuf));
+	if (!pbuf) {
+		dbprintf(_("allocating path buf: %s\n"), strerror(errno));
+		return 1;
+	}
+	basename = strrchr(path, '/');
+	if (basename) {
+		pbuf->len = pathlen - (basename + 1 - path);
+		memcpy(pbuf->path, basename + 1, pbuf->len);
+	} else {
+		pbuf->len = pathlen;
+		memcpy(pbuf->path, path, pbuf->len);
+	}
+	pbuf->path[pbuf->len] = 0;
+
+	/* Dump the inode referenced. */
+	if (pathlen) {
+		ret = path_walk(mp->m_sb.sb_rootino, path);
+		if (ret) {
+			dbprintf(_("%s: %s\n"), path, strerror(ret));
+			return 1;
+		}
+
+		if (sole_path) {
+			struct xfs_dinode	*dip = iocur_top->data;
+
+			/*
+			 * If this is the only path to copy out and it's a dir,
+			 * then we can copy the children directly into the
+			 * target.
+			 */
+			if (S_ISDIR(be16_to_cpu(dip->di_mode))) {
+				pbuf->len = 0;
+				pbuf->path[0] = 0;
+			}
+		}
+	} else {
+		set_cur_inode(mp->m_sb.sb_rootino);
+	}
+
+	ret = -libxfs_trans_alloc_empty(mp, &tp);
+	if (ret) {
+		dbprintf(_("allocating state: %s\n"), strerror(ret));
+		goto out_pbuf;
+	}
+
+	ret = rdump_file(tp, iocur_top->ino, destdir, pbuf);
+	libxfs_trans_cancel(tp);
+out_pbuf:
+	free(pbuf);
+	return ret;
+}
+
+static int
+rdump_f(
+	int		argc,
+	char		*argv[])
+{
+	struct destdir	destdir;
+	int		i;
+	int		c;
+	int		ret;
+
+	lost_mask = 0;
+	strict_errors = false;
+	while ((c = getopt(argc, argv, "s")) != -1) {
+		switch (c) {
+		case 's':
+			strict_errors = true;
+			break;
+		default:
+			rdump_help();
+			return 0;
+		}
+	}
+
+	if (argc < optind + 1) {
+		dbprintf(
+ _("Must supply destination directory.\n"));
+		return 0;
+	}
+
+	/* Create and open destination directory */
+	destdir.path = argv[argc - 1];
+	ret = mkdir(destdir.path, 0755);
+	if (ret && errno != EEXIST) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (destdir.path[0] == 0) {
+		dbprintf(
+ _("Destination dir must be at least one character.\n"));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (destdir.path[strlen(destdir.path) - 1] != '/')
+		destdir.sep = "/";
+	else
+		destdir.sep = "";
+	destdir.fd = open(destdir.path, O_DIRECTORY | O_RDONLY);
+	if (destdir.fd < 0) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+		return 0;
+	}
+
+	if (optind == argc - 1) {
+		/* no dirs given, just do the whole fs */
+		push_cur();
+		ret = rdump_path(mp, false, "", &destdir);
+		pop_cur();
+		if (ret)
+			exitcode = 1;
+		goto out_close;
+	}
+
+	for (i = optind; i < argc - 1; i++) {
+		size_t	len = strlen(argv[i]);
+
+		/* trim trailing slashes */
+		while (len && argv[i][len - 1] == '/')
+			len--;
+		argv[i][len] = 0;
+
+		push_cur();
+		ret = rdump_path(mp, argc == optind + 2, argv[i], &destdir);
+		pop_cur();
+
+		if (ret) {
+			exitcode = 1;
+			if (strict_errors)
+				break;
+		}
+	}
+
+out_close:
+	ret = close(destdir.fd);
+	if (ret) {
+		dbprintf(_("%s: %s\n"), destdir.path, strerror(errno));
+		exitcode = 1;
+	}
+
+	if (lost_mask & LOST_OWNER)
+		dbprintf(_("%s: some uid/gid could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_MODE)
+		dbprintf(_("%s: some file modes could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_TIME)
+		dbprintf(_("%s: some timestamps could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_SOME_FSXATTR)
+		dbprintf(_("%s: some xfs file attr bits could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_FSXATTR)
+		dbprintf(_("%s: some xfs file attrs could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_XATTR)
+		dbprintf(_("%s: some extended xattrs could not be set\n"),
+				destdir.path);
+	if (lost_mask & LOST_ACL)
+		dbprintf(_("%s: some ACLs could not be translated\n"),
+				destdir.path);
+
+	return 0;
+}
+
+static struct cmdinfo rdump_cmd = {
+	.name		= "rdump",
+	.cfunc		= rdump_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.canpush	= 0,
+	.args		= "[-s] [paths...] dest_directory",
+	.help		= rdump_help,
+};
+
+void
+rdump_init(void)
+{
+	rdump_cmd.oneline = _("recover files out of a filesystem");
+	add_command(&rdump_cmd);
+}
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 08f38f37ca01cc..2a9322560584b0 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1132,6 +1132,32 @@ .SH COMMANDS
 Exit
 .BR xfs_db .
 .TP
+.BI "rdump [-s] [" "paths..." "] " "destination_dir"
+Recover the files given by the
+.B path
+arguments by copying them of the filesystem into the directory specified in
+.BR destination_dir .
+
+If the
+.B -s
+option is specified, errors are fatal.
+By default, read errors are ignored in favor of recovering as much data from
+the filesystem as possible.
+
+If zero
+.B paths
+are specified, the entire filesystem is dumped.
+If only one
+.B path
+is specified and it is a directory, the children of that directory will be
+copied directly to the destination.
+If multiple
+.B paths
+are specified, each file is copied into the directory as a new child.
+
+If possible, sparse holes, xfs file attributes, and extended attributes will be
+preserved.
+.TP
 .BI "rgresv [" rgno ]
 Displays the per-rtgroup reservation size, and per-rtgroup
 reservation usage for a given realtime allocation group.

^ permalink raw reply related	[flat|nested] 225+ messages in thread

* Re: [PATCH v1.1 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-18 19:58     ` [PATCH v1.1 " Darrick J. Wong
@ 2025-02-19  5:27       ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-19  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-18  8:36     ` Christoph Hellwig
  2025-02-18 15:54       ` Darrick J. Wong
@ 2025-02-19 18:04       ` Eric Sandeen
  2025-02-19 18:30         ` Darrick J. Wong
  2025-02-20  6:19         ` Christoph Hellwig
  1 sibling, 2 replies; 225+ messages in thread
From: Eric Sandeen @ 2025-02-19 18:04 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J. Wong; +Cc: aalbersh, hch, linux-xfs

On 2/18/25 12:36 AM, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
>> From: Darrick J. Wong <djwong@kernel.org>
>>
>> Aheada of deprecating V4 support in the kernel, let's give people a way
>> to extract their files from a filesystem without needing to mount.
> 
> So I've wanted a userspace file access for a while, but if we deprecate
> the v4 support in the kernel that will propagte to libxfs quickly,
> and this code won't help you with v4 file systems either.  So I don't
> think the rationale here seems very good.

I agree. And, I had considered "migrate your V4 fs via an older kernel" to
be a perfectly acceptable answer for the v4 deprecation scenario.

Christoph, can you say more about why you're eager for the functionality?

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-19 18:04       ` Eric Sandeen
@ 2025-02-19 18:30         ` Darrick J. Wong
  2025-02-20  6:20           ` Christoph Hellwig
  2025-02-20  6:19         ` Christoph Hellwig
  1 sibling, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-19 18:30 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, aalbersh, hch, linux-xfs

On Wed, Feb 19, 2025 at 12:04:36PM -0600, Eric Sandeen wrote:
> On 2/18/25 12:36 AM, Christoph Hellwig wrote:
> > On Thu, Feb 06, 2025 at 03:03:32PM -0800, Darrick J. Wong wrote:
> >> From: Darrick J. Wong <djwong@kernel.org>
> >>
> >> Aheada of deprecating V4 support in the kernel, let's give people a way
> >> to extract their files from a filesystem without needing to mount.
> > 
> > So I've wanted a userspace file access for a while, but if we deprecate
> > the v4 support in the kernel that will propagte to libxfs quickly,
> > and this code won't help you with v4 file systems either.  So I don't
> > think the rationale here seems very good.
> 
> I agree. And, I had considered "migrate your V4 fs via an older kernel" to
> be a perfectly acceptable answer for the v4 deprecation scenario.

I thought about that some more after you and I chatted, and decided to
go for a userspace rdump because (a) other people asked for it, and (b)
the "use an older kernel" path has its own problems.

Let's say it's 2037 and someone finds an old XFSv4 filesystem and decide
to migrate it.  That means they'll have to boot a kernel from 2030,
format a new (2030-era) V5 filesystem on a secondary device, and copy
everything to it.  If that old kernel doesn't boot for whatever reason
(outdated drivers, mismatched mitigations on the VM host, etc) they're
stuck debugging hardware support.  If they're lucky enough that the 2030
kernel actually boots with a 2037-era userspace, then they've got a
fresh filesystem and they're done.

If instead it turns out that 2037-era userspace (coughsystemdcough)
doesn't work on an old 2030 kernel, then they'll have to go install a
2030-era userspace, boot the 2030 kernel, and migrate the data.  Now
they have a V5 filesystem with 2030 era features.  If they want 2037 era
features, they have to move the device to the new system and migrate a
second time.

Contrast this to the userspace approach -- they'll have to go find a
xfs_db binary + libraries from 2030.  If that doesn't work they can try
to build the 2030 source code.  Then they can create a new filesystem on
their 2037 era machine, mount it, and tell the old xfs_db binary to
rdump everything to the new filesystem.  Now they have an XFS with
2037-era features, and they're done.

Frankly I'm more confident in our future ability to help people
run/build userspace source code from years ago than the kernel, because
there's so much more that can go wrong with the kernel.  We guarantee
that new kernels won't break old binaries; we don't guarantee that new
hardware won't break old kernels.

--D

> Christoph, can you say more about why you're eager for the functionality?
> 
> Thanks,
> -Eric
> 

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-19 18:04       ` Eric Sandeen
  2025-02-19 18:30         ` Darrick J. Wong
@ 2025-02-20  6:19         ` Christoph Hellwig
  1 sibling, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-20  6:19 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, Darrick J. Wong, aalbersh, hch, linux-xfs

On Wed, Feb 19, 2025 at 12:04:36PM -0600, Eric Sandeen wrote:
> Christoph, can you say more about why you're eager for the functionality?

Trying to read data from a file system on a system without real root
privileges, e.g. in a random container or multi-user system.


^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems
  2025-02-19 18:30         ` Darrick J. Wong
@ 2025-02-20  6:20           ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-20  6:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, Christoph Hellwig, aalbersh, hch, linux-xfs

On Wed, Feb 19, 2025 at 10:30:56AM -0800, Darrick J. Wong wrote:
> If instead it turns out that 2037-era userspace (coughsystemdcough)
> doesn't work on an old 2030 kernel, then they'll have to go install a
> 2030-era userspace, boot the 2030 kernel, and migrate the data.  Now
> they have a V5 filesystem with 2030 era features.  If they want 2037 era
> features, they have to move the device to the new system and migrate a
> second time.

You probably wouldn't boot the old kernel on hardware but in a VM.
Or have and old user mode Linux build around :)

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [PATCH v1.1] libfrog: wrap handle construction code
  2025-02-06 22:31   ` [PATCH 04/17] libfrog: wrap handle construction code Darrick J. Wong
  2025-02-07  4:34     ` Christoph Hellwig
@ 2025-02-20 21:36     ` Darrick J. Wong
  2025-02-24 16:58       ` Christoph Hellwig
  1 sibling, 1 reply; 225+ messages in thread
From: Darrick J. Wong @ 2025-02-20 21:36 UTC (permalink / raw)
  To: aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Clean up all the open-coded logic to construct a file handle from a
fshandle and some bulkstat/parent pointer information.  The new
functions are stashed in a private header file to avoid leaking the
details of xfs_handle construction in the public libhandle headers.

I tried moving the code to libhandle, but I don't entirely like the
result.  The libhandle functions pass around handles as arbitrary binary
blobs that come from and are sent to the kernel, meaning that the
interface is full of (void *, size_t) tuples.  Putting these new
functions in libhandle breaks that abstraction because now clients know
that they can deal with a struct xfs_handle.

We could fix that leak by changing it to a (void *, size_t) tuple, but
then we'd have to validate the size_t or returns -1 having set errno,
which then means that all the client code now has to have error handling
for a case that we're fairly sure can't be true.  This is overkill for
xfsprogs code that knows better, because we can trust ourselves to know
the exact layout of a handle.

So this nice compact code:

	memcpy(&handle.ha_fsid, file->fshandle, file->fshandle_len);
	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
				sizeof(handle.ha_fid.fid_len);

becomes:

	ret = handle_from_fshandle(&handle, file->fshandle,
			file->fshandle_len);
	if (ret) {
		perror("what?");
		return -1;
	}

Which is much more verbose code, and right now it exists to handle an
exceptional condition that is not possible.  If someone outside of
xfsprogs would like this sort of functionality in libhandle I'm all for
adding it, but with zero demand from external users, I prefer to keep
things simple.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
v1.1: explain why this isn't in libhandle for now
---
 libfrog/handle_priv.h |   55 +++++++++++++++++++++++++++++++++++++++++++++++++
 io/parent.c           |    9 +++-----
 libfrog/Makefile      |    1 +
 scrub/common.c        |    9 +++-----
 scrub/inodes.c        |   13 ++++--------
 scrub/phase5.c        |   12 ++++-------
 spaceman/health.c     |    9 +++-----
 7 files changed, 73 insertions(+), 35 deletions(-)
 create mode 100644 libfrog/handle_priv.h

diff --git a/libfrog/handle_priv.h b/libfrog/handle_priv.h
new file mode 100644
index 00000000000000..8c3634c40de1c8
--- /dev/null
+++ b/libfrog/handle_priv.h
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __LIBFROG_HANDLE_PRIV_H__
+#define __LIBFROG_HANDLE_PRIV_H__
+
+/*
+ * Private helpers to construct an xfs_handle without publishing those details
+ * in the public libhandle header files.
+ */
+
+/*
+ * Fills out the fsid part of a handle.  This does not initialize the fid part
+ * of the handle; use either of the two functions below.
+ */
+static inline void
+handle_from_fshandle(
+	struct xfs_handle	*handle,
+	const void		*fshandle,
+	size_t			fshandle_len)
+{
+	ASSERT(fshandle_len == sizeof(xfs_fsid_t));
+
+	memcpy(&handle->ha_fsid, fshandle, sizeof(handle->ha_fsid));
+	handle->ha_fid.fid_len = sizeof(xfs_fid_t) -
+			sizeof(handle->ha_fid.fid_len);
+	handle->ha_fid.fid_pad = 0;
+	handle->ha_fid.fid_ino = 0;
+	handle->ha_fid.fid_gen = 0;
+}
+
+/* Fill out the fid part of a handle from raw components. */
+static inline void
+handle_from_inogen(
+	struct xfs_handle	*handle,
+	uint64_t		ino,
+	uint32_t		gen)
+{
+	handle->ha_fid.fid_ino = ino;
+	handle->ha_fid.fid_gen = gen;
+}
+
+/* Fill out the fid part of a handle. */
+static inline void
+handle_from_bulkstat(
+	struct xfs_handle		*handle,
+	const struct xfs_bulkstat	*bstat)
+{
+	handle->ha_fid.fid_ino = bstat->bs_ino;
+	handle->ha_fid.fid_gen = bstat->bs_gen;
+}
+
+#endif /* __LIBFROG_HANDLE_PRIV_H__ */
diff --git a/io/parent.c b/io/parent.c
index 8db93d98755289..3ba3aef48cb9be 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -11,6 +11,7 @@
 #include "handle.h"
 #include "init.h"
 #include "io.h"
+#include "libfrog/handle_priv.h"
 
 static cmdinfo_t parent_cmd;
 static char *mntpt;
@@ -205,12 +206,8 @@ parent_f(
 			return 0;
 		}
 
-		memcpy(&handle, hanp, sizeof(handle));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = ino;
-		handle.ha_fid.fid_gen = gen;
+		handle_from_fshandle(&handle, hanp, hlen);
+		handle_from_inogen(&handle, ino, gen);
 	} else if (optind != argc) {
 		return command_usage(&parent_cmd);
 	}
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 4da427789411a6..fc7e506d96bbad 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -53,6 +53,7 @@ fsgeom.h \
 fsproperties.h \
 fsprops.h \
 getparents.h \
+handle_priv.h \
 histogram.h \
 logging.h \
 paths.h \
diff --git a/scrub/common.c b/scrub/common.c
index f86546556f46dd..6eb3c026dc5ac9 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -10,6 +10,7 @@
 #include "platform_defs.h"
 #include "libfrog/paths.h"
 #include "libfrog/getparents.h"
+#include "libfrog/handle_priv.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "progress.h"
@@ -414,12 +415,8 @@ scrub_render_ino_descr(
 	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) {
 		struct xfs_handle handle;
 
-		memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = ino;
-		handle.ha_fid.fid_gen = gen;
+		handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
+		handle_from_inogen(&handle, ino, gen);
 
 		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 4096,
 				buf, buflen);
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 3fe759e8f4867d..2b492a634ea3b2 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -19,6 +19,7 @@
 #include "descr.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
+#include "libfrog/handle_priv.h"
 
 /*
  * Iterate a range of inodes.
@@ -209,7 +210,7 @@ scan_ag_bulkstat(
 	xfs_agnumber_t		agno,
 	void			*arg)
 {
-	struct xfs_handle	handle = { };
+	struct xfs_handle	handle;
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	struct scan_ichunk	*ichunk = arg;
 	struct xfs_inumbers_req	*ireq = ichunk_to_inumbers(ichunk);
@@ -225,12 +226,7 @@ scan_ag_bulkstat(
 	DEFINE_DESCR(dsc_inumbers, ctx, render_inumbers_from_agno);
 
 	descr_set(&dsc_inumbers, &agno);
-
-	memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-			sizeof(handle.ha_fid.fid_len);
-	handle.ha_fid.fid_pad = 0;
-
+	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 retry:
 	bulkstat_for_inumbers(ctx, &dsc_inumbers, inumbers, breq);
 
@@ -244,8 +240,7 @@ scan_ag_bulkstat(
 			continue;
 
 		descr_set(&dsc_bulkstat, bs);
-		handle.ha_fid.fid_ino = scan_ino;
-		handle.ha_fid.fid_gen = bs->bs_gen;
+		handle_from_bulkstat(&handle, bs);
 		error = si->fn(ctx, &handle, bs, si->arg);
 		switch (error) {
 		case 0:
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 22a22915dbc68d..6460d00f30f4bd 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -18,6 +18,7 @@
 #include "libfrog/bitmap.h"
 #include "libfrog/bulkstat.h"
 #include "libfrog/fakelibattr.h"
+#include "libfrog/handle_priv.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "inodes.h"
@@ -474,9 +475,7 @@ retry_deferred_inode(
 	if (error)
 		return error;
 
-	handle->ha_fid.fid_ino = bstat.bs_ino;
-	handle->ha_fid.fid_gen = bstat.bs_gen;
-
+	handle_from_bulkstat(handle, &bstat);
 	return check_inode_names(ncs->ctx, handle, &bstat, ncs);
 }
 
@@ -487,16 +486,13 @@ retry_deferred_inode_range(
 	uint64_t		len,
 	void			*arg)
 {
-	struct xfs_handle	handle = { };
+	struct xfs_handle	handle;
 	struct ncheck_state	*ncs = arg;
 	struct scrub_ctx	*ctx = ncs->ctx;
 	uint64_t		i;
 	int			error;
 
-	memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
-	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-			sizeof(handle.ha_fid.fid_len);
-	handle.ha_fid.fid_pad = 0;
+	handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 
 	for (i = 0; i < len; i++) {
 		error = retry_deferred_inode(ncs, &handle, ino + i);
diff --git a/spaceman/health.c b/spaceman/health.c
index 4281589324cd44..0d2767df424f27 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -14,6 +14,7 @@
 #include "libfrog/bulkstat.h"
 #include "space.h"
 #include "libfrog/getparents.h"
+#include "libfrog/handle_priv.h"
 
 static cmdinfo_t health_cmd;
 static unsigned long long reported;
@@ -317,12 +318,8 @@ report_inode(
 	    (file->xfd.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT)) {
 		struct xfs_handle handle;
 
-		memcpy(&handle.ha_fsid, file->fshandle, sizeof(handle.ha_fsid));
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-				sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_ino = bs->bs_ino;
-		handle.ha_fid.fid_gen = bs->bs_gen;
+		handle_from_fshandle(&handle, file->fshandle, file->fshandle_len);
+		handle_from_bulkstat(&handle, bs);
 
 		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 0,
 				descr, sizeof(descr) - 1);

^ permalink raw reply related	[flat|nested] 225+ messages in thread

* Re: [PATCH v1.1] libfrog: wrap handle construction code
  2025-02-20 21:36     ` [PATCH v1.1] " Darrick J. Wong
@ 2025-02-24 16:58       ` Christoph Hellwig
  0 siblings, 0 replies; 225+ messages in thread
From: Christoph Hellwig @ 2025-02-24 16:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 225+ messages in thread

end of thread, other threads:[~2025-02-24 16:58 UTC | newest]

Thread overview: 225+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-06 22:21 [PATCHBOMB] xfsprogs: all my changes for 6.14 Darrick J. Wong
2025-02-06 22:29 ` [PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration Darrick J. Wong
2025-02-06 22:30   ` [PATCH 01/17] libxfs: unmap xmbuf pages to avoid disaster Darrick J. Wong
2025-02-07  4:10     ` Christoph Hellwig
2025-02-06 22:31   ` [PATCH 02/17] libxfs: mark xmbuf_{un,}map_page static Darrick J. Wong
2025-02-06 22:31   ` [PATCH 03/17] man: document new XFS_BULK_IREQ_METADIR flag to bulkstat Darrick J. Wong
2025-02-07  4:10     ` Christoph Hellwig
2025-02-06 22:31   ` [PATCH 04/17] libfrog: wrap handle construction code Darrick J. Wong
2025-02-07  4:34     ` Christoph Hellwig
2025-02-07  4:49       ` Darrick J. Wong
2025-02-07 17:00         ` Darrick J. Wong
2025-02-13  4:05           ` Christoph Hellwig
2025-02-13  4:20             ` Darrick J. Wong
2025-02-20 21:36     ` [PATCH v1.1] " Darrick J. Wong
2025-02-24 16:58       ` Christoph Hellwig
2025-02-06 22:31   ` [PATCH 05/17] xfs_scrub: don't report data loss in unlinked inodes twice Darrick J. Wong
2025-02-07  4:35     ` Christoph Hellwig
2025-02-07  4:46       ` Darrick J. Wong
2025-02-06 22:32   ` [PATCH 06/17] xfs_scrub: call bulkstat directly if we're only scanning user files Darrick J. Wong
2025-02-07  4:37     ` Christoph Hellwig
2025-02-06 22:32   ` [PATCH 07/17] xfs_scrub: remove flags argument from scrub_scan_all_inodes Darrick J. Wong
2025-02-07  4:38     ` Christoph Hellwig
2025-02-06 22:32   ` [PATCH 08/17] xfs_scrub: selectively re-run bulkstat after re-running inumbers Darrick J. Wong
2025-02-07  4:39     ` Christoph Hellwig
2025-02-06 22:33   ` [PATCH 09/17] xfs_scrub: actually iterate all the bulkstat records Darrick J. Wong
2025-02-07  4:40     ` Christoph Hellwig
2025-02-06 22:33   ` [PATCH 10/17] xfs_scrub: don't double-scan inodes during phase 3 Darrick J. Wong
2025-02-07  4:41     ` Christoph Hellwig
2025-02-06 22:33   ` [PATCH 11/17] xfs_scrub: don't (re)set the bulkstat request icount incorrectly Darrick J. Wong
2025-02-07  4:42     ` Christoph Hellwig
2025-02-06 22:33   ` [PATCH 12/17] xfs_scrub: don't complain if bulkstat fails Darrick J. Wong
2025-02-07  4:42     ` Christoph Hellwig
2025-02-06 22:34   ` [PATCH 13/17] xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data Darrick J. Wong
2025-02-07  4:43     ` Christoph Hellwig
2025-02-06 22:34   ` [PATCH 14/17] xfs_scrub: don't blow away new inodes in bulkstat_single_step Darrick J. Wong
2025-02-07  4:46     ` Christoph Hellwig
2025-02-07  4:50       ` Darrick J. Wong
2025-02-06 22:34   ` [PATCH 15/17] xfs_scrub: hoist the phase3 bulkstat single stepping code Darrick J. Wong
2025-02-07  4:47     ` Christoph Hellwig
2025-02-06 22:34   ` [PATCH 16/17] xfs_scrub: ignore freed inodes when single-stepping during phase 3 Darrick J. Wong
2025-02-07  4:47     ` Christoph Hellwig
2025-02-06 22:35   ` [PATCH 17/17] xfs_scrub: try harder to fill the bulkstat array with bulkstat() Darrick J. Wong
2025-02-07  4:48     ` Christoph Hellwig
2025-02-06 22:29 ` [PATCHSET 2/5] xfsprogs: new libxfs code from kernel 6.14 Darrick J. Wong
2025-02-06 22:35   ` [PATCH 01/56] xfs: tidy up xfs_iroot_realloc Darrick J. Wong
2025-02-06 22:35   ` [PATCH 02/56] xfs: refactor the inode fork memory allocation functions Darrick J. Wong
2025-02-06 22:35   ` [PATCH 03/56] xfs: make xfs_iroot_realloc take the new numrecs instead of deltas Darrick J. Wong
2025-02-06 22:36   ` [PATCH 04/56] xfs: make xfs_iroot_realloc a bmap btree function Darrick J. Wong
2025-02-06 22:36   ` [PATCH 05/56] xfs: tidy up xfs_bmap_broot_realloc a bit Darrick J. Wong
2025-02-06 22:36   ` [PATCH 06/56] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong
2025-02-06 22:36   ` [PATCH 07/56] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong
2025-02-06 22:37   ` [PATCH 08/56] xfs: add some rtgroup inode helpers Darrick J. Wong
2025-02-06 22:37   ` [PATCH 09/56] xfs: prepare to reuse the dquot pointer space in struct xfs_inode Darrick J. Wong
2025-02-06 22:37   ` [PATCH 10/56] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong
2025-02-06 22:37   ` [PATCH 11/56] xfs: support storing records in the inode core root Darrick J. Wong
2025-02-06 22:38   ` [PATCH 12/56] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong
2025-02-06 22:38   ` [PATCH 13/56] xfs: introduce realtime rmap btree ondisk definitions Darrick J. Wong
2025-02-06 22:38   ` [PATCH 14/56] xfs: realtime rmap btree transaction reservations Darrick J. Wong
2025-02-06 22:39   ` [PATCH 15/56] xfs: add realtime rmap btree operations Darrick J. Wong
2025-02-06 22:39   ` [PATCH 16/56] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong
2025-02-06 22:39   ` [PATCH 17/56] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
2025-02-06 22:39   ` [PATCH 18/56] xfs: pretty print metadata file types in error messages Darrick J. Wong
2025-02-06 22:40   ` [PATCH 19/56] xfs: support file data forks containing metadata btrees Darrick J. Wong
2025-02-06 22:40   ` [PATCH 20/56] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong
2025-02-06 22:40   ` [PATCH 21/56] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong
2025-02-06 22:40   ` [PATCH 22/56] xfs: wire up a new metafile type for the realtime rmap Darrick J. Wong
2025-02-06 22:41   ` [PATCH 23/56] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong
2025-02-06 22:41   ` [PATCH 24/56] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong
2025-02-06 22:41   ` [PATCH 25/56] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong
2025-02-06 22:41   ` [PATCH 26/56] xfs: scrub the realtime rmapbt Darrick J. Wong
2025-02-06 22:42   ` [PATCH 27/56] xfs: scrub the metadir path of rt rmap btree files Darrick J. Wong
2025-02-06 22:42   ` [PATCH 28/56] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong
2025-02-06 22:42   ` [PATCH 29/56] xfs: online repair of the realtime rmap btree Darrick J. Wong
2025-02-06 22:42   ` [PATCH 30/56] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong
2025-02-06 22:43   ` [PATCH 31/56] xfs: namespace the maximum length/refcount symbols Darrick J. Wong
2025-02-06 22:43   ` [PATCH 32/56] xfs: introduce realtime refcount btree ondisk definitions Darrick J. Wong
2025-02-06 22:43   ` [PATCH 33/56] xfs: realtime refcount btree transaction reservations Darrick J. Wong
2025-02-06 22:43   ` [PATCH 34/56] xfs: add realtime refcount btree operations Darrick J. Wong
2025-02-06 22:44   ` [PATCH 35/56] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong
2025-02-06 22:44   ` [PATCH 36/56] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
2025-02-06 22:44   ` [PATCH 37/56] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong
2025-02-06 22:45   ` [PATCH 38/56] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong
2025-02-06 22:45   ` [PATCH 39/56] xfs: wire up a new metafile type for the realtime refcount Darrick J. Wong
2025-02-06 22:45   ` [PATCH 40/56] xfs: wire up realtime refcount btree cursors Darrick J. Wong
2025-02-06 22:45   ` [PATCH 41/56] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong
2025-02-06 22:46   ` [PATCH 42/56] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong
2025-02-06 22:46   ` [PATCH 43/56] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong
2025-02-06 22:46   ` [PATCH 44/56] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong
2025-02-06 22:46   ` [PATCH 45/56] xfs: recover CoW leftovers in the realtime volume Darrick J. Wong
2025-02-06 22:47   ` [PATCH 46/56] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong
2025-02-06 22:47   ` [PATCH 47/56] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
2025-02-06 22:47   ` [PATCH 48/56] xfs: enable extent size hints for CoW operations Darrick J. Wong
2025-02-06 22:47   ` [PATCH 49/56] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong
2025-02-06 22:48   ` [PATCH 50/56] xfs: scrub the realtime refcount btree Darrick J. Wong
2025-02-06 22:48   ` [PATCH 51/56] xfs: scrub the metadir path of rt refcount btree files Darrick J. Wong
2025-02-06 22:48   ` [PATCH 52/56] xfs: fix the entry condition of exact EOF block allocation optimization Darrick J. Wong
2025-02-06 22:48   ` [PATCH 53/56] xfs: mark xfs_dir_isempty static Darrick J. Wong
2025-02-06 22:49   ` [PATCH 54/56] xfs: remove XFS_ILOG_NONCORE Darrick J. Wong
2025-02-06 22:49   ` [PATCH 55/56] xfs: constify feature checks Darrick J. Wong
2025-02-06 22:49   ` [PATCH 56/56] xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() Darrick J. Wong
2025-02-06 22:30 ` [PATCHSET v6.3 3/5] xfsprogs: realtime reverse-mapping support Darrick J. Wong
2025-02-06 22:49   ` [PATCH 01/27] libxfs: compute the rt rmap btree maxlevels during initialization Darrick J. Wong
2025-02-07  5:07     ` Christoph Hellwig
2025-02-06 22:50   ` [PATCH 02/27] libxfs: add a realtime flag to the rmap update log redo items Darrick J. Wong
2025-02-07  5:08     ` Christoph Hellwig
2025-02-06 22:50   ` [PATCH 03/27] libfrog: enable scrubbing of the realtime rmap Darrick J. Wong
2025-02-07  5:08     ` Christoph Hellwig
2025-02-06 22:50   ` [PATCH 04/27] man: document userspace API changes due to rt rmap Darrick J. Wong
2025-02-07  5:08     ` Christoph Hellwig
2025-02-06 22:51   ` [PATCH 05/27] xfs_db: compute average btree height Darrick J. Wong
2025-02-07  5:09     ` Christoph Hellwig
2025-02-06 22:51   ` [PATCH 06/27] xfs_db: don't abort when bmapping on a non-extents/bmbt fork Darrick J. Wong
2025-02-07  5:09     ` Christoph Hellwig
2025-02-06 22:51   ` [PATCH 07/27] xfs_db: display the realtime rmap btree contents Darrick J. Wong
2025-02-07  5:16     ` Christoph Hellwig
2025-02-06 22:51   ` [PATCH 08/27] xfs_db: support the realtime rmapbt Darrick J. Wong
2025-02-07  5:17     ` Christoph Hellwig
2025-02-06 22:52   ` [PATCH 09/27] xfs_db: copy the realtime rmap btree Darrick J. Wong
2025-02-07  5:17     ` Christoph Hellwig
2025-02-06 22:52   ` [PATCH 10/27] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong
2025-02-07  5:18     ` Christoph Hellwig
2025-02-06 22:52   ` [PATCH 11/27] xfs_db: add an rgresv command Darrick J. Wong
2025-02-07  5:19     ` Christoph Hellwig
2025-02-07  5:24       ` Darrick J. Wong
2025-02-06 22:52   ` [PATCH 12/27] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong
2025-02-07  5:19     ` Christoph Hellwig
2025-02-06 22:53   ` [PATCH 13/27] xfs_repair: tidy up rmap_diffkeys Darrick J. Wong
2025-02-07  5:20     ` Christoph Hellwig
2025-02-06 22:53   ` [PATCH 14/27] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong
2025-02-07  5:21     ` Christoph Hellwig
2025-02-06 22:53   ` [PATCH 15/27] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong
2025-02-07  5:22     ` Christoph Hellwig
2025-02-06 22:53   ` [PATCH 16/27] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong
2025-02-07  5:22     ` Christoph Hellwig
2025-02-06 22:54   ` [PATCH 17/27] xfs_repair: refactor realtime inode check Darrick J. Wong
2025-02-07  5:22     ` Christoph Hellwig
2025-02-06 22:54   ` [PATCH 18/27] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong
2025-02-07  5:33     ` Christoph Hellwig
2025-02-06 22:54   ` [PATCH 19/27] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong
2025-02-07  5:35     ` Christoph Hellwig
2025-02-06 22:54   ` [PATCH 20/27] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong
2025-02-07  5:35     ` Christoph Hellwig
2025-02-06 22:55   ` [PATCH 21/27] xfs_repair: rebuild the realtime rmap btree Darrick J. Wong
2025-02-07  5:54     ` Christoph Hellwig
2025-02-06 22:55   ` [PATCH 22/27] xfs_repair: check for global free space concerns with default btree slack levels Darrick J. Wong
2025-02-07  5:56     ` Christoph Hellwig
2025-02-06 22:55   ` [PATCH 23/27] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong
2025-02-07  5:56     ` Christoph Hellwig
2025-02-06 22:55   ` [PATCH 24/27] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong
2025-02-07  5:57     ` Christoph Hellwig
2025-02-06 22:56   ` [PATCH 25/27] xfs_logprint: report realtime RUIs Darrick J. Wong
2025-02-07  5:58     ` Christoph Hellwig
2025-02-07  5:58     ` Christoph Hellwig
2025-02-06 22:56   ` [PATCH 26/27] mkfs: add some rtgroup inode helpers Darrick J. Wong
2025-02-07  5:59     ` Christoph Hellwig
2025-02-07  6:07       ` Darrick J. Wong
2025-02-06 22:56   ` [PATCH 27/27] mkfs: create the realtime rmap inode Darrick J. Wong
2025-02-07  6:00     ` Christoph Hellwig
2025-02-06 22:30 ` [PATCHSET v6.3 4/5] xfsprogs: reflink on the realtime device Darrick J. Wong
2025-02-06 22:57   ` [PATCH 01/22] libxfs: compute the rt refcount btree maxlevels during initialization Darrick J. Wong
2025-02-13  4:20     ` Christoph Hellwig
2025-02-06 22:57   ` [PATCH 02/22] libxfs: add a realtime flag to the refcount update log redo items Darrick J. Wong
2025-02-13  4:20     ` Christoph Hellwig
2025-02-06 22:57   ` [PATCH 03/22] libxfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong
2025-02-13  4:21     ` Christoph Hellwig
2025-02-06 22:57   ` [PATCH 04/22] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong
2025-02-13  4:21     ` Christoph Hellwig
2025-02-06 22:58   ` [PATCH 05/22] man: document userspace API changes due to rt reflink Darrick J. Wong
2025-02-13  4:21     ` Christoph Hellwig
2025-02-06 22:58   ` [PATCH 06/22] xfs_db: display the realtime refcount btree contents Darrick J. Wong
2025-02-13  4:22     ` Christoph Hellwig
2025-02-06 22:58   ` [PATCH 07/22] xfs_db: support the realtime refcountbt Darrick J. Wong
2025-02-13  4:22     ` Christoph Hellwig
2025-02-06 22:58   ` [PATCH 08/22] xfs_db: copy the realtime refcount btree Darrick J. Wong
2025-02-13  4:23     ` Christoph Hellwig
2025-02-06 22:59   ` [PATCH 09/22] xfs_db: add rtrefcount reservations to the rgresv command Darrick J. Wong
2025-02-13  4:23     ` Christoph Hellwig
2025-02-06 22:59   ` [PATCH 10/22] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong
2025-02-13  4:24     ` Christoph Hellwig
2025-02-06 22:59   ` [PATCH 11/22] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong
2025-02-13  4:24     ` Christoph Hellwig
2025-02-06 22:59   ` [PATCH 12/22] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong
2025-02-13  4:24     ` Christoph Hellwig
2025-02-06 23:00   ` [PATCH 13/22] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong
2025-02-13  4:25     ` Christoph Hellwig
2025-02-06 23:00   ` [PATCH 14/22] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong
2025-02-13  4:25     ` Christoph Hellwig
2025-02-06 23:00   ` [PATCH 15/22] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong
2025-02-13  4:26     ` Christoph Hellwig
2025-02-06 23:00   ` [PATCH 16/22] xfs_repair: reject unwritten shared extents Darrick J. Wong
2025-02-13  4:26     ` Christoph Hellwig
2025-02-06 23:01   ` [PATCH 17/22] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong
2025-02-13  4:27     ` Christoph Hellwig
2025-02-06 23:01   ` [PATCH 18/22] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong
2025-02-13  4:27     ` Christoph Hellwig
2025-02-06 23:01   ` [PATCH 19/22] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong
2025-02-13  4:27     ` Christoph Hellwig
2025-02-06 23:01   ` [PATCH 20/22] xfs_logprint: report realtime CUIs Darrick J. Wong
2025-02-13  4:28     ` Christoph Hellwig
2025-02-06 23:02   ` [PATCH 21/22] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong
2025-02-13  4:28     ` Christoph Hellwig
2025-02-06 23:02   ` [PATCH 22/22] mkfs: enable reflink on the realtime device Darrick J. Wong
2025-02-13  4:28     ` Christoph Hellwig
2025-02-06 22:30 ` [PATCHSET 5/5] xfsprogs: dump fs directory trees Darrick J. Wong
2025-02-06 23:02   ` [PATCH 1/4] xfs_db: pass const pointers when we're not modifying them Darrick J. Wong
2025-02-13  4:29     ` Christoph Hellwig
2025-02-13 13:26     ` Andrey Albershteyn
2025-02-06 23:03   ` [PATCH 2/4] xfs_db: use an empty transaction to try to prevent livelocks in path_navigate Darrick J. Wong
2025-02-13  4:29     ` Christoph Hellwig
2025-02-13 13:26     ` Andrey Albershteyn
2025-02-06 23:03   ` [PATCH 3/4] xfs_db: make listdir more generally useful Darrick J. Wong
2025-02-13  4:30     ` Christoph Hellwig
2025-02-13 13:26     ` Andrey Albershteyn
2025-02-06 23:03   ` [PATCH 4/4] xfs_db: add command to copy directory trees out of filesystems Darrick J. Wong
2025-02-13 13:25     ` Andrey Albershteyn
2025-02-18  8:36     ` Christoph Hellwig
2025-02-18 15:54       ` Darrick J. Wong
2025-02-18 18:04         ` Darrick J. Wong
2025-02-18 19:49           ` Darrick J. Wong
2025-02-19 18:04       ` Eric Sandeen
2025-02-19 18:30         ` Darrick J. Wong
2025-02-20  6:20           ` Christoph Hellwig
2025-02-20  6:19         ` Christoph Hellwig
2025-02-18 19:58     ` [PATCH v1.1 " Darrick J. Wong
2025-02-19  5:27       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox