linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10
@ 2024-07-02  0:43 Darrick J. Wong
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                   ` (15 more replies)
  0 siblings, 16 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:43 UTC (permalink / raw)
  To: Christoph Hellwig, Carlos Maiolino; +Cc: xfs

Hi everyone,

This is a partial patchbomb of all the changes I would like to merge for
xfsprogs 6.10.  The first patchset adds exchange-range support, and the
11th patchset adds parent pointers support.  Nearly everything else are
all the fixes needed in userspace (i.e. xfs_scrub) to complete online
repair.

Most of the new stuff for xfs_scrub are a lot of hardening of the
systemd system services; strengthening of the deceptive unicode checker
to report on the last few years' worth of unicode attacks; and better
responsiveness during the phase 8 FITRIM.

I've omitted everything that's already passed review, which cut down on
the patch count substantially.  Any patch with an 'R' column 1 has been
reviewed by someone; a patch with a '-' in column 1 has not been RVB'd.

[PATCHSET v30.7 01/16] xfsprogs: atomic file updates
- [PATCH 01/12] man: document the exchange-range ioctl
- [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE
- [PATCH 03/12] libhandle: add support for bulkstat v5
- [PATCH 04/12] libfrog: add support for exchange range ioctl family
- [PATCH 05/12] xfs_db: advertise exchange-range in the version command
- [PATCH 06/12] xfs_logprint: support dumping exchmaps log items
- [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls
- [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer
- [PATCH 09/12] xfs_io: create exchangerange command to test file range
- [PATCH 10/12] libfrog: advertise exchange-range support
- [PATCH 11/12] xfs_repair: add exchange-range to file systems
- [PATCH 12/12] mkfs: add a formatting option for exchange-range
[PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes
- [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to
- [PATCH 2/3] libxfs: port the bumplink function from the kernel
- [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow
[PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename
- [PATCH 01/13] xfs_scrub: use proper UChar string iterators
- [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters
- [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code
- [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a
- [PATCH 05/13] xfs_scrub: guard against libicu returning negative
- [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate
- [PATCH 07/13] xfs_scrub: store bad flags with the name entry
- [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to
- [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags
- [PATCH 10/13] xfs_scrub: reduce size of struct name_entry
- [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer
- [PATCH 12/13] xfs_scrub: report deceptive file extensions
- [PATCH 13/13] xfs_scrub: dump unicode points
[PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase
- [PATCH 1/8] xfs_scrub: move FITRIM to phase 8
- [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim
- [PATCH 3/8] xfs_scrub: collapse trim_filesystem
- [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8
- [PATCH 5/8] xfs_scrub: report FITRIM errors properly
- [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors
- [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for
- [PATCH 8/8] xfs_scrub: improve progress meter for phase 8 fstrimming
[PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce
- [PATCH 1/7] libfrog: hoist free space histogram code
- [PATCH 2/7] libfrog: print wider columns for free space histogram
- [PATCH 3/7] libfrog: print cdf of free space buckets
- [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress
- [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments
- [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7
- [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free
[PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd
- [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing
R [PATCH 2/6] xfs_scrub.service: reduce CPU usage to 60%% when possible
R [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd
R [PATCH 4/6] xfs_scrub: tighten up the security on the background
- [PATCH 5/6] xfs_scrub_fail: tighten up the security on the background
- [PATCH 6/6] xfs_scrub_all: tighten up the security on the background
[PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service
- [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services
- [PATCH 2/6] xfs_scrub_all: remove journalctl background process
- [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all
- [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs
- [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per
- [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all
[PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling
- [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an
- [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl code in an
- [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging
- [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus
- [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls
[PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default
- [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in
- [PATCH 2/3] xfs_scrub: add an optimization-only mode
- [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by
[PATCHSET v13.7 10/16] xfsprogs: improve extended attribute
- [PATCH 1/3] xfs_repair: check free space requirements before allowing
- [PATCH 2/3] xfs_repair: enforce one namespace bit per extended
- [PATCH 3/3] xfs_repair: check for unknown flags in attr entries
[PATCHSET v13.7 11/16] xfsprogs: Parent Pointers
- [PATCH 01/24] libxfs: create attr log item opcodes and formats for
- [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function
- [PATCH 03/24] xfs_logprint: dump new attr log item fields
- [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl
- [PATCH 05/24] libfrog: report parent pointers to userspace
- [PATCH 06/24] libfrog: add parent pointer support code
R [PATCH 07/24] xfs_io: adapt parent command to new parent pointer
R [PATCH 08/24] xfs_io: Add i, n and f flags to parent command
R [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items
- [PATCH 10/24] xfs_spaceman: report file paths
- [PATCH 11/24] xfs_scrub: use parent pointers when possible to report
- [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data
- [PATCH 13/24] xfs_db: report parent pointers in version command
R [PATCH 14/24] xfs_db: report parent bit on xattrs
- [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs
- [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names
- [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via
- [PATCH 18/24] xfs_db: add a parents command to list the parents of a
- [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent
- [PATCH 20/24] xfs_db: add link and unlink expert commands
- [PATCH 21/24] xfs_db: compute hashes of parent pointers
- [PATCH 22/24] libxfs: create new files with attr forks if necessary
R [PATCH 23/24] mkfs: Add parent pointers during protofile creation
R [PATCH 24/24] mkfs: enable formatting with parent pointers
[PATCHSET v13.7 12/16] xfsprogs: scrubbing for parent pointers
- [PATCH 1/2] xfs: create a blob array data structure
- [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent
[PATCHSET v13.7 13/16] xfsprogs: offline repair for parent pointers
- [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set
- [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set
- [PATCH 03/12] xfs_repair: junk parent pointer attributes when
- [PATCH 04/12] xfs_repair: add parent pointers when messing with
- [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when
- [PATCH 06/12] xfs_repair: build a parent pointer index
- [PATCH 07/12] xfs_repair: move the global dirent name store to a
- [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob
- [PATCH 09/12] xfs_repair: check parent pointers
- [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes
- [PATCH 11/12] xfs_repair: update ondisk parent pointer records
- [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are
[PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree
- [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub
- [PATCH 2/5] xfs_spaceman: report directory tree corruption in the
- [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names
- [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions
- [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails
[PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls
- [PATCH 01/10] man: document vectored scrub mode
- [PATCH 02/10] libfrog: support vectored scrub
- [PATCH 03/10] xfs_io: support vectored scrub
- [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a
- [PATCH 05/10] xfs_scrub: split the repair epilogue code into a
- [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use
- [PATCH 07/10] xfs_scrub: vectorize scrub calls
- [PATCH 08/10] xfs_scrub: vectorize repair calls
- [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls
- [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make
[PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok
- [PATCH 1/1] xfs_repair: allow symlinks with short remote targets

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 01/16] xfsprogs: atomic file updates
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
@ 2024-07-02  0:49 ` Darrick J. Wong
  2024-07-02  0:53   ` [PATCH 01/12] man: document the exchange-range ioctl Darrick J. Wong
                     ` (11 more replies)
  2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
                   ` (14 subsequent siblings)
  15 siblings, 12 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:49 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

This series creates a new XFS_IOC_EXCHANGE_RANGE ioctl to exchange
ranges of bytes between two files atomically.

This new functionality enables data storage programs to stage and commit
file updates such that reader programs will see either the old contents
or the new contents in their entirety, with no chance of torn writes.  A
successful call completion guarantees that the new contents will be seen
even if the system fails.

The ability to exchange file fork mappings between files in this manner
is critical to supporting online filesystem repair, which is built upon
the strategy of constructing a clean copy of a damaged structure and
committing the new structure into the metadata file atomically.  The
ioctls exist to facilitate testing of the new functionality and to
enable future application program designs.

User programs will be able to update files atomically by opening an
O_TMPFILE, reflinking the source file to it, making whatever updates
they want to make, and exchange the relevant ranges of the temp file
with the original file.  If the updates are aligned with the file block
size, a new (since v2) flag provides for exchanging only the written
areas.  Note that application software must quiesce writes to the file
while it stages an atomic update.  This will be addressed by a
subsequent series.

This mechanism solves the clunkiness of two existing atomic file update
mechanisms: for O_TRUNC + rewrite, this eliminates the brief period
where other programs can see an empty file.  For create tempfile +
rename, the need to copy file attributes and extended attributes for
each file update is eliminated.

However, this method introduces its own awkwardness -- any program
initiating an exchange now needs to have a way to signal to other
programs that the file contents have changed.  For file access mediated
via read and write, fanotify or inotify are probably sufficient.  For
mmaped files, that may not be fast enough.

The reference implementation in XFS creates a new log incompat feature
and log intent items to track high level progress of swapping ranges of
two files and finish interrupted work if the system goes down.  Sample
code can be found in the corresponding changes to xfs_io to exercise the
use case mentioned above.

Note that this function is /not/ the O_DIRECT atomic untorn file writes
concept that has also been floating around for years.  It is also not
the RWF_ATOMIC patchset that has been shared.  This RFC is constructed
entirely in software, which means that there are no limitations other
than the general filesystem limits.

As a side note, the original motivation behind the kernel functionality
is online repair of file-based metadata.  The atomic file content
exchange is implemented as an atomic exchange of file fork mappings,
which means that we can implement online reconstruction of extended
attributes and directories by building a new one in another inode and
exchanging the contents.

Subsequent patchsets adapt the online filesystem repair code to use
atomic file exchanges.  This enables repair functions to construct a
clean copy of a directory, xattr information, symbolic links, realtime
bitmaps, and realtime summary information in a temporary inode.  If this
completes successfully, the new contents can be committed atomically
into the inode being repaired.  This is essential to avoid making
corruption problems worse if the system goes down in the middle of
running repair.

For userspace, this series also includes the userspace pieces needed to
test the new functionality, and a sample implementation of atomic file
updates.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=atomic-file-updates

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=atomic-file-updates

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=atomic-file-updates
---
Commits in this patchset:
 * man: document the exchange-range ioctl
 * man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE
 * libhandle: add support for bulkstat v5
 * libfrog: add support for exchange range ioctl family
 * xfs_db: advertise exchange-range in the version command
 * xfs_logprint: support dumping exchmaps log items
 * xfs_fsr: convert to bulkstat v5 ioctls
 * xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations
 * xfs_io: create exchangerange command to test file range exchange ioctl
 * libfrog: advertise exchange-range support
 * xfs_repair: add exchange-range to file systems
 * mkfs: add a formatting option for exchange-range
---
 db/sb.c                             |    2 
 fsr/xfs_fsr.c                       |  162 ++++++++++++--------
 include/jdm.h                       |   24 +++
 io/Makefile                         |   48 +++++-
 io/exchrange.c                      |  156 ++++++++++++++++++++
 io/init.c                           |    1 
 io/io.h                             |    1 
 libfrog/Makefile                    |    2 
 libfrog/file_exchange.c             |   52 +++++++
 libfrog/file_exchange.h             |   15 ++
 libfrog/fsgeom.c                    |   49 +++++-
 libfrog/fsgeom.h                    |    1 
 libhandle/jdm.c                     |  117 +++++++++++++++
 logprint/log_misc.c                 |   11 +
 logprint/log_print_all.c            |   12 ++
 logprint/log_redo.c                 |  128 ++++++++++++++++
 logprint/logprint.h                 |    6 +
 man/man2/ioctl_xfs_exchange_range.2 |  278 +++++++++++++++++++++++++++++++++++
 man/man2/ioctl_xfs_fsgeometry.2     |    3 
 man/man8/mkfs.xfs.8.in              |    7 +
 man/man8/xfs_admin.8                |    7 +
 man/man8/xfs_io.8                   |   40 +++++
 mkfs/lts_4.19.conf                  |    1 
 mkfs/lts_5.10.conf                  |    1 
 mkfs/lts_5.15.conf                  |    1 
 mkfs/lts_5.4.conf                   |    1 
 mkfs/lts_6.1.conf                   |    1 
 mkfs/lts_6.6.conf                   |    1 
 mkfs/xfs_mkfs.c                     |   26 +++
 repair/globals.c                    |    1 
 repair/globals.h                    |    1 
 repair/phase2.c                     |   30 ++++
 repair/xfs_repair.c                 |   11 +
 33 files changed, 1111 insertions(+), 86 deletions(-)
 create mode 100644 io/exchrange.c
 create mode 100644 libfrog/file_exchange.c
 create mode 100644 libfrog/file_exchange.h
 create mode 100644 man/man2/ioctl_xfs_exchange_range.2


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
@ 2024-07-02  0:49 ` Darrick J. Wong
  2024-07-02  0:56   ` [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args Darrick J. Wong
                     ` (2 more replies)
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                   ` (13 subsequent siblings)
  15 siblings, 3 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:49 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

While doing QA of the online fsck code, I made a few observations:
First, nobody was checking that the di_onlink field is actually zero;
Second, that allocating a temporary file for repairs can fail (and
thus bring down the entire fs) if the inode cluster is corrupt; and
Third, that file link counts do not pin at ~0U to prevent integer
overflows.

This scattered patchset fixes those three problems.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=inode-repair-improvements

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=inode-repair-improvements
---
Commits in this patchset:
 * xfs_{db,repair}: add an explicit owner field to xfs_da_args
 * libxfs: port the bumplink function from the kernel
 * mkfs/repair: pin inodes that would otherwise overflow link count
---
 db/namei.c          |    1 +
 include/xfs_inode.h |    2 ++
 libxfs/util.c       |   18 ++++++++++++++++++
 mkfs/proto.c        |    4 ++--
 repair/incore_ino.c |    3 ++-
 repair/phase6.c     |   13 ++++++++-----
 6 files changed, 33 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
  2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
@ 2024-07-02  0:50 ` Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 01/13] xfs_scrub: use proper UChar string iterators Darrick J. Wong
                     ` (12 more replies)
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                   ` (12 subsequent siblings)
  15 siblings, 13 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:50 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

In early 2023, malware researchers disclosed a phishing attack that was
targeted at people running Linux workstations.  The attack vector
involved the use of filenames containing what looked like a file
extension but instead contained a lookalike for the full stop (".")
and a common extension ("pdf").  Enhance xfs_scrub phase 5 to detect
these types of attacks and warn the system administrator.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-detect-deceptive-extensions

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-detect-deceptive-extensions
---
Commits in this patchset:
 * xfs_scrub: use proper UChar string iterators
 * xfs_scrub: hoist code that removes ignorable characters
 * xfs_scrub: add a couple of omitted invisible code points
 * xfs_scrub: avoid potential UAF after freeing a duplicate name entry
 * xfs_scrub: guard against libicu returning negative buffer lengths
 * xfs_scrub: hoist non-rendering character predicate
 * xfs_scrub: store bad flags with the name entry
 * xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE
 * xfs_scrub: type-coerce the UNICRASH_* flags
 * xfs_scrub: reduce size of struct name_entry
 * xfs_scrub: rename struct unicrash.normalizer
 * xfs_scrub: report deceptive file extensions
 * xfs_scrub: dump unicode points
---
 scrub/unicrash.c |  530 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 424 insertions(+), 106 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
@ 2024-07-02  0:50 ` Darrick J. Wong
  2024-07-02  1:01   ` [PATCH 1/8] xfs_scrub: move FITRIM to phase 8 Darrick J. Wong
                     ` (7 more replies)
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
                   ` (11 subsequent siblings)
  15 siblings, 8 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:50 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

Back when I originally designed xfs_scrub, all filesystem metadata
checks were complete by the end of phase 3, and phase 4 was where all
the metadata repairs occurred.  On the grounds that the filesystem
should be fully consistent by then, I made a call to FITRIM at the end
of phase 4 to discard empty space in the filesystem.

Unfortunately, that's no longer the case -- summary counters, link
counts, and quota counters are not checked until phase 7.  It's not safe
to instruct the storage to unmap "empty" areas if we don't know where
those empty areas are, so we need to create a phase 8 to trim the fs.
While we're at it, make it more obvious that fstrim only gets to run if
there are no unfixed corruptions and no other runtime errors have
occurred.

Finally, reduce the latency impacts on the rest of the system by
breaking up the fstrim work into a loop that targets only 16GB per call.
This enables better progress reporting for interactive runs and cgroup
based resource constraints for background runs.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-fstrim-phase
---
Commits in this patchset:
 * xfs_scrub: move FITRIM to phase 8
 * xfs_scrub: ignore phase 8 if the user disabled fstrim
 * xfs_scrub: collapse trim_filesystem
 * xfs_scrub: fix the work estimation for phase 8
 * xfs_scrub: report FITRIM errors properly
 * xfs_scrub: don't call FITRIM after runtime errors
 * xfs_scrub: don't trim the first agbno of each AG for better performance
 * xfs_scrub: improve progress meter for phase 8 fstrimming
---
 scrub/Makefile    |    1 
 scrub/phase4.c    |   30 +---------
 scrub/phase8.c    |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/vfs.c       |   22 +++++---
 scrub/vfs.h       |    2 -
 scrub/xfs_scrub.c |   11 +++-
 scrub/xfs_scrub.h |    3 +
 7 files changed, 184 insertions(+), 37 deletions(-)
 create mode 100644 scrub/phase8.c


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
@ 2024-07-02  0:50 ` Darrick J. Wong
  2024-07-02  1:03   ` [PATCH 1/7] libfrog: hoist free space histogram code Darrick J. Wong
                     ` (6 more replies)
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
                   ` (10 subsequent siblings)
  15 siblings, 7 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:50 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

This patchset dramatically reduces the runtime of the FITRIM calls made
during phase 8 of xfs_scrub.  It turns out that phase 8 can really get
bogged down if the free space contains a large number of very small
extents.  In these cases, the runtime can increase by an order of
magnitude to free less than 1% of the free space.  This is not worth the
time, since we're spending a lot of time to do very little work.  The
FITRIM ioctl allows us to specify a minimum extent length, so we can use
statistical methods to compute a minlen parameter.

It turns out xfs_db/spaceman already have the code needed to create
histograms of free space extent lengths.  We add the ability to compute
a CDF of the extent lengths, which make it easy to pick a minimum length
corresponding to 99% of the free space.  In most cases, this results in
dramatic reductions in phase 8 runtime.  Hence, move the histogram code
to libfrog, and wire up xfs_scrub, since phase 7 already walks the
fsmap.

We also add a new -o suboption to xfs_scrub so that people who /do/ want
to examine every free extent can do so.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-fstrim-minlen-freesp-histogram

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-fstrim-minlen-freesp-histogram
---
Commits in this patchset:
 * libfrog: hoist free space histogram code
 * libfrog: print wider columns for free space histogram
 * libfrog: print cdf of free space buckets
 * xfs_scrub: don't close stdout when closing the progress bar
 * xfs_scrub: remove pointless spacemap.c arguments
 * xfs_scrub: collect free space histograms during phase 7
 * xfs_scrub: tune fstrim minlen parameter based on free space histograms
---
 db/freesp.c          |   83 +++-------------
 libfrog/Makefile     |    2 
 libfrog/histogram.c  |  252 ++++++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/histogram.h  |   54 +++++++++++
 man/man8/xfs_scrub.8 |   16 +++
 scrub/phase7.c       |   47 +++++++++
 scrub/phase8.c       |   75 ++++++++++++++-
 scrub/spacemap.c     |   11 +-
 scrub/vfs.c          |    4 +
 scrub/vfs.h          |    2 
 scrub/xfs_scrub.c    |   45 +++++++++
 scrub/xfs_scrub.h    |   16 +++
 spaceman/freesp.c    |   93 +++++-------------
 13 files changed, 544 insertions(+), 156 deletions(-)
 create mode 100644 libfrog/histogram.c
 create mode 100644 libfrog/histogram.h


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
@ 2024-07-02  0:50 ` Darrick J. Wong
  2024-07-02  1:04   ` [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing Darrick J. Wong
                     ` (5 more replies)
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
                   ` (9 subsequent siblings)
  15 siblings, 6 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:50 UTC (permalink / raw)
  To: djwong, cem; +Cc: Christoph Hellwig, Helle Vaanzinn, linux-xfs, hch

Hi all,

To reduce the risk of the online fsck service suffering some sort of
catastrophic breach that results in attackers reconfiguring the running
system, I embarked on a security audit of the systemd service files.
The result should be that all elements of the background service
(individual scrub jobs, the scrub_all initiator, and the failure
reporting) run with as few privileges and within as strong of a sandbox
as possible.

Granted, this does nothing about the potential for the /kernel/ screwing
up, but at least we could prevent obvious container escapes.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-service-security
---
Commits in this patchset:
 * xfs_scrub: allow auxiliary pathnames for sandboxing
 * xfs_scrub.service: reduce background CPU usage to less than one core if possible
 * xfs_scrub: use dynamic users when running as a systemd service
 * xfs_scrub: tighten up the security on the background systemd service
 * xfs_scrub_fail: tighten up the security on the background systemd service
 * xfs_scrub_all: tighten up the security on the background systemd service
---
 man/man8/xfs_scrub.8             |    9 +++-
 scrub/Makefile                   |    7 ++-
 scrub/phase1.c                   |    4 +-
 scrub/system-xfs_scrub.slice     |   30 ++++++++++++
 scrub/vfs.c                      |    2 -
 scrub/xfs_scrub.c                |   11 +++-
 scrub/xfs_scrub.h                |    5 ++
 scrub/xfs_scrub@.service.in      |   97 ++++++++++++++++++++++++++++++++++----
 scrub/xfs_scrub_all.service.in   |   66 ++++++++++++++++++++++++++
 scrub/xfs_scrub_fail@.service.in |   59 +++++++++++++++++++++++
 10 files changed, 270 insertions(+), 20 deletions(-)
 create mode 100644 scrub/system-xfs_scrub.slice


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
@ 2024-07-02  0:51 ` Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode Darrick J. Wong
                     ` (5 more replies)
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
                   ` (8 subsequent siblings)
  15 siblings, 6 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:51 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

Now that we've completed the online fsck functionality, there are a few
things that could be improved in the automatic service.  Specifically,
we would like to perform a more intensive metadata + media scan once per
month, to give the user confidence that the filesystem isn't losing data
silently.  To accomplish this, enhance xfs_scrub_all to be able to
trigger media scans.  Next, add a duplicate set of system services that
start the media scans automatically.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service
---
Commits in this patchset:
 * xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode
 * xfs_scrub_all: remove journalctl background process
 * xfs_scrub_all: support metadata+media scans of all filesystems
 * xfs_scrub_all: enable periodic file data scrubs automatically
 * xfs_scrub_all: trigger automatic media scans once per month
 * xfs_scrub_all: failure reporting for the xfs_scrub_all job
---
 debian/rules                           |    3 +
 include/builddefs.in                   |    3 +
 man/man8/Makefile                      |    7 ++
 man/man8/xfs_scrub_all.8.in            |   20 +++++
 scrub/Makefile                         |   21 +++++-
 scrub/xfs_scrub@.service.in            |    2 -
 scrub/xfs_scrub_all.cron.in            |    2 -
 scrub/xfs_scrub_all.in                 |  122 ++++++++++++++++++++++++++------
 scrub/xfs_scrub_all.service.in         |    9 ++
 scrub/xfs_scrub_all_fail.service.in    |   71 +++++++++++++++++++
 scrub/xfs_scrub_fail.in                |   46 +++++++++---
 scrub/xfs_scrub_fail@.service.in       |    2 -
 scrub/xfs_scrub_media@.service.in      |  100 ++++++++++++++++++++++++++
 scrub/xfs_scrub_media_fail@.service.in |   76 ++++++++++++++++++++
 14 files changed, 439 insertions(+), 45 deletions(-)
 rename man/man8/{xfs_scrub_all.8 => xfs_scrub_all.8.in} (59%)
 create mode 100644 scrub/xfs_scrub_all_fail.service.in
 create mode 100644 scrub/xfs_scrub_media@.service.in
 create mode 100644 scrub/xfs_scrub_media_fail@.service.in


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (6 preceding siblings ...)
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
@ 2024-07-02  0:51 ` Darrick J. Wong
  2024-07-02  1:08   ` [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object Darrick J. Wong
                     ` (4 more replies)
  2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
                   ` (7 subsequent siblings)
  15 siblings, 5 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:51 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,



If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-all-improve-systemd-handling
---
Commits in this patchset:
 * xfs_scrub_all: encapsulate all the subprocess code in an object
 * xfs_scrub_all: encapsulate all the systemctl code in an object
 * xfs_scrub_all: add CLI option for easier debugging
 * xfs_scrub_all: convert systemctl calls to dbus
 * xfs_scrub_all: implement retry and backoff for dbus calls
---
 debian/control         |    2 
 scrub/xfs_scrub_all.in |  279 ++++++++++++++++++++++++++++++++++++------------
 2 files changed, 209 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (7 preceding siblings ...)
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
@ 2024-07-02  0:51 ` Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode Darrick J. Wong
                     ` (2 more replies)
  2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
                   ` (6 subsequent siblings)
  15 siblings, 3 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:51 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

This final patchset in the online fsck series enables the background
service to optimize filesystems by default.  This is the first step
towards enabling repairs by default.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-optimize-by-default

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-optimize-by-default
---
Commits in this patchset:
 * xfs_scrub: automatic downgrades to dry-run mode in service mode
 * xfs_scrub: add an optimization-only mode
 * debian: enable xfs_scrub_all systemd timer services by default
---
 debian/rules         |    2 +-
 man/man8/xfs_scrub.8 |    6 +++++-
 scrub/Makefile       |    2 +-
 scrub/phase1.c       |   13 +++++++++++++
 scrub/phase4.c       |    6 ++++++
 scrub/repair.c       |   37 ++++++++++++++++++++++++++++++++++++-
 scrub/repair.h       |    2 ++
 scrub/scrub.c        |    4 ++--
 scrub/xfs_scrub.c    |   21 +++++++++++++++++++--
 scrub/xfs_scrub.h    |    1 +
 10 files changed, 86 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (8 preceding siblings ...)
  2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
@ 2024-07-02  0:51 ` Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong
                     ` (2 more replies)
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                   ` (5 subsequent siblings)
  15 siblings, 3 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:51 UTC (permalink / raw)
  To: djwong, cem; +Cc: Chandan Babu R, Dave Chinner, linux-xfs, hch

Hi all,

Prior to introducing parent pointer extended attributes, let's spend
some time cleaning up the attr code and strengthening the validation
that it performs on attrs coming in from the disk.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=improve-attr-validation

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=improve-attr-validation
---
Commits in this patchset:
 * xfs_repair: check free space requirements before allowing upgrades
 * xfs_repair: enforce one namespace bit per extended attribute
 * xfs_repair: check for unknown flags in attr entries
---
 include/libxfs.h         |    1 
 libxfs/libxfs_api_defs.h |    1 
 repair/attr_repair.c     |   30 ++++++++++
 repair/phase2.c          |  134 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 166 insertions(+)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (9 preceding siblings ...)
  2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
@ 2024-07-02  0:52 ` Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong
                     ` (23 more replies)
  2024-07-02  0:52 ` [PATCHSET v13.7 12/16] xfsprogs: scrubbing for " Darrick J. Wong
                   ` (4 subsequent siblings)
  15 siblings, 24 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:52 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

Hi all,

This is the latest parent pointer attributes for xfs.  The goal of this
patch set is to add a parent pointer attribute to each inode.  The
attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature
will enable future optimizations for online scrub, shrink, nfs handles,
verity, or any other feature that could make use of quickly deriving an
inodes path from the mount point.

Directory parent pointers are stored as namespaced extended attributes
of a file.  Because parent pointers are an indivisible tuple of
(dirent_name, parent_ino, parent_gen) we cannot use the usual attr name
lookup functions to find a parent pointer.  This is solvable by
introducing a new lookup mode that checks both the name and the value of
the xattr.

Therefore, introduce this new name-value lookup mode that's gated on the
XFS_ATTR_PARENT namespace.  This requires the introduction of new
opcodes for the extended attribute update log intent items, which
actually means that parent pointers (itself an INCOMPAT feature) does
not depend on the LOGGED_XATTRS log incompat feature bit.

To reduce collisions on the dirent names of parent pointers, introduce a
new attr hash mode that is the dir2 namehash of the dirent name xor'd
with the parent inode number.

At this point, Allison has moved on to other things, so I've merged her
patchset into djwong-dev for merging.

Updates since v12 [djwong]:

Rebase on 6.9-rc and update the online fsck design document.
Redesign the ondisk format to use the name-value lookups to get us back
to the point where the attr is (dirent_name -> parent_ino/gen).

Updates since v11 [djwong]:

Rebase on 6.4-rc and make some tweaks and bugfixes to enable the repair
prototypes.  Merge with djwong-dev and make online repair actually work.

Updates since v10 [djwong]:

Merge in the ondisk format changes to get rid of the diroffset conflicts
with the parent pointer repair code, rebase the entire series with the
attr vlookup changes first, and merge all the other random fixes.

Updates since v9:

Reordered patches 2 and 3 to be 6 and 7

xfs: Add xfs_verify_pptr
   moved parent pointer validators to xfs_parent

xfs: Add parent pointer ioctl
   Extra validation checks for fs id
   added missing release for the inode
   use GFP_KERNEL flags for malloc/realloc
   reworked ioctl to use pptr listenty and flex array

NEW
   xfs: don't remove the attr fork when parent pointers are enabled

NEW
   directory lookups should return diroffsets too

NEW
   xfs: move/add parent pointer validators to xfs_parent

Updates since v8:

xfs: parent pointer attribute creation
   Fix xfs_parent_init to release log assist on alloc fail
   Add slab cache for xfs_parent_defer
   Fix xfs_create to release after unlock
   Add xfs_parent_start and xfs_parent_finish wrappers
   removed unused xfs_parent_name_irec and xfs_init_parent_name_irec

xfs: add parent attributes to link
   Start/finish wrapper updates
   Fix xfs_link to disallow reservationless quotas

xfs: add parent attributes to symlink
   Fix xfs_symlink to release after unlock
   Start/finish wrapper updates

xfs: remove parent pointers in unlink
   Start/finish wrapper updates
   Add missing parent free

xfs: Add parent pointers to rename
   Start/finish wrapper updates
   Fix rename to only grab logged xattr once
   Fix xfs_rename to disallow reservationless quotas
   Fix double unlock on dqattach fail
   Move parent frees to out_release_wip

xfs: Add parent pointers to xfs_cross_rename
   Hoist parent pointers into rename

Questions comments and feedback appreciated!

Thanks all!
Allison

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=pptrs
---
Commits in this patchset:
 * libxfs: create attr log item opcodes and formats for parent pointers
 * xfs_{db,repair}: implement new attr hash value function
 * xfs_logprint: dump new attr log item fields
 * man: document the XFS_IOC_GETPARENTS ioctl
 * libfrog: report parent pointers to userspace
 * libfrog: add parent pointer support code
 * xfs_io: adapt parent command to new parent pointer ioctls
 * xfs_io: Add i, n and f flags to parent command
 * xfs_logprint: decode parent pointers in ATTRI items fully
 * xfs_spaceman: report file paths
 * xfs_scrub: use parent pointers when possible to report file operations
 * xfs_scrub: use parent pointers to report lost file data
 * xfs_db: report parent pointers in version command
 * xfs_db: report parent bit on xattrs
 * xfs_db: report parent pointers embedded in xattrs
 * xfs_db: obfuscate dirent and parent pointer names consistently
 * libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h
 * xfs_db: add a parents command to list the parents of a file
 * xfs_db: make attr_set and attr_remove handle parent pointers
 * xfs_db: add link and unlink expert commands
 * xfs_db: compute hashes of parent pointers
 * libxfs: create new files with attr forks if necessary
 * mkfs: Add parent pointers during protofile creation
 * mkfs: enable formatting with parent pointers
---
 db/attr.c                       |   33 ++
 db/attrset.c                    |  202 +++++++++--
 db/attrshort.c                  |   27 ++
 db/field.c                      |   10 +
 db/field.h                      |    3 
 db/hash.c                       |   44 ++
 db/metadump.c                   |  322 +++++++++++++++++-
 db/namei.c                      |  701 +++++++++++++++++++++++++++++++++++++++
 db/sb.c                         |    2 
 include/handle.h                |    1 
 include/xfs_inode.h             |    4 
 io/parent.c                     |  541 +++++++++++-------------------
 libfrog/Makefile                |    2 
 libfrog/fsgeom.c                |    6 
 libfrog/getparents.c            |  355 ++++++++++++++++++++
 libfrog/getparents.h            |   42 ++
 libfrog/paths.c                 |  168 +++++++++
 libfrog/paths.h                 |   25 +
 libhandle/handle.c              |    7 
 libxfs/defer_item.c             |   52 +++
 libxfs/init.c                   |    4 
 libxfs/libxfs_api_defs.h        |   19 +
 libxfs/util.c                   |   19 +
 logprint/log_redo.c             |  217 +++++++++++-
 logprint/logprint.h             |    6 
 man/man2/ioctl_xfs_getparents.2 |  212 ++++++++++++
 man/man8/xfs_db.8               |   59 +++
 man/man8/xfs_io.8               |   32 +-
 man/man8/xfs_spaceman.8         |    7 
 mkfs/lts_4.19.conf              |    3 
 mkfs/lts_5.10.conf              |    3 
 mkfs/lts_5.15.conf              |    3 
 mkfs/lts_5.4.conf               |    3 
 mkfs/lts_6.1.conf               |    3 
 mkfs/lts_6.6.conf               |    3 
 mkfs/proto.c                    |   62 +++
 mkfs/xfs_mkfs.c                 |   45 ++-
 repair/attr_repair.c            |   24 +
 scrub/common.c                  |   41 ++
 scrub/phase6.c                  |   75 ++++
 spaceman/Makefile               |   16 +
 spaceman/file.c                 |    7 
 spaceman/health.c               |   53 ++-
 spaceman/space.h                |    3 
 44 files changed, 2942 insertions(+), 524 deletions(-)
 create mode 100644 libfrog/getparents.c
 create mode 100644 libfrog/getparents.h
 create mode 100644 man/man2/ioctl_xfs_getparents.2


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v13.7 12/16] xfsprogs: scrubbing for parent pointers
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (10 preceding siblings ...)
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
@ 2024-07-02  0:52 ` Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 1/2] xfs: create a blob array data structure Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers Darrick J. Wong
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:52 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

Hi all,

Teach online fsck to use parent pointers to assist in checking
directories, parent pointers, extended attributes, and link counts.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-pptrs
---
Commits in this patchset:
 * xfs: create a blob array data structure
 * man2: update ioctl_xfs_scrub_metadata.2 for parent pointers
---
 libxfs/Makefile                     |    2 
 libxfs/xfblob.c                     |  147 +++++++++++++++++++++++++++++++++++
 libxfs/xfblob.h                     |   24 ++++++
 libxfs/xfile.c                      |   11 +++
 libxfs/xfile.h                      |    1 
 man/man2/ioctl_xfs_scrub_metadata.2 |   20 ++++-
 6 files changed, 201 insertions(+), 4 deletions(-)
 create mode 100644 libxfs/xfblob.c
 create mode 100644 libxfs/xfblob.h


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v13.7 13/16] xfsprogs: offline repair for parent pointers
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (11 preceding siblings ...)
  2024-07-02  0:52 ` [PATCHSET v13.7 12/16] xfsprogs: scrubbing for " Darrick J. Wong
@ 2024-07-02  0:52 ` Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set Darrick J. Wong
                     ` (11 more replies)
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
                   ` (2 subsequent siblings)
  15 siblings, 12 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:52 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

Hi all,

This series implements online checking and repair for directory parent
pointer metadata.  The checking half is fairly straightforward -- for
each outgoing directory link (forward or backwards), grab the inode at
the other end, and confirm that there's a corresponding link.  If we
can't grab an inode or lock it, we'll save that link for a slower loop
that cycles all the locks, confirms the continued existence of the link,
and rechecks the link if it's actually still there.

Repairs are a bit more involved -- for directories, we walk the entire
filesystem to rebuild the dirents from parent pointer information.
Parent pointer repairs do the same walk but rebuild the pptrs from the
dirent information, but with the added twist that it duplicates all the
xattrs so that it can use the atomic extent swapping code to commit the
repairs atomically.

This introduces an added twist to the xattr repair code -- we use dirent
hooks to detect a colliding update to the pptr data while we're not
holding the ILOCKs; if one is detected, we restart the xattr salvaging
process but this time hold all the ILOCKs until the end of the scan.

For offline repair, the phase6 directory connectivity scan generates an
index of all the expected parent pointers in the filesystem.  Then it
walks each file and compares the parent pointers attached to that file
against the index generated, and resyncs the results as necessary.

The last patch teaches xfs_scrub to report pathnames of files that are
being repaired, when possible.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-pptrs
---
Commits in this patchset:
 * xfs_db: remove some boilerplate from xfs_attr_set
 * xfs_db: actually report errors from libxfs_attr_set
 * xfs_repair: junk parent pointer attributes when filesystem doesn't support them
 * xfs_repair: add parent pointers when messing with /lost+found
 * xfs_repair: junk duplicate hashtab entries when processing sf dirents
 * xfs_repair: build a parent pointer index
 * xfs_repair: move the global dirent name store to a separate object
 * xfs_repair: deduplicate strings stored in string blob
 * xfs_repair: check parent pointers
 * xfs_repair: dump garbage parent pointer attributes
 * xfs_repair: update ondisk parent pointer records
 * xfs_repair: wipe ondisk parent pointers when there are none
---
 db/attrset.c             |   36 +
 libxfs/libxfs_api_defs.h |    6 
 libxfs/xfblob.c          |    9 
 libxfs/xfblob.h          |    2 
 repair/Makefile          |    6 
 repair/attr_repair.c     |   30 +
 repair/listxattr.c       |  271 +++++++++
 repair/listxattr.h       |   15 +
 repair/phase6.c          |  121 ++++
 repair/pptr.c            | 1331 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h            |   17 +
 repair/strblobs.c        |  211 +++++++
 repair/strblobs.h        |   24 +
 13 files changed, 2069 insertions(+), 10 deletions(-)
 create mode 100644 repair/listxattr.c
 create mode 100644 repair/listxattr.h
 create mode 100644 repair/pptr.c
 create mode 100644 repair/pptr.h
 create mode 100644 repair/strblobs.c
 create mode 100644 repair/strblobs.h


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (12 preceding siblings ...)
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
@ 2024-07-02  0:52 ` Darrick J. Wong
  2024-07-02  1:20   ` [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library Darrick J. Wong
                     ` (4 more replies)
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
  2024-07-02  0:53 ` [PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok Darrick J. Wong
  15 siblings, 5 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:52 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

Historically, checking the tree-ness of the directory tree structure has
not been complete.  Cycles of subdirectories break the tree properties,
as do subdirectories with multiple parents.  It's easy enough for DFS to
detect problems as long as one of the participants is reachable from the
root, but this technique cannot find unconnected cycles.

Directory parent pointers change that, because we can discover all of
these problems from a simple walk from a subdirectory towards the root.
For each child we start with, if the walk terminates without reaching
the root, we know the path is disconnected and ought to be attached to
the lost and found.  If we find ourselves, we know this is a cycle and
can delete an incoming edge.  If we find multiple paths to the root, we
know to delete an incoming edge.

Even better, once we've finished walking paths, we've identified the
good ones and know which other path(s) to remove.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-directory-tree

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-directory-tree
---
Commits in this patchset:
 * libfrog: add directory tree structure scrubber to scrub library
 * xfs_spaceman: report directory tree corruption in the health information
 * xfs_scrub: fix erroring out of check_inode_names
 * xfs_scrub: detect and repair directory tree corruptions
 * xfs_scrub: defer phase5 file scans if dirloop fails
---
 libfrog/scrub.c                     |    5 +
 man/man2/ioctl_xfs_bulkstat.2       |    3 
 man/man2/ioctl_xfs_fsbulkstat.2     |    3 
 man/man2/ioctl_xfs_scrub_metadata.2 |   14 ++
 scrub/phase5.c                      |  271 +++++++++++++++++++++++++++++++++--
 scrub/repair.c                      |   13 ++
 scrub/repair.h                      |    2 
 spaceman/health.c                   |    4 +
 8 files changed, 301 insertions(+), 14 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (13 preceding siblings ...)
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
@ 2024-07-02  0:53 ` Darrick J. Wong
  2024-07-02  1:22   ` [PATCH 01/10] man: document vectored scrub mode Darrick J. Wong
                     ` (9 more replies)
  2024-07-02  0:53 ` [PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok Darrick J. Wong
  15 siblings, 10 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:53 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

Create a vectorized version of the metadata scrub and repair ioctl, and
adapt xfs_scrub to use that.  This is an experiment to measure overhead
and to try refactoring xfs_scrub.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=vectorized-scrub
---
Commits in this patchset:
 * man: document vectored scrub mode
 * libfrog: support vectored scrub
 * xfs_io: support vectored scrub
 * xfs_scrub: split the scrub epilogue code into a separate function
 * xfs_scrub: split the repair epilogue code into a separate function
 * xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec
 * xfs_scrub: vectorize scrub calls
 * xfs_scrub: vectorize repair calls
 * xfs_scrub: use scrub barriers to reduce kernel calls
 * xfs_scrub: try spot repairs of metadata items to make scrub progress
---
 io/scrub.c                           |  368 ++++++++++++++++++++++++++++++----
 libfrog/fsgeom.h                     |    6 +
 libfrog/scrub.c                      |  137 +++++++++++++
 libfrog/scrub.h                      |   35 +++
 man/man2/ioctl_xfs_scrubv_metadata.2 |  171 ++++++++++++++++
 man/man8/xfs_io.8                    |   51 +++++
 scrub/phase1.c                       |    2 
 scrub/phase2.c                       |   93 +++++++--
 scrub/phase3.c                       |   84 ++++++--
 scrub/repair.c                       |  355 +++++++++++++++++++++------------
 scrub/scrub.c                        |  360 +++++++++++++++++++++++++--------
 scrub/scrub.h                        |   19 ++
 scrub/scrub_private.h                |   55 +++--
 scrub/xfs_scrub.c                    |    1 
 14 files changed, 1431 insertions(+), 306 deletions(-)
 create mode 100644 man/man2/ioctl_xfs_scrubv_metadata.2


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok
  2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
                   ` (14 preceding siblings ...)
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
@ 2024-07-02  0:53 ` Darrick J. Wong
  2024-07-02  1:24   ` [PATCH 1/1] xfs_repair: allow symlinks with short remote targets Darrick J. Wong
  15 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:53 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

Hi all,

Fix some incorrect validation problems in xfs_repair where it would flag
a symlink with a target that is small enough to fit inside the inode but
is in extents format anyway.  The kernel has written out filesystems
this way for a long time and used to be ok with reading such things, so
repair must not flag that as corruption.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-symlink-fixes
---
Commits in this patchset:
 * xfs_repair: allow symlinks with short remote targets
---
 repair/dinode.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCH 01/12] man: document the exchange-range ioctl
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
@ 2024-07-02  0:53   ` Darrick J. Wong
  2024-07-02  5:08     ` Christoph Hellwig
  2024-07-02  0:54   ` [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE Darrick J. Wong
                     ` (10 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:53 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Document the new file data exchange ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_exchange_range.2 |  278 +++++++++++++++++++++++++++++++++++
 1 file changed, 278 insertions(+)
 create mode 100644 man/man2/ioctl_xfs_exchange_range.2


diff --git a/man/man2/ioctl_xfs_exchange_range.2 b/man/man2/ioctl_xfs_exchange_range.2
new file mode 100644
index 000000000000..450db9c25c12
--- /dev/null
+++ b/man/man2/ioctl_xfs_exchange_range.2
@@ -0,0 +1,278 @@
+.\" Copyright (c) 2020-2024 Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-XFS-EXCHANGE-RANGE 2  2024-02-10 "XFS"
+.SH NAME
+ioctl_xfs_exchange_range \- exchange the contents of parts of two files
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <xfs/xfs_fs.h>
+.PP
+.BI "int ioctl(int " file2_fd ", XFS_IOC_EXCHANGE_RANGE, struct xfs_exchange_range *" arg );
+.SH DESCRIPTION
+Given a range of bytes in a first file
+.B file1_fd
+and a second range of bytes in a second file
+.BR file2_fd ,
+this
+.BR ioctl (2)
+exchanges the contents of the two ranges.
+.PP
+Exchanges are atomic with regards to concurrent file operations.
+Implementations must guarantee that readers see either the old contents or the
+new contents in their entirety, even if the system fails.
+.PP
+The system call parameters are conveyed in structures of the following form:
+.PP
+.in +4n
+.EX
+struct xfs_exchange_range {
+    __s32    file1_fd;
+    __u32    pad;
+    __u64    file1_offset;
+    __u64    file2_offset;
+    __u64    length;
+    __u64    flags;
+};
+.EE
+.in
+.PP
+The field
+.I pad
+must be zero.
+.PP
+The fields
+.IR file1_fd ", " file1_offset ", and " length
+define the first range of bytes to be exchanged.
+.PP
+The fields
+.IR file2_fd ", " file2_offset ", and " length
+define the second range of bytes to be exchanged.
+.PP
+Both files must be from the same filesystem mount.
+If the two file descriptors represent the same file, the byte ranges must not
+overlap.
+Most disk-based filesystems require that the starts of both ranges must be
+aligned to the file block size.
+If this is the case, the ends of the ranges must also be so aligned unless the
+.B XFS_EXCHANGE_RANGE_TO_EOF
+flag is set.
+
+.PP
+The field
+.I flags
+control the behavior of the exchange operation.
+.RS 0.4i
+.TP
+.B XFS_EXCHANGE_RANGE_TO_EOF
+Ignore the
+.I length
+parameter.
+All bytes in
+.I file1_fd
+from
+.I file1_offset
+to EOF are moved to
+.IR file2_fd ,
+and file2's size is set to
+.RI ( file2_offset "+(" file1_length - file1_offset )).
+Meanwhile, all bytes in file2 from
+.I file2_offset
+to EOF are moved to file1 and file1's size is set to
+.RI ( file1_offset "+(" file2_length - file2_offset )).
+.TP
+.B XFS_EXCHANGE_RANGE_DSYNC
+Ensure that all modified in-core data in both file ranges and all metadata
+updates pertaining to the exchange operation are flushed to persistent storage
+before the call returns.
+Opening either file descriptor with
+.BR O_SYNC " or " O_DSYNC
+will have the same effect.
+.TP
+.B XFS_EXCHANGE_RANGE_FILE1_WRITTEN
+Only exchange sub-ranges of
+.I file1_fd
+that are known to contain data written by application software.
+Each sub-range may be expanded (both upwards and downwards) to align with the
+file allocation unit.
+For files on the data device, this is one filesystem block.
+For files on the realtime device, this is the realtime extent size.
+This facility can be used to implement fast atomic scatter-gather writes of any
+complexity for software-defined storage targets if all writes are aligned to
+the file allocation unit.
+.TP
+.B XFS_EXCHANGE_RANGE_DRY_RUN
+Check the parameters and the feasibility of the operation, but do not change
+anything.
+.RE
+.PP
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EBADF
+.IR file1_fd
+is not open for reading and writing or is open for append-only writes; or
+.IR file2_fd
+is not open for reading and writing or is open for append-only writes.
+.TP
+.B EINVAL
+The parameters are not correct for these files.
+This error can also appear if either file descriptor represents
+a device, FIFO, or socket.
+Disk filesystems generally require the offset and length arguments
+to be aligned to the fundamental block sizes of both files.
+.TP
+.B EIO
+An I/O error occurred.
+.TP
+.B EISDIR
+One of the files is a directory.
+.TP
+.B ENOMEM
+The kernel was unable to allocate sufficient memory to perform the
+operation.
+.TP
+.B ENOSPC
+There is not enough free space in the filesystem exchange the contents safely.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support exchanging bytes between the two
+files.
+.TP
+.B EPERM
+.IR file1_fd " or " file2_fd
+are immutable.
+.TP
+.B ETXTBSY
+One of the files is a swap file.
+.TP
+.B EUCLEAN
+The filesystem is corrupt.
+.TP
+.B EXDEV
+.IR file1_fd " and " file2_fd
+are not on the same mounted filesystem.
+.SH CONFORMING TO
+This API is XFS-specific.
+.SH USE CASES
+.PP
+Several use cases are imagined for this system call.
+In all cases, application software must coordinate updates to the file
+because the exchange is performed unconditionally.
+.PP
+The first is a data storage program that wants to commit non-contiguous updates
+to a file atomically and coordinates write access to that file.
+This can be done by creating a temporary file, calling
+.BR FICLONE (2)
+to share the contents, and staging the updates into the temporary file.
+The
+.B FULL_FILES
+flag is recommended for this purpose.
+The temporary file can be deleted or punched out afterwards.
+.PP
+An example program might look like this:
+.PP
+.in +4n
+.EX
+int fd = open("/some/file", O_RDWR);
+int temp_fd = open("/some", O_TMPFILE | O_RDWR);
+
+ioctl(temp_fd, FICLONE, fd);
+
+/* append 1MB of records */
+lseek(temp_fd, 0, SEEK_END);
+write(temp_fd, data1, 1000000);
+
+/* update record index */
+pwrite(temp_fd, data1, 600, 98765);
+pwrite(temp_fd, data2, 320, 54321);
+pwrite(temp_fd, data2, 15, 0);
+
+/* commit the entire update */
+struct xfs_exchange_range args = {
+    .file1_fd = temp_fd,
+    .flags = XFS_EXCHANGE_RANGE_TO_EOF,
+};
+
+ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);
+.EE
+.in
+.PP
+The second is a software-defined storage host (e.g. a disk jukebox) which
+implements an atomic scatter-gather write command.
+Provided the exported disk's logical block size matches the file's allocation
+unit size, this can be done by creating a temporary file and writing the data
+at the appropriate offsets.
+It is recommended that the temporary file be truncated to the size of the
+regular file before any writes are staged to the temporary file to avoid issues
+with zeroing during EOF extension.
+Use this call with the
+.B FILE1_WRITTEN
+flag to exchange only the file allocation units involved in the emulated
+device's write command.
+The temporary file should be truncated or punched out completely before being
+reused to stage another write.
+.PP
+An example program might look like this:
+.PP
+.in +4n
+.EX
+int fd = open("/some/file", O_RDWR);
+int temp_fd = open("/some", O_TMPFILE | O_RDWR);
+struct stat sb;
+int blksz;
+
+fstat(fd, &sb);
+blksz = sb.st_blksize;
+
+/* land scatter gather writes between 100fsb and 500fsb */
+pwrite(temp_fd, data1, blksz * 2, blksz * 100);
+pwrite(temp_fd, data2, blksz * 20, blksz * 480);
+pwrite(temp_fd, data3, blksz * 7, blksz * 257);
+
+/* commit the entire update */
+struct xfs_exchange_range args = {
+    .file1_fd = temp_fd,
+    .file1_offset = blksz * 100,
+    .file2_offset = blksz * 100,
+    .length       = blksz * 400,
+    .flags        = XFS_EXCHANGE_RANGE_FILE1_WRITTEN |
+                    XFS_EXCHANGE_RANGE_FILE1_DSYNC,
+};
+
+ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);
+.EE
+.in
+.B
+.SH NOTES
+.PP
+Some filesystems may limit the amount of data or the number of extents that can
+be exchanged in a single call.
+.SH SEE ALSO
+.BR ioctl (2)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
  2024-07-02  0:53   ` [PATCH 01/12] man: document the exchange-range ioctl Darrick J. Wong
@ 2024-07-02  0:54   ` Darrick J. Wong
  2024-07-02  5:08     ` Christoph Hellwig
  2024-07-02  0:54   ` [PATCH 03/12] libhandle: add support for bulkstat v5 Darrick J. Wong
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:54 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Document this new feature flag in the fs geometry ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_fsgeometry.2 |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
index f59a6e8a6a20..54fd89390883 100644
--- a/man/man2/ioctl_xfs_fsgeometry.2
+++ b/man/man2/ioctl_xfs_fsgeometry.2
@@ -211,6 +211,9 @@ Filesystem stores reverse mappings of blocks to owners.
 .TP
 .B XFS_FSOP_GEOM_FLAGS_REFLINK
 Filesystem supports sharing blocks between files.
+.TP
+.B XFS_FSOP_GEOM_FLAGS_EXCHRANGE
+Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
 .RE
 .SH XFS METADATA HEALTH REPORTING
 .PP


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 03/12] libhandle: add support for bulkstat v5
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
  2024-07-02  0:53   ` [PATCH 01/12] man: document the exchange-range ioctl Darrick J. Wong
  2024-07-02  0:54   ` [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE Darrick J. Wong
@ 2024-07-02  0:54   ` Darrick J. Wong
  2024-07-02  5:09     ` Christoph Hellwig
  2024-07-02  0:54   ` [PATCH 04/12] libfrog: add support for exchange range ioctl family Darrick J. Wong
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:54 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add support to libhandle for generating file handles with bulkstat v5
structures.  xfs_fsr will need this to be able to interface with the new
vfs range swap ioctl, and other client programs will probably want this
over time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/jdm.h   |   24 +++++++++++
 libhandle/jdm.c |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)


diff --git a/include/jdm.h b/include/jdm.h
index c57fcae7fcac..445737a6b5f8 100644
--- a/include/jdm.h
+++ b/include/jdm.h
@@ -11,6 +11,7 @@ typedef void	jdm_fshandle_t;		/* filesystem handle */
 typedef void	jdm_filehandle_t;	/* filehandle */
 
 struct xfs_bstat;
+struct xfs_bulkstat;
 struct attrlist_cursor;
 struct parent;
 
@@ -23,6 +24,9 @@ jdm_new_filehandle( jdm_filehandle_t **handlep,	/* new filehandle */
 		    jdm_fshandle_t *fshandlep,	/* filesystem filehandle */
 		    struct xfs_bstat *sp);	/* bulkstat info */
 
+extern void jdm_new_filehandle_v5(jdm_filehandle_t **handlep, size_t *hlen,
+		jdm_fshandle_t *fshandlep, struct xfs_bulkstat *sp);
+
 extern void
 jdm_delete_filehandle( jdm_filehandle_t *handlep,/* filehandle to delete */
 		       size_t hlen);		/* filehandle size */
@@ -32,35 +36,55 @@ jdm_open( jdm_fshandle_t *fshandlep,
 	  struct xfs_bstat *sp,
 	  intgen_t oflags);
 
+extern intgen_t jdm_open_v5(jdm_fshandle_t *fshandlep, struct xfs_bulkstat *sp,
+		intgen_t oflags);
+
 extern intgen_t
 jdm_readlink( jdm_fshandle_t *fshandlep,
 	      struct xfs_bstat *sp,
 	      char *bufp,
 	      size_t bufsz);
 
+extern intgen_t jdm_readlink_v5(jdm_fshandle_t *fshandlep,
+		struct xfs_bulkstat *sp, char *bufp, size_t bufsz);
+
 extern intgen_t
 jdm_attr_multi(	jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
 		char *bufp, int rtrvcnt, int flags);
 
+extern intgen_t jdm_attr_multi_v5(jdm_fshandle_t *fshp,
+		struct xfs_bulkstat *statp, char *bufp, int rtrvcnt,
+		int flags);
+
 extern intgen_t
 jdm_attr_list(	jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
 		char *bufp, size_t bufsz, int flags,
 		struct attrlist_cursor *cursor);
 
+extern intgen_t jdm_attr_list_v5(jdm_fshandle_t *fshp,
+		struct xfs_bulkstat *statp, char *bufp, size_t bufsz, int
+		flags, struct attrlist_cursor *cursor);
+
 extern int
 jdm_parents( jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
 		struct parent *bufp, size_t bufsz,
 		unsigned int *count);
 
+extern int jdm_parents_v5(jdm_fshandle_t *fshp, struct xfs_bulkstat *statp,
+		struct parent *bufp, size_t bufsz, unsigned int *count);
+
 extern int
 jdm_parentpaths( jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
 		struct parent *bufp, size_t bufsz,
 		unsigned int *count);
 
+extern int jdm_parentpaths_v5(jdm_fshandle_t *fshp, struct xfs_bulkstat *statp,
+		struct parent *bufp, size_t bufsz, unsigned int *count);
+
 /* macro for determining the size of a structure member */
 #define sizeofmember( t, m )	sizeof( ( ( t * )0 )->m )
 
diff --git a/libhandle/jdm.c b/libhandle/jdm.c
index 07b0c60985ee..e21aff2b2c19 100644
--- a/libhandle/jdm.c
+++ b/libhandle/jdm.c
@@ -41,6 +41,19 @@ jdm_fill_filehandle( filehandle_t *handlep,
 	handlep->fh_ino = statp->bs_ino;
 }
 
+static void
+jdm_fill_filehandle_v5(
+	struct filehandle	*handlep,
+	struct fshandle		*fshandlep,
+	struct xfs_bulkstat	*statp)
+{
+	handlep->fh_fshandle = *fshandlep;
+	handlep->fh_sz_following = FILEHANDLE_SZ_FOLLOWING;
+	memset(handlep->fh_pad, 0, FILEHANDLE_SZ_PAD);
+	handlep->fh_gen = statp->bs_gen;
+	handlep->fh_ino = statp->bs_ino;
+}
+
 jdm_fshandle_t *
 jdm_getfshandle( char *mntpnt )
 {
@@ -90,6 +103,22 @@ jdm_new_filehandle( jdm_filehandle_t **handlep,
 		jdm_fill_filehandle(*handlep, (fshandle_t *) fshandlep, statp);
 }
 
+void
+jdm_new_filehandle_v5(
+	jdm_filehandle_t	**handlep,
+	size_t			*hlen,
+	jdm_fshandle_t		*fshandlep,
+	struct xfs_bulkstat	*statp)
+{
+	/* allocate and fill filehandle */
+	*hlen = sizeof(filehandle_t);
+	*handlep = (filehandle_t *) malloc(*hlen);
+	if (!*handlep)
+		return;
+
+	jdm_fill_filehandle_v5(*handlep, (struct fshandle *)fshandlep, statp);
+}
+
 /* ARGSUSED */
 void
 jdm_delete_filehandle( jdm_filehandle_t *handlep, size_t hlen )
@@ -111,6 +140,19 @@ jdm_open( jdm_fshandle_t *fshp, struct xfs_bstat *statp, intgen_t oflags )
 	return fd;
 }
 
+intgen_t
+jdm_open_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	intgen_t		oflags)
+{
+	struct fshandle		*fshandlep = (struct fshandle *)fshp;
+	struct filehandle	filehandle;
+
+	jdm_fill_filehandle_v5(&filehandle, fshandlep, statp);
+	return open_by_fshandle(&filehandle, sizeof(filehandle), oflags);
+}
+
 intgen_t
 jdm_readlink( jdm_fshandle_t *fshp,
 	      struct xfs_bstat *statp,
@@ -128,6 +170,20 @@ jdm_readlink( jdm_fshandle_t *fshp,
 	return rval;
 }
 
+intgen_t
+jdm_readlink_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	char			*bufp,
+	size_t			bufsz)
+{
+	struct fshandle		*fshandlep = (struct fshandle *)fshp;
+	struct filehandle	filehandle;
+
+	jdm_fill_filehandle_v5(&filehandle, fshandlep, statp);
+	return readlink_by_handle(&filehandle, sizeof(filehandle), bufp, bufsz);
+}
+
 int
 jdm_attr_multi(	jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
@@ -145,6 +201,22 @@ jdm_attr_multi(	jdm_fshandle_t *fshp,
 	return rval;
 }
 
+int
+jdm_attr_multi_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	char			*bufp,
+	int			rtrvcnt,
+	int			flags)
+{
+	struct fshandle		*fshandlep = (struct fshandle *)fshp;
+	struct filehandle	filehandle;
+
+	jdm_fill_filehandle_v5(&filehandle, fshandlep, statp);
+	return attr_multi_by_handle(&filehandle, sizeof(filehandle), bufp,
+			rtrvcnt, flags);
+}
+
 int
 jdm_attr_list(	jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
@@ -166,6 +238,27 @@ jdm_attr_list(	jdm_fshandle_t *fshp,
 	return rval;
 }
 
+int
+jdm_attr_list_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	char			*bufp,
+	size_t			bufsz,
+	int			flags,
+	struct attrlist_cursor	*cursor)
+{
+	struct fshandle		*fshandlep = (struct fshandle *)fshp;
+	struct filehandle	filehandle;
+
+	/* prevent needless EINVAL from the kernel */
+	if (bufsz > XFS_XATTR_LIST_MAX)
+		bufsz = XFS_XATTR_LIST_MAX;
+
+	jdm_fill_filehandle_v5(&filehandle, fshandlep, statp);
+	return attr_list_by_handle(&filehandle, sizeof(filehandle), bufp,
+			bufsz, flags, cursor);
+}
+
 int
 jdm_parents( jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
@@ -176,6 +269,18 @@ jdm_parents( jdm_fshandle_t *fshp,
 	return -1;
 }
 
+int
+jdm_parents_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	struct parent		*bufp,
+	size_t			bufsz,
+	unsigned int		*count)
+{
+	errno = EOPNOTSUPP;
+	return -1;
+}
+
 int
 jdm_parentpaths( jdm_fshandle_t *fshp,
 		struct xfs_bstat *statp,
@@ -185,3 +290,15 @@ jdm_parentpaths( jdm_fshandle_t *fshp,
 	errno = EOPNOTSUPP;
 	return -1;
 }
+
+int
+jdm_parentpaths_v5(
+	jdm_fshandle_t		*fshp,
+	struct xfs_bulkstat	*statp,
+	struct parent		*bufp,
+	size_t			bufsz,
+	unsigned int		*count)
+{
+	errno = EOPNOTSUPP;
+	return -1;
+}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 04/12] libfrog: add support for exchange range ioctl family
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  0:54   ` [PATCH 03/12] libhandle: add support for bulkstat v5 Darrick J. Wong
@ 2024-07-02  0:54   ` Darrick J. Wong
  2024-07-02  5:09     ` Christoph Hellwig
  2024-07-02  0:54   ` [PATCH 05/12] xfs_db: advertise exchange-range in the version command Darrick J. Wong
                     ` (7 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:54 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add some library code to support the new file range exchange and commit
ioctls.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/Makefile        |    2 ++
 libfrog/file_exchange.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/file_exchange.h |   15 ++++++++++++++
 3 files changed, 69 insertions(+)
 create mode 100644 libfrog/file_exchange.c
 create mode 100644 libfrog/file_exchange.h


diff --git a/libfrog/Makefile b/libfrog/Makefile
index cafee073fe3c..53e3c3492377 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -18,6 +18,7 @@ bitmap.c \
 bulkstat.c \
 convert.c \
 crc32.c \
+file_exchange.c \
 fsgeom.c \
 list_sort.c \
 linux.c \
@@ -42,6 +43,7 @@ crc32defs.h \
 crc32table.h \
 dahashselftest.h \
 div64.h \
+file_exchange.h \
 fsgeom.h \
 logging.h \
 paths.h \
diff --git a/libfrog/file_exchange.c b/libfrog/file_exchange.c
new file mode 100644
index 000000000000..29fdc17e598c
--- /dev/null
+++ b/libfrog/file_exchange.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2020-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+#include <string.h>
+#include "xfs.h"
+#include "fsgeom.h"
+#include "bulkstat.h"
+#include "libfrog/file_exchange.h"
+
+/* Prepare for a file contents exchange. */
+void
+xfrog_exchangerange_prep(
+	struct xfs_exchange_range	*fxr,
+	off_t				file2_offset,
+	int				file1_fd,
+	off_t				file1_offset,
+	uint64_t			length)
+{
+	memset(fxr, 0, sizeof(*fxr));
+
+	fxr->file1_fd			= file1_fd;
+	fxr->file1_offset		= file1_offset;
+	fxr->length			= length;
+	fxr->file2_offset		= file2_offset;
+}
+
+/*
+ * Execute an exchange-range operation.  Returns 0 for success or a negative
+ * errno.
+ */
+int
+xfrog_exchangerange(
+	int				file2_fd,
+	struct xfs_exchange_range	*fxr,
+	uint64_t			flags)
+{
+	int				ret;
+
+	fxr->flags = flags;
+
+	ret = ioctl(file2_fd, XFS_IOC_EXCHANGE_RANGE, fxr);
+	if (ret)
+		return -errno;
+
+	return 0;
+}
diff --git a/libfrog/file_exchange.h b/libfrog/file_exchange.h
new file mode 100644
index 000000000000..b6f6f9f698a8
--- /dev/null
+++ b/libfrog/file_exchange.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2020-2024 Oracle.  All rights reserved.
+ * All Rights Reserved.
+ */
+#ifndef __LIBFROG_FILE_EXCHANGE_H__
+#define __LIBFROG_FILE_EXCHANGE_H__
+
+void xfrog_exchangerange_prep(struct xfs_exchange_range *fxr,
+		off_t file2_offset, int file1_fd,
+		off_t file1_offset, uint64_t length);
+int xfrog_exchangerange(int file2_fd, struct xfs_exchange_range *fxr,
+		uint64_t flags);
+
+#endif	/* __LIBFROG_FILE_EXCHANGE_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 05/12] xfs_db: advertise exchange-range in the version command
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  0:54   ` [PATCH 04/12] libfrog: add support for exchange range ioctl family Darrick J. Wong
@ 2024-07-02  0:54   ` Darrick J. Wong
  2024-07-02  5:10     ` Christoph Hellwig
  2024-07-02  0:55   ` [PATCH 06/12] xfs_logprint: support dumping exchmaps log items Darrick J. Wong
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:54 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Amend the version command to advertise exchange-range support.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/sb.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/db/sb.c b/db/sb.c
index b48767f47fe9..c39011634198 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -706,6 +706,8 @@ version_string(
 		strcat(s, ",NEEDSREPAIR");
 	if (xfs_has_large_extent_counts(mp))
 		strcat(s, ",NREXT64");
+	if (xfs_has_exchange_range(mp))
+		strcat(s, ",EXCHANGE");
 	return s;
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 06/12] xfs_logprint: support dumping exchmaps log items
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  0:54   ` [PATCH 05/12] xfs_db: advertise exchange-range in the version command Darrick J. Wong
@ 2024-07-02  0:55   ` Darrick J. Wong
  2024-07-02  5:11     ` Christoph Hellwig
  2024-07-02  0:55   ` [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls Darrick J. Wong
                     ` (5 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:55 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Support dumping exchmaps log items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 logprint/log_misc.c      |   11 ++++
 logprint/log_print_all.c |   12 ++++
 logprint/log_redo.c      |  128 ++++++++++++++++++++++++++++++++++++++++++++++
 logprint/logprint.h      |    6 ++
 4 files changed, 157 insertions(+)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 9d38113402f4..8e86ac347fa9 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -1052,6 +1052,17 @@ xlog_print_record(
 					be32_to_cpu(op_head->oh_len));
 			break;
 		    }
+		    case XFS_LI_XMI: {
+			skip = xlog_print_trans_xmi(&ptr,
+					be32_to_cpu(op_head->oh_len),
+					continued);
+			break;
+		    }
+		    case XFS_LI_XMD: {
+			skip = xlog_print_trans_xmd(&ptr,
+					be32_to_cpu(op_head->oh_len));
+			break;
+		    }
 		    case XFS_LI_QUOTAOFF: {
 			skip = xlog_print_trans_qoff(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index f436e10917d8..a4a5e41f17fa 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -440,6 +440,12 @@ xlog_recover_print_logitem(
 	case XFS_LI_BUI:
 		xlog_recover_print_bui(item);
 		break;
+	case XFS_LI_XMD:
+		xlog_recover_print_xmd(item);
+		break;
+	case XFS_LI_XMI:
+		xlog_recover_print_xmi(item);
+		break;
 	case XFS_LI_DQUOT:
 		xlog_recover_print_dquot(item);
 		break;
@@ -498,6 +504,12 @@ xlog_recover_print_item(
 	case XFS_LI_BUI:
 		printf("BUI");
 		break;
+	case XFS_LI_XMD:
+		printf("XMD");
+		break;
+	case XFS_LI_XMI:
+		printf("XMI");
+		break;
 	case XFS_LI_DQUOT:
 		printf("DQ ");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index edf7e0fbfa90..ca6dadd7551a 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -847,3 +847,131 @@ xlog_recover_print_attrd(
 		f->alfd_size,
 		(unsigned long long)f->alfd_alf_id);
 }
+
+/* Atomic Extent Swapping Items */
+
+static int
+xfs_xmi_copy_format(
+	struct xfs_xmi_log_format *xmi,
+	uint			  len,
+	struct xfs_xmi_log_format *dst_fmt,
+	int			  continued)
+{
+	if (len == sizeof(struct xfs_xmi_log_format) || continued) {
+		memcpy(dst_fmt, xmi, len);
+		return 0;
+	}
+	fprintf(stderr, _("%s: bad size of XMI format: %u; expected %zu\n"),
+		progname, len, sizeof(struct xfs_xmi_log_format));
+	return 1;
+}
+
+int
+xlog_print_trans_xmi(
+	char			**ptr,
+	uint			src_len,
+	int			continued)
+{
+	struct xfs_xmi_log_format *src_f, *f = NULL;
+	int			error = 0;
+
+	src_f = malloc(src_len);
+	if (src_f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	memcpy(src_f, *ptr, src_len);
+	*ptr += src_len;
+
+	/* convert to native format */
+	if (continued && src_len < sizeof(struct xfs_xmi_log_format)) {
+		printf(_("XMI: Not enough data to decode further\n"));
+		error = 1;
+		goto error;
+	}
+
+	f = malloc(sizeof(struct xfs_xmi_log_format));
+	if (f == NULL) {
+		fprintf(stderr, _("%s: %s: malloc failed\n"),
+			progname, __func__);
+		exit(1);
+	}
+	if (xfs_xmi_copy_format(src_f, src_len, f, continued)) {
+		error = 1;
+		goto error;
+	}
+
+	printf(_("XMI:  #regs: %d	num_extents: 1  id: 0x%llx\n"),
+		f->xmi_size, (unsigned long long)f->xmi_id);
+
+	if (continued) {
+		printf(_("XMI extent data skipped (CONTINUE set, no space)\n"));
+		goto error;
+	}
+
+	printf("(ino1: 0x%llx, igen1: 0x%x, ino2: 0x%llx, igen2: 0x%x, off1: %lld, off2: %lld, len: %lld, flags: 0x%llx)\n",
+		(unsigned long long)f->xmi_inode1,
+		(unsigned int)f->xmi_igen1,
+		(unsigned long long)f->xmi_inode2,
+		(unsigned int)f->xmi_igen2,
+		(unsigned long long)f->xmi_startoff1,
+		(unsigned long long)f->xmi_startoff2,
+		(unsigned long long)f->xmi_blockcount,
+		(unsigned long long)f->xmi_flags);
+error:
+	free(src_f);
+	free(f);
+	return error;
+}
+
+void
+xlog_recover_print_xmi(
+	struct xlog_recover_item	*item)
+{
+	char				*src_f;
+	uint				src_len;
+
+	src_f = item->ri_buf[0].i_addr;
+	src_len = item->ri_buf[0].i_len;
+
+	xlog_print_trans_xmi(&src_f, src_len, 0);
+}
+
+int
+xlog_print_trans_xmd(
+	char				**ptr,
+	uint				len)
+{
+	struct xfs_xmd_log_format	*f;
+	struct xfs_xmd_log_format	lbuf;
+
+	/* size without extents at end */
+	uint core_size = sizeof(struct xfs_xmd_log_format);
+
+	memcpy(&lbuf, *ptr, min(core_size, len));
+	f = &lbuf;
+	*ptr += len;
+	if (len >= core_size) {
+		printf(_("XMD:  #regs: %d	                 id: 0x%llx\n"),
+			f->xmd_size,
+			(unsigned long long)f->xmd_xmi_id);
+
+		/* don't print extents as they are not used */
+
+		return 0;
+	} else {
+		printf(_("XMD: Not enough data to decode further\n"));
+		return 1;
+	}
+}
+
+void
+xlog_recover_print_xmd(
+	struct xlog_recover_item	*item)
+{
+	char				*f;
+
+	f = item->ri_buf[0].i_addr;
+	xlog_print_trans_xmd(&f, sizeof(struct xfs_xmd_log_format));
+}
diff --git a/logprint/logprint.h b/logprint/logprint.h
index b4479c240d94..8867b110ecaf 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -65,4 +65,10 @@ extern void xlog_recover_print_attri(struct xlog_recover_item *item);
 extern int xlog_print_trans_attrd(char **ptr, uint len);
 extern void xlog_recover_print_attrd(struct xlog_recover_item *item);
 extern void xlog_print_op_header(xlog_op_header_t *op_head, int i, char **ptr);
+
+extern int xlog_print_trans_xmi(char **ptr, uint src_len, int continued);
+extern void xlog_recover_print_xmi(struct xlog_recover_item *item);
+extern int xlog_print_trans_xmd(char **ptr, uint len);
+extern void xlog_recover_print_xmd(struct xlog_recover_item *item);
+
 #endif	/* LOGPRINT_H */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  0:55   ` [PATCH 06/12] xfs_logprint: support dumping exchmaps log items Darrick J. Wong
@ 2024-07-02  0:55   ` Darrick J. Wong
  2024-07-02  5:13     ` Christoph Hellwig
  2024-07-02  0:55   ` [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations Darrick J. Wong
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:55 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Now that libhandle can, er, handle bulkstat information coming from the
v5 bulkstat ioctl, port xfs_fsr to use the new interfaces instead of
repeatedly converting things back and forth.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fsr/xfs_fsr.c    |  148 ++++++++++++++++++++++++++++++------------------------
 libfrog/fsgeom.c |   45 ++++++++++++----
 libfrog/fsgeom.h |    1 
 3 files changed, 117 insertions(+), 77 deletions(-)


diff --git a/fsr/xfs_fsr.c b/fsr/xfs_fsr.c
index 06cc0552f9f1..2989e4c989bf 100644
--- a/fsr/xfs_fsr.c
+++ b/fsr/xfs_fsr.c
@@ -65,10 +65,10 @@ static int	pagesize;
 
 void usage(int ret);
 static int  fsrfile(char *fname, xfs_ino_t ino);
-static int  fsrfile_common( char *fname, char *tname, char *mnt,
-                            int fd, struct xfs_bstat *statp);
-static int  packfile(char *fname, char *tname, int fd,
-                     struct xfs_bstat *statp, struct fsxattr *fsxp);
+static int  fsrfile_common(char *fname, char *tname, char *mnt,
+			   struct xfs_fd *file_fd, struct xfs_bulkstat *statp);
+static int  packfile(char *fname, char *tname, struct xfs_fd *file_fd,
+                     struct xfs_bulkstat *statp, struct fsxattr *fsxp);
 static void fsrdir(char *dirname);
 static int  fsrfs(char *mntdir, xfs_ino_t ino, int targetrange);
 static void initallfs(char *mtab);
@@ -80,7 +80,7 @@ int xfs_getrt(int fd, struct statvfs *sfbp);
 char * gettmpname(char *fname);
 char * getparent(char *fname);
 int fsrprintf(const char *fmt, ...);
-int read_fd_bmap(int, struct xfs_bstat *, int *);
+int read_fd_bmap(int, struct xfs_bulkstat *, int *);
 static void tmp_init(char *mnt);
 static char * tmp_next(char *mnt);
 static void tmp_close(char *mnt);
@@ -102,6 +102,26 @@ static int	nfrags = 0;	/* Debug option: Coerse into specific number
 				 * of extents */
 static int	openopts = O_CREAT|O_EXCL|O_RDWR|O_DIRECT;
 
+/*
+ * Open a file on an XFS filesystem from file handle components and fs geometry
+ * data.  Returns zero or a negative error code.
+ */
+static int
+open_handle(
+	struct xfs_fd		*xfd,
+	jdm_fshandle_t		*fshandle,
+	struct xfs_bulkstat	*bulkstat,
+	struct xfs_fsop_geom	*fsgeom,
+	int			flags)
+{
+	xfd->fd = jdm_open_v5(fshandle, bulkstat, flags);
+	if (xfd->fd < 0)
+		return errno;
+
+	xfd_install_geometry(xfd, fsgeom);
+	return 0;
+}
+
 static int
 xfs_swapext(int fd, xfs_swapext_t *sx)
 {
@@ -627,7 +647,6 @@ static int
 fsrfs(char *mntdir, xfs_ino_t startino, int targetrange)
 {
 	struct xfs_fd	fsxfd = XFS_FD_INIT_EMPTY;
-	int	fd;
 	int	count = 0;
 	int	ret;
 	char	fname[64];
@@ -665,10 +684,10 @@ fsrfs(char *mntdir, xfs_ino_t startino, int targetrange)
 	}
 
 	while ((ret = -xfrog_bulkstat(&fsxfd, breq) == 0)) {
-		struct xfs_bstat	bs1;
 		struct xfs_bulkstat	*buf = breq->bulkstat;
 		struct xfs_bulkstat	*p;
 		struct xfs_bulkstat	*endp;
+		struct xfs_fd		file_fd = XFS_FD_INIT_EMPTY;
 		uint32_t		buflenout = breq->hdr.ocount;
 
 		if (buflenout == 0)
@@ -685,15 +704,9 @@ fsrfs(char *mntdir, xfs_ino_t startino, int targetrange)
 			     (p->bs_extents64 < 2))
 				continue;
 
-			ret = -xfrog_bulkstat_v5_to_v1(&fsxfd, &bs1, p);
+			ret = open_handle(&file_fd, fshandlep, p,
+					&fsxfd.fsgeom, O_RDWR | O_DIRECT);
 			if (ret) {
-				fsrprintf(_("bstat conversion error: %s\n"),
-						strerror(ret));
-				continue;
-			}
-
-			fd = jdm_open(fshandlep, &bs1, O_RDWR | O_DIRECT);
-			if (fd < 0) {
 				/* This probably means the file was
 				 * removed while in progress of handling
 				 * it.  Just quietly ignore this file.
@@ -710,11 +723,12 @@ fsrfs(char *mntdir, xfs_ino_t startino, int targetrange)
 			/* Get a tmp file name */
 			tname = tmp_next(mntdir);
 
-			ret = fsrfile_common(fname, tname, mntdir, fd, &bs1);
+			ret = fsrfile_common(fname, tname, mntdir, &file_fd,
+					p);
 
 			leftoffino = p->bs_ino;
 
-			close(fd);
+			xfd_close(&file_fd);
 
 			if (ret == 0) {
 				if (--count <= 0)
@@ -762,9 +776,8 @@ fsrfile(
 {
 	struct xfs_fd		fsxfd = XFS_FD_INIT_EMPTY;
 	struct xfs_bulkstat	bulkstat;
-	struct xfs_bstat	statbuf;
+	struct xfs_fd		file_fd = XFS_FD_INIT_EMPTY;
 	jdm_fshandle_t		*fshandlep;
-	int			fd = -1;
 	int			error = -1;
 	char			*tname;
 
@@ -792,17 +805,12 @@ fsrfile(
 			fname, strerror(error));
 		goto out;
 	}
-	error = -xfrog_bulkstat_v5_to_v1(&fsxfd, &statbuf, &bulkstat);
-	if (error) {
-		fsrprintf(_("bstat conversion error on %s: %s\n"),
-			fname, strerror(error));
-		goto out;
-	}
 
-	fd = jdm_open(fshandlep, &statbuf, O_RDWR|O_DIRECT);
-	if (fd < 0) {
+	error = open_handle(&file_fd, fshandlep, &bulkstat, &fsxfd.fsgeom,
+			O_RDWR | O_DIRECT);
+	if (error) {
 		fsrprintf(_("unable to open handle %s: %s\n"),
-			fname, strerror(errno));
+			fname, strerror(error));
 		goto out;
 	}
 
@@ -810,14 +818,13 @@ fsrfile(
 	memcpy(&fsgeom, &fsxfd.fsgeom, sizeof(fsgeom));
 
 	tname = gettmpname(fname);
-
 	if (tname)
-		error = fsrfile_common(fname, tname, NULL, fd, &statbuf);
+		error = fsrfile_common(fname, tname, NULL, &file_fd,
+				&bulkstat);
 
 out:
 	xfd_close(&fsxfd);
-	if (fd >= 0)
-		close(fd);
+	xfd_close(&file_fd);
 	free(fshandlep);
 
 	return error;
@@ -843,8 +850,8 @@ fsrfile_common(
 	char		*fname,
 	char		*tname,
 	char		*fsname,
-	int		fd,
-	struct xfs_bstat *statp)
+	struct xfs_fd	*file_fd,
+	struct xfs_bulkstat *statp)
 {
 	int		error;
 	struct statvfs  vfss;
@@ -854,7 +861,7 @@ fsrfile_common(
 	if (vflag)
 		fsrprintf("%s\n", fname);
 
-	if (fsync(fd) < 0) {
+	if (fsync(file_fd->fd) < 0) {
 		fsrprintf(_("sync failed: %s: %s\n"), fname, strerror(errno));
 		return -1;
 	}
@@ -878,7 +885,7 @@ fsrfile_common(
 		fl.l_whence = SEEK_SET;
 		fl.l_start = (off_t)0;
 		fl.l_len = 0;
-		if ((fcntl(fd, F_GETLK, &fl)) < 0 ) {
+		if ((fcntl(file_fd->fd, F_GETLK, &fl)) < 0 ) {
 			if (vflag)
 				fsrprintf(_("locking check failed: %s\n"),
 					fname);
@@ -896,7 +903,7 @@ fsrfile_common(
 	/*
 	 * Check if there is room to copy the file.
 	 *
-	 * Note that xfs_bstat.bs_blksize returns the filesystem blocksize,
+	 * Note that xfs_bulkstat.bs_blksize returns the filesystem blocksize,
 	 * not the optimal I/O size as struct stat.
 	 */
 	if (statvfs(fsname ? fsname : fname, &vfss) < 0) {
@@ -913,7 +920,7 @@ fsrfile_common(
 		return 1;
 	}
 
-	if ((ioctl(fd, FS_IOC_FSGETXATTR, &fsx)) < 0) {
+	if ((ioctl(file_fd->fd, FS_IOC_FSGETXATTR, &fsx)) < 0) {
 		fsrprintf(_("failed to get inode attrs: %s\n"), fname);
 		return(-1);
 	}
@@ -929,7 +936,7 @@ fsrfile_common(
 		return(0);
 	}
 	if (fsx.fsx_xflags & FS_XFLAG_REALTIME) {
-		if (xfs_getrt(fd, &vfss) < 0) {
+		if (xfs_getrt(file_fd->fd, &vfss) < 0) {
 			fsrprintf(_("cannot get realtime geometry for: %s\n"),
 				fname);
 			return(-1);
@@ -955,7 +962,7 @@ fsrfile_common(
 	 * file we're defragging, in packfile().
 	 */
 
-	if ((error = packfile(fname, tname, fd, statp, &fsx)))
+	if ((error = packfile(fname, tname, file_fd, statp, &fsx)))
 		return error;
 	return -1; /* no error */
 }
@@ -979,7 +986,7 @@ static int
 fsr_setup_attr_fork(
 	int		fd,
 	int		tfd,
-	struct xfs_bstat *bstatp)
+	struct xfs_bulkstat *bstatp)
 {
 	struct xfs_fd	txfd = XFS_FD_INIT(tfd);
 	struct stat	tstatbuf;
@@ -1161,23 +1168,28 @@ fsr_setup_attr_fork(
  *  1: No change / No Error
  */
 static int
-packfile(char *fname, char *tname, int fd,
-	 struct xfs_bstat *statp, struct fsxattr *fsxp)
+packfile(
+	char			*fname,
+	char			*tname,
+	struct xfs_fd		*file_fd,
+	struct xfs_bulkstat	*statp,
+	struct fsxattr		*fsxp)
 {
-	int 		tfd = -1;
-	int		srval;
-	int		retval = -1;	/* Failure is the default */
-	int		nextents, extent, cur_nextents, new_nextents;
-	unsigned	blksz_dio;
-	unsigned	dio_min;
-	struct dioattr	dio;
-	static xfs_swapext_t   sx;
-	struct xfs_flock64  space;
-	off_t 	cnt, pos;
-	void 		*fbuf = NULL;
-	int 		ct, wc, wc_b4;
-	char		ffname[SMBUFSZ];
-	int		ffd = -1;
+	int			tfd = -1;
+	int			srval;
+	int			retval = -1;	/* Failure is the default */
+	int			nextents, extent, cur_nextents, new_nextents;
+	unsigned		blksz_dio;
+	unsigned		dio_min;
+	struct dioattr		dio;
+	static xfs_swapext_t	sx;
+	struct xfs_flock64	space;
+	off_t			cnt, pos;
+	void			*fbuf = NULL;
+	int			ct, wc, wc_b4;
+	char			ffname[SMBUFSZ];
+	int			ffd = -1;
+	int			error;
 
 	/*
 	 * Work out the extent map - nextents will be set to the
@@ -1185,7 +1197,7 @@ packfile(char *fname, char *tname, int fd,
 	 * into account holes), cur_nextents is the current number
 	 * of extents.
 	 */
-	nextents = read_fd_bmap(fd, statp, &cur_nextents);
+	nextents = read_fd_bmap(file_fd->fd, statp, &cur_nextents);
 
 	if (cur_nextents == 1 || cur_nextents <= nextents) {
 		if (vflag)
@@ -1208,7 +1220,7 @@ packfile(char *fname, char *tname, int fd,
 	unlink(tname);
 
 	/* Setup extended attributes */
-	if (fsr_setup_attr_fork(fd, tfd, statp) != 0) {
+	if (fsr_setup_attr_fork(file_fd->fd, tfd, statp) != 0) {
 		fsrprintf(_("failed to set ATTR fork on tmp: %s:\n"), tname);
 		goto out;
 	}
@@ -1326,7 +1338,7 @@ packfile(char *fname, char *tname, int fd,
 				   tname, strerror(errno));
 				goto out;
 			}
-			if (lseek(fd, outmap[extent].bmv_length, SEEK_CUR) < 0) {
+			if (lseek(file_fd->fd, outmap[extent].bmv_length, SEEK_CUR) < 0) {
 				fsrprintf(_("could not lseek in file: %s : %s\n"),
 				   fname, strerror(errno));
 				goto out;
@@ -1346,7 +1358,7 @@ packfile(char *fname, char *tname, int fd,
 				ct = min(cnt + dio_min - (cnt % dio_min),
 					blksz_dio);
 			}
-			ct = read(fd, fbuf, ct);
+			ct = read(file_fd->fd, fbuf, ct);
 			if (ct == 0) {
 				/* EOF, stop trying to read */
 				extent = nextents;
@@ -1417,9 +1429,15 @@ packfile(char *fname, char *tname, int fd,
 		goto out;
 	}
 
-	sx.sx_stat     = *statp; /* struct copy */
+	error = -xfrog_bulkstat_v5_to_v1(file_fd, &sx.sx_stat, statp);
+	if (error) {
+		fsrprintf(_("bstat conversion error on %s: %s\n"),
+				fname, strerror(error));
+		goto out;
+	}
+
 	sx.sx_version  = XFS_SX_VERSION;
-	sx.sx_fdtarget = fd;
+	sx.sx_fdtarget = file_fd->fd;
 	sx.sx_fdtmp    = tfd;
 	sx.sx_offset   = 0;
 	sx.sx_length   = statp->bs_size;
@@ -1433,7 +1451,7 @@ packfile(char *fname, char *tname, int fd,
         }
 
 	/* Swap the extents */
-	srval = xfs_swapext(fd, &sx);
+	srval = xfs_swapext(file_fd->fd, &sx);
 	if (srval < 0) {
 		if (errno == ENOTSUP) {
 			if (vflag || dflag)
@@ -1529,7 +1547,7 @@ getparent(char *fname)
 #define MAPSIZE	128
 #define	OUTMAP_SIZE_INCREMENT	MAPSIZE
 
-int	read_fd_bmap(int fd, struct xfs_bstat *sin, int *cur_nextents)
+int	read_fd_bmap(int fd, struct xfs_bulkstat *sin, int *cur_nextents)
 {
 	int		i, cnt;
 	struct getbmap	map[MAPSIZE];
diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 3e7f0797d8bd..6980d3ffab69 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -102,29 +102,50 @@ xfrog_geometry(
 	return -errno;
 }
 
-/*
- * Prepare xfs_fd structure for future ioctl operations by computing the xfs
- * geometry for @xfd->fd.  Returns zero or a negative error code.
- */
-int
-xfd_prepare_geometry(
+/* Compute conversion factors of an xfs_fd structure. */
+static void
+xfd_compute_conversion_factors(
 	struct xfs_fd		*xfd)
 {
-	int			ret;
-
-	ret = xfrog_geometry(xfd->fd, &xfd->fsgeom);
-	if (ret)
-		return ret;
-
 	xfd->agblklog = log2_roundup(xfd->fsgeom.agblocks);
 	xfd->blocklog = highbit32(xfd->fsgeom.blocksize);
 	xfd->inodelog = highbit32(xfd->fsgeom.inodesize);
 	xfd->inopblog = xfd->blocklog - xfd->inodelog;
 	xfd->aginolog = xfd->agblklog + xfd->inopblog;
 	xfd->blkbb_log = xfd->blocklog - BBSHIFT;
+}
+
+/*
+ * Prepare xfs_fd structure for future ioctl operations by computing the xfs
+ * geometry for @xfd->fd.  Returns zero or a negative error code.
+ */
+int
+xfd_prepare_geometry(
+	struct xfs_fd		*xfd)
+{
+	int			ret;
+
+	ret = xfrog_geometry(xfd->fd, &xfd->fsgeom);
+	if (ret)
+		return ret;
+
+	xfd_compute_conversion_factors(xfd);
 	return 0;
 }
 
+/*
+ * Prepare xfs_fd structure for future ioctl operations by computing the xfs
+ * geometry for @xfd->fd.  Returns zero or a negative error code.
+ */
+void
+xfd_install_geometry(
+	struct xfs_fd		*xfd,
+	struct xfs_fsop_geom	*fsgeom)
+{
+	memcpy(&xfd->fsgeom, fsgeom, sizeof(*fsgeom));
+	xfd_compute_conversion_factors(xfd);
+}
+
 /* Open a file on an XFS filesystem.  Returns zero or a negative error code. */
 int
 xfd_open(
diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h
index ca38324e8533..f327dc7d70d0 100644
--- a/libfrog/fsgeom.h
+++ b/libfrog/fsgeom.h
@@ -55,6 +55,7 @@ struct xfs_fd {
 #define XFS_FD_INIT_EMPTY	XFS_FD_INIT(-1)
 
 int xfd_prepare_geometry(struct xfs_fd *xfd);
+void xfd_install_geometry(struct xfs_fd *xfd, struct xfs_fsop_geom *fsgeom);
 int xfd_open(struct xfs_fd *xfd, const char *pathname, int flags);
 int xfd_close(struct xfs_fd *xfd);
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  0:55   ` [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls Darrick J. Wong
@ 2024-07-02  0:55   ` Darrick J. Wong
  2024-07-02  5:14     ` Christoph Hellwig
  2024-07-02  0:55   ` [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl Darrick J. Wong
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:55 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

The newer swapext implementations in the kernel run at a high enough
level (above the bmap layer) that it's no longer required to manipulate
bs_forkoff by creating garbage xattrs to get the extent tree that we
want.  If we detect the newer algorithms, skip this error prone step.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fsr/xfs_fsr.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/fsr/xfs_fsr.c b/fsr/xfs_fsr.c
index 2989e4c989bf..7d0639868258 100644
--- a/fsr/xfs_fsr.c
+++ b/fsr/xfs_fsr.c
@@ -999,6 +999,20 @@ fsr_setup_attr_fork(
 	if (!(bstatp->bs_xflags & FS_XFLAG_HASATTR))
 		return 0;
 
+	/*
+	 * If the filesystem has the ability to perform atomic file mapping
+	 * exchanges, the file extent swap implementation uses a higher level
+	 * algorithm that calls into the bmap code instead of playing games
+	 * with swapping the extent forks.
+	 *
+	 * This new functionality does not require specific values of
+	 * bs_forkoff, unlike the old fork swap code.  Leave the extended
+	 * attributes alone if we know we're not using the old fork swap
+	 * strategy.  This eliminates a major source of runtime errors in fsr.
+	 */
+	if (fsgeom.flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE)
+		return 0;
+
 	/*
 	 * use the old method if we have attr1 or the kernel does not yet
 	 * support passing the fork offset in the bulkstat data.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-07-02  0:55   ` [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations Darrick J. Wong
@ 2024-07-02  0:55   ` Darrick J. Wong
  2024-07-02  5:15     ` Christoph Hellwig
  2024-07-02  0:56   ` [PATCH 10/12] libfrog: advertise exchange-range support Darrick J. Wong
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:55 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a new xfs_Io command to make raw calls to the
XFS_IOC_EXCHANGE_RANGE ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/Makefile       |   48 ++++++++++++++--
 io/exchrange.c    |  156 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c         |    1 
 io/io.h           |    1 
 man/man8/xfs_io.8 |   40 ++++++++++++++
 5 files changed, 239 insertions(+), 7 deletions(-)
 create mode 100644 io/exchrange.c


diff --git a/io/Makefile b/io/Makefile
index 17d499de9ab9..3192b813c740 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -8,13 +8,47 @@ include $(TOPDIR)/include/builddefs
 LTCOMMAND = xfs_io
 LSRCFILES = xfs_bmap.sh xfs_freeze.sh xfs_mkfile.sh
 HFILES = init.h io.h
-CFILES = init.c \
-	attr.c bmap.c bulkstat.c crc32cselftest.c cowextsize.c encrypt.c \
-	file.c freeze.c fsuuid.c fsync.c getrusage.c imap.c inject.c label.c \
-	link.c mmap.c open.c parent.c pread.c prealloc.c pwrite.c reflink.c \
-	resblks.c scrub.c seek.c shutdown.c stat.c swapext.c sync.c \
-	truncate.c utimes.c fadvise.c sendfile.c madvise.c mincore.c fiemap.c \
-	sync_file_range.c readdir.c
+CFILES = \
+	attr.c \
+	bmap.c \
+	bulkstat.c \
+	cowextsize.c \
+	crc32cselftest.c \
+	encrypt.c \
+	exchrange.c \
+	fadvise.c \
+	fiemap.c \
+	file.c \
+	freeze.c \
+	fsuuid.c \
+	fsync.c \
+	getrusage.c \
+	imap.c \
+	init.c \
+	inject.c \
+	label.c \
+	link.c \
+	madvise.c \
+	mincore.c \
+	mmap.c \
+	open.c \
+	parent.c \
+	pread.c \
+	prealloc.c \
+	pwrite.c \
+	readdir.c \
+	reflink.c \
+	resblks.c \
+	scrub.c \
+	seek.c \
+	sendfile.c \
+	shutdown.c \
+	stat.c \
+	swapext.c \
+	sync.c \
+	sync_file_range.c \
+	truncate.c \
+	utimes.c
 
 LLDLIBS = $(LIBXCMD) $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUUID)
 LTDEPENDENCIES = $(LIBXCMD) $(LIBHANDLE) $(LIBFROG)
diff --git a/io/exchrange.c b/io/exchrange.c
new file mode 100644
index 000000000000..016429280e27
--- /dev/null
+++ b/io/exchrange.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2020-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+#include "libfrog/logging.h"
+#include "libfrog/fsgeom.h"
+#include "libfrog/file_exchange.h"
+#include "libfrog/bulkstat.h"
+
+static void
+exchangerange_help(void)
+{
+	printf(_(
+"\n"
+" Exchange file data between the open file descriptor and the supplied filename.\n"
+" -C   -- Print timing information in a condensed format\n"
+" -d N -- Start exchanging contents at this position in the open file\n"
+" -f   -- Flush changed file data and metadata to disk\n"
+" -l N -- Exchange this many bytes between the two files instead of to EOF\n"
+" -n   -- Dry run; do all the parameter validation but do not change anything.\n"
+" -s N -- Start exchanging contents at this position in the supplied file\n"
+" -t   -- Print timing information\n"
+" -w   -- Only exchange written ranges in the supplied file\n"
+));
+}
+
+static int
+exchangerange_f(
+	int			argc,
+	char			**argv)
+{
+	struct xfs_exchange_range	fxr;
+	struct stat		stat;
+	struct timeval		t1, t2;
+	uint64_t		flags = XFS_EXCHANGE_RANGE_TO_EOF;
+	int64_t			src_offset = 0;
+	int64_t			dest_offset = 0;
+	int64_t			length = -1;
+	size_t			fsblocksize, fssectsize;
+	int			condensed = 0, quiet_flag = 1;
+	int			c;
+	int			fd;
+	int			ret;
+
+	init_cvtnum(&fsblocksize, &fssectsize);
+	while ((c = getopt(argc, argv, "Ccd:fl:ns:tw")) != -1) {
+		switch (c) {
+		case 'C':
+			condensed = 1;
+			break;
+		case 'd':
+			dest_offset = cvtnum(fsblocksize, fssectsize, optarg);
+			if (dest_offset < 0) {
+				printf(
+			_("non-numeric open file offset argument -- %s\n"),
+						optarg);
+				return 0;
+			}
+			break;
+		case 'f':
+			flags |= XFS_EXCHANGE_RANGE_DSYNC;
+			break;
+		case 'l':
+			length = cvtnum(fsblocksize, fssectsize, optarg);
+			if (length < 0) {
+				printf(
+			_("non-numeric length argument -- %s\n"),
+						optarg);
+				return 0;
+			}
+			flags &= ~XFS_EXCHANGE_RANGE_TO_EOF;
+			break;
+		case 'n':
+			flags |= XFS_EXCHANGE_RANGE_DRY_RUN;
+			break;
+		case 's':
+			src_offset = cvtnum(fsblocksize, fssectsize, optarg);
+			if (src_offset < 0) {
+				printf(
+			_("non-numeric supplied file offset argument -- %s\n"),
+						optarg);
+				return 0;
+			}
+			break;
+		case 't':
+			quiet_flag = 0;
+			break;
+		case 'w':
+			flags |= XFS_EXCHANGE_RANGE_FILE1_WRITTEN;
+			break;
+		default:
+			exchangerange_help();
+			return 0;
+		}
+	}
+	if (optind != argc - 1) {
+		exchangerange_help();
+		return 0;
+	}
+
+	/* open the donor file */
+	fd = openfile(argv[optind], NULL, 0, 0, NULL);
+	if (fd < 0)
+		return 0;
+
+	ret = fstat(file->fd, &stat);
+	if (ret) {
+		perror("fstat");
+		exitcode = 1;
+		goto out;
+	}
+	if (length < 0)
+		length = stat.st_size;
+
+	xfrog_exchangerange_prep(&fxr, dest_offset, fd, src_offset, length);
+	ret = xfrog_exchangerange(file->fd, &fxr, flags);
+	if (ret) {
+		xfrog_perror(ret, "exchangerange");
+		exitcode = 1;
+		goto out;
+	}
+	if (quiet_flag)
+		goto out;
+
+	gettimeofday(&t2, NULL);
+	t2 = tsub(t2, t1);
+
+	report_io_times("exchangerange", &t2, dest_offset, length, length, 1,
+			condensed);
+out:
+	close(fd);
+	return 0;
+}
+
+static struct cmdinfo exchangerange_cmd = {
+	.name		= "exchangerange",
+	.cfunc		= exchangerange_f,
+	.argmin		= 1,
+	.argmax		= -1,
+	.flags		= CMD_FLAG_ONESHOT | CMD_NOMAP_OK,
+	.help		= exchangerange_help,
+};
+
+void
+exchangerange_init(void)
+{
+	exchangerange_cmd.args = _("[-Cfntw] [-d dest_offset] [-s src_offset] [-l length] <donorfile>");
+	exchangerange_cmd.oneline = _("Exchange contents between files.");
+
+	add_command(&exchangerange_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 104cd2c12152..37e0f093c526 100644
--- a/io/init.c
+++ b/io/init.c
@@ -88,6 +88,7 @@ init_commands(void)
 	truncate_init();
 	utimes_init();
 	crc32cselftest_init();
+	exchangerange_init();
 }
 
 /*
diff --git a/io/io.h b/io/io.h
index e06dff53f895..fb28bb896d61 100644
--- a/io/io.h
+++ b/io/io.h
@@ -149,3 +149,4 @@ extern void		scrub_init(void);
 extern void		repair_init(void);
 extern void		crc32cselftest_init(void);
 extern void		bulkstat_init(void);
+extern void		exchangerange_init(void);
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 3ce280a75b4a..2a7c67f7cf3a 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -713,6 +713,46 @@ Swaps extent forks between files. The current open file is the target. The donor
 file is specified by path. Note that file data is not copied (file content moves
 with the fork(s)).
 .TP
+.BI "exchangerange [OPTIONS]" donor_file "
+Exchanges contents between files.
+The current open file is the target.
+The donor file is specified by path.
+Note that file data is not copied (file content moves with the fork(s)).
+Options include:
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.TP
+.B \-C
+Print timing information in a condensed format.
+.TP
+.BI \-d " dest_offset"
+Swap extents with open file beginning at
+.IR dest_offset .
+.TP
+.B \-f
+Flush changed file data and file metadata to disk.
+.TP
+.BI \-l " length"
+Swap up to
+.I length
+bytes of data.
+.TP
+.B \-n
+Perform all the parameter validation checks but don't change anything.
+.TP
+.BI \-s " src_offset"
+Swap extents with donor file beginning at
+.IR src_offset .
+.TP
+.B \-t
+Print timing information.
+.TP
+.B \-w
+Only exchange written ranges in the supplied file.
+.RE
+.PD
+.TP
 .BI "set_encpolicy [ \-c " mode " ] [ \-n " mode " ] [ \-f " flags " ] [ \-s " log2_dusize " ] [ \-v " version " ] [ " keyspec " ]"
 On filesystems that support encryption, assign an encryption policy to the
 current file.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 10/12] libfrog: advertise exchange-range support
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-07-02  0:55   ` [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl Darrick J. Wong
@ 2024-07-02  0:56   ` Darrick J. Wong
  2024-07-02  5:15     ` Christoph Hellwig
  2024-07-02  0:56   ` [PATCH 11/12] xfs_repair: add exchange-range to file systems Darrick J. Wong
  2024-07-02  0:56   ` [PATCH 12/12] mkfs: add a formatting option for exchange-range Darrick J. Wong
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:56 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Report the presence of exchange range for a given filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/fsgeom.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 6980d3ffab69..71a8e4bb9980 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -31,6 +31,7 @@ xfs_report_geom(
 	int			bigtime_enabled;
 	int			inobtcount;
 	int			nrext64;
+	int			exchangerange;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -49,12 +50,14 @@ xfs_report_geom(
 	bigtime_enabled = geo->flags & XFS_FSOP_GEOM_FLAGS_BIGTIME ? 1 : 0;
 	inobtcount = geo->flags & XFS_FSOP_GEOM_FLAGS_INOBTCNT ? 1 : 0;
 	nrext64 = geo->flags & XFS_FSOP_GEOM_FLAGS_NREXT64 ? 1 : 0;
+	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
 "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
 "         =%-22s reflink=%-4u bigtime=%u inobtcount=%u nrext64=%u\n"
+"         =%-22s exchange=%-3u\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
 "naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d\n"
@@ -65,6 +68,7 @@ xfs_report_geom(
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
 		"", reflink_enabled, bigtime_enabled, inobtcount, nrext64,
+		"", exchangerange,
 		"", geo->blocksize, (unsigned long long)geo->datablocks,
 			geo->imaxpct,
 		"", geo->sunit, geo->swidth,


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 11/12] xfs_repair: add exchange-range to file systems
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-07-02  0:56   ` [PATCH 10/12] libfrog: advertise exchange-range support Darrick J. Wong
@ 2024-07-02  0:56   ` Darrick J. Wong
  2024-07-02  5:15     ` Christoph Hellwig
  2024-07-02  0:56   ` [PATCH 12/12] mkfs: add a formatting option for exchange-range Darrick J. Wong
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:56 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Enable upgrading existing filesystems to support the file exchange range
feature.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_admin.8 |    7 +++++++
 repair/globals.c     |    1 +
 repair/globals.h     |    1 +
 repair/phase2.c      |   30 ++++++++++++++++++++++++++++++
 repair/xfs_repair.c  |   11 +++++++++++
 5 files changed, 50 insertions(+)


diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8
index 4794d6774ede..63f8ee90307b 100644
--- a/man/man8/xfs_admin.8
+++ b/man/man8/xfs_admin.8
@@ -156,6 +156,13 @@ data fork extent count will be 2^48 - 1, while the maximum attribute fork
 extent count will be 2^32 - 1. The filesystem cannot be downgraded after this
 feature is enabled. Once enabled, the filesystem will not be mountable by
 older kernels.  This feature was added to Linux 5.19.
+.TP 0.4i
+.B exchange
+Upgrade a filesystem to support atomic file content exchanges through the
+XFS_IOC_EXCHANGE_RANGE ioctl, and to support online repairs of directories,
+extended attributes, symbolic links, and realtime free space metadata.
+The filesystem cannot be downgraded after this feature is enabled.
+Once enabled, the filesystem will not be mountable by older kernels.
 .RE
 .TP
 .BI \-U " uuid"
diff --git a/repair/globals.c b/repair/globals.c
index 24f720c46afb..c0c45df51d56 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -52,6 +52,7 @@ bool	features_changed;	/* did we change superblock feature bits? */
 bool	add_inobtcount;		/* add inode btree counts to AGI */
 bool	add_bigtime;		/* add support for timestamps up to 2486 */
 bool	add_nrext64;
+bool	add_exchrange;		/* add file content exchange support */
 
 /* misc status variables */
 
diff --git a/repair/globals.h b/repair/globals.h
index b83a8ae65942..1eadfdbf9ae4 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -93,6 +93,7 @@ extern bool	features_changed;	/* did we change superblock feature bits? */
 extern bool	add_inobtcount;		/* add inode btree counts to AGI */
 extern bool	add_bigtime;		/* add support for timestamps up to 2486 */
 extern bool	add_nrext64;
+extern bool	add_exchrange;		/* add file content exchange support */
 
 /* misc status variables */
 
diff --git a/repair/phase2.c b/repair/phase2.c
index 06374817964c..83f0c539bb5d 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -182,6 +182,34 @@ set_nrext64(
 	return true;
 }
 
+static bool
+set_exchrange(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*new_sb)
+{
+	if (xfs_has_exchange_range(mp)) {
+		printf(_("Filesystem already supports exchange-range.\n"));
+		exit(0);
+	}
+
+	if (!xfs_has_crc(mp)) {
+		printf(
+	_("File exchange-range feature only supported on V5 filesystems.\n"));
+		exit(0);
+	}
+
+	if (!xfs_has_reflink(mp)) {
+		printf(
+	_("File exchange-range feature cannot be added without reflink.\n"));
+		exit(0);
+	}
+
+	printf(_("Adding file exchange-range support to filesystem.\n"));
+	new_sb->sb_features_ro_compat |= XFS_SB_FEAT_INCOMPAT_EXCHRANGE;
+	new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
+	return true;
+}
+
 struct check_state {
 	struct xfs_sb		sb;
 	uint64_t		features;
@@ -290,6 +318,8 @@ upgrade_filesystem(
 		dirty |= set_bigtime(mp, &new_sb);
 	if (add_nrext64)
 		dirty |= set_nrext64(mp, &new_sb);
+	if (add_exchrange)
+		dirty |= set_exchrange(mp, &new_sb);
 	if (!dirty)
 		return;
 
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index cf774964381e..39884015300a 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -69,6 +69,7 @@ enum c_opt_nums {
 	CONVERT_INOBTCOUNT,
 	CONVERT_BIGTIME,
 	CONVERT_NREXT64,
+	CONVERT_EXCHRANGE,
 	C_MAX_OPTS,
 };
 
@@ -77,6 +78,7 @@ static char *c_opts[] = {
 	[CONVERT_INOBTCOUNT]	= "inobtcount",
 	[CONVERT_BIGTIME]	= "bigtime",
 	[CONVERT_NREXT64]	= "nrext64",
+	[CONVERT_EXCHRANGE]	= "exchange",
 	[C_MAX_OPTS]		= NULL,
 };
 
@@ -360,6 +362,15 @@ process_args(int argc, char **argv)
 		_("-c nrext64 only supports upgrades\n"));
 					add_nrext64 = true;
 					break;
+				case CONVERT_EXCHRANGE:
+					if (!val)
+						do_abort(
+		_("-c exchange requires a parameter\n"));
+					if (strtol(val, NULL, 0) != 1)
+						do_abort(
+		_("-c exchange only supports upgrades\n"));
+					add_exchrange = true;
+					break;
 				default:
 					unknown('c', val);
 					break;


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 12/12] mkfs: add a formatting option for exchange-range
  2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-07-02  0:56   ` [PATCH 11/12] xfs_repair: add exchange-range to file systems Darrick J. Wong
@ 2024-07-02  0:56   ` Darrick J. Wong
  2024-07-02  5:16     ` Christoph Hellwig
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:56 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Allow users to enable the logged file mapping exchange intent items on a
filesystem, which in turn enables XFS_IOC_EXCHANGE_RANGE and online
repair of metadata that lives in files, e.g. directories and xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |    7 +++++++
 mkfs/lts_4.19.conf     |    1 +
 mkfs/lts_5.10.conf     |    1 +
 mkfs/lts_5.15.conf     |    1 +
 mkfs/lts_5.4.conf      |    1 +
 mkfs/lts_6.1.conf      |    1 +
 mkfs/lts_6.6.conf      |    1 +
 mkfs/xfs_mkfs.c        |   26 ++++++++++++++++++++++++--
 8 files changed, 37 insertions(+), 2 deletions(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 8060d342c2a4..d5a0783ac5d6 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -670,6 +670,13 @@ If the value is omitted, 1 is assumed.
 This feature will be enabled when possible.
 This feature is only available for filesystems formatted with -m crc=1.
 .TP
+.BI exchange[= value]
+When enabled, application programs can exchange file contents atomically
+via the XFS_IOC_EXCHANGE_RANGE ioctl.
+Online repair uses this functionality to rebuild extended attributes,
+directories, symbolic links, and realtime metadata files.
+This feature is disabled by default.
+This feature is only available for filesystems formatted with -m crc=1.
 .RE
 .PP
 .PD 0
diff --git a/mkfs/lts_4.19.conf b/mkfs/lts_4.19.conf
index 8b2bdd7a3471..92e8eba6ba8f 100644
--- a/mkfs/lts_4.19.conf
+++ b/mkfs/lts_4.19.conf
@@ -12,3 +12,4 @@ rmapbt=0
 [inode]
 sparse=1
 nrext64=0
+exchange=0
diff --git a/mkfs/lts_5.10.conf b/mkfs/lts_5.10.conf
index 40189310af2a..34e7662cd671 100644
--- a/mkfs/lts_5.10.conf
+++ b/mkfs/lts_5.10.conf
@@ -12,3 +12,4 @@ rmapbt=0
 [inode]
 sparse=1
 nrext64=0
+exchange=0
diff --git a/mkfs/lts_5.15.conf b/mkfs/lts_5.15.conf
index aeecc0355673..a36a5c2b7850 100644
--- a/mkfs/lts_5.15.conf
+++ b/mkfs/lts_5.15.conf
@@ -12,3 +12,4 @@ rmapbt=0
 [inode]
 sparse=1
 nrext64=0
+exchange=0
diff --git a/mkfs/lts_5.4.conf b/mkfs/lts_5.4.conf
index 0a40718b8f62..4204d5b8f235 100644
--- a/mkfs/lts_5.4.conf
+++ b/mkfs/lts_5.4.conf
@@ -12,3 +12,4 @@ rmapbt=0
 [inode]
 sparse=1
 nrext64=0
+exchange=0
diff --git a/mkfs/lts_6.1.conf b/mkfs/lts_6.1.conf
index 452abdf82e62..9a90def8f489 100644
--- a/mkfs/lts_6.1.conf
+++ b/mkfs/lts_6.1.conf
@@ -12,3 +12,4 @@ rmapbt=0
 [inode]
 sparse=1
 nrext64=0
+exchange=0
diff --git a/mkfs/lts_6.6.conf b/mkfs/lts_6.6.conf
index 244f8eaf7645..3f7fb651937d 100644
--- a/mkfs/lts_6.6.conf
+++ b/mkfs/lts_6.6.conf
@@ -12,3 +12,4 @@ rmapbt=1
 [inode]
 sparse=1
 nrext64=1
+exchange=0
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 6d2469c3c81f..991ecbdd03ff 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -90,6 +90,7 @@ enum {
 	I_PROJID32BIT,
 	I_SPINODES,
 	I_NREXT64,
+	I_EXCHANGE,
 	I_MAX_OPTS,
 };
 
@@ -469,6 +470,7 @@ static struct opt_params iopts = {
 		[I_PROJID32BIT] = "projid32bit",
 		[I_SPINODES] = "sparse",
 		[I_NREXT64] = "nrext64",
+		[I_EXCHANGE] = "exchange",
 		[I_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -523,7 +525,13 @@ static struct opt_params iopts = {
 		  .minval = 0,
 		  .maxval = 1,
 		  .defaultval = 1,
-		}
+		},
+		{ .index = I_EXCHANGE,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
 	},
 };
 
@@ -889,6 +897,7 @@ struct sb_feat_args {
 	bool	nodalign;
 	bool	nortalign;
 	bool	nrext64;
+	bool	exchrange;		/* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
 };
 
 struct cli_params {
@@ -1024,7 +1033,8 @@ usage( void )
 			    sectsize=num,concurrency=num]\n\
 /* force overwrite */	[-f]\n\
 /* inode size */	[-i perblock=n|size=num,maxpct=n,attr=0|1|2,\n\
-			    projid32bit=0|1,sparse=0|1,nrext64=0|1]\n\
+			    projid32bit=0|1,sparse=0|1,nrext64=0|1,\n\
+			    exchange=0|1]\n\
 /* no discard */	[-K]\n\
 /* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
 			    sunit=value|su=num,sectsize=num,lazy-count=0|1,\n\
@@ -1722,6 +1732,9 @@ inode_opts_parser(
 	case I_NREXT64:
 		cli->sb_feat.nrext64 = getnum(value, opts, subopt);
 		break;
+	case I_EXCHANGE:
+		cli->sb_feat.exchrange = getnum(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2365,6 +2378,13 @@ _("64 bit extent count not supported without CRC support\n"));
 			usage();
 		}
 		cli->sb_feat.nrext64 = false;
+
+		if (cli->sb_feat.exchrange && cli_opt_set(&iopts, I_EXCHANGE)) {
+			fprintf(stderr,
+_("exchange-range not supported without CRC support\n"));
+			usage();
+		}
+		cli->sb_feat.exchrange = false;
 	}
 
 	if (!cli->sb_feat.finobt) {
@@ -3498,6 +3518,8 @@ sb_set_features(
 
 	if (fp->nrext64)
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NREXT64;
+	if (fp->exchrange)
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_EXCHRANGE;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args
  2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
@ 2024-07-02  0:56   ` Darrick J. Wong
  2024-07-02  5:16     ` Christoph Hellwig
  2024-07-02  0:57   ` [PATCH 2/3] libxfs: port the bumplink function from the kernel Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:56 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Update these two utilities to set the owner field of the da args
structure prior to calling directory and extended attribute functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/namei.c      |    1 +
 repair/phase6.c |    3 +++
 2 files changed, 4 insertions(+)


diff --git a/db/namei.c b/db/namei.c
index 6de0621616a6..41ccaa04b659 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -447,6 +447,7 @@ listdir(
 	struct xfs_da_args	args = {
 		.dp		= dp,
 		.geo		= dp->i_mount->m_dir_geo,
+		.owner		= dp->i_ino,
 	};
 	int			error;
 
diff --git a/repair/phase6.c b/repair/phase6.c
index 1e985e7db068..92a58db0d236 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1401,6 +1401,7 @@ dir2_kill_block(
 	args.trans = tp;
 	args.whichfork = XFS_DATA_FORK;
 	args.geo = mp->m_dir_geo;
+	args.owner = ip->i_ino;
 	if (da_bno >= mp->m_dir_geo->leafblk && da_bno < mp->m_dir_geo->freeblk)
 		error = -libxfs_da_shrink_inode(&args, da_bno, bp);
 	else
@@ -1505,6 +1506,7 @@ longform_dir2_entry_check_data(
 	struct xfs_da_args	da = {
 		.dp = ip,
 		.geo = mp->m_dir_geo,
+		.owner = ip->i_ino,
 	};
 
 
@@ -2294,6 +2296,7 @@ longform_dir2_entry_check(
 	/* is this a block, leaf, or node directory? */
 	args.dp = ip;
 	args.geo = mp->m_dir_geo;
+	args.owner = ip->i_ino;
 	fmt = libxfs_dir2_format(&args, &error);
 
 	/* check directory "data" blocks (ie. name/inode pairs) */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/3] libxfs: port the bumplink function from the kernel
  2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
  2024-07-02  0:56   ` [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args Darrick J. Wong
@ 2024-07-02  0:57   ` Darrick J. Wong
  2024-07-02  5:16     ` Christoph Hellwig
  2024-07-02  0:57   ` [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:57 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Port the xfs_bumplink function from the kernel and use it to replace raw
calls to inc_nlink.  The next patch will need this common function to
prevent integer overflows in the link count.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/xfs_inode.h |    2 ++
 libxfs/util.c       |   17 +++++++++++++++++
 mkfs/proto.c        |    4 ++--
 repair/phase6.c     |   10 +++++-----
 4 files changed, 26 insertions(+), 7 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index c6e4f84bdc16..45339b42621a 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -359,6 +359,8 @@ extern void	libxfs_trans_ichgtime(struct xfs_trans *,
 				struct xfs_inode *, int);
 extern int	libxfs_iflush_int (struct xfs_inode *, struct xfs_buf *);
 
+void libxfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip);
+
 /* Inode Cache Interfaces */
 extern int	libxfs_iget(struct xfs_mount *, struct xfs_trans *, xfs_ino_t,
 				uint, struct xfs_inode **);
diff --git a/libxfs/util.c b/libxfs/util.c
index 2656c99a8ea7..dc54e3ee66db 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -240,6 +240,23 @@ xfs_inode_propagate_flags(
 	ip->i_diflags |= di_flags;
 }
 
+/*
+ * Increment the link count on an inode & log the change.
+ */
+void
+libxfs_bumplink(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip)
+{
+	struct inode		*inode = VFS_I(ip);
+
+	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+
+	inc_nlink(inode);
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+}
+
 /*
  * Initialise a newly allocated inode and return the in-core inode to the
  * caller locked exclusively.
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 5125ee44f493..a9a9b704a3ca 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -591,7 +591,7 @@ parseproto(
 				&creds, fsxp, &ip);
 		if (error)
 			fail(_("Inode allocation failed"), error);
-		inc_nlink(VFS_I(ip));		/* account for . */
+		libxfs_bumplink(tp, ip);		/* account for . */
 		if (!pip) {
 			pip = ip;
 			mp->m_sb.sb_rootino = ip->i_ino;
@@ -601,7 +601,7 @@ parseproto(
 			libxfs_trans_ijoin(tp, pip, 0);
 			xname.type = XFS_DIR3_FT_DIR;
 			newdirent(mp, tp, pip, &xname, ip->i_ino);
-			inc_nlink(VFS_I(pip));
+			libxfs_bumplink(tp, pip);
 			libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
 		}
 		newdirectory(mp, tp, ip, pip);
diff --git a/repair/phase6.c b/repair/phase6.c
index 92a58db0d236..47dd9de2741c 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -947,7 +947,7 @@ mk_orphanage(xfs_mount_t *mp)
 		do_error(_("%s inode allocation failed %d\n"),
 			ORPHANAGE, error);
 	}
-	inc_nlink(VFS_I(ip));		/* account for . */
+	libxfs_bumplink(tp, ip);		/* account for . */
 	ino = ip->i_ino;
 
 	irec = find_inode_rec(mp,
@@ -999,7 +999,7 @@ mk_orphanage(xfs_mount_t *mp)
 	 * for .. in the new directory, and update the irec copy of the
 	 * on-disk nlink so we don't fail the link count check later.
 	 */
-	inc_nlink(VFS_I(pip));
+	libxfs_bumplink(tp, pip);
 	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
 				  XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino));
 	add_inode_ref(irec, 0);
@@ -1094,7 +1094,7 @@ mv_orphanage(
 			if (irec)
 				add_inode_ref(irec, ino_offset);
 			else
-				inc_nlink(VFS_I(orphanage_ip));
+				libxfs_bumplink(tp, orphanage_ip);
 			libxfs_trans_log_inode(tp, orphanage_ip, XFS_ILOG_CORE);
 
 			err = -libxfs_dir_createname(tp, ino_p, &xfs_name_dotdot,
@@ -1103,7 +1103,7 @@ mv_orphanage(
 				do_error(
 	_("creation of .. entry failed (%d)\n"), err);
 
-			inc_nlink(VFS_I(ino_p));
+			libxfs_bumplink(tp, ino_p);
 			libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 			err = -libxfs_trans_commit(tp);
 			if (err)
@@ -1128,7 +1128,7 @@ mv_orphanage(
 			if (irec)
 				add_inode_ref(irec, ino_offset);
 			else
-				inc_nlink(VFS_I(orphanage_ip));
+				libxfs_bumplink(tp, orphanage_ip);
 			libxfs_trans_log_inode(tp, orphanage_ip, XFS_ILOG_CORE);
 
 			/*


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count
  2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
  2024-07-02  0:56   ` [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 2/3] libxfs: port the bumplink function from the kernel Darrick J. Wong
@ 2024-07-02  0:57   ` Darrick J. Wong
  2024-07-02  5:17     ` Christoph Hellwig
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:57 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Update userspace utilities not to allow integer overflows of inode link
counts to result in a file that is referenced by parent directories but
has zero link count.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/util.c       |    3 ++-
 repair/incore_ino.c |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index dc54e3ee66db..74eea0fcbe07 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -252,7 +252,8 @@ libxfs_bumplink(
 
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 
-	inc_nlink(inode);
+	if (inode->i_nlink != XFS_NLINK_PINNED)
+		inc_nlink(inode);
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 }
diff --git a/repair/incore_ino.c b/repair/incore_ino.c
index 0dd7a2f060fb..b0b41a2cc5c5 100644
--- a/repair/incore_ino.c
+++ b/repair/incore_ino.c
@@ -108,7 +108,8 @@ void add_inode_ref(struct ino_tree_node *irec, int ino_offset)
 		nlink_grow_16_to_32(irec);
 		/*FALLTHRU*/
 	case sizeof(uint32_t):
-		irec->ino_un.ex_data->counted_nlinks.un32[ino_offset]++;
+		if (irec->ino_un.ex_data->counted_nlinks.un32[ino_offset] != XFS_NLINK_PINNED)
+			irec->ino_un.ex_data->counted_nlinks.un32[ino_offset]++;
 		break;
 	default:
 		ASSERT(0);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 01/13] xfs_scrub: use proper UChar string iterators
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
@ 2024-07-02  0:57   ` Darrick J. Wong
  2024-07-02  5:18     ` Christoph Hellwig
  2024-07-02  0:57   ` [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters Darrick J. Wong
                     ` (11 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:57 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

For code that wants to examine a UChar string, use libicu's string
iterators to walk UChar strings, instead of the open-coded U16_NEXT*
macros that perform no typechecking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index dd30164354e3..02a1b94efb4d 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -330,13 +330,12 @@ name_entry_examine(
 	struct name_entry	*entry,
 	unsigned int		*badflags)
 {
+	UCharIterator		uiter;
 	UChar32			uchr;
-	int32_t			i;
 	uint8_t			mask = 0;
 
-	for (i = 0; i < entry->normstrlen;) {
-		U16_NEXT_UNSAFE(entry->normstr, i, uchr);
-
+	uiter_setString(&uiter, entry->normstr, entry->normstrlen);
+	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
 		/* zero width character sequences */
 		switch (uchr) {
 		case 0x200B:	/* zero width space */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 01/13] xfs_scrub: use proper UChar string iterators Darrick J. Wong
@ 2024-07-02  0:57   ` Darrick J. Wong
  2024-07-02  5:21     ` Christoph Hellwig
  2024-07-02  0:58   ` [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points Darrick J. Wong
                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:57 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Hoist the loop that removes "ignorable" code points from the skeleton
string into a separate function and give the UChar cursors names that
are easier to understand.  Convert the code to use the safe versions of
the U16_ accessor functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |   39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 02a1b94efb4d..96e20114c484 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -145,6 +145,31 @@ is_utf8_locale(void)
 	return answer;
 }
 
+/*
+ * Remove control/formatting characters from this string and return its new
+ * length.  UChar32 is required for U16_NEXT, despite the name.
+ */
+static int32_t
+remove_ignorable(
+	UChar		*ustr,
+	int32_t		ustrlen)
+{
+	UChar32		uchr;
+	int32_t		src, dest;
+
+	for (src = 0, dest = 0; src < ustrlen; dest = src) {
+		U16_NEXT(ustr, src, ustrlen, uchr);
+		if (!u_isIDIgnorable(uchr))
+			continue;
+		memmove(&ustr[dest], &ustr[src],
+				(ustrlen - src + 1) * sizeof(UChar));
+		ustrlen -= (src - dest);
+		src = dest;
+	}
+
+	return dest;
+}
+
 /*
  * Generate normalized form and skeleton of the name.  If this fails, just
  * forget everything and return false; this is an advisory checker.
@@ -160,9 +185,6 @@ name_entry_compute_checknames(
 	int32_t			normstrlen;
 	int32_t			unistrlen;
 	int32_t			skelstrlen;
-	UChar32			uchr;
-	int32_t			i, j;
-
 	UErrorCode		uerr = U_ZERO_ERROR;
 
 	/* Convert bytestr to unistr for normalization */
@@ -206,16 +228,7 @@ name_entry_compute_checknames(
 	if (U_FAILURE(uerr))
 		goto out_skelstr;
 
-	/* Remove control/formatting characters from skeleton. */
-	for (i = 0, j = 0; i < skelstrlen; j = i) {
-		U16_NEXT_UNSAFE(skelstr, i, uchr);
-		if (!u_isIDIgnorable(uchr))
-			continue;
-		memmove(&skelstr[j], &skelstr[i],
-				(skelstrlen - i + 1) * sizeof(UChar));
-		skelstrlen -= (i - j);
-		i = j;
-	}
+	skelstrlen = remove_ignorable(skelstr, skelstrlen);
 
 	entry->skelstr = skelstr;
 	entry->skelstrlen = skelstrlen;


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 01/13] xfs_scrub: use proper UChar string iterators Darrick J. Wong
  2024-07-02  0:57   ` [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters Darrick J. Wong
@ 2024-07-02  0:58   ` Darrick J. Wong
  2024-07-02  5:22     ` Christoph Hellwig
  2024-07-02  0:58   ` [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry Darrick J. Wong
                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:58 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

I missed a few non-rendering code points in the "zero width"
classification code.  Add them now, and sort the list.

$ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
$ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 96e20114c484..fc1adb2caab7 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -351,15 +351,17 @@ name_entry_examine(
 	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
 		/* zero width character sequences */
 		switch (uchr) {
+		case 0x034F:	/* combining grapheme joiner */
 		case 0x200B:	/* zero width space */
 		case 0x200C:	/* zero width non-joiner */
 		case 0x200D:	/* zero width joiner */
-		case 0xFEFF:	/* zero width non breaking space */
 		case 0x2060:	/* word joiner */
 		case 0x2061:	/* function application */
 		case 0x2062:	/* invisible times (multiply) */
 		case 0x2063:	/* invisible separator (comma) */
 		case 0x2064:	/* invisible plus (addition) */
+		case 0x2D7F:	/* tifinagh consonant joiner */
+		case 0xFEFF:	/* zero width non breaking space */
 			*badflags |= UNICRASH_ZERO_WIDTH;
 			break;
 		}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  0:58   ` [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points Darrick J. Wong
@ 2024-07-02  0:58   ` Darrick J. Wong
  2024-07-02  5:22     ` Christoph Hellwig
  2024-07-02  0:58   ` [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths Darrick J. Wong
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:58 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Change the function declaration of unicrash_add to set the caller's
@new_entry to NULL if we detect an updated name entry and do not wish to
continue processing.  This avoids a theoretical UAF if the unicrash_add
caller were to accidentally continue using the pointer.

This isn't an /actual/ UAF because the function formerly set @badflags
to zero, but let's be a little defensive.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index fc1adb2caab7..5a61d69705bd 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -626,10 +626,11 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."),
 static void
 unicrash_add(
 	struct unicrash		*uc,
-	struct name_entry	*new_entry,
+	struct name_entry	**new_entryp,
 	unsigned int		*badflags,
 	struct name_entry	**existing_entry)
 {
+	struct name_entry	*new_entry = *new_entryp;
 	struct name_entry	*entry;
 	size_t			bucket;
 	xfs_dahash_t		hash;
@@ -652,7 +653,7 @@ unicrash_add(
 			entry->ino = new_entry->ino;
 			uc->buckets[bucket] = new_entry->next;
 			name_entry_free(new_entry);
-			*badflags = 0;
+			*new_entryp = NULL;
 			return;
 		}
 
@@ -695,8 +696,8 @@ __unicrash_check_name(
 		return 0;
 
 	name_entry_examine(new_entry, &badflags);
-	unicrash_add(uc, new_entry, &badflags, &dup_entry);
-	if (badflags)
+	unicrash_add(uc, &new_entry, &badflags, &dup_entry);
+	if (new_entry && badflags)
 		unicrash_complain(uc, dsc, namedescr, new_entry, badflags,
 				dup_entry);
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  0:58   ` [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry Darrick J. Wong
@ 2024-07-02  0:58   ` Darrick J. Wong
  2024-07-02  5:22     ` Christoph Hellwig
  2024-07-02  0:58   ` [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate Darrick J. Wong
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:58 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

The libicu functions u_strFromUTF8, unorm2_normalize, and
uspoof_getSkeleton return int32_t values.  Guard against negative return
values, even though the library itself never does this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 5a61d69705bd..1c0597e52f76 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -189,7 +189,7 @@ name_entry_compute_checknames(
 
 	/* Convert bytestr to unistr for normalization */
 	u_strFromUTF8(NULL, 0, &unistrlen, entry->name, entry->namelen, &uerr);
-	if (uerr != U_BUFFER_OVERFLOW_ERROR)
+	if (uerr != U_BUFFER_OVERFLOW_ERROR || unistrlen < 0)
 		return false;
 	uerr = U_ZERO_ERROR;
 	unistr = calloc(unistrlen + 1, sizeof(UChar));
@@ -203,7 +203,7 @@ name_entry_compute_checknames(
 	/* Normalize the string. */
 	normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL,
 			0, &uerr);
-	if (uerr != U_BUFFER_OVERFLOW_ERROR)
+	if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0)
 		goto out_unistr;
 	uerr = U_ZERO_ERROR;
 	normstr = calloc(normstrlen + 1, sizeof(UChar));
@@ -217,7 +217,7 @@ name_entry_compute_checknames(
 	/* Compute skeleton. */
 	skelstrlen = uspoof_getSkeleton(uc->spoof, 0, unistr, unistrlen, NULL,
 			0, &uerr);
-	if (uerr != U_BUFFER_OVERFLOW_ERROR)
+	if (uerr != U_BUFFER_OVERFLOW_ERROR || skelstrlen < 0)
 		goto out_normstr;
 	uerr = U_ZERO_ERROR;
 	skelstr = calloc(skelstrlen + 1, sizeof(UChar));


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  0:58   ` [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths Darrick J. Wong
@ 2024-07-02  0:58   ` Darrick J. Wong
  2024-07-02  5:23     ` Christoph Hellwig
  2024-07-02  0:59   ` [PATCH 07/13] xfs_scrub: store bad flags with the name entry Darrick J. Wong
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:58 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Hoist this predicate code into its own function; we're going to use it
elsewhere later on.  While we're at it, document how we generated this
list in the first place.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |   45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 1c0597e52f76..385e42c6acc9 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -170,6 +170,34 @@ remove_ignorable(
 	return dest;
 }
 
+/*
+ * Certain unicode codepoints are formatting hints that are not themselves
+ * supposed to be rendered by a display system.  These codepoints can be
+ * encoded in file names to try to confuse users.
+ *
+ * Download https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt and
+ * $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt
+ */
+static inline bool is_nonrendering(UChar32 uchr)
+{
+	switch (uchr) {
+	case 0x034F:	/* combining grapheme joiner */
+	case 0x200B:	/* zero width space */
+	case 0x200C:	/* zero width non-joiner */
+	case 0x200D:	/* zero width joiner */
+	case 0x2060:	/* word joiner */
+	case 0x2061:	/* function application */
+	case 0x2062:	/* invisible times (multiply) */
+	case 0x2063:	/* invisible separator (comma) */
+	case 0x2064:	/* invisible plus (addition) */
+	case 0x2D7F:	/* tifinagh consonant joiner */
+	case 0xFEFF:	/* zero width non breaking space */
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Generate normalized form and skeleton of the name.  If this fails, just
  * forget everything and return false; this is an advisory checker.
@@ -349,22 +377,9 @@ name_entry_examine(
 
 	uiter_setString(&uiter, entry->normstr, entry->normstrlen);
 	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
-		/* zero width character sequences */
-		switch (uchr) {
-		case 0x034F:	/* combining grapheme joiner */
-		case 0x200B:	/* zero width space */
-		case 0x200C:	/* zero width non-joiner */
-		case 0x200D:	/* zero width joiner */
-		case 0x2060:	/* word joiner */
-		case 0x2061:	/* function application */
-		case 0x2062:	/* invisible times (multiply) */
-		case 0x2063:	/* invisible separator (comma) */
-		case 0x2064:	/* invisible plus (addition) */
-		case 0x2D7F:	/* tifinagh consonant joiner */
-		case 0xFEFF:	/* zero width non breaking space */
+		/* characters are invisible */
+		if (is_nonrendering(uchr))
 			*badflags |= UNICRASH_ZERO_WIDTH;
-			break;
-		}
 
 		/* control characters */
 		if (u_iscntrl(uchr))


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 07/13] xfs_scrub: store bad flags with the name entry
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  0:58   ` [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate Darrick J. Wong
@ 2024-07-02  0:59   ` Darrick J. Wong
  2024-07-02  5:23     ` Christoph Hellwig
  2024-07-02  0:59   ` [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE Darrick J. Wong
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:59 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

When scrub is checking unicode names, there are certain properties of
the directory/attribute/label name itself that it can complain about.
Store these in struct name_entry so that the confusable names detector
can pick this up later.

This restructuring enables a subsequent patch to detect suspicious
sequences in the NFC normalized form of the name without needing to hang
on to that NFC form until the end of processing.  IOWs, it's a memory
usage optimization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |  122 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 64 insertions(+), 58 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 385e42c6acc9..a770d0d7aae4 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -69,6 +69,9 @@ struct name_entry {
 
 	xfs_ino_t		ino;
 
+	/* Everything that we don't like about this name. */
+	unsigned int		badflags;
+
 	/* Raw dirent name */
 	size_t			namelen;
 	char			name[0];
@@ -274,6 +277,55 @@ name_entry_compute_checknames(
 	return false;
 }
 
+/*
+ * Check a name for suspicious elements that have appeared in filename
+ * spoofing attacks.  This includes names that mixed directions or contain
+ * direction overrides control characters, both of which have appeared in
+ * filename spoofing attacks.
+ */
+static unsigned int
+name_entry_examine(
+	const struct name_entry	*entry)
+{
+	UCharIterator		uiter;
+	UChar32			uchr;
+	uint8_t			mask = 0;
+	unsigned int		ret = 0;
+
+	uiter_setString(&uiter, entry->normstr, entry->normstrlen);
+	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
+	/* characters are invisible */
+		if (is_nonrendering(uchr))
+			ret |= UNICRASH_ZERO_WIDTH;
+
+		/* control characters */
+		if (u_iscntrl(uchr))
+			ret |= UNICRASH_CONTROL_CHAR;
+
+		switch (u_charDirection(uchr)) {
+		case U_LEFT_TO_RIGHT:
+			mask |= 0x01;
+			break;
+		case U_RIGHT_TO_LEFT:
+			mask |= 0x02;
+			break;
+		case U_RIGHT_TO_LEFT_OVERRIDE:
+			ret |= UNICRASH_BIDI_OVERRIDE;
+			break;
+		case U_LEFT_TO_RIGHT_OVERRIDE:
+			ret |= UNICRASH_BIDI_OVERRIDE;
+			break;
+		default:
+			break;
+		}
+	}
+
+	/* mixing left-to-right and right-to-left chars */
+	if (mask == 0x3)
+		ret |= UNICRASH_BIDI_MIXED;
+	return ret;
+}
+
 /* Create a new name entry, returns false if we could not succeed. */
 static bool
 name_entry_create(
@@ -299,6 +351,7 @@ name_entry_create(
 	if (!name_entry_compute_checknames(uc, new_entry))
 		goto out;
 
+	new_entry->badflags = name_entry_examine(new_entry);
 	*entry = new_entry;
 	return true;
 
@@ -360,54 +413,6 @@ name_entry_hash(
 	}
 }
 
-/*
- * Check a name for suspicious elements that have appeared in filename
- * spoofing attacks.  This includes names that mixed directions or contain
- * direction overrides control characters, both of which have appeared in
- * filename spoofing attacks.
- */
-static void
-name_entry_examine(
-	struct name_entry	*entry,
-	unsigned int		*badflags)
-{
-	UCharIterator		uiter;
-	UChar32			uchr;
-	uint8_t			mask = 0;
-
-	uiter_setString(&uiter, entry->normstr, entry->normstrlen);
-	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
-		/* characters are invisible */
-		if (is_nonrendering(uchr))
-			*badflags |= UNICRASH_ZERO_WIDTH;
-
-		/* control characters */
-		if (u_iscntrl(uchr))
-			*badflags |= UNICRASH_CONTROL_CHAR;
-
-		switch (u_charDirection(uchr)) {
-		case U_LEFT_TO_RIGHT:
-			mask |= 0x01;
-			break;
-		case U_RIGHT_TO_LEFT:
-			mask |= 0x02;
-			break;
-		case U_RIGHT_TO_LEFT_OVERRIDE:
-			*badflags |= UNICRASH_BIDI_OVERRIDE;
-			break;
-		case U_LEFT_TO_RIGHT_OVERRIDE:
-			*badflags |= UNICRASH_BIDI_OVERRIDE;
-			break;
-		default:
-			break;
-		}
-	}
-
-	/* mixing left-to-right and right-to-left chars */
-	if (mask == 0x3)
-		*badflags |= UNICRASH_BIDI_MIXED;
-}
-
 /* Initialize the collision detector. */
 static int
 unicrash_init(
@@ -638,17 +643,17 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."),
  * must be skeletonized according to Unicode TR39 to detect names that
  * could be visually confused with each other.
  */
-static void
+static unsigned int
 unicrash_add(
 	struct unicrash		*uc,
 	struct name_entry	**new_entryp,
-	unsigned int		*badflags,
 	struct name_entry	**existing_entry)
 {
 	struct name_entry	*new_entry = *new_entryp;
 	struct name_entry	*entry;
 	size_t			bucket;
 	xfs_dahash_t		hash;
+	unsigned int		badflags = new_entry->badflags;
 
 	/* Store name in hashtable. */
 	hash = name_entry_hash(new_entry);
@@ -669,28 +674,30 @@ unicrash_add(
 			uc->buckets[bucket] = new_entry->next;
 			name_entry_free(new_entry);
 			*new_entryp = NULL;
-			return;
+			return 0;
 		}
 
 		/* Same normalization? */
 		if (new_entry->normstrlen == entry->normstrlen &&
 		    !u_strcmp(new_entry->normstr, entry->normstr) &&
 		    (uc->compare_ino ? entry->ino != new_entry->ino : true)) {
-			*badflags |= UNICRASH_NOT_UNIQUE;
+			badflags |= UNICRASH_NOT_UNIQUE;
 			*existing_entry = entry;
-			return;
+			break;
 		}
 
 		/* Confusable? */
 		if (new_entry->skelstrlen == entry->skelstrlen &&
 		    !u_strcmp(new_entry->skelstr, entry->skelstr) &&
 		    (uc->compare_ino ? entry->ino != new_entry->ino : true)) {
-			*badflags |= UNICRASH_CONFUSABLE;
+			badflags |= UNICRASH_CONFUSABLE;
 			*existing_entry = entry;
-			return;
+			break;
 		}
 		entry = entry->next;
 	}
+
+	return badflags;
 }
 
 /* Check a name for unicode normalization problems or collisions. */
@@ -704,14 +711,13 @@ __unicrash_check_name(
 {
 	struct name_entry	*dup_entry = NULL;
 	struct name_entry	*new_entry = NULL;
-	unsigned int		badflags = 0;
+	unsigned int		badflags;
 
 	/* If we can't create entry data, just skip it. */
 	if (!name_entry_create(uc, name, ino, &new_entry))
 		return 0;
 
-	name_entry_examine(new_entry, &badflags);
-	unicrash_add(uc, &new_entry, &badflags, &dup_entry);
+	badflags = unicrash_add(uc, &new_entry, &dup_entry);
 	if (new_entry && badflags)
 		unicrash_complain(uc, dsc, namedescr, new_entry, badflags,
 				dup_entry);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  0:59   ` [PATCH 07/13] xfs_scrub: store bad flags with the name entry Darrick J. Wong
@ 2024-07-02  0:59   ` Darrick J. Wong
  2024-07-02  5:23     ` Christoph Hellwig
  2024-07-02  0:59   ` [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags Darrick J. Wong
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:59 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

"Zero width" doesn't fully describe what the flag represents -- it gets
set for any codepoint that doesn't render.  Rename it accordingly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index a770d0d7aae4..b2baa47ad6ca 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -109,7 +109,7 @@ struct unicrash {
 #define UNICRASH_CONTROL_CHAR	(1 << 3)
 
 /* Invisible characters.  Only a problem if we have collisions. */
-#define UNICRASH_ZERO_WIDTH	(1 << 4)
+#define UNICRASH_INVISIBLE	(1 << 4)
 
 /* Multiple names resolve to the same skeleton string. */
 #define UNICRASH_CONFUSABLE	(1 << 5)
@@ -296,7 +296,7 @@ name_entry_examine(
 	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
 	/* characters are invisible */
 		if (is_nonrendering(uchr))
-			ret |= UNICRASH_ZERO_WIDTH;
+			ret |= UNICRASH_INVISIBLE;
 
 		/* control characters */
 		if (u_iscntrl(uchr))
@@ -580,7 +580,7 @@ _("Unicode name \"%s\" in %s renders identically to \"%s\"."),
 	 * confused with another name as a result, we should complain.
 	 * "moo<zerowidthspace>cow" and "moocow" are misleading.
 	 */
-	if ((badflags & UNICRASH_ZERO_WIDTH) &&
+	if ((badflags & UNICRASH_INVISIBLE) &&
 	    (badflags & UNICRASH_CONFUSABLE)) {
 		str_warn(uc->ctx, descr_render(dsc),
 _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible characters."),


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-07-02  0:59   ` [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE Darrick J. Wong
@ 2024-07-02  0:59   ` Darrick J. Wong
  2024-07-02  5:24     ` Christoph Hellwig
  2024-07-02  0:59   ` [PATCH 10/13] xfs_scrub: reduce size of struct name_entry Darrick J. Wong
                     ` (3 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:59 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Promote this type to something that we can type-check.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index b2baa47ad6ca..25f562b0a36f 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -4,6 +4,7 @@
  * Author: Darrick J. Wong <djwong@kernel.org>
  */
 #include "xfs.h"
+#include "xfs_arch.h"
 #include <stdint.h>
 #include <stdlib.h>
 #include <dirent.h>
@@ -56,6 +57,8 @@
  * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))).
  */
 
+typedef unsigned int __bitwise	badname_t;
+
 struct name_entry {
 	struct name_entry	*next;
 
@@ -70,7 +73,7 @@ struct name_entry {
 	xfs_ino_t		ino;
 
 	/* Everything that we don't like about this name. */
-	unsigned int		badflags;
+	badname_t		badflags;
 
 	/* Raw dirent name */
 	size_t			namelen;
@@ -93,26 +96,29 @@ struct unicrash {
 
 /* Things to complain about in Unicode naming. */
 
+/* Everything is ok */
+#define UNICRASH_OK		((__force badname_t)0)
+
 /*
  * Multiple names resolve to the same normalized string and therefore render
  * identically.
  */
-#define UNICRASH_NOT_UNIQUE	(1 << 0)
+#define UNICRASH_NOT_UNIQUE	((__force badname_t)(1U << 0))
 
 /* Name contains directional overrides. */
-#define UNICRASH_BIDI_OVERRIDE	(1 << 1)
+#define UNICRASH_BIDI_OVERRIDE	((__force badname_t)(1U << 1))
 
 /* Name mixes left-to-right and right-to-left characters. */
-#define UNICRASH_BIDI_MIXED	(1 << 2)
+#define UNICRASH_BIDI_MIXED	((__force badname_t)(1U << 2))
 
 /* Control characters in name. */
-#define UNICRASH_CONTROL_CHAR	(1 << 3)
+#define UNICRASH_CONTROL_CHAR	((__force badname_t)(1U << 3))
 
 /* Invisible characters.  Only a problem if we have collisions. */
-#define UNICRASH_INVISIBLE	(1 << 4)
+#define UNICRASH_INVISIBLE	((__force badname_t)(1U << 4))
 
 /* Multiple names resolve to the same skeleton string. */
-#define UNICRASH_CONFUSABLE	(1 << 5)
+#define UNICRASH_CONFUSABLE	((__force badname_t)(1U << 5))
 
 /*
  * We only care about validating utf8 collisions if the underlying
@@ -540,7 +546,7 @@ unicrash_complain(
 	struct descr		*dsc,
 	const char		*what,
 	struct name_entry	*entry,
-	unsigned int		badflags,
+	badname_t		badflags,
 	struct name_entry	*dup_entry)
 {
 	char			*bad1 = NULL;
@@ -643,7 +649,7 @@ _("Unicode name \"%s\" in %s could be confused with \"%s\"."),
  * must be skeletonized according to Unicode TR39 to detect names that
  * could be visually confused with each other.
  */
-static unsigned int
+static badname_t
 unicrash_add(
 	struct unicrash		*uc,
 	struct name_entry	**new_entryp,
@@ -653,7 +659,7 @@ unicrash_add(
 	struct name_entry	*entry;
 	size_t			bucket;
 	xfs_dahash_t		hash;
-	unsigned int		badflags = new_entry->badflags;
+	badname_t		badflags = new_entry->badflags;
 
 	/* Store name in hashtable. */
 	hash = name_entry_hash(new_entry);
@@ -711,14 +717,14 @@ __unicrash_check_name(
 {
 	struct name_entry	*dup_entry = NULL;
 	struct name_entry	*new_entry = NULL;
-	unsigned int		badflags;
+	badname_t		badflags;
 
 	/* If we can't create entry data, just skip it. */
 	if (!name_entry_create(uc, name, ino, &new_entry))
 		return 0;
 
 	badflags = unicrash_add(uc, &new_entry, &dup_entry);
-	if (new_entry && badflags)
+	if (new_entry && badflags != UNICRASH_OK)
 		unicrash_complain(uc, dsc, namedescr, new_entry, badflags,
 				dup_entry);
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 10/13] xfs_scrub: reduce size of struct name_entry
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-07-02  0:59   ` [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags Darrick J. Wong
@ 2024-07-02  0:59   ` Darrick J. Wong
  2024-07-02  5:24     ` Christoph Hellwig
  2024-07-02  1:00   ` [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer Darrick J. Wong
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  0:59 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

libicu doesn't support processing strings longer than 2GB in length, and
we never feed the unicrash code a name longer than about 300 bytes.
Rearrange the structure to reduce the head structure size from 56 bytes
to 44 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 25f562b0a36f..dfa798b09b0e 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -57,18 +57,20 @@
  * In other words, skel = remove_invisible(nfd(remap_confusables(nfd(name)))).
  */
 
-typedef unsigned int __bitwise	badname_t;
+typedef uint16_t __bitwise	badname_t;
 
 struct name_entry {
 	struct name_entry	*next;
 
 	/* NFKC normalized name */
 	UChar			*normstr;
-	size_t			normstrlen;
 
 	/* Unicode skeletonized name */
 	UChar			*skelstr;
-	size_t			skelstrlen;
+
+	/* Lengths for normstr and skelstr */
+	int32_t			normstrlen;
+	int32_t			skelstrlen;
 
 	xfs_ino_t		ino;
 
@@ -76,7 +78,7 @@ struct name_entry {
 	badname_t		badflags;
 
 	/* Raw dirent name */
-	size_t			namelen;
+	uint16_t		namelen;
 	char			name[0];
 };
 #define NAME_ENTRY_SZ(nl)	(sizeof(struct name_entry) + 1 + \
@@ -343,6 +345,12 @@ name_entry_create(
 	struct name_entry	*new_entry;
 	size_t			namelen = strlen(name);
 
+	/* should never happen */
+	if (namelen > UINT16_MAX) {
+		ASSERT(namelen <= UINT16_MAX);
+		return false;
+	}
+
 	/* Create new entry */
 	new_entry = calloc(NAME_ENTRY_SZ(namelen), 1);
 	if (!new_entry)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-07-02  0:59   ` [PATCH 10/13] xfs_scrub: reduce size of struct name_entry Darrick J. Wong
@ 2024-07-02  1:00   ` Darrick J. Wong
  2024-07-02  5:24     ` Christoph Hellwig
  2024-07-02  1:00   ` [PATCH 12/13] xfs_scrub: report deceptive file extensions Darrick J. Wong
  2024-07-02  1:00   ` [PATCH 13/13] xfs_scrub: dump unicode points Darrick J. Wong
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:00 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

We're about to introduce a second normalizer, so change the name of the
existing one to reflect the algorithm that you'll get if you use it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index dfa798b09b0e..f6b53276c05d 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -87,7 +87,7 @@ struct name_entry {
 struct unicrash {
 	struct scrub_ctx	*ctx;
 	USpoofChecker		*spoof;
-	const UNormalizer2	*normalizer;
+	const UNormalizer2	*nfkc;
 	bool			compare_ino;
 	bool			is_only_root_writeable;
 	size_t			nr_buckets;
@@ -240,7 +240,7 @@ name_entry_compute_checknames(
 		goto out_unistr;
 
 	/* Normalize the string. */
-	normstrlen = unorm2_normalize(uc->normalizer, unistr, unistrlen, NULL,
+	normstrlen = unorm2_normalize(uc->nfkc, unistr, unistrlen, NULL,
 			0, &uerr);
 	if (uerr != U_BUFFER_OVERFLOW_ERROR || normstrlen < 0)
 		goto out_unistr;
@@ -248,7 +248,7 @@ name_entry_compute_checknames(
 	normstr = calloc(normstrlen + 1, sizeof(UChar));
 	if (!normstr)
 		goto out_unistr;
-	unorm2_normalize(uc->normalizer, unistr, unistrlen, normstr, normstrlen,
+	unorm2_normalize(uc->nfkc, unistr, unistrlen, normstr, normstrlen,
 			&uerr);
 	if (U_FAILURE(uerr))
 		goto out_normstr;
@@ -455,7 +455,7 @@ unicrash_init(
 	p->ctx = ctx;
 	p->nr_buckets = nr_buckets;
 	p->compare_ino = compare_ino;
-	p->normalizer = unorm2_getNFKCInstance(&uerr);
+	p->nfkc = unorm2_getNFKCInstance(&uerr);
 	if (U_FAILURE(uerr))
 		goto out_free;
 	p->spoof = uspoof_open(&uerr);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 12/13] xfs_scrub: report deceptive file extensions
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-07-02  1:00   ` [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer Darrick J. Wong
@ 2024-07-02  1:00   ` Darrick J. Wong
  2024-07-02  5:25     ` Christoph Hellwig
  2024-07-02  1:00   ` [PATCH 13/13] xfs_scrub: dump unicode points Darrick J. Wong
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:00 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Earlier this year, ESET revealed that Linux users had been tricked into
opening executables containing malware payloads.  The trickery came in
the form of a malicious zip file containing a filename with the string
"job offer․pdf".  Note that the filename does *not* denote a real pdf
file, since the last four codepoints in the file name are "ONE DOT
LEADER", p, d, and f.  Not period (ok, FULL STOP), p, d, f like you'd
normally expect.

Teach xfs_scrub to look for codepoints that could be confused with a
period followed by alphanumerics.

Link: https://www.welivesecurity.com/2023/04/20/linux-malware-strengthens-links-lazarus-3cx-supply-chain-attack/
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |  215 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+), 1 deletion(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index f6b53276c05d..e895afe32aab 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -88,6 +88,7 @@ struct unicrash {
 	struct scrub_ctx	*ctx;
 	USpoofChecker		*spoof;
 	const UNormalizer2	*nfkc;
+	const UNormalizer2	*nfc;
 	bool			compare_ino;
 	bool			is_only_root_writeable;
 	size_t			nr_buckets;
@@ -122,6 +123,12 @@ struct unicrash {
 /* Multiple names resolve to the same skeleton string. */
 #define UNICRASH_CONFUSABLE	((__force badname_t)(1U << 5))
 
+/* Possible phony file extension. */
+#define UNICRASH_PHONY_EXTENSION ((__force badname_t)(1U << 6))
+
+/* FULL STOP (aka period), 0x2E */
+#define UCHAR_PERIOD		((UChar32)'.')
+
 /*
  * We only care about validating utf8 collisions if the underlying
  * system configuration says we're using utf8.  If the language
@@ -209,6 +216,193 @@ static inline bool is_nonrendering(UChar32 uchr)
 	return false;
 }
 
+/*
+ * Decide if this unicode codepoint looks similar enough to a period (".")
+ * to fool users into thinking that any subsequent alphanumeric sequence is
+ * the file extension.  Most of the fullstop characters do not do this.
+ *
+ * $ grep -i 'full stop' UnicodeData.txt
+ */
+static inline bool is_fullstop_lookalike(UChar32 uchr)
+{
+	switch (uchr) {
+	case 0x0701:	/* syriac supralinear full stop */
+	case 0x0702:	/* syriac sublinear full stop */
+	case 0x2024:	/* one dot leader */
+	case 0xA4F8:	/* lisu letter tone mya ti */
+	case 0xFE52:	/* small full stop */
+	case 0xFF61:	/* haflwidth ideographic full stop */
+	case 0xFF0E:	/* fullwidth full stop */
+		return true;
+	}
+
+	return false;
+}
+
+/* How many UChar do we need to fit a full UChar32 codepoint? */
+#define UCHAR_PER_UCHAR32	2
+
+/* Format this UChar32 into a UChar buffer. */
+static inline int32_t
+uchar32_to_uchar(
+	UChar32		uchr,
+	UChar		*buf)
+{
+	int32_t		i = 0;
+	bool		err = false;
+
+	U16_APPEND(buf, i, UCHAR_PER_UCHAR32, uchr, err);
+	if (err)
+		return 0;
+	return i;
+}
+
+/* Extract a single UChar32 code point from this UChar string. */
+static inline UChar32
+uchar_to_uchar32(
+	UChar		*buf,
+	int32_t		buflen)
+{
+	UChar32		ret;
+	int32_t		i = 0;
+
+	U16_NEXT(buf, i, buflen, ret);
+	return ret;
+}
+
+/*
+ * For characters that are not themselves a full stop (0x2E), let's see if the
+ * compatibility normalization (NFKC) will turn it into a full stop.  If so,
+ * then this could be the start of a phony file extension.
+ */
+static bool
+is_period_lookalike(
+	struct unicrash	*uc,
+	UChar32		uchr)
+{
+	UChar		uchrstr[UCHAR_PER_UCHAR32];
+	UChar		nfkcstr[UCHAR_PER_UCHAR32];
+	int32_t		uchrstrlen, nfkcstrlen;
+	UChar32		nfkc_uchr;
+	UErrorCode	uerr = U_ZERO_ERROR;
+
+	if (uchr == UCHAR_PERIOD)
+		return false;
+
+	uchrstrlen = uchar32_to_uchar(uchr, uchrstr);
+	if (!uchrstrlen)
+		return false;
+
+	/*
+	 * Normalize the UChar string to NFKC form, which does all the
+	 * compatibility transformations.
+	 */
+	nfkcstrlen = unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, NULL,
+			0, &uerr);
+	if (uerr == U_BUFFER_OVERFLOW_ERROR)
+		return false;
+
+	uerr = U_ZERO_ERROR;
+	unorm2_normalize(uc->nfkc, uchrstr, uchrstrlen, nfkcstr, nfkcstrlen,
+			&uerr);
+	if (U_FAILURE(uerr))
+		return false;
+
+	nfkc_uchr = uchar_to_uchar32(nfkcstr, nfkcstrlen);
+	return nfkc_uchr == UCHAR_PERIOD;
+}
+
+/*
+ * Detect directory entry names that contain deceptive sequences that look like
+ * file extensions but are not.  This we define as a sequence that begins with
+ * a code point that renders like a period ("full stop" in unicode parlance)
+ * but is not actually a period, followed by any number of alphanumeric code
+ * points or a period, all the way to the end.
+ *
+ * The 3cx attack used a zip file containing an executable file named "job
+ * offer․pdf".  Note that the dot mark in the extension is /not/ a period but
+ * the Unicode codepoint "leader dot".  The file was also marked executable
+ * inside the zip file, which meant that naïve file explorers could inflate
+ * the file and restore the execute bit.  If a user double-clicked on the file,
+ * the binary would open a decoy pdf while infecting the system.
+ *
+ * For this check, we need to normalize with canonical (and not compatibility)
+ * decomposition, because compatibility mode will turn certain code points
+ * (e.g. one dot leader, 0x2024) into actual periods (0x2e).  The NFC
+ * composition is not needed after this, so we save some memory by keeping this
+ * a separate function from name_entry_examine.
+ */
+static badname_t
+name_entry_phony_extension(
+	struct unicrash	*uc,
+	const UChar	*unistr,
+	int32_t		unistrlen)
+{
+	UCharIterator	uiter;
+	UChar		*nfcstr;
+	int32_t		nfcstrlen;
+	UChar32		uchr;
+	bool		maybe_phony_extension = false;
+	badname_t	ret = UNICRASH_OK;
+	UErrorCode	uerr = U_ZERO_ERROR;
+
+	/* Normalize with NFC. */
+	nfcstrlen = unorm2_normalize(uc->nfc, unistr, unistrlen, NULL,
+			0, &uerr);
+	if (uerr != U_BUFFER_OVERFLOW_ERROR || nfcstrlen < 0)
+		return ret;
+	uerr = U_ZERO_ERROR;
+	nfcstr = calloc(nfcstrlen + 1, sizeof(UChar));
+	if (!nfcstr)
+		return ret;
+	unorm2_normalize(uc->nfc, unistr, unistrlen, nfcstr, nfcstrlen,
+			&uerr);
+	if (U_FAILURE(uerr))
+		goto out_nfcstr;
+
+	/* Examine the NFC normalized string... */
+	uiter_setString(&uiter, nfcstr, nfcstrlen);
+	while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) {
+		/*
+		 * If this *looks* like, but is not, a full stop (0x2E), this
+		 * could be the start of a phony file extension.
+		 */
+		if (is_period_lookalike(uc, uchr)) {
+			maybe_phony_extension = true;
+			continue;
+		}
+
+		if (is_fullstop_lookalike(uchr)) {
+			/*
+			 * The normalizer above should catch most of these
+			 * codepoints that look like periods, but record the
+			 * ones known to have been used in attacks.
+			 */
+			maybe_phony_extension = true;
+		} else if (uchr == UCHAR_PERIOD) {
+			/*
+			 * Due to the propensity of file explores to obscure
+			 * file extensions in the name of "user friendliness",
+			 * this classifier ignores periods.
+			 */
+		} else {
+			/*
+			 * File extensions (as far as the author knows) tend
+			 * only to use ascii alphanumerics.
+			 */
+			if (maybe_phony_extension &&
+			    !u_isalnum(uchr) && !is_nonrendering(uchr))
+				maybe_phony_extension = false;
+		}
+	}
+	if (maybe_phony_extension)
+		ret |= UNICRASH_PHONY_EXTENSION;
+
+out_nfcstr:
+	free(nfcstr);
+	return ret;
+}
+
 /*
  * Generate normalized form and skeleton of the name.  If this fails, just
  * forget everything and return false; this is an advisory checker.
@@ -269,6 +463,11 @@ name_entry_compute_checknames(
 
 	skelstrlen = remove_ignorable(skelstr, skelstrlen);
 
+	/* Check for deceptive file extensions in directory entry names. */
+	if (entry->ino)
+		entry->badflags |= name_entry_phony_extension(uc, unistr,
+						unistrlen);
+
 	entry->skelstr = skelstr;
 	entry->skelstrlen = skelstrlen;
 	entry->normstr = normstr;
@@ -365,7 +564,7 @@ name_entry_create(
 	if (!name_entry_compute_checknames(uc, new_entry))
 		goto out;
 
-	new_entry->badflags = name_entry_examine(new_entry);
+	new_entry->badflags |= name_entry_examine(new_entry);
 	*entry = new_entry;
 	return true;
 
@@ -456,6 +655,9 @@ unicrash_init(
 	p->nr_buckets = nr_buckets;
 	p->compare_ino = compare_ino;
 	p->nfkc = unorm2_getNFKCInstance(&uerr);
+	if (U_FAILURE(uerr))
+		goto out_free;
+	p->nfc = unorm2_getNFCInstance(&uerr);
 	if (U_FAILURE(uerr))
 		goto out_free;
 	p->spoof = uspoof_open(&uerr);
@@ -602,6 +804,17 @@ _("Unicode name \"%s\" in %s could be confused with '%s' due to invisible charac
 		goto out;
 	}
 
+	/*
+	 * Fake looking file extensions have tricked Linux users into thinking
+	 * that an executable is actually a pdf.  See Lazarus 3cx attack.
+	 */
+	if (badflags & UNICRASH_PHONY_EXTENSION) {
+		str_warn(uc->ctx, descr_render(dsc),
+_("Unicode name \"%s\" in %s contains a possibly deceptive file extension."),
+				bad1, what);
+		goto out;
+	}
+
 	/*
 	 * Unfiltered control characters can mess up your terminal and render
 	 * invisibly in filechooser UIs.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 13/13] xfs_scrub: dump unicode points
  2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-07-02  1:00   ` [PATCH 12/13] xfs_scrub: report deceptive file extensions Darrick J. Wong
@ 2024-07-02  1:00   ` Darrick J. Wong
  2024-07-02  5:25     ` Christoph Hellwig
  12 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:00 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add some debug functions to make it easier to query unicode character
properties.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/unicrash.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index e895afe32aab..119656b0b9d5 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -5,6 +5,7 @@
  */
 #include "xfs.h"
 #include "xfs_arch.h"
+#include "list.h"
 #include <stdint.h>
 #include <stdlib.h>
 #include <dirent.h>
@@ -1001,14 +1002,68 @@ unicrash_check_fs_label(
 			label, 0);
 }
 
+/* Dump a unicode code point and its properties. */
+static inline void dump_uchar32(UChar32 c)
+{
+	UChar		uchrstr[UCHAR_PER_UCHAR32];
+	const char	*descr;
+	char		buf[16];
+	int32_t		uchrstrlen, buflen;
+	UProperty	p;
+	UErrorCode	uerr = U_ZERO_ERROR;
+
+	printf("Unicode point 0x%x:", c);
+
+	/* Convert UChar32 to UTF8 representation. */
+	uchrstrlen = uchar32_to_uchar(c, uchrstr);
+	if (!uchrstrlen)
+		return;
+
+	u_strToUTF8(buf, sizeof(buf), &buflen, uchrstr, uchrstrlen, &uerr);
+	if (!U_FAILURE(uerr) && buflen > 0) {
+		int32_t	i;
+
+		printf(" \"");
+		for (i = 0; i < buflen; i++)
+			printf("\\x%02x", buf[i]);
+		printf("\"");
+	}
+	printf("\n");
+
+	for (p = 0; p < UCHAR_BINARY_LIMIT; p++) {
+		int	has;
+
+		descr = u_getPropertyName(p, U_LONG_PROPERTY_NAME);
+		if (!descr)
+			descr = u_getPropertyName(p, U_SHORT_PROPERTY_NAME);
+
+		has = u_hasBinaryProperty(c, p) ? 1 : 0;
+		if (descr) {
+			printf("  %s(%u) = %d\n", descr, p, has);
+		} else {
+			printf("  ?(%u) = %d\n", p, has);
+		}
+	}
+}
+
 /* Load libicu and initialize it. */
 bool
 unicrash_load(void)
 {
-	UErrorCode		uerr = U_ZERO_ERROR;
+	char		*dbgstr;
+	UChar32		uchr;
+	UErrorCode	uerr = U_ZERO_ERROR;
 
 	u_init(&uerr);
-	return U_FAILURE(uerr);
+	if (U_FAILURE(uerr))
+		return true;
+
+	dbgstr = getenv("XFS_SCRUB_DUMP_CHAR");
+	if (dbgstr) {
+		uchr = strtol(dbgstr, NULL, 0);
+		dump_uchar32(uchr);
+	}
+	return false;
 }
 
 /* Unload libicu once we're done with it. */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/8] xfs_scrub: move FITRIM to phase 8
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
@ 2024-07-02  1:01   ` Darrick J. Wong
  2024-07-02  5:27     ` Christoph Hellwig
  2024-07-02  1:01   ` [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim Darrick J. Wong
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:01 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Issuing discards against the filesystem should be the *last* thing that
xfs_scrub does, after everything else has been checked, repaired, and
found to be clean.  If we can't satisfy all those conditions, we have no
business telling the storage to discard itself.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/Makefile    |    1 +
 scrub/phase4.c    |   30 ++----------------------
 scrub/phase8.c    |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h |    3 ++
 4 files changed, 73 insertions(+), 27 deletions(-)
 create mode 100644 scrub/phase8.c


diff --git a/scrub/Makefile b/scrub/Makefile
index c11c2b5fe84e..8ccc67d01df3 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -68,6 +68,7 @@ phase4.c \
 phase5.c \
 phase6.c \
 phase7.c \
+phase8.c \
 progress.c \
 read_verify.c \
 repair.c \
diff --git a/scrub/phase4.c b/scrub/phase4.c
index 9080d38818f7..451101811c9b 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -227,16 +227,6 @@ repair_everything(
 	return action_list_process(ctx, ctx->fs_repair_list, XRM_FINAL_WARNING);
 }
 
-/* Trim the unused areas of the filesystem if the caller asked us to. */
-static void
-trim_filesystem(
-	struct scrub_ctx	*ctx)
-{
-	if (want_fstrim)
-		fstrim(ctx);
-	progress_add(1);
-}
-
 /* Fix everything that needs fixing. */
 int
 phase4_func(
@@ -248,7 +238,7 @@ phase4_func(
 
 	if (action_list_empty(ctx->fs_repair_list) &&
 	    action_list_empty(ctx->file_repair_list))
-		goto maybe_trim;
+		return 0;
 
 	/*
 	 * Check the resource usage counters early.  Normally we do this during
@@ -281,20 +271,7 @@ phase4_func(
 	if (ret)
 		return ret;
 
-	ret = repair_everything(ctx);
-	if (ret)
-		return ret;
-
-	/*
-	 * If errors remain on the filesystem, do not trim anything.  We don't
-	 * have any threads running, so it's ok to skip the ctx lock here.
-	 */
-	if (ctx->corruptions_found || ctx->unfixable_errors != 0)
-		return 0;
-
-maybe_trim:
-	trim_filesystem(ctx);
-	return 0;
+	return repair_everything(ctx);
 }
 
 /* Estimate how much work we're going to do. */
@@ -307,10 +284,9 @@ phase4_estimate(
 {
 	unsigned long long	need_fixing;
 
-	/* Everything on the repair list plus FSTRIM. */
+	/* Everything on the repair lis. */
 	need_fixing = action_list_length(ctx->fs_repair_list) +
 		      action_list_length(ctx->file_repair_list);
-	need_fixing++;
 
 	*items = need_fixing;
 	*nr_threads = scrub_nproc(ctx) + 1;
diff --git a/scrub/phase8.c b/scrub/phase8.c
new file mode 100644
index 000000000000..07726b5b8691
--- /dev/null
+++ b/scrub/phase8.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include <stdint.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/statvfs.h>
+#include "list.h"
+#include "libfrog/paths.h"
+#include "libfrog/workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "progress.h"
+#include "scrub.h"
+#include "repair.h"
+#include "vfs.h"
+#include "atomic.h"
+
+/* Phase 8: Trim filesystem. */
+
+/* Trim the unused areas of the filesystem if the caller asked us to. */
+static void
+trim_filesystem(
+	struct scrub_ctx	*ctx)
+{
+	fstrim(ctx);
+	progress_add(1);
+}
+
+/* Trim the filesystem, if desired. */
+int
+phase8_func(
+	struct scrub_ctx	*ctx)
+{
+	if (action_list_empty(ctx->fs_repair_list) &&
+	    action_list_empty(ctx->file_repair_list))
+		goto maybe_trim;
+
+	/*
+	 * If errors remain on the filesystem, do not trim anything.  We don't
+	 * have any threads running, so it's ok to skip the ctx lock here.
+	 */
+	if (ctx->corruptions_found || ctx->unfixable_errors != 0)
+		return 0;
+
+maybe_trim:
+	trim_filesystem(ctx);
+	return 0;
+}
+
+/* Estimate how much work we're going to do. */
+int
+phase8_estimate(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = 1;
+	*nr_threads = 1;
+	*rshift = 0;
+	return 0;
+}
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index ed86d0093db3..6272a36879e7 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -98,6 +98,7 @@ int phase4_func(struct scrub_ctx *ctx);
 int phase5_func(struct scrub_ctx *ctx);
 int phase6_func(struct scrub_ctx *ctx);
 int phase7_func(struct scrub_ctx *ctx);
+int phase8_func(struct scrub_ctx *ctx);
 
 /* Progress estimator functions */
 unsigned int scrub_estimate_ag_work(struct scrub_ctx *ctx);
@@ -112,5 +113,7 @@ int phase5_estimate(struct scrub_ctx *ctx, uint64_t *items,
 		    unsigned int *nr_threads, int *rshift);
 int phase6_estimate(struct scrub_ctx *ctx, uint64_t *items,
 		    unsigned int *nr_threads, int *rshift);
+int phase8_estimate(struct scrub_ctx *ctx, uint64_t *items,
+		    unsigned int *nr_threads, int *rshift);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
  2024-07-02  1:01   ` [PATCH 1/8] xfs_scrub: move FITRIM to phase 8 Darrick J. Wong
@ 2024-07-02  1:01   ` Darrick J. Wong
  2024-07-02  5:27     ` Christoph Hellwig
  2024-07-02  1:01   ` [PATCH 3/8] xfs_scrub: collapse trim_filesystem Darrick J. Wong
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:01 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

If the user told us to skip trimming the filesystem, don't run the phase
at all.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)


diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index e49538ca1cdd..edf58d07bde2 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -249,6 +249,7 @@ struct phase_rusage {
 /* Operations for each phase. */
 #define DATASCAN_DUMMY_FN	((void *)1)
 #define REPAIR_DUMMY_FN		((void *)2)
+#define FSTRIM_DUMMY_FN		((void *)3)
 struct phase_ops {
 	char		*descr;
 	int		(*fn)(struct scrub_ctx *ctx);
@@ -429,6 +430,11 @@ run_scrub_phases(
 			.fn = phase7_func,
 			.must_run = true,
 		},
+		{
+			.descr = _("Trim filesystem storage."),
+			.fn = FSTRIM_DUMMY_FN,
+			.estimate_work = phase8_estimate,
+		},
 		{
 			NULL
 		},
@@ -449,6 +455,8 @@ run_scrub_phases(
 		/* Turn on certain phases if user said to. */
 		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data) {
 			sp->fn = phase6_func;
+		} else if (sp->fn == FSTRIM_DUMMY_FN && want_fstrim) {
+			sp->fn = phase8_func;
 		} else if (sp->fn == REPAIR_DUMMY_FN &&
 			   ctx->mode == SCRUB_MODE_REPAIR) {
 			sp->descr = _("Repair filesystem.");
@@ -458,7 +466,8 @@ run_scrub_phases(
 
 		/* Skip certain phases unless they're turned on. */
 		if (sp->fn == REPAIR_DUMMY_FN ||
-		    sp->fn == DATASCAN_DUMMY_FN)
+		    sp->fn == DATASCAN_DUMMY_FN ||
+		    sp->fn == FSTRIM_DUMMY_FN)
 			continue;
 
 		/* Allow debug users to force a particular phase. */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/8] xfs_scrub: collapse trim_filesystem
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
  2024-07-02  1:01   ` [PATCH 1/8] xfs_scrub: move FITRIM to phase 8 Darrick J. Wong
  2024-07-02  1:01   ` [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim Darrick J. Wong
@ 2024-07-02  1:01   ` Darrick J. Wong
  2024-07-02  5:27     ` Christoph Hellwig
  2024-07-02  1:01   ` [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8 Darrick J. Wong
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:01 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Collapse this two-line helper into the main function since it's trivial.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |   12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index 07726b5b8691..e577260a93dd 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -21,15 +21,6 @@
 
 /* Phase 8: Trim filesystem. */
 
-/* Trim the unused areas of the filesystem if the caller asked us to. */
-static void
-trim_filesystem(
-	struct scrub_ctx	*ctx)
-{
-	fstrim(ctx);
-	progress_add(1);
-}
-
 /* Trim the filesystem, if desired. */
 int
 phase8_func(
@@ -47,7 +38,8 @@ phase8_func(
 		return 0;
 
 maybe_trim:
-	trim_filesystem(ctx);
+	fstrim(ctx);
+	progress_add(1);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:01   ` [PATCH 3/8] xfs_scrub: collapse trim_filesystem Darrick J. Wong
@ 2024-07-02  1:01   ` Darrick J. Wong
  2024-07-02  5:27     ` Christoph Hellwig
  2024-07-02  1:02   ` [PATCH 5/8] xfs_scrub: report FITRIM errors properly Darrick J. Wong
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:01 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

If there are latent errors on the filesystem, we aren't going to do any
work during phase 8 and it makes no sense to add that into the work
estimate for the progress bar.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |   36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index e577260a93dd..dfe62e8d97be 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -21,23 +21,35 @@
 
 /* Phase 8: Trim filesystem. */
 
-/* Trim the filesystem, if desired. */
-int
-phase8_func(
+static inline bool
+fstrim_ok(
 	struct scrub_ctx	*ctx)
 {
-	if (action_list_empty(ctx->fs_repair_list) &&
-	    action_list_empty(ctx->file_repair_list))
-		goto maybe_trim;
-
 	/*
 	 * If errors remain on the filesystem, do not trim anything.  We don't
 	 * have any threads running, so it's ok to skip the ctx lock here.
 	 */
-	if (ctx->corruptions_found || ctx->unfixable_errors != 0)
+	if (!action_list_empty(ctx->fs_repair_list))
+		return false;
+	if (!action_list_empty(ctx->file_repair_list))
+		return false;
+
+	if (ctx->corruptions_found != 0)
+		return false;
+	if (ctx->unfixable_errors != 0)
+		return false;
+
+	return true;
+}
+
+/* Trim the filesystem, if desired. */
+int
+phase8_func(
+	struct scrub_ctx	*ctx)
+{
+	if (!fstrim_ok(ctx))
 		return 0;
 
-maybe_trim:
 	fstrim(ctx);
 	progress_add(1);
 	return 0;
@@ -51,7 +63,11 @@ phase8_estimate(
 	unsigned int		*nr_threads,
 	int			*rshift)
 {
-	*items = 1;
+	*items = 0;
+
+	if (fstrim_ok(ctx))
+		*items = 1;
+
 	*nr_threads = 1;
 	*rshift = 0;
 	return 0;


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/8] xfs_scrub: report FITRIM errors properly
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:01   ` [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8 Darrick J. Wong
@ 2024-07-02  1:02   ` Darrick J. Wong
  2024-07-02  5:28     ` Christoph Hellwig
  2024-07-02  1:02   ` [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors Darrick J. Wong
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:02 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Move the error reporting for the FITRIM ioctl out of vfs.c and into
phase8.c.  This makes it so that IO errors encountered during trim are
counted as runtime errors instead of being dropped silently.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |   12 +++++++++++-
 scrub/vfs.c    |   12 +++++++-----
 scrub/vfs.h    |    2 +-
 3 files changed, 19 insertions(+), 7 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index dfe62e8d97be..288800a76cff 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -47,10 +47,20 @@ int
 phase8_func(
 	struct scrub_ctx	*ctx)
 {
+	int			error;
+
 	if (!fstrim_ok(ctx))
 		return 0;
 
-	fstrim(ctx);
+	error = fstrim(ctx);
+	if (error == EOPNOTSUPP)
+		return 0;
+
+	if (error) {
+		str_liberror(ctx, error, _("fstrim"));
+		return error;
+	}
+
 	progress_add(1);
 	return 0;
 }
diff --git a/scrub/vfs.c b/scrub/vfs.c
index 9e459d6243fa..bcfd4f42ca8b 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -296,15 +296,17 @@ struct fstrim_range {
 #endif
 
 /* Call FITRIM to trim all the unused space in a filesystem. */
-void
+int
 fstrim(
 	struct scrub_ctx	*ctx)
 {
 	struct fstrim_range	range = {0};
-	int			error;
 
 	range.len = ULLONG_MAX;
-	error = ioctl(ctx->mnt.fd, FITRIM, &range);
-	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
-		perror(_("fstrim"));
+	if (ioctl(ctx->mnt.fd, FITRIM, &range) == 0)
+		return 0;
+	if (errno == EOPNOTSUPP || errno == ENOTTY)
+		return EOPNOTSUPP;
+
+	return errno;
 }
diff --git a/scrub/vfs.h b/scrub/vfs.h
index 1ac41e5aac07..a8a4d72e290a 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -24,6 +24,6 @@ typedef int (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 int scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
-void fstrim(struct scrub_ctx *ctx);
+int fstrim(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_VFS_H_ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:02   ` [PATCH 5/8] xfs_scrub: report FITRIM errors properly Darrick J. Wong
@ 2024-07-02  1:02   ` Darrick J. Wong
  2024-07-02  5:28     ` Christoph Hellwig
  2024-07-02  1:02   ` [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance Darrick J. Wong
  2024-07-02  1:02   ` [PATCH 8/8] xfs_scrub: improve progress meter for phase 8 fstrimming Darrick J. Wong
  7 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:02 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Don't call FITRIM if there have been runtime errors -- we don't want to
touch anything after any kind of unfixable problem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index 288800a76cff..75400c968595 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -39,6 +39,9 @@ fstrim_ok(
 	if (ctx->unfixable_errors != 0)
 		return false;
 
+	if (ctx->runtime_errors != 0)
+		return false;
+
 	return true;
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  1:02   ` [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors Darrick J. Wong
@ 2024-07-02  1:02   ` Darrick J. Wong
  2024-07-02  5:30     ` Christoph Hellwig
  2024-07-03  3:52     ` [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem Darrick J. Wong
  2024-07-02  1:02   ` [PATCH 8/8] xfs_scrub: improve progress meter for phase 8 fstrimming Darrick J. Wong
  7 siblings, 2 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:02 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

On a 10TB filesystem where the free space in each AG is heavily
fragmented, I noticed some very high runtimes on a FITRIM call for the
entire filesystem.  xfs_scrub likes to report progress information on
each phase of the scrub, which means that a strace for the entire
filesystem:

ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839>

shows that scrub is uncommunicative for the entire duration.  Reducing
the size of the FITRIM requests to a single AG at a time produces lower
times for each individual call, but even this isn't quite acceptable,
because the time between progress reports are still very high:

Strace for the first 4x 1TB AGs looks like (2):
ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033>
ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323>
ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226>
ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744>

I then had the idea to limit the length parameter of each call to a
smallish amount (~11GB) so that we could report progress relatively
quickly, but much to my surprise, each FITRIM call still took ~68
seconds!

Unfortunately, the by-length fstrim implementation handles this poorly
because it walks the entire free space by length index (cntbt), which is
a very inefficient way to walk a subset of an AG.

Therefore, we created a second implementation that will walk the bnobt
and perform the trims in block number order.  This implementation avoids
the worst problems of the original code, though it lacks the desirable
attribute of freeing the biggest chunks first.

On the other hand, this second implementation will be much easier to
constrain the system call latency, and makes it much easier to report
fstrim progress to anyone who's running xfs_scrub.  Skip the first block
of each AG to ensure that we get the sub-AG algorithm.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++---------
 scrub/vfs.c    |   10 ++++++---
 scrub/vfs.h    |    2 +-
 3 files changed, 62 insertions(+), 14 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index 75400c968595..e5f5619a80b5 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -45,29 +45,73 @@ fstrim_ok(
 	return true;
 }
 
-/* Trim the filesystem, if desired. */
-int
-phase8_func(
-	struct scrub_ctx	*ctx)
+/* Trim a certain range of the filesystem. */
+static int
+fstrim_fsblocks(
+	struct scrub_ctx	*ctx,
+	uint64_t		start_fsb,
+	uint64_t		fsbcount)
 {
+	uint64_t		start = cvt_off_fsb_to_b(&ctx->mnt, start_fsb);
+	uint64_t		len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount);
 	int			error;
 
-	if (!fstrim_ok(ctx))
-		return 0;
-
-	error = fstrim(ctx);
+	error = fstrim(ctx, start, len);
 	if (error == EOPNOTSUPP)
 		return 0;
-
 	if (error) {
-		str_liberror(ctx, error, _("fstrim"));
+		char		descr[DESCR_BUFSZ];
+
+		snprintf(descr, sizeof(descr) - 1,
+				_("fstrim start 0x%llx len 0x%llx"),
+				(unsigned long long)start,
+				(unsigned long long)len);
+		str_liberror(ctx, error, descr);
 		return error;
 	}
 
+	return 0;
+}
+
+/* Trim each AG on the data device. */
+static int
+fstrim_datadev(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_fsop_geom	*geo = &ctx->mnt.fsgeom;
+	uint64_t		fsbno;
+	int			error;
+
+	for (fsbno = 0; fsbno < geo->datablocks; fsbno += geo->agblocks) {
+		uint64_t	fsbcount;
+
+		/*
+		 * Skip the first block of each AG to ensure that we get the
+		 * partial-AG discard implementation.  This means that we can
+		 * report trim progress to userspace in units smaller than
+		 * entire AGs.
+		 */
+		fsbcount = min(geo->datablocks - (fsbno + 1), geo->agblocks - 1);
+		error = fstrim_fsblocks(ctx, fsbno + 1, fsbcount);
+		if (error)
+			return error;
+	}
+
 	progress_add(1);
 	return 0;
 }
 
+/* Trim the filesystem, if desired. */
+int
+phase8_func(
+	struct scrub_ctx	*ctx)
+{
+	if (!fstrim_ok(ctx))
+		return 0;
+
+	return fstrim_datadev(ctx);
+}
+
 /* Estimate how much work we're going to do. */
 int
 phase8_estimate(
diff --git a/scrub/vfs.c b/scrub/vfs.c
index bcfd4f42ca8b..cc958ba9438e 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -298,11 +298,15 @@ struct fstrim_range {
 /* Call FITRIM to trim all the unused space in a filesystem. */
 int
 fstrim(
-	struct scrub_ctx	*ctx)
+	struct scrub_ctx	*ctx,
+	uint64_t		start,
+	uint64_t		len)
 {
-	struct fstrim_range	range = {0};
+	struct fstrim_range	range = {
+		.start		= start,
+		.len		= len,
+	};
 
-	range.len = ULLONG_MAX;
 	if (ioctl(ctx->mnt.fd, FITRIM, &range) == 0)
 		return 0;
 	if (errno == EOPNOTSUPP || errno == ENOTTY)
diff --git a/scrub/vfs.h b/scrub/vfs.h
index a8a4d72e290a..1af8d80d1de6 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -24,6 +24,6 @@ typedef int (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 int scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
-int fstrim(struct scrub_ctx *ctx);
+int fstrim(struct scrub_ctx *ctx, uint64_t start, uint64_t len);
 
 #endif /* XFS_SCRUB_VFS_H_ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 8/8] xfs_scrub: improve progress meter for phase 8 fstrimming
  2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  1:02   ` [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance Darrick J. Wong
@ 2024-07-02  1:02   ` Darrick J. Wong
  7 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:02 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, progress reporting in phase 8 is awful, because we stall at
0% until jumping to 100%.  Since we're now performing sub-AG fstrim
calls to limit the latency impacts to the rest of the system, we might
as well limit the FSTRIM scan size so that we can report status updates
to the user more regularly.  Doing so also facilitates CPU usage control
during phase 8.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |   59 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 19 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index e5f5619a80b5..ae6d965c75e1 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -45,6 +45,13 @@ fstrim_ok(
 	return true;
 }
 
+/*
+ * Limit the amount of fstrim scanning that we let the kernel do in a single
+ * call so that we can implement decent progress reporting and CPU resource
+ * control.  Pick a prime number of gigabytes for interest.
+ */
+#define FSTRIM_MAX_BYTES	(11ULL << 30)
+
 /* Trim a certain range of the filesystem. */
 static int
 fstrim_fsblocks(
@@ -56,18 +63,31 @@ fstrim_fsblocks(
 	uint64_t		len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount);
 	int			error;
 
-	error = fstrim(ctx, start, len);
-	if (error == EOPNOTSUPP)
-		return 0;
-	if (error) {
-		char		descr[DESCR_BUFSZ];
-
-		snprintf(descr, sizeof(descr) - 1,
-				_("fstrim start 0x%llx len 0x%llx"),
-				(unsigned long long)start,
-				(unsigned long long)len);
-		str_liberror(ctx, error, descr);
-		return error;
+	while (len > 0) {
+		uint64_t	run;
+
+		run = min(len, FSTRIM_MAX_BYTES);
+
+		error = fstrim(ctx, start, run);
+		if (error == EOPNOTSUPP) {
+			/* Pretend we finished all the work. */
+			progress_add(len);
+			return 0;
+		}
+		if (error) {
+			char		descr[DESCR_BUFSZ];
+
+			snprintf(descr, sizeof(descr) - 1,
+					_("fstrim start 0x%llx run 0x%llx"),
+					(unsigned long long)start,
+					(unsigned long long)run);
+			str_liberror(ctx, error, descr);
+			return error;
+		}
+
+		progress_add(run);
+		len -= run;
+		start += run;
 	}
 
 	return 0;
@@ -91,13 +111,13 @@ fstrim_datadev(
 		 * report trim progress to userspace in units smaller than
 		 * entire AGs.
 		 */
+		progress_add(geo->blocksize);
 		fsbcount = min(geo->datablocks - (fsbno + 1), geo->agblocks - 1);
 		error = fstrim_fsblocks(ctx, fsbno + 1, fsbcount);
 		if (error)
 			return error;
 	}
 
-	progress_add(1);
 	return 0;
 }
 
@@ -120,12 +140,13 @@ phase8_estimate(
 	unsigned int		*nr_threads,
 	int			*rshift)
 {
-	*items = 0;
-
-	if (fstrim_ok(ctx))
-		*items = 1;
-
+	if (fstrim_ok(ctx)) {
+		*items = cvt_off_fsb_to_b(&ctx->mnt,
+				ctx->mnt.fsgeom.datablocks);
+	} else {
+		*items = 0;
+	}
 	*nr_threads = 1;
-	*rshift = 0;
+	*rshift = 30; /* GiB */
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/7] libfrog: hoist free space histogram code
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
@ 2024-07-02  1:03   ` Darrick J. Wong
  2024-07-02  5:32     ` Christoph Hellwig
  2024-07-02  1:03   ` [PATCH 2/7] libfrog: print wider columns for free space histogram Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:03 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Combine the two free space histograms in xfs_db and xfs_spaceman into a
single implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/freesp.c         |   83 +++++--------------------------
 libfrog/Makefile    |    2 +
 libfrog/histogram.c |  135 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/histogram.h |   50 +++++++++++++++++++
 spaceman/freesp.c   |   93 +++++++++--------------------------
 5 files changed, 225 insertions(+), 138 deletions(-)
 create mode 100644 libfrog/histogram.c
 create mode 100644 libfrog/histogram.h


diff --git a/db/freesp.c b/db/freesp.c
index 883741e66fee..7a2da002a138 100644
--- a/db/freesp.c
+++ b/db/freesp.c
@@ -12,14 +12,7 @@
 #include "output.h"
 #include "init.h"
 #include "malloc.h"
-
-typedef struct histent
-{
-	int		low;
-	int		high;
-	long long	count;
-	long long	blocks;
-} histent_t;
+#include "libfrog/histogram.h"
 
 static void	addhistent(int h);
 static void	addtohist(xfs_agnumber_t agno, xfs_agblock_t agbno,
@@ -46,13 +39,10 @@ static int		alignment;
 static int		countflag;
 static int		dumpflag;
 static int		equalsize;
-static histent_t	*hist;
-static int		histcount;
+static struct histogram	freesp_hist;
 static int		multsize;
 static int		seen1;
 static int		summaryflag;
-static long long	totblocks;
-static long long	totexts;
 
 static const cmdinfo_t	freesp_cmd =
 	{ "freesp", NULL, freesp_f, 0, -1, 0,
@@ -93,18 +83,13 @@ freesp_f(
 		if (inaglist(agno))
 			scan_ag(agno);
 	}
-	if (histcount)
+	if (hist_buckets(&freesp_hist))
 		printhist();
-	if (summaryflag) {
-		dbprintf(_("total free extents %lld\n"), totexts);
-		dbprintf(_("total free blocks %lld\n"), totblocks);
-		dbprintf(_("average free extent size %g\n"),
-			(double)totblocks / (double)totexts);
-	}
+	if (summaryflag)
+		hist_summarize(&freesp_hist);
 	if (aglist)
 		xfree(aglist);
-	if (hist)
-		xfree(hist);
+	hist_free(&freesp_hist);
 	return 0;
 }
 
@@ -132,10 +117,9 @@ init(
 	int		speced = 0;
 
 	agcount = countflag = dumpflag = equalsize = multsize = optind = 0;
-	histcount = seen1 = summaryflag = 0;
-	totblocks = totexts = 0;
+	seen1 = summaryflag = 0;
 	aglist = NULL;
-	hist = NULL;
+
 	while ((c = getopt(argc, argv, "A:a:bcde:h:m:s")) != EOF) {
 		switch (c) {
 		case 'A':
@@ -163,7 +147,7 @@ init(
 			speced = 1;
 			break;
 		case 'h':
-			if (speced && !histcount)
+			if (speced && hist_buckets(&freesp_hist) == 0)
 				return usage();
 			addhistent(atoi(optarg));
 			speced = 1;
@@ -339,14 +323,7 @@ static void
 addhistent(
 	int	h)
 {
-	hist = xrealloc(hist, (histcount + 1) * sizeof(*hist));
-	if (h == 0)
-		h = 1;
-	hist[histcount].low = h;
-	hist[histcount].count = hist[histcount].blocks = 0;
-	histcount++;
-	if (h == 1)
-		seen1 = 1;
+	hist_add_bucket(&freesp_hist, h);
 }
 
 static void
@@ -355,30 +332,12 @@ addtohist(
 	xfs_agblock_t	agbno,
 	xfs_extlen_t	len)
 {
-	int		i;
-
 	if (alignment && (XFS_AGB_TO_FSB(mp,agno,agbno) % alignment))
 		return;
 
 	if (dumpflag)
 		dbprintf("%8d %8d %8d\n", agno, agbno, len);
-	totexts++;
-	totblocks += len;
-	for (i = 0; i < histcount; i++) {
-		if (hist[i].high >= len) {
-			hist[i].count++;
-			hist[i].blocks += len;
-			break;
-		}
-	}
-}
-
-static int
-hcmp(
-	const void	*a,
-	const void	*b)
-{
-	return ((histent_t *)a)->low - ((histent_t *)b)->low;
+	hist_add(&freesp_hist, len);
 }
 
 static void
@@ -387,6 +346,7 @@ histinit(
 {
 	int	i;
 
+	hist_init(&freesp_hist);
 	if (equalsize) {
 		for (i = 1; i < maxlen; i += equalsize)
 			addhistent(i);
@@ -396,27 +356,12 @@ histinit(
 	} else {
 		if (!seen1)
 			addhistent(1);
-		qsort(hist, histcount, sizeof(*hist), hcmp);
-	}
-	for (i = 0; i < histcount; i++) {
-		if (i < histcount - 1)
-			hist[i].high = hist[i + 1].low - 1;
-		else
-			hist[i].high = maxlen;
 	}
+	hist_prepare(&freesp_hist, maxlen);
 }
 
 static void
 printhist(void)
 {
-	int	i;
-
-	dbprintf("%7s %7s %7s %7s %6s\n",
-		_("from"), _("to"), _("extents"), _("blocks"), _("pct"));
-	for (i = 0; i < histcount; i++) {
-		if (hist[i].count)
-			dbprintf("%7d %7d %7lld %7lld %6.2f\n", hist[i].low,
-				hist[i].high, hist[i].count, hist[i].blocks,
-				hist[i].blocks * 100.0 / totblocks);
-	}
+	hist_print(&freesp_hist);
 }
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 53e3c3492377..acfa228bc8ec 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -20,6 +20,7 @@ convert.c \
 crc32.c \
 file_exchange.c \
 fsgeom.c \
+histogram.c \
 list_sort.c \
 linux.c \
 logging.c \
@@ -45,6 +46,7 @@ dahashselftest.h \
 div64.h \
 file_exchange.h \
 fsgeom.h \
+histogram.h \
 logging.h \
 paths.h \
 projects.h \
diff --git a/libfrog/histogram.c b/libfrog/histogram.c
new file mode 100644
index 000000000000..553ba3d7c6e8
--- /dev/null
+++ b/libfrog/histogram.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2012 Red Hat, Inc.
+ * Copyright (c) 2017-2024 Oracle.
+ * All Rights Reserved.
+ */
+#include "xfs.h"
+#include <stdlib.h>
+#include <string.h>
+#include "platform_defs.h"
+#include "libfrog/histogram.h"
+
+/* Create a new bucket with the given low value. */
+int
+hist_add_bucket(
+	struct histogram	*hs,
+	long long		bucket_low)
+{
+	struct histent		*buckets;
+
+	if (hs->nr_buckets == INT_MAX)
+		return EFBIG;
+
+	buckets = realloc(hs->buckets,
+			(hs->nr_buckets + 1) * sizeof(struct histent));
+	if (!buckets)
+		return errno;
+
+	hs->buckets = buckets;
+	hs->buckets[hs->nr_buckets].low = bucket_low;
+	hs->buckets[hs->nr_buckets].count = buckets[hs->nr_buckets].blocks = 0;
+	hs->nr_buckets++;
+	return 0;
+}
+
+/* Add an observation to the histogram. */
+void
+hist_add(
+	struct histogram	*hs,
+	long long		len)
+{
+	unsigned int		i;
+
+	hs->totexts++;
+	hs->totblocks += len;
+	for (i = 0; i < hs->nr_buckets; i++) {
+		if (hs->buckets[i].high >= len) {
+			hs->buckets[i].count++;
+			hs->buckets[i].blocks += len;
+			break;
+		}
+	}
+}
+
+static int
+histent_cmp(
+	const void		*a,
+	const void		*b)
+{
+	const struct histent	*ha = a;
+	const struct histent	*hb = b;
+
+	if (ha->low < hb->low)
+		return -1;
+	if (ha->low > hb->low)
+		return 1;
+	return 0;
+}
+
+/* Prepare a histogram for bucket configuration. */
+void
+hist_init(
+	struct histogram	*hs)
+{
+	memset(hs, 0, sizeof(struct histogram));
+}
+
+/* Prepare a histogram to receive data observations. */
+void
+hist_prepare(
+	struct histogram	*hs,
+	long long		maxlen)
+{
+	unsigned int		i;
+
+	qsort(hs->buckets, hs->nr_buckets, sizeof(struct histent), histent_cmp);
+
+	for (i = 0; i < hs->nr_buckets; i++) {
+		if (i < hs->nr_buckets - 1)
+			hs->buckets[i].high = hs->buckets[i + 1].low - 1;
+		else
+			hs->buckets[i].high = maxlen;
+	}
+}
+
+/* Free all data associated with a histogram. */
+void
+hist_free(
+	struct histogram	*hs)
+{
+	free(hs->buckets);
+	memset(hs, 0, sizeof(struct histogram));
+}
+
+/* Dump a histogram to stdout. */
+void
+hist_print(
+	const struct histogram	*hs)
+{
+	unsigned int		i;
+
+	printf("%7s %7s %7s %7s %6s\n",
+		_("from"), _("to"), _("extents"), _("blocks"), _("pct"));
+	for (i = 0; i < hs->nr_buckets; i++) {
+		if (hs->buckets[i].count == 0)
+			continue;
+
+		printf("%7lld %7lld %7lld %7lld %6.2f\n",
+				hs->buckets[i].low, hs->buckets[i].high,
+				hs->buckets[i].count, hs->buckets[i].blocks,
+				hs->buckets[i].blocks * 100.0 / hs->totblocks);
+	}
+}
+
+/* Summarize the contents of the histogram. */
+void
+hist_summarize(
+	const struct histogram	*hs)
+{
+	printf(_("total free extents %lld\n"), hs->totexts);
+	printf(_("total free blocks %lld\n"), hs->totblocks);
+	printf(_("average free extent size %g\n"),
+			(double)hs->totblocks / (double)hs->totexts);
+}
diff --git a/libfrog/histogram.h b/libfrog/histogram.h
new file mode 100644
index 000000000000..2e2b169a79b0
--- /dev/null
+++ b/libfrog/histogram.h
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2012 Red Hat, Inc.
+ * Copyright (c) 2017-2024 Oracle.
+ * All Rights Reserved.
+ */
+#ifndef __LIBFROG_HISTOGRAM_H__
+#define __LIBFROG_HISTOGRAM_H__
+
+struct histent
+{
+	/* Low and high size of this bucket */
+	long long	low;
+	long long	high;
+
+	/* Count of observations recorded */
+	long long	count;
+
+	/* Sum of blocks recorded */
+	long long	blocks;
+};
+
+struct histogram {
+	/* Sum of all blocks recorded */
+	long long	totblocks;
+
+	/* Count of all observations recorded */
+	long long	totexts;
+
+	struct histent	*buckets;
+
+	/* Number of buckets */
+	unsigned int	nr_buckets;
+};
+
+int hist_add_bucket(struct histogram *hs, long long bucket_low);
+void hist_add(struct histogram *hs, long long len);
+void hist_init(struct histogram *hs);
+void hist_prepare(struct histogram *hs, long long maxlen);
+void hist_free(struct histogram *hs);
+void hist_print(const struct histogram *hs);
+void hist_summarize(const struct histogram *hs);
+
+static inline unsigned int hist_buckets(const struct histogram *hs)
+{
+	return hs->nr_buckets;
+}
+
+#endif /* __LIBFROG_HISTOGRAM_H__ */
diff --git a/spaceman/freesp.c b/spaceman/freesp.c
index f5177cb4ee5d..97c92b131a89 100644
--- a/spaceman/freesp.c
+++ b/spaceman/freesp.c
@@ -15,76 +15,52 @@
 #include "libfrog/paths.h"
 #include "space.h"
 #include "input.h"
-
-struct histent
-{
-	long long	low;
-	long long	high;
-	long long	count;
-	long long	blocks;
-};
+#include "libfrog/histogram.h"
 
 static int		agcount;
 static xfs_agnumber_t	*aglist;
-static struct histent	*hist;
+static struct histogram	freesp_hist;
 static int		dumpflag;
 static long long	equalsize;
 static long long	multsize;
-static int		histcount;
 static int		seen1;
 static int		summaryflag;
 static int		gflag;
 static bool		rtflag;
-static long long	totblocks;
-static long long	totexts;
 
 static cmdinfo_t freesp_cmd;
 
-static void
+static inline void
 addhistent(
 	long long	h)
 {
-	if (histcount == INT_MAX) {
+	int		error;
+
+	error = hist_add_bucket(&freesp_hist, h);
+	if (error == EFBIG) {
 		printf(_("Too many histogram buckets.\n"));
 		return;
 	}
-	hist = realloc(hist, (histcount + 1) * sizeof(*hist));
+	if (error) {
+		printf("%s\n", strerror(error));
+		return;
+	}
+
 	if (h == 0)
 		h = 1;
-	hist[histcount].low = h;
-	hist[histcount].count = hist[histcount].blocks = 0;
-	histcount++;
 	if (h == 1)
 		seen1 = 1;
 }
 
-static void
+static inline void
 addtohist(
 	xfs_agnumber_t	agno,
 	xfs_agblock_t	agbno,
 	off_t		len)
 {
-	long		i;
-
 	if (dumpflag)
 		printf("%8d %8d %8"PRId64"\n", agno, agbno, len);
-	totexts++;
-	totblocks += len;
-	for (i = 0; i < histcount; i++) {
-		if (hist[i].high >= len) {
-			hist[i].count++;
-			hist[i].blocks += len;
-			break;
-		}
-	}
-}
-
-static int
-hcmp(
-	const void	*a,
-	const void	*b)
-{
-	return ((struct histent *)a)->low - ((struct histent *)b)->low;
+	hist_add(&freesp_hist, len);
 }
 
 static void
@@ -93,6 +69,7 @@ histinit(
 {
 	long long	i;
 
+	hist_init(&freesp_hist);
 	if (equalsize) {
 		for (i = 1; i < maxlen; i += equalsize)
 			addhistent(i);
@@ -102,29 +79,14 @@ histinit(
 	} else {
 		if (!seen1)
 			addhistent(1);
-		qsort(hist, histcount, sizeof(*hist), hcmp);
-	}
-	for (i = 0; i < histcount; i++) {
-		if (i < histcount - 1)
-			hist[i].high = hist[i + 1].low - 1;
-		else
-			hist[i].high = maxlen;
 	}
+	hist_prepare(&freesp_hist, maxlen);
 }
 
-static void
+static inline void
 printhist(void)
 {
-	int	i;
-
-	printf("%7s %7s %7s %7s %6s\n",
-		_("from"), _("to"), _("extents"), _("blocks"), _("pct"));
-	for (i = 0; i < histcount; i++) {
-		if (hist[i].count)
-			printf("%7lld %7lld %7lld %7lld %6.2f\n", hist[i].low,
-				hist[i].high, hist[i].count, hist[i].blocks,
-				hist[i].blocks * 100.0 / totblocks);
-	}
+	hist_print(&freesp_hist);
 }
 
 static int
@@ -255,10 +217,8 @@ init(
 	int			speced = 0;	/* only one of -b -e -h or -m */
 
 	agcount = dumpflag = equalsize = multsize = optind = gflag = 0;
-	histcount = seen1 = summaryflag = 0;
-	totblocks = totexts = 0;
+	seen1 = summaryflag = 0;
 	aglist = NULL;
-	hist = NULL;
 	rtflag = false;
 
 	while ((c = getopt(argc, argv, "a:bde:gh:m:rs")) != EOF) {
@@ -287,7 +247,7 @@ init(
 			gflag++;
 			break;
 		case 'h':
-			if (speced && !histcount)
+			if (speced && hist_buckets(&freesp_hist) == 0)
 				goto many_spec;
 			/* addhistent increments histcount */
 			x = cvt_s64(optarg, 0);
@@ -345,18 +305,13 @@ freesp_f(
 		if (inaglist(agno))
 			scan_ag(agno);
 	}
-	if (histcount && !gflag)
+	if (hist_buckets(&freesp_hist) > 0 && !gflag)
 		printhist();
-	if (summaryflag) {
-		printf(_("total free extents %lld\n"), totexts);
-		printf(_("total free blocks %lld\n"), totblocks);
-		printf(_("average free extent size %g\n"),
-			(double)totblocks / (double)totexts);
-	}
+	if (summaryflag)
+		hist_summarize(&freesp_hist);
 	if (aglist)
 		free(aglist);
-	if (hist)
-		free(hist);
+	hist_free(&freesp_hist);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/7] libfrog: print wider columns for free space histogram
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
  2024-07-02  1:03   ` [PATCH 1/7] libfrog: hoist free space histogram code Darrick J. Wong
@ 2024-07-02  1:03   ` Darrick J. Wong
  2024-07-02  5:33     ` Christoph Hellwig
  2024-07-02  1:03   ` [PATCH 3/7] libfrog: print cdf of free space buckets Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:03 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

The values reported here can reach very large values, so compute the
column width dynamically.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/histogram.c |   34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)


diff --git a/libfrog/histogram.c b/libfrog/histogram.c
index 553ba3d7c6e8..5053d5eafc26 100644
--- a/libfrog/histogram.c
+++ b/libfrog/histogram.c
@@ -108,17 +108,41 @@ void
 hist_print(
 	const struct histogram	*hs)
 {
+	unsigned int		from_w, to_w, extents_w, blocks_w;
 	unsigned int		i;
 
-	printf("%7s %7s %7s %7s %6s\n",
-		_("from"), _("to"), _("extents"), _("blocks"), _("pct"));
+	from_w = to_w = extents_w = blocks_w = 7;
+	for (i = 0; i < hs->nr_buckets; i++) {
+		char buf[256];
+
+		if (!hs->buckets[i].count)
+			continue;
+
+		snprintf(buf, sizeof(buf) - 1, "%lld", hs->buckets[i].low);
+		from_w = max(from_w, strlen(buf));
+
+		snprintf(buf, sizeof(buf) - 1, "%lld", hs->buckets[i].high);
+		to_w = max(to_w, strlen(buf));
+
+		snprintf(buf, sizeof(buf) - 1, "%lld", hs->buckets[i].count);
+		extents_w = max(extents_w, strlen(buf));
+
+		snprintf(buf, sizeof(buf) - 1, "%lld", hs->buckets[i].blocks);
+		blocks_w = max(blocks_w, strlen(buf));
+	}
+
+	printf("%*s %*s %*s %*s %6s\n",
+		from_w, _("from"), to_w, _("to"), extents_w, _("extents"),
+		blocks_w, _("blocks"), _("pct"));
 	for (i = 0; i < hs->nr_buckets; i++) {
 		if (hs->buckets[i].count == 0)
 			continue;
 
-		printf("%7lld %7lld %7lld %7lld %6.2f\n",
-				hs->buckets[i].low, hs->buckets[i].high,
-				hs->buckets[i].count, hs->buckets[i].blocks,
+		printf("%*lld %*lld %*lld %*lld %6.2f\n",
+				from_w, hs->buckets[i].low,
+				to_w, hs->buckets[i].high,
+				extents_w, hs->buckets[i].count,
+				blocks_w, hs->buckets[i].blocks,
 				hs->buckets[i].blocks * 100.0 / hs->totblocks);
 	}
 }


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/7] libfrog: print cdf of free space buckets
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
  2024-07-02  1:03   ` [PATCH 1/7] libfrog: hoist free space histogram code Darrick J. Wong
  2024-07-02  1:03   ` [PATCH 2/7] libfrog: print wider columns for free space histogram Darrick J. Wong
@ 2024-07-02  1:03   ` Darrick J. Wong
  2024-07-02  5:33     ` Christoph Hellwig
  2024-07-02  1:03   ` [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:03 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Print the cumulative distribution function of the free space buckets in
reverse order.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/histogram.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 4 deletions(-)


diff --git a/libfrog/histogram.c b/libfrog/histogram.c
index 5053d5eafc26..bed79e35b021 100644
--- a/libfrog/histogram.c
+++ b/libfrog/histogram.c
@@ -103,13 +103,64 @@ hist_free(
 	memset(hs, 0, sizeof(struct histogram));
 }
 
+/*
+ * Compute the CDF of the free space in decreasing order of extent length.
+ * This enables users to determine how much free space is not in the long tail
+ * of small extents, e.g. 98% of the free space extents are larger than 31
+ * blocks.
+ */
+static int
+hist_cdf(
+	const struct histogram	*hs,
+	struct histogram	*cdf)
+{
+	struct histent		*buckets;
+	int			i = hs->nr_buckets - 1;
+
+	ASSERT(cdf->nr_buckets == 0);
+	ASSERT(hs->nr_buckets < INT_MAX);
+
+	if (hs->nr_buckets == 0)
+		return 0;
+
+	buckets = calloc(hs->nr_buckets, sizeof(struct histent));
+	if (!buckets)
+		return errno;
+
+	memset(cdf, 0, sizeof(struct histogram));
+	cdf->buckets = buckets;
+
+	cdf->buckets[i].count = hs->buckets[i].count;
+	cdf->buckets[i].blocks = hs->buckets[i].blocks;
+	i--;
+
+	while (i >= 0) {
+		cdf->buckets[i].count = hs->buckets[i].count +
+				       cdf->buckets[i + 1].count;
+
+		cdf->buckets[i].blocks = hs->buckets[i].blocks +
+					cdf->buckets[i + 1].blocks;
+		i--;
+	}
+
+	return 0;
+}
+
 /* Dump a histogram to stdout. */
 void
 hist_print(
 	const struct histogram	*hs)
 {
+	struct histogram	cdf = { };
 	unsigned int		from_w, to_w, extents_w, blocks_w;
 	unsigned int		i;
+	int			error;
+
+	error = hist_cdf(hs, &cdf);
+	if (error) {
+		printf(_("histogram cdf: %s\n"), strerror(error));
+		return;
+	}
 
 	from_w = to_w = extents_w = blocks_w = 7;
 	for (i = 0; i < hs->nr_buckets; i++) {
@@ -131,20 +182,24 @@ hist_print(
 		blocks_w = max(blocks_w, strlen(buf));
 	}
 
-	printf("%*s %*s %*s %*s %6s\n",
+	printf("%*s %*s %*s %*s %6s %6s %6s\n",
 		from_w, _("from"), to_w, _("to"), extents_w, _("extents"),
-		blocks_w, _("blocks"), _("pct"));
+		blocks_w, _("blocks"), _("pct"), _("blkcdf"), _("extcdf"));
 	for (i = 0; i < hs->nr_buckets; i++) {
 		if (hs->buckets[i].count == 0)
 			continue;
 
-		printf("%*lld %*lld %*lld %*lld %6.2f\n",
+		printf("%*lld %*lld %*lld %*lld %6.2f %6.2f %6.2f\n",
 				from_w, hs->buckets[i].low,
 				to_w, hs->buckets[i].high,
 				extents_w, hs->buckets[i].count,
 				blocks_w, hs->buckets[i].blocks,
-				hs->buckets[i].blocks * 100.0 / hs->totblocks);
+				hs->buckets[i].blocks * 100.0 / hs->totblocks,
+				cdf.buckets[i].blocks * 100.0 / hs->totblocks,
+				cdf.buckets[i].count * 100.0 / hs->totexts);
 	}
+
+	hist_free(&cdf);
 }
 
 /* Summarize the contents of the histogram. */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:03   ` [PATCH 3/7] libfrog: print cdf of free space buckets Darrick J. Wong
@ 2024-07-02  1:03   ` Darrick J. Wong
  2024-07-02  5:33     ` Christoph Hellwig
  2024-07-02  1:04   ` [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:03 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

When we're tearing down the progress bar file stream, check that it's
not an alias of stdout before closing it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index edf58d07bde2..adf9d13e5090 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -878,7 +878,7 @@ main(
 	if (ctx.runtime_errors)
 		ret |= SCRUB_RET_OPERROR;
 	phase_end(&all_pi, 0);
-	if (progress_fp)
+	if (progress_fp && fileno(progress_fp) != 1)
 		fclose(progress_fp);
 out_unicrash:
 	unicrash_unload();


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:03   ` [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar Darrick J. Wong
@ 2024-07-02  1:04   ` Darrick J. Wong
  2024-07-02  5:33     ` Christoph Hellwig
  2024-07-02  1:04   ` [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7 Darrick J. Wong
  2024-07-02  1:04   ` [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms Darrick J. Wong
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:04 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Remove unused parameters from the full-device spacemap scan functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/spacemap.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)


diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index 9cefe074c6f6..e35756db2eed 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -132,7 +132,6 @@ scan_ag_rmaps(
 static void
 scan_dev_rmaps(
 	struct scrub_ctx	*ctx,
-	int			idx,
 	dev_t			dev,
 	struct scan_blocks	*sbx)
 {
@@ -170,7 +169,7 @@ scan_rt_rmaps(
 {
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 
-	scan_dev_rmaps(ctx, agno, ctx->fsinfo.fs_rtdev, arg);
+	scan_dev_rmaps(ctx, ctx->fsinfo.fs_rtdev, arg);
 }
 
 /* Iterate all the reverse mappings of the log device. */
@@ -182,7 +181,7 @@ scan_log_rmaps(
 {
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 
-	scan_dev_rmaps(ctx, agno, ctx->fsinfo.fs_logdev, arg);
+	scan_dev_rmaps(ctx, ctx->fsinfo.fs_logdev, arg);
 }
 
 /*
@@ -210,8 +209,7 @@ scrub_scan_all_spacemaps(
 		return ret;
 	}
 	if (ctx->fsinfo.fs_rt) {
-		ret = -workqueue_add(&wq, scan_rt_rmaps,
-				ctx->mnt.fsgeom.agcount + 1, &sbx);
+		ret = -workqueue_add(&wq, scan_rt_rmaps, 0, &sbx);
 		if (ret) {
 			sbx.aborted = true;
 			str_liberror(ctx, ret, _("queueing rtdev fsmap work"));
@@ -219,8 +217,7 @@ scrub_scan_all_spacemaps(
 		}
 	}
 	if (ctx->fsinfo.fs_log) {
-		ret = -workqueue_add(&wq, scan_log_rmaps,
-				ctx->mnt.fsgeom.agcount + 2, &sbx);
+		ret = -workqueue_add(&wq, scan_log_rmaps, 0, &sbx);
 		if (ret) {
 			sbx.aborted = true;
 			str_liberror(ctx, ret, _("queueing logdev fsmap work"));


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:04   ` [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments Darrick J. Wong
@ 2024-07-02  1:04   ` Darrick J. Wong
  2024-07-02  5:34     ` Christoph Hellwig
  2024-07-02  1:04   ` [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms Darrick J. Wong
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:04 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Collect a histogram of free space observed during phase 7.  We'll put
this information to use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/histogram.c |   38 ++++++++++++++++++++++++++++++++++++++
 libfrog/histogram.h |    3 +++
 scrub/phase7.c      |   47 +++++++++++++++++++++++++++++++++++++++++++++--
 scrub/xfs_scrub.c   |    5 +++++
 scrub/xfs_scrub.h   |    4 ++++
 5 files changed, 95 insertions(+), 2 deletions(-)


diff --git a/libfrog/histogram.c b/libfrog/histogram.c
index bed79e35b021..54e2bac0f731 100644
--- a/libfrog/histogram.c
+++ b/libfrog/histogram.c
@@ -212,3 +212,41 @@ hist_summarize(
 	printf(_("average free extent size %g\n"),
 			(double)hs->totblocks / (double)hs->totexts);
 }
+
+/* Copy the contents of src to dest. */
+void
+hist_import(
+	struct histogram	*dest,
+	const struct histogram	*src)
+{
+	unsigned int		i;
+
+	ASSERT(dest->nr_buckets == src->nr_buckets);
+
+	dest->totblocks += src->totblocks;
+	dest->totexts += src->totexts;
+
+	for (i = 0; i < dest->nr_buckets; i++) {
+		ASSERT(dest->buckets[i].low == src->buckets[i].low);
+		ASSERT(dest->buckets[i].high == src->buckets[i].high);
+
+		dest->buckets[i].count += src->buckets[i].count;
+		dest->buckets[i].blocks += src->buckets[i].blocks;
+	}
+}
+
+/*
+ * Move the contents of src to dest and reinitialize src.  dst must not
+ * contain any observations or buckets.
+ */
+void
+hist_move(
+	struct histogram	*dest,
+	struct histogram	*src)
+{
+	ASSERT(dest->nr_buckets == 0);
+	ASSERT(dest->totexts == 0);
+
+	memcpy(dest, src, sizeof(struct histogram));
+	hist_init(src);
+}
diff --git a/libfrog/histogram.h b/libfrog/histogram.h
index 2e2b169a79b0..ec788344d4c2 100644
--- a/libfrog/histogram.h
+++ b/libfrog/histogram.h
@@ -47,4 +47,7 @@ static inline unsigned int hist_buckets(const struct histogram *hs)
 	return hs->nr_buckets;
 }
 
+void hist_import(struct histogram *dest, const struct histogram *src);
+void hist_move(struct histogram *dest, struct histogram *src);
+
 #endif /* __LIBFROG_HISTOGRAM_H__ */
diff --git a/scrub/phase7.c b/scrub/phase7.c
index cce5ede00122..475d8f157eec 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -12,6 +12,7 @@
 #include "libfrog/ptvar.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/scrub.h"
+#include "libfrog/histogram.h"
 #include "list.h"
 #include "xfs_scrub.h"
 #include "common.h"
@@ -27,8 +28,36 @@ struct summary_counts {
 	unsigned long long	rbytes;		/* rt dev bytes */
 	unsigned long long	next_phys;	/* next phys bytes we see? */
 	unsigned long long	agbytes;	/* freespace bytes */
+
+	/* Free space histogram, in fsb */
+	struct histogram	datadev_hist;
 };
 
+/*
+ * Initialize a free space histogram.  Unsharded realtime volumes can be up to
+ * 2^52 blocks long, so we allocate enough buckets to handle that.
+ */
+static inline void
+init_freesp_hist(
+	struct histogram	*hs)
+{
+	unsigned int		i;
+
+	hist_init(hs);
+	for (i = 0; i < 53; i++)
+		hist_add_bucket(hs, 1ULL << i);
+	hist_prepare(hs, 1ULL << 53);
+}
+
+static void
+summary_count_init(
+	void			*data)
+{
+	struct summary_counts	*counts = data;
+
+	init_freesp_hist(&counts->datadev_hist);
+}
+
 /* Record block usage. */
 static int
 count_block_summary(
@@ -48,8 +77,14 @@ count_block_summary(
 	if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
 		return 0;
 	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
-	    fsmap->fmr_owner == XFS_FMR_OWN_FREE)
+	    fsmap->fmr_owner == XFS_FMR_OWN_FREE) {
+		uint64_t	blocks;
+
+		blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
+		if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
+			hist_add(&counts->datadev_hist, blocks);
 		return 0;
+	}
 
 	len = fsmap->fmr_length;
 
@@ -87,6 +122,9 @@ add_summaries(
 	total->dbytes += item->dbytes;
 	total->rbytes += item->rbytes;
 	total->agbytes += item->agbytes;
+
+	hist_import(&total->datadev_hist, &item->datadev_hist);
+	hist_free(&item->datadev_hist);
 	return 0;
 }
 
@@ -118,6 +156,8 @@ phase7_func(
 	int			ip;
 	int			error;
 
+	summary_count_init(&totalcount);
+
 	/* Check and fix the summary metadata. */
 	scrub_item_init_fs(&sri);
 	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_SUMMARY);
@@ -136,7 +176,7 @@ phase7_func(
 	}
 
 	error = -ptvar_alloc(scrub_nproc(ctx), sizeof(struct summary_counts),
-			NULL, &ptvar);
+			summary_count_init, &ptvar);
 	if (error) {
 		str_liberror(ctx, error, _("setting up block counter"));
 		return error;
@@ -153,6 +193,9 @@ phase7_func(
 	}
 	ptvar_free(ptvar);
 
+	/* Preserve free space histograms for phase 8. */
+	hist_move(&ctx->datadev_hist, &totalcount.datadev_hist);
+
 	/* Scan the whole fs. */
 	error = scrub_count_all_inodes(ctx, &counted_inodes);
 	if (error) {
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index adf9d13e5090..2894f6148e10 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -18,6 +18,7 @@
 #include "descr.h"
 #include "unicrash.h"
 #include "progress.h"
+#include "libfrog/histogram.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -669,6 +670,8 @@ main(
 	int			ret = SCRUB_RET_SUCCESS;
 	int			error;
 
+	hist_init(&ctx.datadev_hist);
+
 	fprintf(stdout, "EXPERIMENTAL xfs_scrub program in use! Use at your own risk!\n");
 	fflush(stdout);
 
@@ -883,6 +886,8 @@ main(
 out_unicrash:
 	unicrash_unload();
 
+	hist_free(&ctx.datadev_hist);
+
 	/*
 	 * If we're being run as a service, the return code must fit the LSB
 	 * init script action error guidelines, which is to say that we
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 6272a36879e7..1a28f0cc847e 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -7,6 +7,7 @@
 #define XFS_SCRUB_XFS_SCRUB_H_
 
 #include "libfrog/fsgeom.h"
+#include "libfrog/histogram.h"
 
 extern char *progname;
 
@@ -86,6 +87,9 @@ struct scrub_ctx {
 	unsigned long long	preens;
 	bool			scrub_setup_succeeded;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
+
+	/* Free space histograms, in fsb */
+	struct histogram	datadev_hist;
 };
 
 /* Phase helper functions */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  1:04   ` [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7 Darrick J. Wong
@ 2024-07-02  1:04   ` Darrick J. Wong
  2024-07-02  5:36     ` Christoph Hellwig
  6 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:04 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, phase 8 runs very slowly on filesystems with a lot of small
free space extents.  To reduce the amount of time spent on fstrim
activities during phase 8, we want to balance estimated runtime against
completeness of the trim.  In short, the goal is to reduce runtime by
avoiding small trim requests.

At the start of phase 8, a CDF is computed in decreasing order of extent
length from the histogram buckets created during the fsmap scan in phase
7.  A point corresponding to the fstrim percentage target is chosen from
the CDF and mapped back to a histogram bucket, and free space extents
smaller than that amount are ommitted from fstrim.

On my aging /home filesystem, the free space histogram reported by
xfs_spaceman looks like this:

   from      to extents    blocks    pct blkcdf extcdf
      1       1  121953    121953   0.04 100.00 100.00
      2       3  124741    299694   0.09  99.96  81.16
      4       7  113492    593763   0.18  99.87  61.89
      8      15  109215   1179524   0.36  99.69  44.36
     16      31   76972   1695455   0.52  99.33  27.48
     32      63   48655   2219667   0.68  98.82  15.59
     64     127   31398   2876898   0.88  98.14   8.08
    128     255    8014   1447920   0.44  97.27   3.23
    256     511    4142   1501758   0.46  96.82   1.99
    512    1023    2433   1768732   0.54  96.37   1.35
   1024    2047    1795   2648460   0.81  95.83   0.97
   2048    4095    1429   4206103   1.28  95.02   0.69
   4096    8191    1045   6162111   1.88  93.74   0.47
   8192   16383     791   9242745   2.81  91.87   0.31
  16384   32767     473  10883977   3.31  89.06   0.19
  32768   65535     272  12385566   3.77  85.74   0.12
  65536  131071     192  18098739   5.51  81.98   0.07
 131072  262143     108  20675199   6.29  76.47   0.04
 262144  524287      80  29061285   8.84  70.18   0.03
 524288 1048575      39  29002829   8.83  61.33   0.02
1048576 2097151      25  36824985  11.21  52.51   0.01
2097152 4194303      32 101727192  30.95  41.30   0.01
4194304 8388607       7  34007410  10.35  10.35   0.00

From this table, we see that free space extents that are 16 blocks or
longer constitute 99.3% of the free space in the filesystem but only
27.5% of the extents.  If we set the fstrim minlen parameter to 16
blocks, that means that we can trim over 99% of the space in one third
of the time it would take to trim everything.

Add a new -o fstrim_pct= option to xfs_scrub just in case there are
users out there who want a different percentage.  For example, accepting
a 95% trim would net us a speed increase of nearly two orders of
magnitude, ignoring system call overhead.  Setting it to 100% will trim
everything, just like fstrim(8).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/histogram.c  |    2 +
 libfrog/histogram.h  |    1 +
 man/man8/xfs_scrub.8 |   16 +++++++++++
 scrub/phase8.c       |   75 +++++++++++++++++++++++++++++++++++++++++++++++---
 scrub/vfs.c          |    4 ++-
 scrub/vfs.h          |    2 +
 scrub/xfs_scrub.c    |   38 +++++++++++++++++++++++++
 scrub/xfs_scrub.h    |   12 ++++++++
 8 files changed, 141 insertions(+), 9 deletions(-)


diff --git a/libfrog/histogram.c b/libfrog/histogram.c
index 54e2bac0f731..61ecda16ffef 100644
--- a/libfrog/histogram.c
+++ b/libfrog/histogram.c
@@ -109,7 +109,7 @@ hist_free(
  * of small extents, e.g. 98% of the free space extents are larger than 31
  * blocks.
  */
-static int
+int
 hist_cdf(
 	const struct histogram	*hs,
 	struct histogram	*cdf)
diff --git a/libfrog/histogram.h b/libfrog/histogram.h
index ec788344d4c2..ecac66d240da 100644
--- a/libfrog/histogram.h
+++ b/libfrog/histogram.h
@@ -39,6 +39,7 @@ void hist_add(struct histogram *hs, long long len);
 void hist_init(struct histogram *hs);
 void hist_prepare(struct histogram *hs, long long maxlen);
 void hist_free(struct histogram *hs);
+int hist_cdf(const struct histogram *hs, struct histogram *cdf);
 void hist_print(const struct histogram *hs);
 void hist_summarize(const struct histogram *hs);
 
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index 404baba696e1..b9f253e1b079 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -100,6 +100,22 @@ The
 supported are:
 .RS 1.0i
 .TP
+.BI fstrim_pct= percentage
+To constrain the amount of time spent on fstrim activities during phase 8,
+this program tries to balance estimated runtime against completeness of the
+trim.
+In short, the program avoids small trim requests to save time.
+
+During phase 7, a log-scale histogram of free space extents is constructed.
+At the start of phase 8, a CDF is computed in decreasing order of extent
+length from the histogram buckets.
+A point corresponding to the fstrim percentage target is chosen from the CDF
+and mapped back to a histogram bucket.
+Free space extents at least as long as the bucket size are trimmed.
+Smaller extents are ignored.
+
+By default, the percentage threshold is 99%.
+.TP
 .BI iwarn
 Treat informational messages as warnings.
 This will result in a nonzero return code, and a higher logging level.
diff --git a/scrub/phase8.c b/scrub/phase8.c
index ae6d965c75e1..6bb9317afecc 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -11,6 +11,7 @@
 #include "list.h"
 #include "libfrog/paths.h"
 #include "libfrog/workqueue.h"
+#include "libfrog/histogram.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "progress.h"
@@ -57,10 +58,12 @@ static int
 fstrim_fsblocks(
 	struct scrub_ctx	*ctx,
 	uint64_t		start_fsb,
-	uint64_t		fsbcount)
+	uint64_t		fsbcount,
+	uint64_t		minlen_fsb)
 {
 	uint64_t		start = cvt_off_fsb_to_b(&ctx->mnt, start_fsb);
 	uint64_t		len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount);
+	uint64_t		minlen = cvt_off_fsb_to_b(&ctx->mnt, minlen_fsb);
 	int			error;
 
 	while (len > 0) {
@@ -68,7 +71,7 @@ fstrim_fsblocks(
 
 		run = min(len, FSTRIM_MAX_BYTES);
 
-		error = fstrim(ctx, start, run);
+		error = fstrim(ctx, start, run, minlen);
 		if (error == EOPNOTSUPP) {
 			/* Pretend we finished all the work. */
 			progress_add(len);
@@ -78,9 +81,10 @@ fstrim_fsblocks(
 			char		descr[DESCR_BUFSZ];
 
 			snprintf(descr, sizeof(descr) - 1,
-					_("fstrim start 0x%llx run 0x%llx"),
+					_("fstrim start 0x%llx run 0x%llx minlen 0x%llx"),
 					(unsigned long long)start,
-					(unsigned long long)run);
+					(unsigned long long)run,
+					(unsigned long long)minlen);
 			str_liberror(ctx, error, descr);
 			return error;
 		}
@@ -93,6 +97,64 @@ fstrim_fsblocks(
 	return 0;
 }
 
+/* Compute a suitable minlen parameter for fstrim. */
+static uint64_t
+fstrim_compute_minlen(
+	const struct scrub_ctx	*ctx,
+	const struct histogram	*freesp_hist)
+{
+	struct histogram	cdf;
+	uint64_t		ret = 0;
+	double			blk_threshold = 0;
+	unsigned int		i;
+	unsigned int		ag_max_usable;
+	int			error;
+
+	/*
+	 * The kernel will reject a minlen that's larger than m_ag_max_usable.
+	 * We can't calculate or query that value directly, so we guesstimate
+	 * that it's 95% of the AG size.
+	 */
+	ag_max_usable = ctx->mnt.fsgeom.agblocks * 95 / 100;
+
+	if (freesp_hist->totexts == 0)
+		goto out;
+
+	if (debug > 1)
+		hist_print(freesp_hist);
+
+	/* Insufficient samples to make a meaningful histogram */
+	if (freesp_hist->totexts < freesp_hist->nr_buckets * 10)
+		goto out;
+
+	hist_init(&cdf);
+	error = hist_cdf(freesp_hist, &cdf);
+	if (error)
+		goto out_free;
+
+	blk_threshold = freesp_hist->totblocks * ctx->fstrim_block_pct;
+	for (i = 1; i < freesp_hist->nr_buckets; i++) {
+		if (cdf.buckets[i].blocks < blk_threshold) {
+			ret = freesp_hist->buckets[i - 1].low;
+			break;
+		}
+	}
+
+out_free:
+	hist_free(&cdf);
+out:
+	if (debug > 1)
+		printf(_("fstrim minlen %lld threshold %lld ag_max_usable %u\n"),
+				(unsigned long long)ret,
+				(unsigned long long)blk_threshold,
+				ag_max_usable);
+	if (ret > ag_max_usable)
+		ret = ag_max_usable;
+	if (ret == 1)
+		ret = 0;
+	return ret;
+}
+
 /* Trim each AG on the data device. */
 static int
 fstrim_datadev(
@@ -100,8 +162,11 @@ fstrim_datadev(
 {
 	struct xfs_fsop_geom	*geo = &ctx->mnt.fsgeom;
 	uint64_t		fsbno;
+	uint64_t		minlen_fsb;
 	int			error;
 
+	minlen_fsb = fstrim_compute_minlen(ctx, &ctx->datadev_hist);
+
 	for (fsbno = 0; fsbno < geo->datablocks; fsbno += geo->agblocks) {
 		uint64_t	fsbcount;
 
@@ -113,7 +178,7 @@ fstrim_datadev(
 		 */
 		progress_add(geo->blocksize);
 		fsbcount = min(geo->datablocks - (fsbno + 1), geo->agblocks - 1);
-		error = fstrim_fsblocks(ctx, fsbno + 1, fsbcount);
+		error = fstrim_fsblocks(ctx, fsbno + 1, fsbcount, minlen_fsb);
 		if (error)
 			return error;
 	}
diff --git a/scrub/vfs.c b/scrub/vfs.c
index cc958ba9438e..22c19485a2da 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -300,11 +300,13 @@ int
 fstrim(
 	struct scrub_ctx	*ctx,
 	uint64_t		start,
-	uint64_t		len)
+	uint64_t		len,
+	uint64_t		minlen)
 {
 	struct fstrim_range	range = {
 		.start		= start,
 		.len		= len,
+		.minlen		= minlen,
 	};
 
 	if (ioctl(ctx->mnt.fd, FITRIM, &range) == 0)
diff --git a/scrub/vfs.h b/scrub/vfs.h
index 1af8d80d1de6..f0cfd53c27be 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -24,6 +24,6 @@ typedef int (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 int scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
-int fstrim(struct scrub_ctx *ctx, uint64_t start, uint64_t len);
+int fstrim(struct scrub_ctx *ctx, uint64_t start, uint64_t len, uint64_t minlen);
 
 #endif /* XFS_SCRUB_VFS_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 2894f6148e10..296d814eceeb 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -622,11 +622,13 @@ report_outcome(
  */
 enum o_opt_nums {
 	IWARN = 0,
+	FSTRIM_PCT,
 	O_MAX_OPTS,
 };
 
 static char *o_opts[] = {
 	[IWARN]			= "iwarn",
+	[FSTRIM_PCT]		= "fstrim_pct",
 	[O_MAX_OPTS]		= NULL,
 };
 
@@ -635,8 +637,11 @@ parse_o_opts(
 	struct scrub_ctx	*ctx,
 	char			*p)
 {
+	double			dval;
+
 	while (*p != '\0')  {
 		char		*val;
+		char		*endp;
 
 		switch (getsubopt(&p, o_opts, &val))  {
 		case IWARN:
@@ -647,6 +652,35 @@ parse_o_opts(
 			}
 			info_is_warning = true;
 			break;
+		case FSTRIM_PCT:
+			if (!val) {
+				fprintf(stderr,
+ _("-o fstrim_pct requires a parameter\n"));
+				usage();
+			}
+
+			errno = 0;
+			dval = strtod(val, &endp);
+
+			if (*endp) {
+				fprintf(stderr,
+ _("-o fstrim_pct must be a floating point number\n"));
+				usage();
+			}
+			if (errno) {
+				fprintf(stderr,
+ _("-o fstrim_pct: %s\n"),
+						strerror(errno));
+				usage();
+			}
+			if (dval <= 0 || dval > 100) {
+				fprintf(stderr,
+ _("-o fstrim_pct must be larger than 0 and less than 100\n"));
+				usage();
+			}
+
+			ctx->fstrim_block_pct = dval / 100.0;
+			break;
 		default:
 			usage();
 			break;
@@ -659,7 +693,9 @@ main(
 	int			argc,
 	char			**argv)
 {
-	struct scrub_ctx	ctx = {0};
+	struct scrub_ctx	ctx = {
+		.fstrim_block_pct = FSTRIM_BLOCK_PCT_DEFAULT,
+	};
 	struct phase_rusage	all_pi;
 	char			*mtab = NULL;
 	FILE			*progress_fp = NULL;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 1a28f0cc847e..7d48f4bad9ce 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -90,8 +90,20 @@ struct scrub_ctx {
 
 	/* Free space histograms, in fsb */
 	struct histogram	datadev_hist;
+
+	/*
+	 * Pick the largest value for fstrim minlen such that we trim at least
+	 * this much space per volume.
+	 */
+	double			fstrim_block_pct;
 };
 
+/*
+ * Trim only enough free space extents (in order of decreasing length) to
+ * ensure that this percentage of the free space is trimmed.
+ */
+#define FSTRIM_BLOCK_PCT_DEFAULT	(99.0 / 100.0)
+
 /* Phase helper functions */
 void xfs_shutdown_fs(struct scrub_ctx *ctx);
 int scrub_cleanup(struct scrub_ctx *ctx);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
@ 2024-07-02  1:04   ` Darrick J. Wong
  2024-07-02  5:37     ` Christoph Hellwig
  2024-07-02  1:05   ` [PATCH 2/6] xfs_scrub.service: reduce background CPU usage to less than one core if possible Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:04 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

In the next patch, we'll tighten up the security on the xfs_scrub
service so that it can't escape.  However, sandboxing the service
involves making the host filesystem as inaccessible as possible, with
the filesystem to scrub bind mounted onto a known location within the
sandbox.  Hence we need one path for reporting and a new -M argument to
tell scrub what it should actually be trying to open.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_scrub.8 |    9 ++++++++-
 scrub/phase1.c       |    4 ++--
 scrub/vfs.c          |    2 +-
 scrub/xfs_scrub.c    |   11 ++++++++---
 scrub/xfs_scrub.h    |    5 ++++-
 5 files changed, 23 insertions(+), 8 deletions(-)


diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index b9f253e1b079..6154011271e6 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -4,7 +4,7 @@ xfs_scrub \- check and repair the contents of a mounted XFS filesystem
 .SH SYNOPSIS
 .B xfs_scrub
 [
-.B \-abCemnTvx
+.B \-abCeMmnTvx
 ]
 .I mount-point
 .br
@@ -79,6 +79,13 @@ behavior.
 .B \-k
 Do not call TRIM on the free space.
 .TP
+.BI \-M " real-mount-point"
+Open the this path for issuing scrub system calls to the kernel.
+The positional
+.I mount-point
+parameter will be used for displaying informational messages and logging.
+This parameter exists to enable process sandboxing for service mode.
+.TP
 .BI \-m " file"
 Search this file for mounted filesystems instead of /etc/mtab.
 .TP
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 1b3f6e8eb4f3..516d929d6268 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -146,7 +146,7 @@ phase1_func(
 	 * CAP_SYS_ADMIN, which we probably need to do anything fancy
 	 * with the (XFS driver) kernel.
 	 */
-	error = -xfd_open(&ctx->mnt, ctx->mntpoint,
+	error = -xfd_open(&ctx->mnt, ctx->actual_mntpoint,
 			O_RDONLY | O_NOATIME | O_DIRECTORY);
 	if (error) {
 		if (error == EPERM)
@@ -199,7 +199,7 @@ _("Not an XFS filesystem."));
 		return error;
 	}
 
-	error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle,
+	error = path_to_fshandle(ctx->actual_mntpoint, &ctx->fshandle,
 			&ctx->fshandle_len);
 	if (error) {
 		str_errno(ctx, _("getting fshandle"));
diff --git a/scrub/vfs.c b/scrub/vfs.c
index 22c19485a2da..fca9a4cf3568 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -249,7 +249,7 @@ scan_fs_tree(
 		goto out_cond;
 	}
 
-	ret = queue_subdir(ctx, &sft, &wq, ctx->mntpoint, true);
+	ret = queue_subdir(ctx, &sft, &wq, ctx->actual_mntpoint, true);
 	if (ret) {
 		str_liberror(ctx, ret, _("queueing directory scan"));
 		goto out_wq;
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 296d814eceeb..d7cef115deea 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -725,7 +725,7 @@ main(
 	pthread_mutex_init(&ctx.lock, NULL);
 	ctx.mode = SCRUB_MODE_REPAIR;
 	ctx.error_action = ERRORS_CONTINUE;
-	while ((c = getopt(argc, argv, "a:bC:de:km:no:TvxV")) != EOF) {
+	while ((c = getopt(argc, argv, "a:bC:de:kM:m:no:TvxV")) != EOF) {
 		switch (c) {
 		case 'a':
 			ctx.max_errors = cvt_u64(optarg, 10);
@@ -769,6 +769,9 @@ main(
 		case 'k':
 			want_fstrim = false;
 			break;
+		case 'M':
+			ctx.actual_mntpoint = optarg;
+			break;
 		case 'm':
 			mtab = optarg;
 			break;
@@ -823,6 +826,8 @@ main(
 		usage();
 
 	ctx.mntpoint = argv[optind];
+	if (!ctx.actual_mntpoint)
+		ctx.actual_mntpoint = ctx.mntpoint;
 
 	stdout_isatty = isatty(STDOUT_FILENO);
 	stderr_isatty = isatty(STDERR_FILENO);
@@ -840,7 +845,7 @@ main(
 		return SCRUB_RET_OPERROR;
 
 	/* Find the mount record for the passed-in argument. */
-	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
+	if (stat(ctx.actual_mntpoint, &ctx.mnt_sb) < 0) {
 		fprintf(stderr,
 			_("%s: could not stat: %s: %s\n"),
 			progname, argv[optind], strerror(errno));
@@ -863,7 +868,7 @@ main(
 	}
 
 	fs_table_initialise(0, NULL, 0, NULL);
-	fsp = fs_table_lookup_mount(ctx.mntpoint);
+	fsp = fs_table_lookup_mount(ctx.actual_mntpoint);
 	if (!fsp) {
 		fprintf(stderr, _("%s: Not a XFS mount point.\n"),
 				ctx.mntpoint);
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 7d48f4bad9ce..b0aa9fcc67b7 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -38,9 +38,12 @@ enum error_action {
 struct scrub_ctx {
 	/* Immutable scrub state. */
 
-	/* Strings we need for presentation */
+	/* Mountpoint we use for presentation */
 	char			*mntpoint;
 
+	/* Actual VFS path to the filesystem */
+	char			*actual_mntpoint;
+
 	/* Mountpoint info */
 	struct stat		mnt_sb;
 	struct statvfs		mnt_sv;


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/6] xfs_scrub.service: reduce background CPU usage to less than one core if possible
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
  2024-07-02  1:04   ` [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing Darrick J. Wong
@ 2024-07-02  1:05   ` Darrick J. Wong
  2024-07-02  1:05   ` [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd service Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:05 UTC (permalink / raw)
  To: djwong, cem; +Cc: Christoph Hellwig, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, the xfs_scrub background service is configured to use -b,
which means that the program runs completely serially.  However, even
using all of one CPU core with idle priority may be enough to cause
thermal throttling and unwanted fan noise on smaller systems (e.g.
laptops) with fast IO systems.

Let's try to avoid this (at least on systemd) by using cgroups to limit
the program's usage to slghtly more than half of one CPU and lowering
the nice priority in the scheduler.  What we /really/ want is to run
steadily on an efficiency core, but there doesn't seem to be a means to
ask the scheduler not to ramp up the CPU frequency for a particular
task.

While we're at it, group the resource limit directives together.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/Makefile                   |    7 ++++++-
 scrub/system-xfs_scrub.slice     |   30 ++++++++++++++++++++++++++++++
 scrub/xfs_scrub@.service.in      |   12 ++++++++++--
 scrub/xfs_scrub_all.service.in   |    4 ++++
 scrub/xfs_scrub_fail@.service.in |    4 ++++
 5 files changed, 54 insertions(+), 3 deletions(-)
 create mode 100644 scrub/system-xfs_scrub.slice


diff --git a/scrub/Makefile b/scrub/Makefile
index 8ccc67d01df3..2a257e0805cd 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -18,7 +18,12 @@ XFS_SCRUB_FAIL_PROG = xfs_scrub_fail
 XFS_SCRUB_ARGS = -b -n
 ifeq ($(HAVE_SYSTEMD),yes)
 INSTALL_SCRUB += install-systemd
-SYSTEMD_SERVICES = $(scrub_svcname) xfs_scrub_all.service xfs_scrub_all.timer xfs_scrub_fail@.service
+SYSTEMD_SERVICES=\
+	$(scrub_svcname) \
+	xfs_scrub_fail@.service \
+	xfs_scrub_all.service \
+	xfs_scrub_all.timer \
+	system-xfs_scrub.slice
 OPTIONAL_TARGETS += $(SYSTEMD_SERVICES)
 endif
 ifeq ($(HAVE_CROND),yes)
diff --git a/scrub/system-xfs_scrub.slice b/scrub/system-xfs_scrub.slice
new file mode 100644
index 000000000000..95cd4f74526f
--- /dev/null
+++ b/scrub/system-xfs_scrub.slice
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+
+[Unit]
+Description=xfs_scrub background service slice
+Before=slices.target
+
+[Slice]
+
+# If the CPU usage cgroup controller is available, don't use more than 60% of a
+# single core for all background processes.
+CPUQuota=60%
+CPUAccounting=true
+
+[Install]
+# As of systemd 249, the systemd cgroupv2 configuration code will drop resource
+# controllers from the root and system.slice cgroups at startup if it doesn't
+# find any direct dependencies that require a given controller.  Newly
+# activated units with resource control directives are created under the system
+# slice but do not cause a reconfiguration of the slice's resource controllers.
+# Hence we cannot put CPUQuota= into the xfs_scrub service units directly.
+#
+# For the CPUQuota directive to have any effect, we must therefore create an
+# explicit definition file for the slice that systemd creates to contain the
+# xfs_scrub instance units (e.g. xfs_scrub@.service) and we must configure this
+# slice as a dependency of the system slice to establish the direct dependency
+# relation.
+WantedBy=system.slice
diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
index 05e5293ee1ac..855fe4de4dcf 100644
--- a/scrub/xfs_scrub@.service.in
+++ b/scrub/xfs_scrub@.service.in
@@ -18,8 +18,16 @@ PrivateTmp=no
 AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
 NoNewPrivileges=yes
 User=nobody
-IOSchedulingClass=idle
-CPUSchedulingPolicy=idle
 Environment=SERVICE_MODE=1
 ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f
 SyslogIdentifier=%N
+
+# Run scrub with minimal CPU and IO priority so that nothing else will starve.
+IOSchedulingClass=idle
+CPUSchedulingPolicy=idle
+CPUAccounting=true
+Nice=19
+
+# Create the service underneath the scrub background service slice so that we
+# can control resource usage.
+Slice=system-xfs_scrub.slice
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 347cd6e66228..96be90e74ee6 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -14,3 +14,7 @@ Type=oneshot
 Environment=SERVICE_MODE=1
 ExecStart=@sbindir@/xfs_scrub_all
 SyslogIdentifier=xfs_scrub_all
+
+# Create the service underneath the scrub background service slice so that we
+# can control resource usage.
+Slice=system-xfs_scrub.slice
diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in
index 96a2ed5da31e..32012ec35366 100644
--- a/scrub/xfs_scrub_fail@.service.in
+++ b/scrub/xfs_scrub_fail@.service.in
@@ -14,3 +14,7 @@ ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" %f
 User=mail
 Group=mail
 SupplementaryGroups=systemd-journal
+
+# Create the service underneath the scrub background service slice so that we
+# can control resource usage.
+Slice=system-xfs_scrub.slice


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd service
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
  2024-07-02  1:04   ` [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing Darrick J. Wong
  2024-07-02  1:05   ` [PATCH 2/6] xfs_scrub.service: reduce background CPU usage to less than one core if possible Darrick J. Wong
@ 2024-07-02  1:05   ` Darrick J. Wong
  2024-07-02  1:05   ` [PATCH 4/6] xfs_scrub: tighten up the security on the background " Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:05 UTC (permalink / raw)
  To: djwong, cem; +Cc: Helle Vaanzinn, Christoph Hellwig, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Five years ago, systemd introduced the DynamicUser directive that
allocates a new unique user/group id, runs a service with those ids, and
deletes them after the service exits.  This is a good replacement for
User=nobody, since it eliminates the threat of nobody-services messing
with each other.

Make this transition ahead of all the other security tightenings that
will land in the next few patches, and add credits for the people who
suggested the change and reviewed it.

Link: https://0pointer.net/blog/dynamic-users-with-systemd.html
Suggested-by: Helle Vaanzinn <glitsj16@riseup.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub@.service.in |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
index 855fe4de4dcf..52068add834d 100644
--- a/scrub/xfs_scrub@.service.in
+++ b/scrub/xfs_scrub@.service.in
@@ -17,7 +17,6 @@ ProtectHome=read-only
 PrivateTmp=no
 AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
 NoNewPrivileges=yes
-User=nobody
 Environment=SERVICE_MODE=1
 ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f
 SyslogIdentifier=%N
@@ -31,3 +30,6 @@ Nice=19
 # Create the service underneath the scrub background service slice so that we
 # can control resource usage.
 Slice=system-xfs_scrub.slice
+
+# Dynamically create a user that isn't root
+DynamicUser=true


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/6] xfs_scrub: tighten up the security on the background systemd service
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:05   ` [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd service Darrick J. Wong
@ 2024-07-02  1:05   ` Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 5/6] xfs_scrub_fail: " Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 6/6] xfs_scrub_all: " Darrick J. Wong
  5 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:05 UTC (permalink / raw)
  To: djwong, cem; +Cc: Christoph Hellwig, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, xfs_scrub has to run with some elevated privileges.  Minimize
the risk of xfs_scrub escaping its service container or contaminating
the rest of the system by using systemd's sandboxing controls to
prohibit as much access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub@.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/xfs_scrub@.service.in |   81 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 73 insertions(+), 8 deletions(-)


diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
index 52068add834d..a8dd9052f0e0 100644
--- a/scrub/xfs_scrub@.service.in
+++ b/scrub/xfs_scrub@.service.in
@@ -8,17 +8,21 @@ Description=Online XFS Metadata Check for %f
 OnFailure=xfs_scrub_fail@%i.service
 Documentation=man:xfs_scrub(8)
 
+# Explicitly require the capabilities that this program needs
+ConditionCapability=CAP_SYS_ADMIN
+ConditionCapability=CAP_FOWNER
+ConditionCapability=CAP_DAC_OVERRIDE
+ConditionCapability=CAP_DAC_READ_SEARCH
+ConditionCapability=CAP_SYS_RAWIO
+
+# Must be a mountpoint
+ConditionPathIsMountPoint=%f
+RequiresMountsFor=%f
+
 [Service]
 Type=oneshot
-PrivateNetwork=true
-ProtectSystem=full
-ProtectHome=read-only
-# Disable private /tmp just in case %f is a path under /tmp.
-PrivateTmp=no
-AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
-NoNewPrivileges=yes
 Environment=SERVICE_MODE=1
-ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f
+ExecStart=@sbindir@/xfs_scrub @scrub_args@ -M /tmp/scrub/ %f
 SyslogIdentifier=%N
 
 # Run scrub with minimal CPU and IO priority so that nothing else will starve.
@@ -31,5 +35,66 @@ Nice=19
 # can control resource usage.
 Slice=system-xfs_scrub.slice
 
+# No realtime CPU scheduling
+RestrictRealtime=true
+
 # Dynamically create a user that isn't root
 DynamicUser=true
+
+# Make the entire filesystem readonly and /home inaccessible, then bind mount
+# the filesystem we're supposed to be checking into our private /tmp dir.
+# 'norbind' means that we don't bind anything under that original mount.
+ProtectSystem=strict
+ProtectHome=yes
+PrivateTmp=true
+BindPaths=%f:/tmp/scrub:norbind
+
+# Don't let scrub complain about paths in /etc/projects that have been hidden
+# by our sandboxing.  scrub doesn't care about project ids anyway.
+InaccessiblePaths=-/etc/projects
+
+# No network access
+PrivateNetwork=true
+ProtectHostname=true
+RestrictAddressFamilies=none
+IPAddressDeny=any
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Hide everything in /proc, even /proc/mounts
+ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# xfs_scrub needs these privileges to run, and no others
+CapabilityBoundingSet=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+NoNewPrivileges=true
+
+# xfs_scrub doesn't create files
+UMask=7777
+
+# No access to hardware /dev files except for block devices
+ProtectClock=true
+DevicePolicy=closed
+DeviceAllow=block-*


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/6] xfs_scrub_fail: tighten up the security on the background systemd service
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:05   ` [PATCH 4/6] xfs_scrub: tighten up the security on the background " Darrick J. Wong
@ 2024-07-02  1:06   ` Darrick J. Wong
  2024-07-02  5:37     ` Christoph Hellwig
  2024-07-02  1:06   ` [PATCH 6/6] xfs_scrub_all: " Darrick J. Wong
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, xfs_scrub_fail has to run with enough privileges to access
the journal contents for a given scrub run and to send a report via
email.  Minimize the risk of xfs_scrub_fail escaping its service
container or contaminating the rest of the system by using systemd's
sandboxing controls to prohibit as much access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub_fail@.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_fail@.service.in |   55 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)


diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in
index 32012ec35366..2c879afd6d8e 100644
--- a/scrub/xfs_scrub_fail@.service.in
+++ b/scrub/xfs_scrub_fail@.service.in
@@ -18,3 +18,58 @@ SupplementaryGroups=systemd-journal
 # Create the service underneath the scrub background service slice so that we
 # can control resource usage.
 Slice=system-xfs_scrub.slice
+
+# No realtime scheduling
+RestrictRealtime=true
+
+# Make the entire filesystem readonly and /home inaccessible.
+ProtectSystem=full
+ProtectHome=yes
+PrivateTmp=true
+RestrictSUIDSGID=true
+
+# Emailing reports requires network access, but not the ability to change the
+# hostname.
+ProtectHostname=true
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Can't hide /proc because journalctl needs it to find various pieces of log
+# information
+#ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# xfs_scrub needs these privileges to run, and no others
+CapabilityBoundingSet=
+NoNewPrivileges=true
+
+# Failure reporting shouldn't create world-readable files
+UMask=0077
+
+# Clean up any IPC objects when this unit stops
+RemoveIPC=true
+
+# No access to hardware device files
+PrivateDevices=true
+ProtectClock=true


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 6/6] xfs_scrub_all: tighten up the security on the background systemd service
  2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:06   ` [PATCH 5/6] xfs_scrub_fail: " Darrick J. Wong
@ 2024-07-02  1:06   ` Darrick J. Wong
  2024-07-02  5:37     ` Christoph Hellwig
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, xfs_scrub_all has to run with enough privileges to find
mounted XFS filesystems and the device associated with that mount and to
start xfs_scrub@<mountpoint> sub-services.  Minimize the risk of
xfs_scrub_all escaping its service container or contaminating the rest
of the system by using systemd's sandboxing controls to prohibit as much
access as possible.

The directives added by this patch were recommended by the command
'systemd-analyze security xfs_scrub_all.service' in systemd 249.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.service.in |   62 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)


diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 96be90e74ee6..478cd8d057a2 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -18,3 +18,65 @@ SyslogIdentifier=xfs_scrub_all
 # Create the service underneath the scrub background service slice so that we
 # can control resource usage.
 Slice=system-xfs_scrub.slice
+
+# Run scrub_all with minimal CPU and IO priority so that nothing will starve.
+IOSchedulingClass=idle
+CPUSchedulingPolicy=idle
+CPUAccounting=true
+Nice=19
+
+# No realtime scheduling
+RestrictRealtime=true
+
+# No special privileges, but we still have to run as root so that we can
+# contact the service manager to start the sub-units.
+CapabilityBoundingSet=
+NoNewPrivileges=true
+RestrictSUIDSGID=true
+
+# Make the entire filesystem readonly.  We don't want to hide anything because
+# we need to find all mounted XFS filesystems in the host.
+ProtectSystem=strict
+ProtectHome=read-only
+PrivateTmp=false
+
+# No network access except to the systemd control socket
+PrivateNetwork=true
+ProtectHostname=true
+RestrictAddressFamilies=AF_UNIX
+IPAddressDeny=any
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Hide everything in /proc, even /proc/mounts
+ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# Media scan stamp file shouldn't be readable by regular users
+UMask=0077
+
+# lsblk ignores mountpoints if it can't find the device files, so we cannot
+# hide them
+#ProtectClock=true
+#PrivateDevices=true


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
@ 2024-07-02  1:06   ` Darrick J. Wong
  2024-07-02  5:38     ` Christoph Hellwig
  2024-07-02  1:06   ` [PATCH 2/6] xfs_scrub_all: remove journalctl background process Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Since the per-mount xfs_scrub@.service definition includes a bunch of
resource usage constraints, we no longer want to use those services if
xfs_scrub_all is being run directly by the sysadmin (aka not in service
mode) on the presumption that sysadmins want answers as quickly as
possible.

Therefore, only try to call the systemd service from xfs_scrub_all if
SERVICE_MODE is set in the environment.  If reaching out to systemd
fails and we're in service mode, we still want to run xfs_scrub
directly.  Split the makefile variables as necessary so that we only
pass -b to xfs_scrub in service mode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/Makefile              |    5 ++++-
 scrub/xfs_scrub@.service.in |    2 +-
 scrub/xfs_scrub_all.in      |   11 ++++++++---
 3 files changed, 13 insertions(+), 5 deletions(-)


diff --git a/scrub/Makefile b/scrub/Makefile
index 2a257e0805cd..2050fe28fc75 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -15,7 +15,8 @@ LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 XFS_SCRUB_ALL_PROG = xfs_scrub_all
 XFS_SCRUB_FAIL_PROG = xfs_scrub_fail
-XFS_SCRUB_ARGS = -b -n
+XFS_SCRUB_ARGS = -n
+XFS_SCRUB_SERVICE_ARGS = -b
 ifeq ($(HAVE_SYSTEMD),yes)
 INSTALL_SCRUB += install-systemd
 SYSTEMD_SERVICES=\
@@ -113,6 +114,7 @@ xfs_scrub_all: xfs_scrub_all.in $(builddefs)
 	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \
 		   -e "s|@scrub_svcname@|$(scrub_svcname)|g" \
 		   -e "s|@pkg_version@|$(PKG_VERSION)|g" \
+		   -e "s|@scrub_service_args@|$(XFS_SCRUB_SERVICE_ARGS)|g" \
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" < $< > $@
 	$(Q)chmod a+x $@
 
@@ -132,6 +134,7 @@ install: $(INSTALL_SCRUB)
 %.service: %.service.in $(builddefs)
 	@echo "    [SED]    $@"
 	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \
+		   -e "s|@scrub_service_args@|$(XFS_SCRUB_SERVICE_ARGS)|g" \
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" \
 		   -e "s|@pkg_libexec_dir@|$(PKG_LIBEXEC_DIR)|g" \
 		   < $< > $@
diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
index a8dd9052f0e0..5fa5f328200e 100644
--- a/scrub/xfs_scrub@.service.in
+++ b/scrub/xfs_scrub@.service.in
@@ -22,7 +22,7 @@ RequiresMountsFor=%f
 [Service]
 Type=oneshot
 Environment=SERVICE_MODE=1
-ExecStart=@sbindir@/xfs_scrub @scrub_args@ -M /tmp/scrub/ %f
+ExecStart=@sbindir@/xfs_scrub @scrub_service_args@ @scrub_args@ -M /tmp/scrub/ %f
 SyslogIdentifier=%N
 
 # Run scrub with minimal CPU and IO priority so that nothing else will starve.
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index d0ab27fd3060..f27251fa5439 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -162,9 +162,10 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 		if terminate:
 			return
 
-		# Try it the systemd way
+		# Run per-mount systemd xfs_scrub service only if we ourselves
+		# are running as a systemd service.
 		unitname = path_to_serviceunit(path)
-		if unitname is not None:
+		if unitname is not None and 'SERVICE_MODE' in os.environ:
 			ret = systemctl_start(unitname, killfuncs)
 			if ret == 0 or ret == 1:
 				print("Scrubbing %s done, (err=%d)" % (mnt, ret))
@@ -175,8 +176,12 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 			if terminate:
 				return
 
-		# Invoke xfs_scrub manually
+		# Invoke xfs_scrub manually if we're running in the foreground.
+		# We also permit this if we're running as a cronjob where
+		# systemd services are unavailable.
 		cmd = ['@sbindir@/xfs_scrub']
+		if 'SERVICE_MODE' in os.environ:
+			cmd += '@scrub_service_args@'.split()
 		cmd += '@scrub_args@'.split()
 		cmd += [mnt]
 		ret = run_killable(cmd, None, killfuncs)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/6] xfs_scrub_all: remove journalctl background process
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode Darrick J. Wong
@ 2024-07-02  1:06   ` Darrick J. Wong
  2024-07-02  5:38     ` Christoph Hellwig
  2024-07-02  1:07   ` [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Now that we only start systemd services if we're running in service
mode, there's no need for the background journalctl process that only
ran if we had started systemd services in non-service mode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.in |   14 --------------
 1 file changed, 14 deletions(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index f27251fa5439..fc7a2e637efa 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -261,17 +261,6 @@ def main():
 
 	fs = find_mounts()
 
-	# Tail the journal if we ourselves aren't a service...
-	journalthread = None
-	if 'SERVICE_MODE' not in os.environ:
-		try:
-			cmd=['journalctl', '--no-pager', '-q', '-S', 'now', \
-					'-f', '-u', 'xfs_scrub@*', '-o', \
-					'cat']
-			journalthread = subprocess.Popen(cmd)
-		except:
-			pass
-
 	# Schedule scrub jobs...
 	running_devs = set()
 	killfuncs = set()
@@ -308,9 +297,6 @@ def main():
 	while len(killfuncs) > 0:
 		wait_for_termination(cond, killfuncs)
 
-	if journalthread is not None:
-		journalthread.terminate()
-
 	# See the service mode comments in xfs_scrub.c for why we do this.
 	if 'SERVICE_MODE' in os.environ:
 		time.sleep(2)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode Darrick J. Wong
  2024-07-02  1:06   ` [PATCH 2/6] xfs_scrub_all: remove journalctl background process Darrick J. Wong
@ 2024-07-02  1:07   ` Darrick J. Wong
  2024-07-02  5:39     ` Christoph Hellwig
  2024-07-02  1:07   ` [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:07 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add the necessary systemd services and control bits so that
xfs_scrub_all can kick off a metadata+media scan of a filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_scrub_all.8               |    5 +-
 scrub/Makefile                         |    4 +
 scrub/xfs_scrub_all.in                 |   23 +++++--
 scrub/xfs_scrub_fail.in                |   13 +++-
 scrub/xfs_scrub_fail@.service.in       |    2 -
 scrub/xfs_scrub_media@.service.in      |  100 ++++++++++++++++++++++++++++++++
 scrub/xfs_scrub_media_fail@.service.in |   76 ++++++++++++++++++++++++
 7 files changed, 210 insertions(+), 13 deletions(-)
 create mode 100644 scrub/xfs_scrub_media@.service.in
 create mode 100644 scrub/xfs_scrub_media_fail@.service.in


diff --git a/man/man8/xfs_scrub_all.8 b/man/man8/xfs_scrub_all.8
index 74548802eda0..86a9b3eced28 100644
--- a/man/man8/xfs_scrub_all.8
+++ b/man/man8/xfs_scrub_all.8
@@ -4,7 +4,7 @@ xfs_scrub_all \- scrub all mounted XFS filesystems
 .SH SYNOPSIS
 .B xfs_scrub_all
 [
-.B \-hV
+.B \-hxV
 ]
 .SH DESCRIPTION
 .B xfs_scrub_all
@@ -21,6 +21,9 @@ the same device simultaneously.
 .B \-h
 Display help.
 .TP
+.B \-x
+Read all file data extents to look for disk errors.
+.TP
 .B \-V
 Prints the version number and exits.
 .SH EXIT CODE
diff --git a/scrub/Makefile b/scrub/Makefile
index 2050fe28fc75..5567ec061e82 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -9,6 +9,7 @@ include $(builddefs)
 SCRUB_PREREQS=$(HAVE_GETFSMAP)
 
 scrub_svcname=xfs_scrub@.service
+scrub_media_svcname=xfs_scrub_media@.service
 
 ifeq ($(SCRUB_PREREQS),yes)
 LTCOMMAND = xfs_scrub
@@ -22,6 +23,8 @@ INSTALL_SCRUB += install-systemd
 SYSTEMD_SERVICES=\
 	$(scrub_svcname) \
 	xfs_scrub_fail@.service \
+	$(scrub_media_svcname) \
+	xfs_scrub_media_fail@.service \
 	xfs_scrub_all.service \
 	xfs_scrub_all.timer \
 	system-xfs_scrub.slice
@@ -113,6 +116,7 @@ xfs_scrub_all: xfs_scrub_all.in $(builddefs)
 	@echo "    [SED]    $@"
 	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \
 		   -e "s|@scrub_svcname@|$(scrub_svcname)|g" \
+		   -e "s|@scrub_media_svcname@|$(scrub_media_svcname)|g" \
 		   -e "s|@pkg_version@|$(PKG_VERSION)|g" \
 		   -e "s|@scrub_service_args@|$(XFS_SCRUB_SERVICE_ARGS)|g" \
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" < $< > $@
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index fc7a2e637efa..afba0dbe8912 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -19,6 +19,7 @@ from io import TextIOWrapper
 
 retcode = 0
 terminate = False
+scrub_media = False
 
 def DEVNULL():
 	'''Return /dev/null in subprocess writable format.'''
@@ -88,11 +89,15 @@ def run_killable(cmd, stdout, killfuncs):
 # systemd doesn't like unit instance names with slashes in them, so it
 # replaces them with dashes when it invokes the service.  Filesystem paths
 # need a special --path argument so that dashes do not get mangled.
-def path_to_serviceunit(path):
+def path_to_serviceunit(path, scrub_media):
 	'''Convert a pathname into a systemd service unit name.'''
 
-	cmd = ['systemd-escape', '--template', '@scrub_svcname@',
-	       '--path', path]
+	if scrub_media:
+		svcname = '@scrub_media_svcname@'
+	else:
+		svcname = '@scrub_svcname@'
+	cmd = ['systemd-escape', '--template', svcname, '--path', path]
+
 	try:
 		proc = subprocess.Popen(cmd, stdout = subprocess.PIPE)
 		proc.wait()
@@ -153,7 +158,7 @@ def systemctl_start(unitname, killfuncs):
 
 def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 	'''Run a scrub process.'''
-	global retcode, terminate
+	global retcode, terminate, scrub_media
 
 	print("Scrubbing %s..." % mnt)
 	sys.stdout.flush()
@@ -164,7 +169,7 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 
 		# Run per-mount systemd xfs_scrub service only if we ourselves
 		# are running as a systemd service.
-		unitname = path_to_serviceunit(path)
+		unitname = path_to_serviceunit(path, scrub_media)
 		if unitname is not None and 'SERVICE_MODE' in os.environ:
 			ret = systemctl_start(unitname, killfuncs)
 			if ret == 0 or ret == 1:
@@ -183,6 +188,8 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 		if 'SERVICE_MODE' in os.environ:
 			cmd += '@scrub_service_args@'.split()
 		cmd += '@scrub_args@'.split()
+		if scrub_media:
+			cmd += '-x'
 		cmd += [mnt]
 		ret = run_killable(cmd, None, killfuncs)
 		if ret >= 0:
@@ -247,18 +254,22 @@ def main():
 		a = (mnt, cond, running_devs, devs, killfuncs)
 		thr = threading.Thread(target = run_scrub, args = a)
 		thr.start()
-	global retcode, terminate
+	global retcode, terminate, scrub_media
 
 	parser = argparse.ArgumentParser( \
 			description = "Scrub all mounted XFS filesystems.")
 	parser.add_argument("-V", help = "Report version and exit.", \
 			action = "store_true")
+	parser.add_argument("-x", help = "Scrub file data after filesystem metadata.", \
+			action = "store_true")
 	args = parser.parse_args()
 
 	if args.V:
 		print("xfs_scrub_all version @pkg_version@")
 		sys.exit(0)
 
+	scrub_media = args.x
+
 	fs = find_mounts()
 
 	# Schedule scrub jobs...
diff --git a/scrub/xfs_scrub_fail.in b/scrub/xfs_scrub_fail.in
index 9437b0327e7c..e420917f699f 100755
--- a/scrub/xfs_scrub_fail.in
+++ b/scrub/xfs_scrub_fail.in
@@ -9,8 +9,11 @@
 
 recipient="$1"
 test -z "${recipient}" && exit 0
-mntpoint="$2"
+service="$2"
+test -z "${service}" && exit 0
+mntpoint="$3"
 test -z "${mntpoint}" && exit 0
+
 hostname="$(hostname -f 2>/dev/null)"
 test -z "${hostname}" && hostname="${HOSTNAME}"
 
@@ -21,16 +24,16 @@ if [ ! -x "${mailer}" ]; then
 fi
 
 # Turn the mountpoint into a properly escaped systemd instance name
-scrub_svc="$(systemd-escape --template "@scrub_svcname@" --path "${mntpoint}")"
+scrub_svc="$(systemd-escape --template "${service}@.service" --path "${mntpoint}")"
 
 (cat << ENDL
 To: $1
-From: <xfs_scrub@${hostname}>
-Subject: xfs_scrub failure on ${mntpoint}
+From: <${service}@${hostname}>
+Subject: ${service} failure on ${mntpoint}
 Content-Transfer-Encoding: 8bit
 Content-Type: text/plain; charset=UTF-8
 
-So sorry, the automatic xfs_scrub of ${mntpoint} on ${hostname} failed.
+So sorry, the automatic ${service} of ${mntpoint} on ${hostname} failed.
 Please do not reply to this mesage.
 
 A log of what happened follows:
diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in
index 2c879afd6d8e..16077888df33 100644
--- a/scrub/xfs_scrub_fail@.service.in
+++ b/scrub/xfs_scrub_fail@.service.in
@@ -10,7 +10,7 @@ Documentation=man:xfs_scrub(8)
 [Service]
 Type=oneshot
 Environment=EMAIL_ADDR=root
-ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" %f
+ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" xfs_scrub %f
 User=mail
 Group=mail
 SupplementaryGroups=systemd-journal
diff --git a/scrub/xfs_scrub_media@.service.in b/scrub/xfs_scrub_media@.service.in
new file mode 100644
index 000000000000..e670748ced51
--- /dev/null
+++ b/scrub/xfs_scrub_media@.service.in
@@ -0,0 +1,100 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+
+[Unit]
+Description=Online XFS Metadata and Media Check for %f
+OnFailure=xfs_scrub_media_fail@%i.service
+Documentation=man:xfs_scrub(8)
+
+# Explicitly require the capabilities that this program needs
+ConditionCapability=CAP_SYS_ADMIN
+ConditionCapability=CAP_FOWNER
+ConditionCapability=CAP_DAC_OVERRIDE
+ConditionCapability=CAP_DAC_READ_SEARCH
+ConditionCapability=CAP_SYS_RAWIO
+
+# Must be a mountpoint
+ConditionPathIsMountPoint=%f
+RequiresMountsFor=%f
+
+[Service]
+Type=oneshot
+Environment=SERVICE_MODE=1
+ExecStart=@sbindir@/xfs_scrub @scrub_service_args@ @scrub_args@ -M /tmp/scrub/ -x %f
+SyslogIdentifier=%N
+
+# Run scrub with minimal CPU and IO priority so that nothing else will starve.
+IOSchedulingClass=idle
+CPUSchedulingPolicy=idle
+CPUAccounting=true
+Nice=19
+
+# Create the service underneath the scrub background service slice so that we
+# can control resource usage.
+Slice=system-xfs_scrub.slice
+
+# No realtime CPU scheduling
+RestrictRealtime=true
+
+# Dynamically create a user that isn't root
+DynamicUser=true
+
+# Make the entire filesystem readonly and /home inaccessible, then bind mount
+# the filesystem we're supposed to be checking into our private /tmp dir.
+# 'norbind' means that we don't bind anything under that original mount.
+ProtectSystem=strict
+ProtectHome=yes
+PrivateTmp=true
+BindPaths=%f:/tmp/scrub:norbind
+
+# Don't let scrub complain about paths in /etc/projects that have been hidden
+# by our sandboxing.  scrub doesn't care about project ids anyway.
+InaccessiblePaths=-/etc/projects
+
+# No network access
+PrivateNetwork=true
+ProtectHostname=true
+RestrictAddressFamilies=none
+IPAddressDeny=any
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Hide everything in /proc, even /proc/mounts
+ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# xfs_scrub needs these privileges to run, and no others
+CapabilityBoundingSet=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+NoNewPrivileges=true
+
+# xfs_scrub doesn't create files
+UMask=7777
+
+# No access to hardware /dev files except for block devices
+ProtectClock=true
+DevicePolicy=closed
+DeviceAllow=block-*
diff --git a/scrub/xfs_scrub_media_fail@.service.in b/scrub/xfs_scrub_media_fail@.service.in
new file mode 100644
index 000000000000..97c0e090721d
--- /dev/null
+++ b/scrub/xfs_scrub_media_fail@.service.in
@@ -0,0 +1,76 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+
+[Unit]
+Description=Online XFS Metadata and Media Check Failure Reporting for %f
+Documentation=man:xfs_scrub(8)
+
+[Service]
+Type=oneshot
+Environment=EMAIL_ADDR=root
+ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" xfs_scrub_media %f
+User=mail
+Group=mail
+SupplementaryGroups=systemd-journal
+
+# Create the service underneath the scrub background service slice so that we
+# can control resource usage.
+Slice=system-xfs_scrub.slice
+
+# No realtime scheduling
+RestrictRealtime=true
+
+# Make the entire filesystem readonly and /home inaccessible, then bind mount
+# the filesystem we're supposed to be checking into our private /tmp dir.
+ProtectSystem=full
+ProtectHome=yes
+PrivateTmp=true
+RestrictSUIDSGID=true
+
+# Emailing reports requires network access, but not the ability to change the
+# hostname.
+ProtectHostname=true
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Can't hide /proc because journalctl needs it to find various pieces of log
+# information
+#ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# xfs_scrub needs these privileges to run, and no others
+CapabilityBoundingSet=
+NoNewPrivileges=true
+
+# Failure reporting shouldn't create world-readable files
+UMask=0077
+
+# Clean up any IPC objects when this unit stops
+RemoveIPC=true
+
+# No access to hardware device files
+PrivateDevices=true
+ProtectClock=true


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:07   ` [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems Darrick J. Wong
@ 2024-07-02  1:07   ` Darrick J. Wong
  2024-07-02  5:39     ` Christoph Hellwig
  2024-07-02  1:07   ` [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month Darrick J. Wong
  2024-07-02  1:07   ` [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job Darrick J. Wong
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:07 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Enhance xfs_scrub_all with the ability to initiate a file data scrub
periodically.  The user must specify the period, and they may optionally
specify the path to a file that will record the last time the file data
was scrubbed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 debian/rules                   |    3 +-
 include/builddefs.in           |    3 ++
 man/man8/Makefile              |    7 +++-
 man/man8/xfs_scrub_all.8.in    |   15 ++++++++
 scrub/Makefile                 |    3 ++
 scrub/xfs_scrub_all.in         |   76 +++++++++++++++++++++++++++++++++++++++-
 scrub/xfs_scrub_all.service.in |    6 ++-
 7 files changed, 108 insertions(+), 5 deletions(-)
 rename man/man8/{xfs_scrub_all.8 => xfs_scrub_all.8.in} (63%)


diff --git a/debian/rules b/debian/rules
index 0db0ed8e7543..69a79fc67405 100755
--- a/debian/rules
+++ b/debian/rules
@@ -38,7 +38,8 @@ configure_options = \
 	--disable-ubsan \
 	--disable-addrsan \
 	--disable-threadsan \
-	--enable-lto
+	--enable-lto \
+	--localstatedir=/var
 
 options = export DEBUG=-DNDEBUG DISTRIBUTION=debian \
 	  INSTALL_USER=root INSTALL_GROUP=root \
diff --git a/include/builddefs.in b/include/builddefs.in
index 6ac36c149238..734bd95ecb0b 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -57,6 +57,9 @@ PKG_DOC_DIR	= @datadir@/doc/@pkg_name@
 PKG_LOCALE_DIR	= @datadir@/locale
 PKG_DATA_DIR	= @datadir@/@pkg_name@
 MKFS_CFG_DIR	= @datadir@/@pkg_name@/mkfs
+PKG_STATE_DIR	= @localstatedir@/lib/@pkg_name@
+
+XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_STAMP=$(PKG_STATE_DIR)/xfs_scrub_all_media.stamp
 
 CC		= @cc@
 BUILD_CC	= @BUILD_CC@
diff --git a/man/man8/Makefile b/man/man8/Makefile
index 272e45aebc20..5be76ab727a1 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -11,11 +11,12 @@ ifneq ("$(ENABLE_SCRUB)","yes")
   MAN_PAGES = $(filter-out xfs_scrub%,$(shell echo *.$(MAN_SECTION)))
 else
   MAN_PAGES = $(shell echo *.$(MAN_SECTION))
+  MAN_PAGES += xfs_scrub_all.8
 endif
 MAN_PAGES	+= mkfs.xfs.8
 MAN_DEST	= $(PKG_MAN_DIR)/man$(MAN_SECTION)
 LSRCFILES	= $(MAN_PAGES)
-DIRT		= mkfs.xfs.8
+DIRT		= mkfs.xfs.8 xfs_scrub_all.8
 
 default : $(MAN_PAGES)
 
@@ -29,4 +30,8 @@ mkfs.xfs.8: mkfs.xfs.8.in
 	@echo "    [SED]    $@"
 	$(Q)$(SED) -e 's|@mkfs_cfg_dir@|$(MKFS_CFG_DIR)|g' < $^ > $@
 
+xfs_scrub_all.8: xfs_scrub_all.8.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e 's|@stampfile@|$(XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_STAMP)|g' < $^ > $@
+
 install-dev :
diff --git a/man/man8/xfs_scrub_all.8 b/man/man8/xfs_scrub_all.8.in
similarity index 63%
rename from man/man8/xfs_scrub_all.8
rename to man/man8/xfs_scrub_all.8.in
index 86a9b3eced28..0aa87e237165 100644
--- a/man/man8/xfs_scrub_all.8
+++ b/man/man8/xfs_scrub_all.8.in
@@ -18,6 +18,21 @@ operations can be run in parallel so long as no two scrubbers access
 the same device simultaneously.
 .SH OPTIONS
 .TP
+.B \--auto-media-scan-interval
+Automatically enable the file data scan (i.e. the
+.B -x
+flag) if it has not been run in the specified interval.
+The interval must be a floating point number with an optional unit suffix.
+Supported unit suffixes are
+.IR y ", " q ", " mo ", " w ", " d ", " h ", " m ", and " s
+for years, 90-day quarters, 30-day months, weeks, days, hours, minutes, and
+seconds, respectively.
+If no units are specified, the default is seconds.
+.TP
+.B \--auto-media-scan-stamp
+Path to a file that will record the last time the media scan was run.
+Defaults to @stampfile@.
+.TP
 .B \-h
 Display help.
 .TP
diff --git a/scrub/Makefile b/scrub/Makefile
index 5567ec061e82..091a5c94baa1 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -118,6 +118,7 @@ xfs_scrub_all: xfs_scrub_all.in $(builddefs)
 		   -e "s|@scrub_svcname@|$(scrub_svcname)|g" \
 		   -e "s|@scrub_media_svcname@|$(scrub_media_svcname)|g" \
 		   -e "s|@pkg_version@|$(PKG_VERSION)|g" \
+		   -e "s|@stampfile@|$(XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_STAMP)|g" \
 		   -e "s|@scrub_service_args@|$(XFS_SCRUB_SERVICE_ARGS)|g" \
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" < $< > $@
 	$(Q)chmod a+x $@
@@ -141,6 +142,7 @@ install: $(INSTALL_SCRUB)
 		   -e "s|@scrub_service_args@|$(XFS_SCRUB_SERVICE_ARGS)|g" \
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" \
 		   -e "s|@pkg_libexec_dir@|$(PKG_LIBEXEC_DIR)|g" \
+		   -e "s|@pkg_state_dir@|$(PKG_STATE_DIR)|g" \
 		   < $< > $@
 
 %.cron: %.cron.in $(builddefs)
@@ -161,6 +163,7 @@ install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 $(XFS_SCRUB_ALL_PROG) $(PKG_SBIN_DIR)
+	$(INSTALL) -m 755 -d $(PKG_STATE_DIR)
 
 install-udev: $(UDEV_RULES)
 	$(INSTALL) -m 755 -d $(UDEV_RULE_DIR)
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index afba0dbe8912..9d5cbd2a6487 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -16,6 +16,10 @@ import os
 import argparse
 import signal
 from io import TextIOWrapper
+from pathlib import Path
+from datetime import timedelta
+from datetime import datetime
+from datetime import timezone
 
 retcode = 0
 terminate = False
@@ -248,6 +252,65 @@ def wait_for_termination(cond, killfuncs):
 		fn()
 	return True
 
+def scan_interval(string):
+	'''Convert a textual scan interval argument into a time delta.'''
+
+	if string.endswith('y'):
+		year = timedelta(seconds = 31556952)
+		return year * float(string[:-1])
+	if string.endswith('q'):
+		return timedelta(days = 90 * float(string[:-1]))
+	if string.endswith('mo'):
+		return timedelta(days = 30 * float(string[:-2]))
+	if string.endswith('w'):
+		return timedelta(weeks = float(string[:-1]))
+	if string.endswith('d'):
+		return timedelta(days = float(string[:-1]))
+	if string.endswith('h'):
+		return timedelta(hours = float(string[:-1]))
+	if string.endswith('m'):
+		return timedelta(minutes = float(string[:-1]))
+	if string.endswith('s'):
+		return timedelta(seconds = float(string[:-1]))
+	return timedelta(seconds = int(string))
+
+def utcnow():
+	'''Create a representation of the time right now, in UTC.'''
+
+	dt = datetime.utcnow()
+	return dt.replace(tzinfo = timezone.utc)
+
+def enable_automatic_media_scan(args):
+	'''Decide if we enable media scanning automatically.'''
+	already_enabled = args.x
+
+	try:
+		interval = scan_interval(args.auto_media_scan_interval)
+	except Exception as e:
+		raise Exception('%s: Invalid media scan interval.' % \
+				args.auto_media_scan_interval)
+
+	p = Path(args.auto_media_scan_stamp)
+	if already_enabled:
+		res = True
+	else:
+		try:
+			last_run = p.stat().st_mtime
+			now = utcnow().timestamp()
+			res = last_run + interval.total_seconds() < now
+		except FileNotFoundError:
+			res = True
+
+	if res:
+		# Truncate the stamp file to update its mtime
+		with p.open('w') as f:
+			pass
+		if not already_enabled:
+			print('Automatically enabling file data scrub.')
+			sys.stdout.flush()
+
+	return res
+
 def main():
 	'''Find mounts, schedule scrub runs.'''
 	def thr(mnt, devs):
@@ -262,13 +325,24 @@ def main():
 			action = "store_true")
 	parser.add_argument("-x", help = "Scrub file data after filesystem metadata.", \
 			action = "store_true")
+	parser.add_argument("--auto-media-scan-interval", help = "Automatically scrub file data at this interval.", \
+			default = None)
+	parser.add_argument("--auto-media-scan-stamp", help = "Stamp file for automatic file data scrub.", \
+			default = '@stampfile@')
 	args = parser.parse_args()
 
 	if args.V:
 		print("xfs_scrub_all version @pkg_version@")
 		sys.exit(0)
 
-	scrub_media = args.x
+	if args.auto_media_scan_interval is not None:
+		try:
+			scrub_media = enable_automatic_media_scan(args)
+		except Exception as e:
+			print(e)
+			sys.exit(16)
+	else:
+		scrub_media = args.x
 
 	fs = find_mounts()
 
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 478cd8d057a2..9ecc3af0c1f6 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -34,11 +34,13 @@ CapabilityBoundingSet=
 NoNewPrivileges=true
 RestrictSUIDSGID=true
 
-# Make the entire filesystem readonly.  We don't want to hide anything because
-# we need to find all mounted XFS filesystems in the host.
+# Make the entire filesystem readonly except for the media scan stamp file
+# directory.  We don't want to hide anything because we need to find all
+# mounted XFS filesystems in the host.
 ProtectSystem=strict
 ProtectHome=read-only
 PrivateTmp=false
+BindPaths=@pkg_state_dir@
 
 # No network access except to the systemd control socket
 PrivateNetwork=true


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:07   ` [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically Darrick J. Wong
@ 2024-07-02  1:07   ` Darrick J. Wong
  2024-07-02  5:39     ` Christoph Hellwig
  2024-07-02  1:07   ` [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job Darrick J. Wong
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:07 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Teach the xfs_scrub_all background service to trigger an automatic scan
of all file data once per month.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/Makefile                 |    8 +++++++-
 scrub/xfs_scrub_all.cron.in    |    2 +-
 scrub/xfs_scrub_all.service.in |    2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)


diff --git a/scrub/Makefile b/scrub/Makefile
index 091a5c94baa1..0e09ed127b82 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -108,6 +108,9 @@ CFILES += unicrash.c
 LCFLAGS += -DHAVE_LIBICU $(LIBICU_CFLAGS)
 endif
 
+# Automatically trigger a media scan once per month
+XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_INTERVAL=1mo
+
 LDIRT = $(XFS_SCRUB_ALL_PROG) $(XFS_SCRUB_FAIL_PROG) *.service *.cron
 
 default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG) $(XFS_SCRUB_FAIL_PROG) $(OPTIONAL_TARGETS)
@@ -143,11 +146,14 @@ install: $(INSTALL_SCRUB)
 		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" \
 		   -e "s|@pkg_libexec_dir@|$(PKG_LIBEXEC_DIR)|g" \
 		   -e "s|@pkg_state_dir@|$(PKG_STATE_DIR)|g" \
+		   -e "s|@media_scan_interval@|$(XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_INTERVAL)|g" \
 		   < $< > $@
 
 %.cron: %.cron.in $(builddefs)
 	@echo "    [SED]    $@"
-	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" < $< > $@
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \
+		   -e "s|@media_scan_interval@|$(XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_INTERVAL)|g" \
+		   < $< > $@
 
 install-systemd: default $(SYSTEMD_SERVICES)
 	$(INSTALL) -m 755 -d $(SYSTEMD_SYSTEM_UNIT_DIR)
diff --git a/scrub/xfs_scrub_all.cron.in b/scrub/xfs_scrub_all.cron.in
index e08a9daf9259..e6633e091c42 100644
--- a/scrub/xfs_scrub_all.cron.in
+++ b/scrub/xfs_scrub_all.cron.in
@@ -3,4 +3,4 @@
 # Copyright (C) 2018-2024 Oracle.  All Rights Reserved.
 # Author: Darrick J. Wong <djwong@kernel.org>
 #
-10 3 * * 0 root test -e /run/systemd/system || @sbindir@/xfs_scrub_all
+10 3 * * 0 root test -e /run/systemd/system || @sbindir@/xfs_scrub_all --auto-media-scan-interval @media_scan_interval@
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 9ecc3af0c1f6..8ed682989048 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -12,7 +12,7 @@ After=paths.target multi-user.target network.target network-online.target system
 [Service]
 Type=oneshot
 Environment=SERVICE_MODE=1
-ExecStart=@sbindir@/xfs_scrub_all
+ExecStart=@sbindir@/xfs_scrub_all --auto-media-scan-interval @media_scan_interval@
 SyslogIdentifier=xfs_scrub_all
 
 # Create the service underneath the scrub background service slice so that we


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job
  2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:07   ` [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month Darrick J. Wong
@ 2024-07-02  1:07   ` Darrick J. Wong
  2024-07-02  5:39     ` Christoph Hellwig
  5 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:07 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a failure reporting service for when xfs_scrub_all fails.  This
shouldn't happen often, but let's report anyways.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/Makefile                      |    1 
 scrub/xfs_scrub_all.service.in      |    1 
 scrub/xfs_scrub_all_fail.service.in |   71 +++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub_fail.in             |   35 ++++++++++++++---
 4 files changed, 101 insertions(+), 7 deletions(-)
 create mode 100644 scrub/xfs_scrub_all_fail.service.in


diff --git a/scrub/Makefile b/scrub/Makefile
index 0e09ed127b82..7e6882450d54 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -26,6 +26,7 @@ SYSTEMD_SERVICES=\
 	$(scrub_media_svcname) \
 	xfs_scrub_media_fail@.service \
 	xfs_scrub_all.service \
+	xfs_scrub_all_fail.service \
 	xfs_scrub_all.timer \
 	system-xfs_scrub.slice
 OPTIONAL_TARGETS += $(SYSTEMD_SERVICES)
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 8ed682989048..b86b787d2ee3 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -5,6 +5,7 @@
 
 [Unit]
 Description=Online XFS Metadata Check for All Filesystems
+OnFailure=xfs_scrub_all_fail.service
 ConditionACPower=true
 Documentation=man:xfs_scrub_all(8)
 After=paths.target multi-user.target network.target network-online.target systemd-networkd.service NetworkManager.service connman.service
diff --git a/scrub/xfs_scrub_all_fail.service.in b/scrub/xfs_scrub_all_fail.service.in
new file mode 100644
index 000000000000..53479db84771
--- /dev/null
+++ b/scrub/xfs_scrub_all_fail.service.in
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+# Author: Darrick J. Wong <djwong@kernel.org>
+
+[Unit]
+Description=Online XFS Metadata Check for All Filesystems Failure Reporting
+Documentation=man:xfs_scrub_all(8)
+
+[Service]
+Type=oneshot
+Environment=EMAIL_ADDR=root
+ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" xfs_scrub_all
+User=mail
+Group=mail
+SupplementaryGroups=systemd-journal
+
+# No realtime scheduling
+RestrictRealtime=true
+
+# Make the entire filesystem readonly and /home inaccessible.
+ProtectSystem=full
+ProtectHome=yes
+PrivateTmp=true
+RestrictSUIDSGID=true
+
+# Emailing reports requires network access, but not the ability to change the
+# hostname.
+ProtectHostname=true
+
+# Don't let the program mess with the kernel configuration at all
+ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+ProtectControlGroups=true
+ProtectProc=invisible
+RestrictNamespaces=true
+
+# Can't hide /proc because journalctl needs it to find various pieces of log
+# information
+#ProcSubset=pid
+
+# Only allow the default personality Linux
+LockPersonality=true
+
+# No writable memory pages
+MemoryDenyWriteExecute=true
+
+# Don't let our mounts leak out to the host
+PrivateMounts=true
+
+# Restrict system calls to the native arch and only enough to get things going
+SystemCallArchitectures=native
+SystemCallFilter=@system-service
+SystemCallFilter=~@privileged
+SystemCallFilter=~@resources
+SystemCallFilter=~@mount
+
+# xfs_scrub needs these privileges to run, and no others
+CapabilityBoundingSet=
+NoNewPrivileges=true
+
+# Failure reporting shouldn't create world-readable files
+UMask=0077
+
+# Clean up any IPC objects when this unit stops
+RemoveIPC=true
+
+# No access to hardware device files
+PrivateDevices=true
+ProtectClock=true
diff --git a/scrub/xfs_scrub_fail.in b/scrub/xfs_scrub_fail.in
index e420917f699f..089b438f03c0 100755
--- a/scrub/xfs_scrub_fail.in
+++ b/scrub/xfs_scrub_fail.in
@@ -5,14 +5,13 @@
 # Copyright (C) 2018-2024 Oracle.  All Rights Reserved.
 # Author: Darrick J. Wong <djwong@kernel.org>
 
-# Email logs of failed xfs_scrub unit runs
+# Email logs of failed xfs_scrub and xfs_scrub_all unit runs
 
 recipient="$1"
 test -z "${recipient}" && exit 0
 service="$2"
 test -z "${service}" && exit 0
 mntpoint="$3"
-test -z "${mntpoint}" && exit 0
 
 hostname="$(hostname -f 2>/dev/null)"
 test -z "${hostname}" && hostname="${HOSTNAME}"
@@ -23,11 +22,13 @@ if [ ! -x "${mailer}" ]; then
 	exit 1
 fi
 
-# Turn the mountpoint into a properly escaped systemd instance name
-scrub_svc="$(systemd-escape --template "${service}@.service" --path "${mntpoint}")"
+fail_mail_mntpoint() {
+	local scrub_svc
 
-(cat << ENDL
-To: $1
+	# Turn the mountpoint into a properly escaped systemd instance name
+	scrub_svc="$(systemd-escape --template "${service}@.service" --path "${mntpoint}")"
+	cat << ENDL
+To: ${recipient}
 From: <${service}@${hostname}>
 Subject: ${service} failure on ${mntpoint}
 Content-Transfer-Encoding: 8bit
@@ -38,5 +39,25 @@ Please do not reply to this mesage.
 
 A log of what happened follows:
 ENDL
-systemctl status --full --lines 4294967295 "${scrub_svc}") | "${mailer}" -t -i
+	systemctl status --full --lines 4294967295 "${scrub_svc}"
+}
+
+fail_mail() {
+	cat << ENDL
+To: ${recipient}
+From: <${service}@${hostname}>
+Subject: ${service} failure
+
+So sorry, the automatic ${service} on ${hostname} failed.
+
+A log of what happened follows:
+ENDL
+	systemctl status --full --lines 4294967295 "${service}"
+}
+
+if [ -n "${mntpoint}" ]; then
+	fail_mail_mntpoint | "${mailer}" -t -i
+else
+	fail_mail | "${mailer}" -t -i
+fi
 exit "${PIPESTATUS[1]}"


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
@ 2024-07-02  1:08   ` Darrick J. Wong
  2024-07-02  5:40     ` Christoph Hellwig
  2024-07-02  1:08   ` [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl " Darrick J. Wong
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:08 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Move all the xfs_scrub subprocess handling code to an object so that we
can contain all the details in a single place.  This also simplifies the
background state management.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.in |   68 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 54 insertions(+), 14 deletions(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index 9d5cbd2a6487..001c49a70128 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -78,15 +78,62 @@ def remove_killfunc(killfuncs, fn):
 	except:
 		pass
 
-def run_killable(cmd, stdout, killfuncs):
+class scrub_control(object):
+	'''Control object for xfs_scrub.'''
+	def __init__(self):
+		pass
+
+	def start(self):
+		'''Start scrub and wait for it to complete.  Returns -1 if the
+		service was not started, 0 if it succeeded, or 1 if it
+		failed.'''
+		assert False
+
+	def stop(self):
+		'''Stop scrub.'''
+		assert False
+
+class scrub_subprocess(scrub_control):
+	'''Control object for xfs_scrub subprocesses.'''
+	def __init__(self, mnt, scrub_media):
+		cmd = ['@sbindir@/xfs_scrub']
+		if 'SERVICE_MODE' in os.environ:
+			cmd += '@scrub_service_args@'.split()
+		cmd += '@scrub_args@'.split()
+		if scrub_media:
+			cmd += '-x'
+		cmd += [mnt]
+		self.cmdline = cmd
+		self.proc = None
+
+	def start(self):
+		'''Start xfs_scrub and wait for it to complete.  Returns -1 if
+		the service was not started, 0 if it succeeded, or 1 if it
+		failed.'''
+		try:
+			self.proc = subprocess.Popen(self.cmdline)
+			self.proc.wait()
+		except:
+			return -1
+
+		proc = self.proc
+		self.proc = None
+		return proc.returncode
+
+	def stop(self):
+		'''Stop xfs_scrub.'''
+		if self.proc is not None:
+			self.proc.terminate()
+
+def run_subprocess(mnt, scrub_media, killfuncs):
 	'''Run a killable program.  Returns program retcode or -1 if we can't
 	start it.'''
 	try:
-		proc = subprocess.Popen(cmd, stdout = stdout)
-		killfuncs.add(proc.terminate)
-		proc.wait()
-		remove_killfunc(killfuncs, proc.terminate)
-		return proc.returncode
+		p = scrub_subprocess(mnt, scrub_media)
+		killfuncs.add(p.stop)
+		ret = p.start()
+		remove_killfunc(killfuncs, p.stop)
+		return ret
 	except:
 		return -1
 
@@ -188,14 +235,7 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 		# Invoke xfs_scrub manually if we're running in the foreground.
 		# We also permit this if we're running as a cronjob where
 		# systemd services are unavailable.
-		cmd = ['@sbindir@/xfs_scrub']
-		if 'SERVICE_MODE' in os.environ:
-			cmd += '@scrub_service_args@'.split()
-		cmd += '@scrub_args@'.split()
-		if scrub_media:
-			cmd += '-x'
-		cmd += [mnt]
-		ret = run_killable(cmd, None, killfuncs)
+		ret = run_subprocess(mnt, scrub_media, killfuncs)
 		if ret >= 0:
 			print("Scrubbing %s done, (err=%d)" % (mnt, ret))
 			sys.stdout.flush()


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl code in an object
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
  2024-07-02  1:08   ` [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object Darrick J. Wong
@ 2024-07-02  1:08   ` Darrick J. Wong
  2024-07-02  5:41     ` Christoph Hellwig
  2024-07-02  1:08   ` [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging Darrick J. Wong
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:08 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Move all the systemd service handling code to an object so that we can
contain all the insanity^Wdetails in a single place.  This also makes
the killfuncs handling similar to starting background processes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.in |  113 ++++++++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 52 deletions(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index 001c49a70128..09fedff9d965 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -149,63 +149,73 @@ def path_to_serviceunit(path, scrub_media):
 		svcname = '@scrub_svcname@'
 	cmd = ['systemd-escape', '--template', svcname, '--path', path]
 
-	try:
-		proc = subprocess.Popen(cmd, stdout = subprocess.PIPE)
-		proc.wait()
-		for line in proc.stdout:
-			return line.decode(sys.stdout.encoding).strip()
-	except:
-		return None
+	proc = subprocess.Popen(cmd, stdout = subprocess.PIPE)
+	proc.wait()
+	for line in proc.stdout:
+		return line.decode(sys.stdout.encoding).strip()
 
-def systemctl_stop(unitname):
-	'''Stop a systemd unit.'''
-	cmd = ['systemctl', 'stop', unitname]
-	x = subprocess.Popen(cmd)
-	x.wait()
+class scrub_service(scrub_control):
+	'''Control object for xfs_scrub systemd service.'''
+	def __init__(self, mnt, scrub_media):
+		self.unitname = path_to_serviceunit(mnt, scrub_media)
 
-def systemctl_start(unitname, killfuncs):
-	'''Start a systemd unit and wait for it to complete.'''
-	stop_fn = None
-	cmd = ['systemctl', 'start', unitname]
-	try:
-		proc = subprocess.Popen(cmd, stdout = DEVNULL())
-		stop_fn = lambda: systemctl_stop(unitname)
-		killfuncs.add(stop_fn)
-		proc.wait()
-		ret = proc.returncode
-	except:
-		if stop_fn is not None:
-			remove_killfunc(killfuncs, stop_fn)
-		return -1
+	def wait(self, interval = 1):
+		'''Wait until the service finishes.'''
 
-	if ret != 1:
-		remove_killfunc(killfuncs, stop_fn)
-		return ret
+		# As of systemd 249, the is-active command returns any of the
+		# following states: active, reloading, inactive, failed,
+		# activating, deactivating, or maintenance.  Apparently these
+		# strings are not localized.
+		while True:
+			try:
+				for l in backtick(['systemctl', 'is-active', self.unitname]):
+					if l == 'failed':
+						return 1
+					if l == 'inactive':
+						return 0
+			except:
+				return -1
 
-	# If systemctl-start returns 1, it's possible that the service failed
-	# or that dbus/systemd restarted and the client program lost its
-	# connection -- according to the systemctl man page, 1 means "unit not
-	# failed".
-	#
-	# Either way, we switch to polling the service status to try to wait
-	# for the service to end.  As of systemd 249, the is-active command
-	# returns any of the following states: active, reloading, inactive,
-	# failed, activating, deactivating, or maintenance.  Apparently these
-	# strings are not localized.
-	while True:
+			time.sleep(interval)
+
+	def start(self):
+		'''Start the service and wait for it to complete.  Returns -1
+		if the service was not started, 0 if it succeeded, or 1 if it
+		failed.'''
+		cmd = ['systemctl', 'start', self.unitname]
 		try:
-			for l in backtick(['systemctl', 'is-active', unitname]):
-				if l == 'failed':
-					remove_killfunc(killfuncs, stop_fn)
-					return 1
-				if l == 'inactive':
-					remove_killfunc(killfuncs, stop_fn)
-					return 0
+			proc = subprocess.Popen(cmd, stdout = DEVNULL())
+			proc.wait()
+			ret = proc.returncode
 		except:
-			remove_killfunc(killfuncs, stop_fn)
 			return -1
 
-		time.sleep(1)
+		if ret != 1:
+			return ret
+
+		# If systemctl-start returns 1, it's possible that the service
+		# failed or that dbus/systemd restarted and the client program
+		# lost its connection -- according to the systemctl man page, 1
+		# means "unit not failed".
+		return self.wait()
+
+	def stop(self):
+		'''Stop the service.'''
+		cmd = ['systemctl', 'stop', self.unitname]
+		x = subprocess.Popen(cmd)
+		x.wait()
+
+def run_service(mnt, scrub_media, killfuncs):
+	'''Run scrub as a service.'''
+	try:
+		svc = scrub_service(mnt, scrub_media)
+	except:
+		return -1
+
+	killfuncs.add(svc.stop)
+	retcode = svc.start()
+	remove_killfunc(killfuncs, svc.stop)
+	return retcode
 
 def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 	'''Run a scrub process.'''
@@ -220,9 +230,8 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 
 		# Run per-mount systemd xfs_scrub service only if we ourselves
 		# are running as a systemd service.
-		unitname = path_to_serviceunit(path, scrub_media)
-		if unitname is not None and 'SERVICE_MODE' in os.environ:
-			ret = systemctl_start(unitname, killfuncs)
+		if 'SERVICE_MODE' in os.environ:
+			ret = run_service(mnt, scrub_media, killfuncs)
 			if ret == 0 or ret == 1:
 				print("Scrubbing %s done, (err=%d)" % (mnt, ret))
 				sys.stdout.flush()


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
  2024-07-02  1:08   ` [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object Darrick J. Wong
  2024-07-02  1:08   ` [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl " Darrick J. Wong
@ 2024-07-02  1:08   ` Darrick J. Wong
  2024-07-02  5:42     ` Christoph Hellwig
  2024-07-02  1:08   ` [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls Darrick J. Wong
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:08 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add a new CLI argument to make it easier to figure out what exactly the
program is doing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.in |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index 09fedff9d965..d5d1d13a2552 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -24,6 +24,7 @@ from datetime import timezone
 retcode = 0
 terminate = False
 scrub_media = False
+debug = False
 
 def DEVNULL():
 	'''Return /dev/null in subprocess writable format.'''
@@ -110,6 +111,11 @@ class scrub_subprocess(scrub_control):
 		'''Start xfs_scrub and wait for it to complete.  Returns -1 if
 		the service was not started, 0 if it succeeded, or 1 if it
 		failed.'''
+		global debug
+
+		if debug:
+			print('run ', ' '.join(self.cmdline))
+
 		try:
 			self.proc = subprocess.Popen(self.cmdline)
 			self.proc.wait()
@@ -122,6 +128,10 @@ class scrub_subprocess(scrub_control):
 
 	def stop(self):
 		'''Stop xfs_scrub.'''
+		global debug
+
+		if debug:
+			print('kill ', ' '.join(self.cmdline))
 		if self.proc is not None:
 			self.proc.terminate()
 
@@ -182,8 +192,12 @@ class scrub_service(scrub_control):
 		'''Start the service and wait for it to complete.  Returns -1
 		if the service was not started, 0 if it succeeded, or 1 if it
 		failed.'''
+		global debug
+
 		cmd = ['systemctl', 'start', self.unitname]
 		try:
+			if debug:
+				print(' '.join(cmd))
 			proc = subprocess.Popen(cmd, stdout = DEVNULL())
 			proc.wait()
 			ret = proc.returncode
@@ -201,7 +215,11 @@ class scrub_service(scrub_control):
 
 	def stop(self):
 		'''Stop the service.'''
+		global debug
+
 		cmd = ['systemctl', 'stop', self.unitname]
+		if debug:
+			print(' '.join(cmd))
 		x = subprocess.Popen(cmd)
 		x.wait()
 
@@ -366,10 +384,12 @@ def main():
 		a = (mnt, cond, running_devs, devs, killfuncs)
 		thr = threading.Thread(target = run_scrub, args = a)
 		thr.start()
-	global retcode, terminate, scrub_media
+	global retcode, terminate, scrub_media, debug
 
 	parser = argparse.ArgumentParser( \
 			description = "Scrub all mounted XFS filesystems.")
+	parser.add_argument("--debug", help = "Enabling debugging messages.", \
+			action = "store_true")
 	parser.add_argument("-V", help = "Report version and exit.", \
 			action = "store_true")
 	parser.add_argument("-x", help = "Scrub file data after filesystem metadata.", \
@@ -384,6 +404,9 @@ def main():
 		print("xfs_scrub_all version @pkg_version@")
 		sys.exit(0)
 
+	if args.debug:
+		debug = True
+
 	if args.auto_media_scan_interval is not None:
 		try:
 			scrub_media = enable_automatic_media_scan(args)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:08   ` [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging Darrick J. Wong
@ 2024-07-02  1:08   ` Darrick J. Wong
  2024-07-02  5:42     ` Christoph Hellwig
  2024-07-02  1:09   ` [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls Darrick J. Wong
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:08 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Convert the systemctl invocations to direct dbus calls, which decouples
us from the CLI in favor of direct API calls.  This spares us from some
of the insanity of divining service state from program outputs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 debian/control         |    2 +
 scrub/xfs_scrub_all.in |   96 +++++++++++++++++++++++++++++++-----------------
 2 files changed, 63 insertions(+), 35 deletions(-)


diff --git a/debian/control b/debian/control
index 344466de0161..31773e53a19a 100644
--- a/debian/control
+++ b/debian/control
@@ -8,7 +8,7 @@ Standards-Version: 4.0.0
 Homepage: https://xfs.wiki.kernel.org/
 
 Package: xfsprogs
-Depends: ${shlibs:Depends}, ${misc:Depends}, python3:any
+Depends: ${shlibs:Depends}, ${misc:Depends}, python3-dbus, python3:any
 Provides: fsck-backend
 Suggests: xfsdump, acl, attr, quota
 Breaks: xfsdump (<< 3.0.0)
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index d5d1d13a2552..a09566efdcd8 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -15,6 +15,7 @@ import sys
 import os
 import argparse
 import signal
+import dbus
 from io import TextIOWrapper
 from pathlib import Path
 from datetime import timedelta
@@ -168,25 +169,57 @@ class scrub_service(scrub_control):
 	'''Control object for xfs_scrub systemd service.'''
 	def __init__(self, mnt, scrub_media):
 		self.unitname = path_to_serviceunit(mnt, scrub_media)
+		self.prop = None
+		self.unit = None
+		self.bind()
+
+	def bind(self):
+		'''Bind to the dbus proxy object for this service.'''
+		sysbus = dbus.SystemBus()
+		systemd1 = sysbus.get_object('org.freedesktop.systemd1',
+					    '/org/freedesktop/systemd1')
+		manager = dbus.Interface(systemd1,
+				'org.freedesktop.systemd1.Manager')
+		path = manager.LoadUnit(self.unitname)
+
+		svc_obj = sysbus.get_object('org.freedesktop.systemd1', path)
+		self.prop = dbus.Interface(svc_obj,
+				'org.freedesktop.DBus.Properties')
+		self.unit = dbus.Interface(svc_obj,
+				'org.freedesktop.systemd1.Unit')
+
+	def state(self):
+		'''Retrieve the active state for a systemd service.  As of
+		systemd 249, this is supposed to be one of the following:
+		"active", "reloading", "inactive", "failed", "activating",
+		or "deactivating".  These strings are not localized.'''
+		global debug
+
+		try:
+			return self.prop.Get('org.freedesktop.systemd1.Unit', 'ActiveState')
+		except Exception as e:
+			if debug:
+				print(e, file = sys.stderr)
+			return 'failed'
 
 	def wait(self, interval = 1):
 		'''Wait until the service finishes.'''
+		global debug
 
-		# As of systemd 249, the is-active command returns any of the
-		# following states: active, reloading, inactive, failed,
-		# activating, deactivating, or maintenance.  Apparently these
-		# strings are not localized.
-		while True:
-			try:
-				for l in backtick(['systemctl', 'is-active', self.unitname]):
-					if l == 'failed':
-						return 1
-					if l == 'inactive':
-						return 0
-			except:
-				return -1
-
+		# Use a poll/sleep loop to wait for the service to finish.
+		# Avoid adding a dependency on python3 glib, which is required
+		# to use an event loop to receive a dbus signal.
+		s = self.state()
+		while s not in ['failed', 'inactive']:
+			if debug:
+				print('waiting %s %s' % (self.unitname, s))
 			time.sleep(interval)
+			s = self.state()
+		if debug:
+			print('waited %s %s' % (self.unitname, s))
+		if s == 'failed':
+			return 1
+		return 0
 
 	def start(self):
 		'''Start the service and wait for it to complete.  Returns -1
@@ -194,34 +227,29 @@ class scrub_service(scrub_control):
 		failed.'''
 		global debug
 
-		cmd = ['systemctl', 'start', self.unitname]
+		if debug:
+			print('starting %s' % self.unitname)
+
 		try:
-			if debug:
-				print(' '.join(cmd))
-			proc = subprocess.Popen(cmd, stdout = DEVNULL())
-			proc.wait()
-			ret = proc.returncode
-		except:
+			self.unit.Start('replace')
+			return self.wait()
+		except Exception as e:
+			print(e, file = sys.stderr)
 			return -1
 
-		if ret != 1:
-			return ret
-
-		# If systemctl-start returns 1, it's possible that the service
-		# failed or that dbus/systemd restarted and the client program
-		# lost its connection -- according to the systemctl man page, 1
-		# means "unit not failed".
-		return self.wait()
-
 	def stop(self):
 		'''Stop the service.'''
 		global debug
 
-		cmd = ['systemctl', 'stop', self.unitname]
 		if debug:
-			print(' '.join(cmd))
-		x = subprocess.Popen(cmd)
-		x.wait()
+			print('stopping %s' % self.unitname)
+
+		try:
+			self.unit.Stop('replace')
+			return self.wait()
+		except Exception as e:
+			print(e, file = sys.stderr)
+			return -1
 
 def run_service(mnt, scrub_media, killfuncs):
 	'''Run scrub as a service.'''


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls
  2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:08   ` [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus Darrick J. Wong
@ 2024-07-02  1:09   ` Darrick J. Wong
  2024-07-02  5:42     ` Christoph Hellwig
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Calls to systemd across dbus are remote procedure calls, which means
that they're subject to transitory connection failures (e.g. systemd
re-exec itself).  We don't want to fail at the *first* sign of what
could be temporary trouble, so implement a limited retry with fibonacci
backoff before we resort to invoking xfs_scrub as a subprocess.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/xfs_scrub_all.in |   43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index a09566efdcd8..71726cdf36d5 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -165,6 +165,22 @@ def path_to_serviceunit(path, scrub_media):
 	for line in proc.stdout:
 		return line.decode(sys.stdout.encoding).strip()
 
+def fibonacci(max_ret):
+	'''Yield fibonacci sequence up to but not including max_ret.'''
+	if max_ret < 1:
+		return
+
+	x = 0
+	y = 1
+	yield 1
+
+	z = x + y
+	while z <= max_ret:
+		yield z
+		x = y
+		y = z
+		z = x + y
+
 class scrub_service(scrub_control):
 	'''Control object for xfs_scrub systemd service.'''
 	def __init__(self, mnt, scrub_media):
@@ -188,6 +204,25 @@ class scrub_service(scrub_control):
 		self.unit = dbus.Interface(svc_obj,
 				'org.freedesktop.systemd1.Unit')
 
+	def __dbusrun(self, lambda_fn):
+		'''Call the lambda function to execute something on dbus.  dbus
+		exceptions result in retries with Fibonacci backoff, and the
+		bindings will be rebuilt every time.'''
+		global debug
+
+		fatal_ex = None
+
+		for i in fibonacci(30):
+			try:
+				return lambda_fn()
+			except dbus.exceptions.DBusException as e:
+				if debug:
+					print(e)
+				fatal_ex = e
+				time.sleep(i)
+				self.bind()
+		raise fatal_ex
+
 	def state(self):
 		'''Retrieve the active state for a systemd service.  As of
 		systemd 249, this is supposed to be one of the following:
@@ -195,8 +230,10 @@ class scrub_service(scrub_control):
 		or "deactivating".  These strings are not localized.'''
 		global debug
 
+		l = lambda: self.prop.Get('org.freedesktop.systemd1.Unit',
+				'ActiveState')
 		try:
-			return self.prop.Get('org.freedesktop.systemd1.Unit', 'ActiveState')
+			return self.__dbusrun(l)
 		except Exception as e:
 			if debug:
 				print(e, file = sys.stderr)
@@ -231,7 +268,7 @@ class scrub_service(scrub_control):
 			print('starting %s' % self.unitname)
 
 		try:
-			self.unit.Start('replace')
+			self.__dbusrun(lambda: self.unit.Start('replace'))
 			return self.wait()
 		except Exception as e:
 			print(e, file = sys.stderr)
@@ -245,7 +282,7 @@ class scrub_service(scrub_control):
 			print('stopping %s' % self.unitname)
 
 		try:
-			self.unit.Stop('replace')
+			self.__dbusrun(lambda: self.unit.Stop('replace'))
 			return self.wait()
 		except Exception as e:
 			print(e, file = sys.stderr)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode
  2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
@ 2024-07-02  1:09   ` Darrick J. Wong
  2024-07-02  5:43     ` Christoph Hellwig
  2024-07-02  1:09   ` [PATCH 2/3] xfs_scrub: add an optimization-only mode Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

When service mode is enabled, xfs_scrub is being run within the context
of a systemd service.  The service description language doesn't have any
particularly good constructs for adding in a '-n' argument if the
filesystem is readonly, which means that xfs_scrub is passed a path, and
needs to switch to dry-run mode on its own if the fs is mounted
readonly or the kernel doesn't support repairs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase1.c |   13 +++++++++++++
 scrub/repair.c |   33 +++++++++++++++++++++++++++++++++
 scrub/repair.h |    2 ++
 3 files changed, 48 insertions(+)


diff --git a/scrub/phase1.c b/scrub/phase1.c
index 516d929d6268..095c045915a7 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -216,6 +216,19 @@ _("Kernel metadata scrubbing facility is not available."));
 		return ECANCELED;
 	}
 
+	/*
+	 * Normally, callers are required to pass -n if the provided path is a
+	 * readonly filesystem or the kernel wasn't built with online repair
+	 * enabled.  However, systemd services are not scripts and cannot
+	 * determine either of these conditions programmatically.  Change the
+	 * behavior to dry-run mode if either condition is detected.
+	 */
+	if (repair_want_service_downgrade(ctx)) {
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem cannot be repaired in service mode, downgrading to dry-run mode."));
+		ctx->mode = SCRUB_MODE_DRY_RUN;
+	}
+
 	/* Do we need kernel-assisted metadata repair? */
 	if (ctx->mode != SCRUB_MODE_DRY_RUN && !can_repair(ctx)) {
 		str_error(ctx, ctx->mntpoint,
diff --git a/scrub/repair.c b/scrub/repair.c
index 19f5c9052aff..2883f98af4ab 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -45,6 +45,39 @@ static const unsigned int repair_deps[XFS_SCRUB_TYPE_NR] = {
 };
 #undef DEP
 
+/*
+ * Decide if we want an automatic downgrade to dry-run mode.  This is only
+ * for service mode, where we are fed a path and have to figure out if the fs
+ * is repairable or not.
+ */
+bool
+repair_want_service_downgrade(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_scrub_metadata	meta = {
+		.sm_type		= XFS_SCRUB_TYPE_PROBE,
+		.sm_flags		= XFS_SCRUB_IFLAG_REPAIR,
+	};
+	int				error;
+
+	if (ctx->mode == SCRUB_MODE_DRY_RUN)
+		return false;
+	if (!is_service)
+		return false;
+	if (debug_tweak_on("XFS_SCRUB_NO_KERNEL"))
+		return false;
+
+	error = -xfrog_scrub_metadata(&ctx->mnt, &meta);
+	switch (error) {
+	case EROFS:
+	case ENOTRECOVERABLE:
+	case EOPNOTSUPP:
+		return true;
+	}
+
+	return false;
+}
+
 /* Repair some metadata. */
 static int
 xfs_repair_metadata(
diff --git a/scrub/repair.h b/scrub/repair.h
index a685e90374cb..411a379f6faa 100644
--- a/scrub/repair.h
+++ b/scrub/repair.h
@@ -102,4 +102,6 @@ repair_item_completely(
 	return repair_item(ctx, sri, XRM_FINAL_WARNING | XRM_NOPROGRESS);
 }
 
+bool repair_want_service_downgrade(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_REPAIR_H_ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/3] xfs_scrub: add an optimization-only mode
  2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode Darrick J. Wong
@ 2024-07-02  1:09   ` Darrick J. Wong
  2024-07-02  5:43     ` Christoph Hellwig
  2024-07-02  1:09   ` [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add a "preen" mode in which we only optimize filesystem metadata.
Repairs will result in early exits.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_scrub.8 |    6 +++++-
 scrub/Makefile       |    2 +-
 scrub/phase4.c       |    6 ++++++
 scrub/repair.c       |    4 +++-
 scrub/scrub.c        |    4 ++--
 scrub/xfs_scrub.c    |   21 +++++++++++++++++++--
 scrub/xfs_scrub.h    |    1 +
 7 files changed, 37 insertions(+), 7 deletions(-)


diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index 6154011271e6..1fd122f2a242 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -4,7 +4,7 @@ xfs_scrub \- check and repair the contents of a mounted XFS filesystem
 .SH SYNOPSIS
 .B xfs_scrub
 [
-.B \-abCeMmnTvx
+.B \-abCeMmnpTvx
 ]
 .I mount-point
 .br
@@ -128,6 +128,10 @@ Treat informational messages as warnings.
 This will result in a nonzero return code, and a higher logging level.
 .RE
 .TP
+.B \-p
+Only optimize filesystem metadata.
+If repairs are required, report them and exit.
+.TP
 .BI \-T
 Print timing and memory usage information for each phase.
 .TP
diff --git a/scrub/Makefile b/scrub/Makefile
index 7e6882450d54..885b43e9948d 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -16,7 +16,7 @@ LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 XFS_SCRUB_ALL_PROG = xfs_scrub_all
 XFS_SCRUB_FAIL_PROG = xfs_scrub_fail
-XFS_SCRUB_ARGS = -n
+XFS_SCRUB_ARGS = -p
 XFS_SCRUB_SERVICE_ARGS = -b
 ifeq ($(HAVE_SYSTEMD),yes)
 INSTALL_SCRUB += install-systemd
diff --git a/scrub/phase4.c b/scrub/phase4.c
index 451101811c9b..88cb53aeac90 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -240,6 +240,12 @@ phase4_func(
 	    action_list_empty(ctx->file_repair_list))
 		return 0;
 
+	if (ctx->mode == SCRUB_MODE_PREEN && ctx->corruptions_found) {
+		str_info(ctx, ctx->mntpoint,
+ _("Corruptions found; will not optimize.  Re-run without -p.\n"));
+		return 0;
+	}
+
 	/*
 	 * Check the resource usage counters early.  Normally we do this during
 	 * phase 7, but some of the cross-referencing requires fairly accurate
diff --git a/scrub/repair.c b/scrub/repair.c
index 2883f98af4ab..0258210722ba 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -651,7 +651,9 @@ repair_item_class(
 	unsigned int			scrub_type;
 	int				error = 0;
 
-	if (ctx->mode < SCRUB_MODE_REPAIR)
+	if (ctx->mode == SCRUB_MODE_DRY_RUN)
+		return 0;
+	if (ctx->mode == SCRUB_MODE_PREEN && !(repair_mask & SCRUB_ITEM_PREEN))
 		return 0;
 
 	/*
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 2b6b6274e382..1b0609e7418b 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -174,7 +174,7 @@ _("Filesystem is shut down, aborting."));
 	 * repair if desired, otherwise complain.
 	 */
 	if (is_corrupt(&meta) || xref_disagrees(&meta)) {
-		if (ctx->mode < SCRUB_MODE_REPAIR) {
+		if (ctx->mode != SCRUB_MODE_REPAIR) {
 			/* Dry-run mode, so log an error and forget it. */
 			str_corrupt(ctx, descr_render(&dsc),
 _("Repairs are required."));
@@ -192,7 +192,7 @@ _("Repairs are required."));
 	 * otherwise complain.
 	 */
 	if (is_unoptimized(&meta)) {
-		if (ctx->mode != SCRUB_MODE_REPAIR) {
+		if (ctx->mode == SCRUB_MODE_DRY_RUN) {
 			/* Dry-run mode, so log an error and forget it. */
 			if (group != XFROG_SCRUB_GROUP_INODE) {
 				/* AG or FS metadata, always warn. */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index d7cef115deea..bb316f73e02c 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -183,6 +183,7 @@ usage(void)
 	fprintf(stderr, _("  -k           Do not FITRIM the free space.\n"));
 	fprintf(stderr, _("  -m path      Path to /etc/mtab.\n"));
 	fprintf(stderr, _("  -n           Dry run.  Do not modify anything.\n"));
+	fprintf(stderr, _("  -p           Only optimize, do not fix corruptions.\n"));
 	fprintf(stderr, _("  -T           Display timing/usage information.\n"));
 	fprintf(stderr, _("  -v           Verbose output.\n"));
 	fprintf(stderr, _("  -V           Print version.\n"));
@@ -463,6 +464,11 @@ run_scrub_phases(
 			sp->descr = _("Repair filesystem.");
 			sp->fn = phase4_func;
 			sp->must_run = true;
+		} else if (sp->fn == REPAIR_DUMMY_FN &&
+			   ctx->mode == SCRUB_MODE_PREEN) {
+			sp->descr = _("Optimize filesystem.");
+			sp->fn = phase4_func;
+			sp->must_run = true;
 		}
 
 		/* Skip certain phases unless they're turned on. */
@@ -601,7 +607,7 @@ report_outcome(
 	if (ctx->scrub_setup_succeeded && actionable_errors > 0) {
 		char		*msg;
 
-		if (ctx->mode == SCRUB_MODE_DRY_RUN)
+		if (ctx->mode != SCRUB_MODE_REPAIR)
 			msg = _("%s: Re-run xfs_scrub without -n.\n");
 		else
 			msg = _("%s: Unmount and run xfs_repair.\n");
@@ -725,7 +731,7 @@ main(
 	pthread_mutex_init(&ctx.lock, NULL);
 	ctx.mode = SCRUB_MODE_REPAIR;
 	ctx.error_action = ERRORS_CONTINUE;
-	while ((c = getopt(argc, argv, "a:bC:de:kM:m:no:TvxV")) != EOF) {
+	while ((c = getopt(argc, argv, "a:bC:de:kM:m:no:pTvxV")) != EOF) {
 		switch (c) {
 		case 'a':
 			ctx.max_errors = cvt_u64(optarg, 10);
@@ -776,11 +782,22 @@ main(
 			mtab = optarg;
 			break;
 		case 'n':
+			if (ctx.mode != SCRUB_MODE_REPAIR) {
+				fprintf(stderr, _("Cannot use -n with -p.\n"));
+				usage();
+			}
 			ctx.mode = SCRUB_MODE_DRY_RUN;
 			break;
 		case 'o':
 			parse_o_opts(&ctx, optarg);
 			break;
+		case 'p':
+			if (ctx.mode != SCRUB_MODE_REPAIR) {
+				fprintf(stderr, _("Cannot use -p with -n.\n"));
+				usage();
+			}
+			ctx.mode = SCRUB_MODE_PREEN;
+			break;
 		case 'T':
 			display_rusage = true;
 			break;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index b0aa9fcc67b7..4d9a028921b5 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -27,6 +27,7 @@ extern bool			info_is_warning;
 
 enum scrub_mode {
 	SCRUB_MODE_DRY_RUN,
+	SCRUB_MODE_PREEN,
 	SCRUB_MODE_REPAIR,
 };
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode Darrick J. Wong
  2024-07-02  1:09   ` [PATCH 2/3] xfs_scrub: add an optimization-only mode Darrick J. Wong
@ 2024-07-02  1:09   ` Darrick J. Wong
  2024-07-02  5:44     ` Christoph Hellwig
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Now that we're finished building online fsck, enable the periodic
background scrub service by default.  This involves the postinst script
starting the resource management slice and the timer.

No other sub-services need to be enabled or unmasked explicitly.  They
also shouldn't be started or restarted because that might interrupt
background operation unnecessarily.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 debian/rules |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/debian/rules b/debian/rules
index 69a79fc67405..c3fbcd26232e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -114,7 +114,7 @@ binary-arch: checkroot built
 	dh_compress
 	dh_fixperms
 	dh_makeshlibs
-	dh_installsystemd -p xfsprogs --no-enable --no-start --no-restart-after-upgrade --no-stop-on-upgrade
+	dh_installsystemd -p xfsprogs --no-restart-after-upgrade --no-stop-on-upgrade system-xfs_scrub.slice xfs_scrub_all.timer
 	dh_installdeb
 	dh_shlibdeps
 	dh_gencontrol


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades
  2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
@ 2024-07-02  1:10   ` Darrick J. Wong
  2024-07-02  5:44     ` Christoph Hellwig
  2024-07-02  1:10   ` [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 3/3] xfs_repair: check for unknown flags in attr entries Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: Chandan Babu R, Dave Chinner, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Currently, the V5 feature upgrades permitted by xfs_repair do not affect
filesystem space usage, so we haven't needed to verify the geometry.

However, this will change once we start to allow the sysadmin to add new
metadata indexes to existing filesystems.  Add all the infrastructure we
need to ensure that there's enough space for metadata space reservations
and per-AG reservations the next time the filesystem will be mounted.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
[david: Recompute transaction reservation values; Exit with error if upgrade fails]
Signed-off-by: Dave Chinner <david@fromorbit.com>
[djwong: Refuse to upgrade if any part of the fs has < 10% free]
---
 include/libxfs.h |    1 
 repair/phase2.c  |  134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 135 insertions(+)


diff --git a/include/libxfs.h b/include/libxfs.h
index e760a46d8950..bb00b71b17e1 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -91,6 +91,7 @@ struct iomap;
 #include "libxfs/buf_mem.h"
 #include "xfs_btree_mem.h"
 #include "xfs_parent.h"
+#include "xfs_ag_resv.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/repair/phase2.c b/repair/phase2.c
index 83f0c539bb5d..3418da523a67 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -249,6 +249,137 @@ install_new_state(
 	libxfs_trans_init(mp);
 }
 
+#define GIGABYTES(count, blog)     ((uint64_t)(count) << (30 - (blog)))
+static inline bool
+check_free_space(
+	struct xfs_mount	*mp,
+	unsigned long long	avail,
+	unsigned long long	total)
+{
+	/* Ok if there's more than 10% free. */
+	if (avail >= total / 10)
+		return true;
+
+	/* Not ok if there's less than 5% free. */
+	if (avail < total / 5)
+		return false;
+
+	/* Let it slide if there's at least 10GB free. */
+	return avail > GIGABYTES(10, mp->m_sb.sb_blocklog);
+}
+
+static void
+check_fs_free_space(
+	struct xfs_mount		*mp,
+	const struct check_state	*old,
+	struct xfs_sb			*new_sb)
+{
+	struct xfs_perag		*pag;
+	xfs_agnumber_t			agno;
+	int				error;
+
+	/* Make sure we have enough space for per-AG reservations. */
+	for_each_perag(mp, agno, pag) {
+		struct xfs_trans	*tp;
+		struct xfs_agf		*agf;
+		struct xfs_buf		*agi_bp, *agf_bp;
+		unsigned int		avail, agblocks;
+
+		/* Put back the old super so that we can read AG headers. */
+		restore_old_state(mp, old);
+
+		/*
+		 * Create a dummy transaction so that we can load the AGI and
+		 * AGF buffers in memory with the old fs geometry and pin them
+		 * there while we try to make a per-AG reservation with the new
+		 * geometry.
+		 */
+		error = -libxfs_trans_alloc_empty(mp, &tp);
+		if (error)
+			do_error(
+	_("Cannot reserve resources for upgrade check, err=%d.\n"),
+					error);
+
+		error = -libxfs_ialloc_read_agi(pag, tp, 0, &agi_bp);
+		if (error)
+			do_error(
+	_("Cannot read AGI %u for upgrade check, err=%d.\n"),
+					pag->pag_agno, error);
+
+		error = -libxfs_alloc_read_agf(pag, tp, 0, &agf_bp);
+		if (error)
+			do_error(
+	_("Cannot read AGF %u for upgrade check, err=%d.\n"),
+					pag->pag_agno, error);
+		agf = agf_bp->b_addr;
+		agblocks = be32_to_cpu(agf->agf_length);
+
+		/*
+		 * Install the new superblock and try to make a per-AG space
+		 * reservation with the new geometry.  We pinned the AG header
+		 * buffers to the transaction, so we shouldn't hit any
+		 * corruption errors on account of the new geometry.
+		 */
+		install_new_state(mp, new_sb);
+
+		error = -libxfs_ag_resv_init(pag, tp);
+		if (error == ENOSPC) {
+			printf(
+	_("Not enough free space would remain in AG %u for metadata.\n"),
+					pag->pag_agno);
+			exit(1);
+		}
+		if (error)
+			do_error(
+	_("Error %d while checking AG %u space reservation.\n"),
+					error, pag->pag_agno);
+
+		/*
+		 * Would the post-upgrade filesystem have enough free space in
+		 * this AG after making per-AG reservations?
+		 */
+		avail = pag->pagf_freeblks + pag->pagf_flcount;
+		avail -= pag->pag_meta_resv.ar_reserved;
+		avail -= pag->pag_rmapbt_resv.ar_asked;
+
+		if (!check_free_space(mp, avail, agblocks)) {
+			printf(
+	_("AG %u will be low on space after upgrade.\n"),
+					pag->pag_agno);
+			exit(1);
+		}
+		libxfs_trans_cancel(tp);
+	}
+
+	/*
+	 * Would the post-upgrade filesystem have enough free space on the data
+	 * device after making per-AG reservations?
+	 */
+	if (!check_free_space(mp, mp->m_sb.sb_fdblocks, mp->m_sb.sb_dblocks)) {
+		printf(_("Filesystem will be low on space after upgrade.\n"));
+		exit(1);
+	}
+
+	/*
+	 * Release the per-AG reservations and mark the per-AG structure as
+	 * uninitialized so that we don't trip over stale cached counters
+	 * after the upgrade/
+	 */
+	for_each_perag(mp, agno, pag) {
+		libxfs_ag_resv_free(pag);
+		clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
+		clear_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
+	}
+}
+
+static bool
+need_check_fs_free_space(
+	struct xfs_mount		*mp,
+	const struct check_state	*old)
+{
+	return false;
+}
+
 /*
  * Make sure we can actually upgrade this (v5) filesystem without running afoul
  * of root inode or log size requirements that would prevent us from mounting
@@ -291,6 +422,9 @@ install_new_geometry(
 		exit(1);
 	}
 
+	if (need_check_fs_free_space(mp, &old))
+		check_fs_free_space(mp, &old, new_sb);
+
 	/*
 	 * Restore the old state to get everything back to a clean state,
 	 * upgrade the featureset one more time, and recompute the btree max


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute
  2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong
@ 2024-07-02  1:10   ` Darrick J. Wong
  2024-07-02  5:44     ` Christoph Hellwig
  2024-07-02  1:10   ` [PATCH 3/3] xfs_repair: check for unknown flags in attr entries Darrick J. Wong
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Enforce that all extended attributes have at most one namespace bit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 +
 repair/attr_repair.c     |   15 +++++++++++++++
 2 files changed, 16 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index cc670d93a9f2..2d858580abfe 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -36,6 +36,7 @@
 
 #define xfs_ascii_ci_hashname		libxfs_ascii_ci_hashname
 
+#define xfs_attr_check_namespace	libxfs_attr_check_namespace
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 0f2f7a284bdd..a756a40db9b0 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -291,6 +291,13 @@ process_shortform_attr(
 			}
 		}
 
+		if (!libxfs_attr_check_namespace(currententry->flags)) {
+			do_warn(
+	_("multiple namespaces for shortform attribute %d in inode %" PRIu64 "\n"),
+				i, ino);
+			junkit = 1;
+		}
+
 		/* namecheck checks for null chars in attr names. */
 		if (!libxfs_attr_namecheck(currententry->flags,
 					   currententry->nameval,
@@ -641,6 +648,14 @@ process_leaf_attr_block(
 			break;
 		}
 
+		if (!libxfs_attr_check_namespace(entry->flags)) {
+			do_warn(
+	_("multiple namespaces for attribute entry %d in attr block %u, inode %" PRIu64 "\n"),
+				i, da_bno, ino);
+			clearit = 1;
+			break;
+		}
+
 		if (entry->flags & XFS_ATTR_INCOMPLETE) {
 			/* we are inconsistent state. get rid of us */
 			do_warn(


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/3] xfs_repair: check for unknown flags in attr entries
  2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute Darrick J. Wong
@ 2024-07-02  1:10   ` Darrick J. Wong
  2024-07-02  5:45     ` Christoph Hellwig
  2 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Explicitly check for unknown bits being set in the shortform and leaf
attr entries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/attr_repair.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index a756a40db9b0..37b5852b885e 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -291,6 +291,13 @@ process_shortform_attr(
 			}
 		}
 
+		if (currententry->flags & ~XFS_ATTR_ONDISK_MASK) {
+			do_warn(
+	_("unknown flags 0x%x in shortform attribute %d in inode %" PRIu64 "\n"),
+				currententry->flags, i, ino);
+			junkit = 1;
+		}
+
 		if (!libxfs_attr_check_namespace(currententry->flags)) {
 			do_warn(
 	_("multiple namespaces for shortform attribute %d in inode %" PRIu64 "\n"),
@@ -648,6 +655,14 @@ process_leaf_attr_block(
 			break;
 		}
 
+		if (entry->flags & ~XFS_ATTR_ONDISK_MASK) {
+			do_warn(
+	_("unknown flags 0x%x in attribute entry #%d in attr block %u, inode %" PRIu64 "\n"),
+				entry->flags, i, da_bno, ino);
+			clearit = 1;
+			break;
+		}
+
 		if (!libxfs_attr_check_namespace(entry->flags)) {
 			do_warn(
 	_("multiple namespaces for attribute entry %d in attr block %u, inode %" PRIu64 "\n"),


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
@ 2024-07-02  1:10   ` Darrick J. Wong
  2024-07-02  6:24     ` Christoph Hellwig
  2024-07-02  1:11   ` [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function Darrick J. Wong
                     ` (22 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Update xfs_attr_defer_add to use the pptr-specific opcodes if it's
reading or writing parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/defer_item.c |   52 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 9955e189dfe1..8cdf57eac900 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -676,21 +676,61 @@ xfs_attr_defer_add(
 	enum xfs_attr_defer_op	op)
 {
 	struct xfs_attr_intent	*new;
+	unsigned int		log_op = 0;
+	bool			is_pptr = args->attr_filter & XFS_ATTR_PARENT;
 
-	new = kmem_cache_zalloc(xfs_attr_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	if (is_pptr) {
+		ASSERT(xfs_has_parent(args->dp->i_mount));
+		ASSERT((args->attr_filter & ~XFS_ATTR_PARENT) != 0);
+		ASSERT(args->op_flags & XFS_DA_OP_LOGGED);
+		ASSERT(args->valuelen == sizeof(struct xfs_parent_rec));
+	}
+
+	new = kmem_cache_zalloc(xfs_attr_intent_cache,
+			GFP_NOFS | __GFP_NOFAIL);
 	new->xattri_da_args = args;
 
+	/* Compute log operation from the higher level op and namespace. */
 	switch (op) {
 	case XFS_ATTR_DEFER_SET:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_SET;
-		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_SET;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_SET;
 		break;
 	case XFS_ATTR_DEFER_REPLACE:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REPLACE;
-		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_REPLACE;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_REPLACE;
 		break;
 	case XFS_ATTR_DEFER_REMOVE:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REMOVE;
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_REMOVE;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_REMOVE;
+		break;
+	default:
+		ASSERT(0);
+		break;
+	}
+	new->xattri_op_flags = log_op;
+
+	/* Set up initial attr operation state. */
+	switch (log_op) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
+	case XFS_ATTRI_OP_FLAGS_SET:
+		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+		ASSERT(args->new_valuelen == args->valuelen);
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		new->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong
@ 2024-07-02  1:11   ` Darrick J. Wong
  2024-07-02  6:24     ` Christoph Hellwig
  2024-07-02  1:11   ` [PATCH 03/24] xfs_logprint: dump new attr log item fields Darrick J. Wong
                     ` (21 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:11 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Port existing utilities to use libxfs_attr_hashname instead of calling
libxfs_da_hashname directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/hash.c                |    8 ++++----
 db/metadump.c            |   10 +++++-----
 libxfs/libxfs_api_defs.h |    2 ++
 repair/attr_repair.c     |   18 ++++++++++++------
 4 files changed, 23 insertions(+), 15 deletions(-)


diff --git a/db/hash.c b/db/hash.c
index 9b3fdea6c413..1500a6032fd7 100644
--- a/db/hash.c
+++ b/db/hash.c
@@ -73,7 +73,7 @@ hash_f(
 		if (use_dir2_hash)
 			hashval = libxfs_dir2_hashname(mp, &xname);
 		else
-			hashval = libxfs_da_hashname(xname.name, xname.len);
+			hashval = libxfs_attr_hashname(xname.name, xname.len);
 		dbprintf("0x%x\n", hashval);
 	}
 
@@ -306,7 +306,7 @@ collide_xattrs(
 	unsigned long		i;
 	int			error = 0;
 
-	old_hash = libxfs_da_hashname((uint8_t *)name, namelen);
+	old_hash = libxfs_attr_hashname((uint8_t *)name, namelen);
 
 	if (fd >= 0) {
 		/*
@@ -331,8 +331,8 @@ collide_xattrs(
 		snprintf(xattrname, MAXNAMELEN + 5, "user.%s", name);
 		obfuscate_name(old_hash, namelen, (uint8_t *)xattrname + 5,
 				false);
-		ASSERT(old_hash == libxfs_da_hashname((uint8_t *)xattrname + 5,
-				namelen));
+		ASSERT(old_hash == libxfs_attr_hashname(
+					(uint8_t *)xattrname + 5, namelen));
 
 		if (fd >= 0) {
 			error = fsetxattr(fd, xattrname, "1", 1, 0);
diff --git a/db/metadump.c b/db/metadump.c
index 9457e02e8288..c1bf5d002751 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -835,7 +835,7 @@ dirattr_hashname(
 		return libxfs_dir2_hashname(mp, &xname);
 	}
 
-	return libxfs_da_hashname(name, namelen);
+	return libxfs_attr_hashname(name, namelen);
 }
 
 static void
@@ -982,9 +982,9 @@ obfuscate_path_components(
 		if (!slash) {
 			/* last (or single) component */
 			namelen = strnlen((char *)comp, len);
-			hash = libxfs_da_hashname(comp, namelen);
+			hash = dirattr_hashname(true, comp, namelen);
 			obfuscate_name(hash, namelen, comp, false);
-			ASSERT(hash == libxfs_da_hashname(comp, namelen));
+			ASSERT(hash == dirattr_hashname(true, comp, namelen));
 			break;
 		}
 		namelen = slash - (char *)comp;
@@ -994,9 +994,9 @@ obfuscate_path_components(
 			len--;
 			continue;
 		}
-		hash = libxfs_da_hashname(comp, namelen);
+		hash = dirattr_hashname(true, comp, namelen);
 		obfuscate_name(hash, namelen, comp, false);
-		ASSERT(hash == libxfs_da_hashname(comp, namelen));
+		ASSERT(hash == dirattr_hashname(true, comp, namelen));
 		comp += namelen + 1;
 		len -= namelen + 1;
 	}
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 2d858580abfe..c36a6ac81a7b 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -38,6 +38,8 @@
 
 #define xfs_attr_check_namespace	libxfs_attr_check_namespace
 #define xfs_attr_get			libxfs_attr_get
+#define xfs_attr_hashname		libxfs_attr_hashname
+#define xfs_attr_hashval		libxfs_attr_hashval
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_set			libxfs_attr_set
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 37b5852b885e..8321c9b679b2 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -485,6 +485,7 @@ process_leaf_attr_local(
 	xfs_ino_t		ino)
 {
 	xfs_attr_leaf_name_local_t *local;
+	xfs_dahash_t		computed;
 
 	local = xfs_attr3_leaf_name_local(leaf, i);
 	if (local->namelen == 0 ||
@@ -504,9 +505,12 @@ process_leaf_attr_local(
 	 * ordering anyway in case both the name value and the
 	 * hashvalue were wrong but matched. Unlikely, however.
 	 */
-	if (be32_to_cpu(entry->hashval) != libxfs_da_hashname(
-				&local->nameval[0], local->namelen) ||
-				be32_to_cpu(entry->hashval) < last_hashval) {
+	computed = libxfs_attr_hashval(mp, entry->flags, local->nameval,
+				       local->namelen,
+				       local->nameval + local->namelen,
+				       be16_to_cpu(local->valuelen));
+	if (be32_to_cpu(entry->hashval) != computed ||
+	    be32_to_cpu(entry->hashval) < last_hashval) {
 		do_warn(
 	_("bad hashvalue for attribute entry %d in attr block %u, inode %" PRIu64 "\n"),
 			i, da_bno, ino);
@@ -540,15 +544,17 @@ process_leaf_attr_remote(
 {
 	xfs_attr_leaf_name_remote_t *remotep;
 	char*			value;
+	xfs_dahash_t		computed;
 
 	remotep = xfs_attr3_leaf_name_remote(leaf, i);
 
+	computed = libxfs_attr_hashval(mp, entry->flags, remotep->name,
+				       remotep->namelen, NULL,
+				       be32_to_cpu(remotep->valuelen));
 	if (remotep->namelen == 0 ||
 	    !libxfs_attr_namecheck(entry->flags, remotep->name,
 				   remotep->namelen) ||
-	    be32_to_cpu(entry->hashval) !=
-			libxfs_da_hashname((unsigned char *)&remotep->name[0],
-					   remotep->namelen) ||
+	    be32_to_cpu(entry->hashval) != computed ||
 	    be32_to_cpu(entry->hashval) < last_hashval ||
 	    be32_to_cpu(remotep->valueblk) == 0) {
 		do_warn(


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 03/24] xfs_logprint: dump new attr log item fields
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
  2024-07-02  1:10   ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong
  2024-07-02  1:11   ` [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function Darrick J. Wong
@ 2024-07-02  1:11   ` Darrick J. Wong
  2024-07-02  6:25     ` Christoph Hellwig
  2024-07-02  1:11   ` [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl Darrick J. Wong
                     ` (20 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:11 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Dump the new extended attribute log item fields.  This was split out
from the previous patch to make libxfs resyncing easier.  This code
needs more cleaning, which we'll do in the next few patches before
moving on to the parent pointer code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 logprint/log_redo.c |  140 ++++++++++++++++++++++++++++++++++++++++++---------
 logprint/logprint.h |    6 +-
 2 files changed, 119 insertions(+), 27 deletions(-)


diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index ca6dadd7551a..1d55164a90ff 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -674,6 +674,12 @@ xfs_attri_copy_log_format(
 	return 1;
 }
 
+static inline unsigned int
+xfs_attr_log_item_op(const struct xfs_attri_log_format *attrp)
+{
+	return attrp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+}
+
 int
 xlog_print_trans_attri(
 	char				**ptr,
@@ -683,6 +689,10 @@ xlog_print_trans_attri(
 	struct xfs_attri_log_format	*src_f = NULL;
 	xlog_op_header_t		*head = NULL;
 	uint				dst_len;
+	unsigned int			name_len = 0;
+	unsigned int			new_name_len = 0;
+	unsigned int			value_len = 0;
+	unsigned int			new_value_len = 0;
 	int				error = 0;
 
 	dst_len = sizeof(struct xfs_attri_log_format);
@@ -705,27 +715,71 @@ xlog_print_trans_attri(
 	memmove((char*)src_f, *ptr, src_len);
 	*ptr += src_len;
 
-	printf(_("ATTRI:  #regs: %d	name_len: %d, value_len: %d  id: 0x%llx\n"),
-		src_f->alfi_size, src_f->alfi_name_len, src_f->alfi_value_len,
-				(unsigned long long)src_f->alfi_id);
+	if (xfs_attr_log_item_op(src_f) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
+		name_len      = src_f->alfi_old_name_len;
+		new_name_len  = src_f->alfi_new_name_len;
+		value_len     = src_f->alfi_value_len;
+		new_value_len = src_f->alfi_value_len;
+	} else {
+		name_len      = src_f->alfi_name_len;
+		value_len     = src_f->alfi_value_len;
+	}
+
+	printf(_("ATTRI:  #regs: %d	f: 0x%x, ino: 0x%llx, igen: 0x%x, attr_filter: 0x%x, name_len: %u, new_name_len: %u, value_len: %u, new_value_len: %u  id: 0x%llx\n"),
+			src_f->alfi_size,
+			src_f->alfi_op_flags,
+			(unsigned long long)src_f->alfi_ino,
+			(unsigned int)src_f->alfi_igen,
+			src_f->alfi_attr_filter,
+			name_len,
+			new_name_len,
+			value_len,
+			new_value_len,
+			(unsigned long long)src_f->alfi_id);
+
+	if (name_len > 0) {
+		printf(_("\n"));
+		(*i)++;
+		head = (xlog_op_header_t *)*ptr;
+		xlog_print_op_header(head, *i, ptr);
+		error = xlog_print_trans_attri_name(ptr,
+				be32_to_cpu(head->oh_len), "name");
+		if (error)
+			goto error;
+	}
 
-	if (src_f->alfi_name_len > 0) {
+	if (new_name_len > 0) {
 		printf(_("\n"));
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
-		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len));
+		error = xlog_print_trans_attri_name(ptr,
+				be32_to_cpu(head->oh_len), "newname");
 		if (error)
 			goto error;
 	}
 
-	if (src_f->alfi_value_len > 0) {
+	if (value_len > 0) {
 		printf(_("\n"));
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
-		error = xlog_print_trans_attri_value(ptr, be32_to_cpu(head->oh_len),
-				src_f->alfi_value_len);
+		error = xlog_print_trans_attri_value(ptr,
+				be32_to_cpu(head->oh_len), value_len, "value");
+		if (error)
+			goto error;
+	}
+
+	if (new_value_len > 0) {
+		printf(_("\n"));
+		(*i)++;
+		head = (xlog_op_header_t *)*ptr;
+		xlog_print_op_header(head, *i, ptr);
+		error = xlog_print_trans_attri_value(ptr,
+				be32_to_cpu(head->oh_len), new_value_len,
+				"newvalue");
+		if (error)
+			goto error;
 	}
 error:
 	free(src_f);
@@ -736,31 +790,33 @@ xlog_print_trans_attri(
 int
 xlog_print_trans_attri_name(
 	char				**ptr,
-	uint				src_len)
+	uint				src_len,
+	const char			*tag)
 {
-	printf(_("ATTRI:  name len:%u\n"), src_len);
+	printf(_("ATTRI:  %s len:%u\n"), tag, src_len);
 	print_or_dump(*ptr, src_len);
 
 	*ptr += src_len;
 
 	return 0;
-}	/* xlog_print_trans_attri */
+}
 
 int
 xlog_print_trans_attri_value(
 	char				**ptr,
 	uint				src_len,
-	int				value_len)
+	int				value_len,
+	const char			*tag)
 {
 	int len = min(value_len, src_len);
 
-	printf(_("ATTRI:  value len:%u\n"), value_len);
+	printf(_("ATTRI:  %s len:%u\n"), tag, value_len);
 	print_or_dump(*ptr, len);
 
 	*ptr += src_len;
 
 	return 0;
-}	/* xlog_print_trans_attri_value */
+}
 
 void
 xlog_recover_print_attri(
@@ -768,7 +824,10 @@ xlog_recover_print_attri(
 {
 	struct xfs_attri_log_format	*f, *src_f = NULL;
 	uint				src_len, dst_len;
-
+	unsigned int			name_len = 0;
+	unsigned int			new_name_len = 0;
+	unsigned int			value_len = 0;
+	unsigned int			new_value_len = 0;
 	int				region = 0;
 
 	src_f = (struct xfs_attri_log_format *)item->ri_buf[0].i_addr;
@@ -788,24 +847,55 @@ xlog_recover_print_attri(
 	if (xfs_attri_copy_log_format((char*)src_f, src_len, f))
 		goto out;
 
-	printf(_("ATTRI:  #regs: %d	name_len: %d, value_len: %d  id: 0x%llx\n"),
-		f->alfi_size, f->alfi_name_len, f->alfi_value_len, (unsigned long long)f->alfi_id);
+	if (xfs_attr_log_item_op(f) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
+		name_len      = f->alfi_old_name_len;
+		new_name_len  = f->alfi_new_name_len;
+		value_len     = f->alfi_value_len;
+		new_value_len = f->alfi_value_len;
+	} else {
+		name_len      = f->alfi_name_len;
+		value_len     = f->alfi_value_len;
+	}
+
+	printf(_("ATTRI:  #regs: %d	f: 0x%x, ino: 0x%llx, igen: 0x%x, attr_filter: 0x%x, name_len: %u, new_name_len: %u, value_len: %d, new_value_len: %u  id: 0x%llx\n"),
+			f->alfi_size,
+			f->alfi_op_flags,
+			(unsigned long long)f->alfi_ino,
+			(unsigned int)f->alfi_igen,
+			f->alfi_attr_filter,
+			name_len,
+			new_name_len,
+			value_len,
+			new_value_len,
+			(unsigned long long)f->alfi_id);
 
-	if (f->alfi_name_len > 0) {
+	if (name_len > 0) {
 		region++;
-		printf(_("ATTRI:  name len:%u\n"), f->alfi_name_len);
+		printf(_("ATTRI:  name len:%u\n"), name_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr,
-			       f->alfi_name_len);
+			       name_len);
 	}
 
-	if (f->alfi_value_len > 0) {
-		int len = f->alfi_value_len;
+	if (new_name_len > 0) {
+		region++;
+		printf(_("ATTRI:  newname len:%u\n"), new_name_len);
+		print_or_dump((char *)item->ri_buf[region].i_addr,
+			       new_name_len);
+	}
+
+	if (value_len > 0) {
+		int	len = min(MAX_ATTR_VAL_PRINT, value_len);
+
+		region++;
+		printf(_("ATTRI:  value len:%u\n"), value_len);
+		print_or_dump((char *)item->ri_buf[region].i_addr, len);
+	}
 
-		if (len > MAX_ATTR_VAL_PRINT)
-			len = MAX_ATTR_VAL_PRINT;
+	if (new_value_len > 0) {
+		int	len = min(MAX_ATTR_VAL_PRINT, new_value_len);
 
 		region++;
-		printf(_("ATTRI:  value len:%u\n"), f->alfi_value_len);
+		printf(_("ATTRI:  newvalue len:%u\n"), new_value_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr, len);
 	}
 
diff --git a/logprint/logprint.h b/logprint/logprint.h
index 8867b110ecaf..9715cdb9e27b 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -59,8 +59,10 @@ extern void xlog_recover_print_bud(struct xlog_recover_item *item);
 #define MAX_ATTR_VAL_PRINT	128
 
 extern int xlog_print_trans_attri(char **ptr, uint src_len, int *i);
-extern int xlog_print_trans_attri_name(char **ptr, uint src_len);
-extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len);
+extern int xlog_print_trans_attri_name(char **ptr, uint src_len,
+		const char *tag);
+extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len,
+		const char *tag);
 extern void xlog_recover_print_attri(struct xlog_recover_item *item);
 extern int xlog_print_trans_attrd(char **ptr, uint len);
 extern void xlog_recover_print_attrd(struct xlog_recover_item *item);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:11   ` [PATCH 03/24] xfs_logprint: dump new attr log item fields Darrick J. Wong
@ 2024-07-02  1:11   ` Darrick J. Wong
  2024-07-02  6:25     ` Christoph Hellwig
  2024-07-02  1:11   ` [PATCH 05/24] libfrog: report parent pointers to userspace Darrick J. Wong
                     ` (19 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:11 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Document how this new ioctl works.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_getparents.2 |  212 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 212 insertions(+)
 create mode 100644 man/man2/ioctl_xfs_getparents.2


diff --git a/man/man2/ioctl_xfs_getparents.2 b/man/man2/ioctl_xfs_getparents.2
new file mode 100644
index 000000000000..5bb9b96a0c95
--- /dev/null
+++ b/man/man2/ioctl_xfs_getparents.2
@@ -0,0 +1,212 @@
+.\" Copyright (c) 2019-2024 Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\" %%%LICENSE_END
+.TH IOCTL-XFS-GETPARENTS 2 2024-04-09 "XFS"
+.SH NAME
+ioctl_xfs_getparents \- query XFS directory parent information
+.SH SYNOPSIS
+.br
+.B #include <xfs/xfs_fs.h>
+.PP
+.BI "int ioctl(int " fd ", XFS_IOC_GETPARENTS, struct xfs_getparents *" arg );
+.PP
+.BI "int ioctl(int " fd ", XFS_IOC_GETPARENTS_BY_HANDLE, struct xfs_getparents_by_handle *" arg );
+.SH DESCRIPTION
+This command is used to retrieve the directory parent pointers of either the
+currently opened file or a file handle.
+Parent pointers point upwards in the directory tree from a child file towards a
+parent directories.
+Each entry in a parent directory must have a corresponding parent pointer in
+the child.
+
+Calling programs should allocate a large memory buffer and initialize a header
+of the following form:
+.PP
+.in +4n
+.nf
+struct xfs_getparents {
+	struct xfs_attrlist_cursor  gp_cursor;
+	__u16                       gp_iflags;
+	__u16                       gp_oflags;
+	__u32                       gp_bufsize;
+	__u64                       __pad;
+	__u64                       gp_buffer;
+};
+
+struct xfs_getparents {
+	struct xfs_handle           gph_handle;
+	struct xfs_getparents       gph_request;
+};
+.fi
+.in
+
+.PP
+The field
+.I gp_cursor
+tracks the progress of iterating through the parent pointers.
+Calling programs must initialize this to zero before the first system call
+and must not touch it after that.
+
+.PP
+The field
+.I gp_iflags
+control the behavior of the query operation and provide more information
+about the outcome of the operation.
+There are no input flags currently defined; this field must be zero.
+
+.PP
+The field
+.I gp_oflags
+contains information about the query itself.
+Possibly output flags are:
+.RS 0.4i
+.TP
+.B XFS_GETPARENTS_OFLAG_ROOT
+The file queried was the root directory.
+.TP
+.B XFS_GETPARENTS_OFLAG_DONE
+There are no more parent pointers to query.
+.RE
+
+.PP
+The field
+.I __pad
+must be zero.
+
+.PP
+The field
+.I gp_bufsize
+should be set to the size of the buffer, in bytes.
+
+.PP
+The field
+.I gp_buffer
+should point to an output buffer for the parent pointer records.
+
+Parent pointer records are returned in the following form:
+.PP
+.in +4n
+.nf
+
+struct xfs_getparents_rec {
+	struct xfs_handle           gpr_parent;
+	__u16                       gpr_reclen;
+	char                        gpr_name[];
+};
+.fi
+.in
+
+.PP
+The field
+.I gpr_parent
+is a file handle that can be used to open the parent directory.
+
+.PP
+The field
+.I gpr_reclen
+will be set to the number of bytes used by this parent record.
+
+.PP
+The array
+.I gpr_name
+will be set to a NULL-terminated byte sequence representing the filename
+stored in the parent pointer.
+If the name is a zero-length string, the file queried has no parents.
+
+.SH SAMPLE PROGRAM
+Calling programs should allocate a large memory buffer, initialize the head
+structure to zeroes, set gp_bufsize to the size of the buffer, and call the
+ioctl.
+The XFS_GETPARENTS_OFLAG_DONE flag will be set in gp_flags when there are no
+more parent pointers to be read.
+The below code is an example of XFS_IOC_GETPARENTS usage:
+
+.nf
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <xfs/linux.h>
+#include <xfs/xfs.h>
+#include <xfs/xfs_types.h>
+#include <xfs/xfs_fs.h>
+
+int main() {
+	struct xfs_getparents gp = { };
+	struct xfs_getparents_rec *gpr;
+	int error, fd;
+
+	gp.gp_buffer = (uintptr_t)malloc(65536);
+	if (!gp.gp_buffer) {
+		perror("malloc");
+		return 1;
+	}
+	gp->gp_bufsize = 65536;
+
+	fd = open("/mnt/test/foo.txt", O_RDONLY | O_CREAT);
+	if (fd  == -1)
+		return errno;
+
+	do {
+		error = ioctl(fd, XFS_IOC_GETPARENTS, gp);
+		if (error)
+			return error;
+
+		for (gpr = xfs_getparents_first_rec(&gp);
+		     gpr != NULL;
+		     gpr = xfs_getparents_next_rec(&gp, gpr)) {
+			if (gpr->gpr_name[0] == 0)
+				break;
+
+			printf("inode		= %llu\\n",
+					gpr->gpr_parent.ha_fid.fid_ino);
+			printf("generation	= %u\\n",
+					gpr->gpr_parent.ha_fid.fid_gen);
+			printf("name		= \\"%s\\"\\n\\n",
+					gpr->gpr_name);
+		}
+	} while (!(gp.gp_flags & XFS_GETPARENTS_OFLAG_DONE));
+
+	return 0;
+}
+.fi
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EFSBADCRC
+Metadata checksum validation failed while performing the query.
+.TP
+.B EFSCORRUPTED
+Metadata corruption was encountered while performing the query.
+.TP
+.B EINVAL
+One or more of the arguments specified is invalid.
+.TP
+.B EMSGSIZE
+The record buffer was not large enough to store even a single record.
+.TP
+.B ENOMEM
+Not enough memory to retrieve parent pointers.
+.TP
+.B EOPNOTSUPP
+Repairs of the requested metadata object are not supported.
+.TP
+.B EROFS
+Filesystem is read-only and a repair was requested.
+.TP
+.B ESHUTDOWN
+Filesystem is shut down due to previous errors.
+.TP
+.B EIO
+An I/O error was encountered while performing the query.
+.SH CONFORMING TO
+This API is specific to XFS filesystem on the Linux kernel.
+.SH SEE ALSO
+.BR ioctl (2)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 05/24] libfrog: report parent pointers to userspace
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:11   ` [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl Darrick J. Wong
@ 2024-07-02  1:11   ` Darrick J. Wong
  2024-07-02  6:25     ` Christoph Hellwig
  2024-07-02  1:12   ` [PATCH 06/24] libfrog: add parent pointer support code Darrick J. Wong
                     ` (18 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:11 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Report the presence of parent pointer to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/fsgeom.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 71a8e4bb9980..597c38b11402 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -32,6 +32,7 @@ xfs_report_geom(
 	int			inobtcount;
 	int			nrext64;
 	int			exchangerange;
+	int			parent;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -51,6 +52,7 @@ xfs_report_geom(
 	inobtcount = geo->flags & XFS_FSOP_GEOM_FLAGS_INOBTCNT ? 1 : 0;
 	nrext64 = geo->flags & XFS_FSOP_GEOM_FLAGS_NREXT64 ? 1 : 0;
 	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
+	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
@@ -60,7 +62,7 @@ xfs_report_geom(
 "         =%-22s exchange=%-3u\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
-"naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d\n"
+"naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d, parent=%d\n"
 "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
 "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
 "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"),
@@ -72,7 +74,7 @@ xfs_report_geom(
 		"", geo->blocksize, (unsigned long long)geo->datablocks,
 			geo->imaxpct,
 		"", geo->sunit, geo->swidth,
-		dirversion, geo->dirblocksize, cimode, ftype_enabled,
+		dirversion, geo->dirblocksize, cimode, ftype_enabled, parent,
 		isint ? _("internal log") : logname ? logname : _("external"),
 			geo->blocksize, geo->logblocks, logversion,
 		"", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 06/24] libfrog: add parent pointer support code
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:11   ` [PATCH 05/24] libfrog: report parent pointers to userspace Darrick J. Wong
@ 2024-07-02  1:12   ` Darrick J. Wong
  2024-07-02  6:26     ` Christoph Hellwig
  2024-07-02  1:12   ` [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls Darrick J. Wong
                     ` (17 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:12 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Add some support code to libfrog so that client programs can walk file
descriptors and handles upwards through the directory tree; and obtain a
reasonable file path from a file descriptor/handle.  This code will be
used in xfsprogs utilities.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/handle.h     |    1 
 libfrog/Makefile     |    2 
 libfrog/getparents.c |  355 ++++++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/getparents.h |   42 ++++++
 libfrog/paths.c      |  168 ++++++++++++++++++++++++
 libfrog/paths.h      |   25 ++++
 libhandle/handle.c   |    7 +
 7 files changed, 597 insertions(+), 3 deletions(-)
 create mode 100644 libfrog/getparents.c
 create mode 100644 libfrog/getparents.h


diff --git a/include/handle.h b/include/handle.h
index 34246f3854de..ba06500516cf 100644
--- a/include/handle.h
+++ b/include/handle.h
@@ -17,6 +17,7 @@ struct parent;
 extern int  path_to_handle (char *__path, void **__hanp, size_t *__hlen);
 extern int  path_to_fshandle (char *__path, void **__fshanp, size_t *__fshlen);
 extern int  fd_to_handle (int fd, void **hanp, size_t *hlen);
+extern int  handle_to_fsfd(void *, char **);
 extern int  handle_to_fshandle (void *__hanp, size_t __hlen, void **__fshanp,
 				size_t *__fshlen);
 extern void free_handle (void *__hanp, size_t __hlen);
diff --git a/libfrog/Makefile b/libfrog/Makefile
index acfa228bc8ec..0b5b23893a13 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -20,6 +20,7 @@ convert.c \
 crc32.c \
 file_exchange.c \
 fsgeom.c \
+getparents.c \
 histogram.c \
 list_sort.c \
 linux.c \
@@ -46,6 +47,7 @@ dahashselftest.h \
 div64.h \
 file_exchange.h \
 fsgeom.h \
+getparents.h \
 histogram.h \
 logging.h \
 paths.h \
diff --git a/libfrog/getparents.c b/libfrog/getparents.c
new file mode 100644
index 000000000000..9118b0ff32db
--- /dev/null
+++ b/libfrog/getparents.c
@@ -0,0 +1,355 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2017-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "list.h"
+#include "paths.h"
+#include "handle.h"
+#include "libfrog/getparents.h"
+
+/* Allocate a buffer for the xfs_getparent_rec array. */
+static void *
+alloc_records(
+	struct xfs_getparents	*gp,
+	size_t			bufsize)
+{
+	void			*buf;
+
+	if (bufsize >= UINT32_MAX) {
+		errno = ENOMEM;
+		return NULL;
+	} else if (!bufsize) {
+		bufsize = XFS_XATTR_LIST_MAX;
+	}
+
+	buf = malloc(bufsize);
+	if (!buf)
+		return NULL;
+
+	gp->gp_buffer = (uintptr_t)buf;
+	gp->gp_bufsize = bufsize;
+	return buf;
+}
+
+/* Copy a file handle. */
+static inline void
+copy_handle(
+	struct xfs_handle	*dest,
+	const struct xfs_handle	*src)
+{
+	memcpy(dest, src, sizeof(struct xfs_handle));
+}
+
+/* Initiate a callback for each parent pointer. */
+static int
+walk_parent_records(
+	struct xfs_getparents	*gp,
+	walk_parent_fn		fn,
+	void			*arg)
+{
+	struct xfs_getparents_rec *gpr;
+	int			ret;
+
+	if (gp->gp_oflags & XFS_GETPARENTS_OFLAG_ROOT) {
+		struct parent_rec	rec = {
+			.p_flags	= PARENTREC_FILE_IS_ROOT,
+		};
+
+		return fn(&rec, arg);
+	}
+
+	for (gpr = xfs_getparents_first_rec(gp);
+	     gpr != NULL;
+	     gpr = xfs_getparents_next_rec(gp, gpr)) {
+		struct parent_rec	rec = { };
+
+		if (gpr->gpr_name[0] == 0)
+			break;
+
+		copy_handle(&rec.p_handle, &gpr->gpr_parent);
+		rec.p_name = gpr->gpr_name;
+
+		ret = fn(&rec, arg);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/* Walk all parent pointers of this fd.  Returns 0 or positive errno. */
+int
+fd_walk_parents(
+	int			fd,
+	size_t			bufsize,
+	walk_parent_fn		fn,
+	void			*arg)
+{
+	struct xfs_getparents	gp = { };
+	void			*buf;
+	int			ret;
+
+	buf = alloc_records(&gp, bufsize);
+	if (!buf)
+		return errno;
+
+	while ((ret = ioctl(fd, XFS_IOC_GETPARENTS, &gp)) == 0) {
+		ret = walk_parent_records(&gp, fn, arg);
+		if (ret)
+			goto out_buf;
+		if (gp.gp_oflags & XFS_GETPARENTS_OFLAG_DONE)
+			break;
+	}
+	if (ret)
+		ret = errno;
+
+out_buf:
+	free(buf);
+	return ret;
+}
+
+/* Walk all parent pointers of this handle.  Returns 0 or positive errno. */
+int
+handle_walk_parents(
+	const void		*hanp,
+	size_t			hlen,
+	size_t			bufsize,
+	walk_parent_fn		fn,
+	void			*arg)
+{
+	struct xfs_getparents_by_handle	gph = { };
+	void			*buf;
+	char			*mntpt;
+	int			fd;
+	int			ret;
+
+	if (hlen != sizeof(struct xfs_handle))
+		return EINVAL;
+
+	/*
+	 * This function doesn't modify the handle, but we don't want to have
+	 * to bump the libhandle major version just to change that.
+	 */
+	fd = handle_to_fsfd((void *)hanp, &mntpt);
+	if (fd < 0)
+		return errno;
+
+	buf = alloc_records(&gph.gph_request, bufsize);
+	if (!buf)
+		return errno;
+
+	copy_handle(&gph.gph_handle, hanp);
+	while ((ret = ioctl(fd, XFS_IOC_GETPARENTS_BY_HANDLE, &gph)) == 0) {
+		ret = walk_parent_records(&gph.gph_request, fn, arg);
+		if (ret)
+			goto out_buf;
+		if (gph.gph_request.gp_oflags & XFS_GETPARENTS_OFLAG_DONE)
+			break;
+	}
+	if (ret)
+		ret = errno;
+
+out_buf:
+	free(buf);
+	return ret;
+}
+
+struct walk_ppaths_info {
+	/* Callback */
+	walk_path_fn		fn;
+	void			*arg;
+
+	/* Mountpoint of this filesystem. */
+	char			*mntpt;
+
+	/* Path that we're constructing. */
+	struct path_list	*path;
+
+	size_t			ioctl_bufsize;
+};
+
+/*
+ * Recursively walk upwards through the directory tree, changing out the path
+ * components as needed.  Call the callback when we have a complete path.
+ */
+static int
+find_parent_component(
+	const struct parent_rec	*rec,
+	void			*arg)
+{
+	struct walk_ppaths_info	*wpi = arg;
+	struct path_component	*pc;
+	int			ret;
+
+	if (rec->p_flags & PARENTREC_FILE_IS_ROOT)
+		return wpi->fn(wpi->mntpt, wpi->path, wpi->arg);
+
+	/*
+	 * If we detect a directory tree cycle, give up.  We never made any
+	 * guarantees about concurrent tree updates.
+	 */
+	if (path_will_loop(wpi->path, rec->p_handle.ha_fid.fid_ino))
+		return 0;
+
+	pc = path_component_init(rec->p_name, rec->p_handle.ha_fid.fid_ino);
+	if (!pc)
+		return errno;
+	path_list_add_parent_component(wpi->path, pc);
+
+	ret = handle_walk_parents(&rec->p_handle, sizeof(rec->p_handle),
+			wpi->ioctl_bufsize, find_parent_component, wpi);
+
+	path_list_del_component(wpi->path, pc);
+	path_component_free(pc);
+	return ret;
+}
+
+/*
+ * Call the given function on all known paths from the vfs root to the inode
+ * described in the handle.  Returns 0 for success or positive errno.
+ */
+int
+handle_walk_paths(
+	const void		*hanp,
+	size_t			hlen,
+	size_t			ioctl_bufsize,
+	walk_path_fn		fn,
+	void			*arg)
+{
+	struct walk_ppaths_info	wpi = {
+		.ioctl_bufsize	= ioctl_bufsize,
+	};
+	int			ret;
+
+	/*
+	 * This function doesn't modify the handle, but we don't want to have
+	 * to bump the libhandle major version just to change that.
+	 */
+	ret = handle_to_fsfd((void *)hanp, &wpi.mntpt);
+	if (ret < 0)
+		return errno;
+
+	wpi.path = path_list_init();
+	if (!wpi.path)
+		return errno;
+	wpi.fn = fn;
+	wpi.arg = arg;
+
+	ret = handle_walk_parents(hanp, hlen, ioctl_bufsize,
+			find_parent_component, &wpi);
+
+	path_list_free(wpi.path);
+	return ret;
+}
+
+/*
+ * Call the given function on all known paths from the vfs root to the inode
+ * referred to by the file description.  Returns 0 or positive errno.
+ */
+int
+fd_walk_paths(
+	int			fd,
+	size_t			ioctl_bufsize,
+	walk_path_fn		fn,
+	void			*arg)
+{
+	void			*hanp;
+	size_t			hlen;
+	int			ret;
+
+	ret = fd_to_handle(fd, &hanp, &hlen);
+	if (ret)
+		return errno;
+
+	ret = handle_walk_paths(hanp, hlen, ioctl_bufsize, fn, arg);
+	free_handle(hanp, hlen);
+	return ret;
+}
+
+struct gather_path_info {
+	char			*buf;
+	size_t			len;
+	size_t			written;
+};
+
+/* Helper that stringifies the first full path that we find. */
+static int
+path_to_string(
+	const char		*mntpt,
+	const struct path_list	*path,
+	void			*arg)
+{
+	struct gather_path_info	*gpi = arg;
+	int			mntpt_len = strlen(mntpt);
+	int			ret;
+
+	/* Trim trailing slashes from the mountpoint */
+	while (mntpt_len > 0 && mntpt[mntpt_len - 1] == '/')
+		mntpt_len--;
+
+	ret = snprintf(gpi->buf, gpi->len, "%.*s", mntpt_len, mntpt);
+	if (ret != mntpt_len)
+		return ENAMETOOLONG;
+	gpi->written += ret;
+
+	ret = path_list_to_string(path, gpi->buf + ret, gpi->len - ret);
+	if (ret < 0)
+		return ENAMETOOLONG;
+
+	gpi->written += ret;
+	return ECANCELED;
+}
+
+/*
+ * Return any eligible path to this file handle.  Returns 0 for success or
+ * positive errno.
+ */
+int
+handle_to_path(
+	const void		*hanp,
+	size_t			hlen,
+	size_t			ioctl_bufsize,
+	char			*path,
+	size_t			pathlen)
+{
+	struct gather_path_info	gpi = { .buf = path, .len = pathlen };
+	int			ret;
+
+	ret = handle_walk_paths(hanp, hlen, ioctl_bufsize, path_to_string,
+			&gpi);
+	if (ret && ret != ECANCELED)
+		return ret;
+	if (!gpi.written)
+		return ENODATA;
+
+	path[gpi.written] = 0;
+	return 0;
+}
+
+/*
+ * Return any eligible path to this file description.  Returns 0 for success
+ * or positive errno.
+ */
+int
+fd_to_path(
+	int			fd,
+	size_t			ioctl_bufsize,
+	char			*path,
+	size_t			pathlen)
+{
+	struct gather_path_info	gpi = { .buf = path, .len = pathlen };
+	int			ret;
+
+	ret = fd_walk_paths(fd, ioctl_bufsize, path_to_string, &gpi);
+	if (ret && ret != ECANCELED)
+		return ret;
+	if (!gpi.written)
+		return ENODATA;
+
+	path[gpi.written] = 0;
+	return 0;
+}
diff --git a/libfrog/getparents.h b/libfrog/getparents.h
new file mode 100644
index 000000000000..8098d594219b
--- /dev/null
+++ b/libfrog/getparents.h
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef	__LIBFROG_GETPARENTS_H_
+#define	__LIBFROG_GETPARENTS_H_
+
+struct path_list;
+
+struct parent_rec {
+	/* File handle to parent directory */
+	struct xfs_handle	p_handle;
+
+	/* Null-terminated directory entry name in the parent */
+	char			*p_name;
+
+	/* Flags for this record; see PARENTREC_* below */
+	uint32_t		p_flags;
+};
+
+/* This is the root directory. */
+#define PARENTREC_FILE_IS_ROOT	(1U << 0)
+
+typedef int (*walk_parent_fn)(const struct parent_rec *rec, void *arg);
+
+int fd_walk_parents(int fd, size_t ioctl_bufsize, walk_parent_fn fn, void *arg);
+int handle_walk_parents(const void *hanp, size_t hanlen, size_t ioctl_bufsize,
+		walk_parent_fn fn, void *arg);
+
+typedef int (*walk_path_fn)(const char *mntpt, const struct path_list *path,
+		void *arg);
+
+int fd_walk_paths(int fd, size_t ioctl_bufsize, walk_path_fn fn, void *arg);
+int handle_walk_paths(const void *hanp, size_t hanlen, size_t ioctl_bufsize,
+		walk_path_fn fn, void *arg);
+
+int fd_to_path(int fd, size_t ioctl_bufsize, char *path, size_t pathlen);
+int handle_to_path(const void *hanp, size_t hlen, size_t ioctl_bufsize,
+		char *path, size_t pathlen);
+
+#endif /* __LIBFROG_GETPARENTS_H_ */
diff --git a/libfrog/paths.c b/libfrog/paths.c
index 320b26dbf25b..a5dfab48ec1e 100644
--- a/libfrog/paths.c
+++ b/libfrog/paths.c
@@ -16,6 +16,7 @@
 #include "input.h"
 #include "projects.h"
 #include <mntent.h>
+#include "list.h"
 #include <limits.h>
 
 extern char *progname;
@@ -560,3 +561,170 @@ fs_table_insert_project_path(
 
 	return error;
 }
+
+/* Structured path components. */
+
+struct path_list {
+	struct list_head	p_head;
+};
+
+struct path_component {
+	struct list_head	pc_list;
+	uint64_t		pc_ino;
+	char			*pc_fname;
+};
+
+/* Initialize a path component with a given name. */
+struct path_component *
+path_component_init(
+	const char		*name,
+	uint64_t		ino)
+{
+	struct path_component	*pc;
+
+	pc = malloc(sizeof(struct path_component));
+	if (!pc)
+		return NULL;
+	INIT_LIST_HEAD(&pc->pc_list);
+	pc->pc_fname = strdup(name);
+	if (!pc->pc_fname) {
+		free(pc);
+		return NULL;
+	}
+	pc->pc_ino = ino;
+	return pc;
+}
+
+/* Free a path component. */
+void
+path_component_free(
+	struct path_component	*pc)
+{
+	free(pc->pc_fname);
+	free(pc);
+}
+
+/* Initialize a pathname or returns positive errno. */
+struct path_list *
+path_list_init(void)
+{
+	struct path_list	*path;
+
+	path = malloc(sizeof(struct path_list));
+	if (!path)
+		return NULL;
+	INIT_LIST_HEAD(&path->p_head);
+	return path;
+}
+
+/* Empty out a pathname. */
+void
+path_list_free(
+	struct path_list	*path)
+{
+	struct path_component	*pos;
+	struct path_component	*n;
+
+	list_for_each_entry_safe(pos, n, &path->p_head, pc_list) {
+		path_list_del_component(path, pos);
+		path_component_free(pos);
+	}
+	free(path);
+}
+
+/* Add a parent component to a pathname. */
+void
+path_list_add_parent_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_add(&pc->pc_list, &path->p_head);
+}
+
+/* Add a component to a pathname. */
+void
+path_list_add_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_add_tail(&pc->pc_list, &path->p_head);
+}
+
+/* Remove a component from a pathname. */
+void
+path_list_del_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_del_init(&pc->pc_list);
+}
+
+/*
+ * Convert a pathname into a string or returns -1 if the buffer isn't long
+ * enough.
+ */
+ssize_t
+path_list_to_string(
+	const struct path_list	*path,
+	char			*buf,
+	size_t			buflen)
+{
+	struct path_component	*pos;
+	char			*buf_end = buf + buflen;
+	ssize_t			bytes = 0;
+	int			ret;
+
+	list_for_each_entry(pos, &path->p_head, pc_list) {
+		if (buf >= buf_end)
+			return -1;
+
+		ret = snprintf(buf, buflen, "/%s", pos->pc_fname);
+		if (ret < 0 || ret >= buflen)
+			return -1;
+
+		bytes += ret;
+		buf += ret;
+		buflen -= ret;
+	}
+	return bytes;
+}
+
+/* Walk each component of a path. */
+int
+path_walk_components(
+	const struct path_list	*path,
+	path_walk_fn_t		fn,
+	void			*arg)
+{
+	struct path_component	*pos;
+	int			ret;
+
+	list_for_each_entry(pos, &path->p_head, pc_list) {
+		ret = fn(pos->pc_fname, pos->pc_ino, arg);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/* Will this path contain a loop if we add this inode? */
+bool
+path_will_loop(
+	const struct path_list	*path_list,
+	uint64_t		ino)
+{
+	struct path_component	*pc;
+	unsigned int		nr = 0;
+
+	list_for_each_entry(pc, &path_list->p_head, pc_list) {
+		if (pc->pc_ino == ino)
+			return true;
+
+		/* 256 path components should be enough for anyone. */
+		if (++nr > 256)
+			return true;
+	}
+
+	return false;
+}
diff --git a/libfrog/paths.h b/libfrog/paths.h
index f20a2c3ef582..306fd3cb8fde 100644
--- a/libfrog/paths.h
+++ b/libfrog/paths.h
@@ -58,4 +58,29 @@ typedef struct fs_cursor {
 extern void fs_cursor_initialise(char *__dir, uint __flags, fs_cursor_t *__cp);
 extern fs_path_t *fs_cursor_next_entry(fs_cursor_t *__cp);
 
+/* Path information. */
+
+struct path_list;
+struct path_component;
+
+struct path_component *path_component_init(const char *name, uint64_t ino);
+void path_component_free(struct path_component *pc);
+
+struct path_list *path_list_init(void);
+void path_list_free(struct path_list *path);
+void path_list_add_parent_component(struct path_list *path,
+		struct path_component *pc);
+void path_list_add_component(struct path_list *path, struct path_component *pc);
+void path_list_del_component(struct path_list *path, struct path_component *pc);
+
+ssize_t path_list_to_string(const struct path_list *path, char *buf,
+		size_t buflen);
+
+typedef int (*path_walk_fn_t)(const char *name, uint64_t ino, void *arg);
+
+int path_walk_components(const struct path_list *path, path_walk_fn_t fn,
+		void *arg);
+
+bool path_will_loop(const struct path_list *path, uint64_t ino);
+
 #endif	/* __LIBFROG_PATH_H__ */
diff --git a/libhandle/handle.c b/libhandle/handle.c
index 333c21909007..1e8fe9ac5f10 100644
--- a/libhandle/handle.c
+++ b/libhandle/handle.c
@@ -29,7 +29,6 @@ typedef union {
 } comarg_t;
 
 static int obj_to_handle(char *, int, unsigned int, comarg_t, void**, size_t*);
-static int handle_to_fsfd(void *, char **);
 static char *path_to_fspath(char *path);
 
 
@@ -203,8 +202,10 @@ handle_to_fshandle(
 	return 0;
 }
 
-static int
-handle_to_fsfd(void *hanp, char **path)
+int
+handle_to_fsfd(
+	void		*hanp,
+	char		**path)
 {
 	struct fdhash	*fdhp;
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  1:12   ` [PATCH 06/24] libfrog: add parent pointer support code Darrick J. Wong
@ 2024-07-02  1:12   ` Darrick J. Wong
  2024-07-02  6:26     ` Christoph Hellwig
  2024-07-02  1:12   ` [PATCH 08/24] xfs_io: Add i, n and f flags to parent command Darrick J. Wong
                     ` (16 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:12 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Darrick J. Wong <djwong@kernel.org>

For ages, xfs_io has had a totally useless 'parent' command that enabled
callers to walk the parents or print the directory tree path of an open
file.  This code used the ioctl interface presented by SGI's version of
parent pointers that was never merged.  Rework the code in here to use
the new ioctl interfaces that we've settled upon.  Get rid of the old
parent pointer checking code since xfs_repair/xfs_scrub will take care
of that.

(This originally was in the "xfsprogs: implement the upper half of
parent pointers" megapatch.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 io/parent.c       |  504 +++++++++++++++--------------------------------------
 man/man8/xfs_io.8 |   25 ++-
 2 files changed, 161 insertions(+), 368 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 8f63607ffec2..927d05d70b0c 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -7,365 +7,88 @@
 #include "command.h"
 #include "input.h"
 #include "libfrog/paths.h"
-#include "parent.h"
+#include "libfrog/getparents.h"
 #include "handle.h"
-#include "jdm.h"
 #include "init.h"
 #include "io.h"
 
-#define PARENTBUF_SZ		16384
-#define BSTATBUF_SZ		16384
-
 static cmdinfo_t parent_cmd;
-static int verbose_flag;
-static int err_status;
-static __u64 inodes_checked;
 static char *mntpt;
 
-/*
- * check out a parent entry to see if the values seem valid
- */
-static void
-check_parent_entry(struct xfs_bstat *bstatp, parent_t *parent)
-{
-	int sts;
-	char fullpath[PATH_MAX];
-	struct stat statbuf;
-	char *str;
-
-	sprintf(fullpath, _("%s%s"), mntpt, parent->p_name);
-
-	sts = lstat(fullpath, &statbuf);
-	if (sts != 0) {
-		fprintf(stderr,
-			_("inode-path for inode: %llu is incorrect - path \"%s\" non-existent\n"),
-			(unsigned long long) bstatp->bs_ino, fullpath);
-		if (verbose_flag) {
-			fprintf(stderr,
-				_("path \"%s\" does not stat for inode: %llu; err = %s\n"),
-				fullpath,
-			       (unsigned long long) bstatp->bs_ino,
-				strerror(errno));
-		}
-		err_status++;
-		return;
-	} else {
-		if (verbose_flag > 1) {
-			printf(_("path \"%s\" found\n"), fullpath);
-		}
-	}
-
-	if (statbuf.st_ino != bstatp->bs_ino) {
-		fprintf(stderr,
-			_("inode-path for inode: %llu is incorrect - wrong inode#\n"),
-		       (unsigned long long) bstatp->bs_ino);
-		if (verbose_flag) {
-			fprintf(stderr,
-				_("ino mismatch for path \"%s\" %llu vs %llu\n"),
-				fullpath,
-				(unsigned long long)statbuf.st_ino,
-				(unsigned long long)bstatp->bs_ino);
-		}
-		err_status++;
-		return;
-	} else if (verbose_flag > 1) {
-		printf(_("inode number match: %llu\n"),
-			(unsigned long long)statbuf.st_ino);
-	}
-
-	/* get parent path */
-	str = strrchr(fullpath, '/');
-	*str = '\0';
-	sts = stat(fullpath, &statbuf);
-	if (sts != 0) {
-		fprintf(stderr,
-			_("parent path \"%s\" does not stat: %s\n"),
-			fullpath,
-			strerror(errno));
-		err_status++;
-		return;
-	} else {
-		if (parent->p_ino != statbuf.st_ino) {
-			fprintf(stderr,
-				_("inode-path for inode: %llu is incorrect - wrong parent inode#\n"),
-			       (unsigned long long) bstatp->bs_ino);
-			if (verbose_flag) {
-				fprintf(stderr,
-					_("ino mismatch for path \"%s\" %llu vs %llu\n"),
-					fullpath,
-					(unsigned long long)parent->p_ino,
-					(unsigned long long)statbuf.st_ino);
-			}
-			err_status++;
-			return;
-		} else {
-			if (verbose_flag > 1) {
-			       printf(_("parent ino match for %llu\n"),
-				       (unsigned long long) parent->p_ino);
-			}
-		}
-	}
-}
-
-static void
-check_parents(parent_t *parentbuf, size_t *parentbuf_size,
-	     jdm_fshandle_t *fshandlep, struct xfs_bstat *statp)
-{
-	int error, i;
-	__u32 count;
-	parent_t *entryp;
-
-	do {
-		error = jdm_parentpaths(fshandlep, statp, parentbuf, *parentbuf_size, &count);
-
-		if (error == ERANGE) {
-			*parentbuf_size *= 2;
-			parentbuf = (parent_t *)realloc(parentbuf, *parentbuf_size);
-		} else if (error) {
-			fprintf(stderr, _("parentpaths failed for ino %llu: %s\n"),
-			       (unsigned long long) statp->bs_ino,
-				strerror(errno));
-			err_status++;
-			break;
-		}
-	} while (error == ERANGE);
-
-
-	if (count == 0) {
-		/* no links for inode - something wrong here */
-	       fprintf(stderr, _("inode-path for inode: %llu is missing\n"),
-			       (unsigned long long) statp->bs_ino);
-		err_status++;
-	}
-
-	entryp = parentbuf;
-	for (i = 0; i < count; i++) {
-		check_parent_entry(statp, entryp);
-		entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
-	}
-}
-
-static int
-do_bulkstat(parent_t *parentbuf, size_t *parentbuf_size,
-	    struct xfs_bstat *bstatbuf, int fsfd, jdm_fshandle_t *fshandlep)
-{
-	__s32 buflenout;
-	__u64 lastino = 0;
-	struct xfs_bstat *p;
-	struct xfs_bstat *endp;
-	struct xfs_fsop_bulkreq bulkreq;
-	struct stat mntstat;
-
-	if (stat(mntpt, &mntstat)) {
-		fprintf(stderr, _("can't stat mount point \"%s\": %s\n"),
-			mntpt, strerror(errno));
-		return 1;
-	}
-
-	bulkreq.lastip  = &lastino;
-	bulkreq.icount  = BSTATBUF_SZ;
-	bulkreq.ubuffer = (void *)bstatbuf;
-	bulkreq.ocount  = &buflenout;
-
-	while (xfsctl(mntpt, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq) == 0) {
-		if (*(bulkreq.ocount) == 0) {
-			return 0;
-		}
-		for (p = bstatbuf, endp = bstatbuf + *bulkreq.ocount; p < endp; p++) {
-
-			/* inode being modified, get synced data with iget */
-			if ( (!p->bs_nlink || !p->bs_mode) && p->bs_ino != 0 ) {
-
-				if (xfsctl(mntpt, fsfd, XFS_IOC_FSBULKSTAT_SINGLE, &bulkreq) < 0) {
-				    fprintf(stderr,
-					  _("failed to get bulkstat information for inode %llu\n"),
-					 (unsigned long long) p->bs_ino);
-				    continue;
-				}
-				if (!p->bs_nlink || !p->bs_mode || !p->bs_ino) {
-				    fprintf(stderr,
-					  _("failed to get valid bulkstat information for inode %llu\n"),
-					 (unsigned long long) p->bs_ino);
-				    continue;
-				}
-			}
-
-			/* skip root */
-			if (p->bs_ino == mntstat.st_ino) {
-				continue;
-			}
-
-			if (verbose_flag > 1) {
-			       printf(_("checking inode %llu\n"),
-				       (unsigned long long) p->bs_ino);
-			}
-
-			/* print dotted progress */
-			if ((inodes_checked % 100) == 0 && verbose_flag == 1) {
-				printf("."); fflush(stdout);
-			}
-			inodes_checked++;
-
-			check_parents(parentbuf, parentbuf_size, fshandlep, p);
-		}
-
-	}/*while*/
-
-	fprintf(stderr, _("syssgi bulkstat failed: %s\n"), strerror(errno));
-	return 1;
-}
+struct pptr_args {
+	char		*pathbuf;
+};
 
 static int
-parent_check(void)
+pptr_print(
+	const struct parent_rec	*rec,
+	void			*arg)
 {
-	int fsfd;
-	jdm_fshandle_t *fshandlep;
-	parent_t *parentbuf;
-	size_t parentbuf_size = PARENTBUF_SZ;
-	struct xfs_bstat *bstatbuf;
+	const struct xfs_fid	*fid = &rec->p_handle.ha_fid;
 
-	err_status = 0;
-	inodes_checked = 0;
-
-	sync();
-
-        fsfd = file->fd;
-
-	fshandlep = jdm_getfshandle(mntpt);
-	if (fshandlep == NULL) {
-		fprintf(stderr, _("unable to open \"%s\" for jdm: %s\n"),
-		      mntpt,
-		      strerror(errno));
-		return 1;
+	if (rec->p_flags & PARENTREC_FILE_IS_ROOT) {
+		printf(_("Root directory.\n"));
+		return 0;
 	}
 
-	/* allocate buffers */
-        bstatbuf = (struct xfs_bstat *)calloc(BSTATBUF_SZ, sizeof(struct xfs_bstat));
-	parentbuf = (parent_t *)malloc(parentbuf_size);
-	if (!bstatbuf || !parentbuf) {
-		fprintf(stderr, _("unable to allocate buffers: %s\n"),
-			strerror(errno));
-		err_status = 1;
-		goto out;
-	}
-
-	if (do_bulkstat(parentbuf, &parentbuf_size, bstatbuf, fsfd, fshandlep) != 0)
-		err_status++;
-
-	if (err_status > 0)
-		fprintf(stderr, _("num errors: %d\n"), err_status);
-	else
-		printf(_("succeeded checking %llu inodes\n"),
-			(unsigned long long) inodes_checked);
-
-out:
-	free(bstatbuf);
-	free(parentbuf);
-	free(fshandlep);
-	return err_status;
-}
+	printf(_("p_ino     = %llu\n"), (unsigned long long)fid->fid_ino);
+	printf(_("p_gen     = %u\n"), (unsigned int)fid->fid_gen);
+	printf(_("p_namelen = %zu\n"), strlen(rec->p_name));
+	printf(_("p_name    = \"%s\"\n\n"), rec->p_name);
 
-static void
-print_parent_entry(parent_t *parent, int fullpath)
-{
-       printf(_("p_ino    = %llu\n"),  (unsigned long long) parent->p_ino);
-	printf(_("p_gen    = %u\n"),	parent->p_gen);
-	printf(_("p_reclen = %u\n"),	parent->p_reclen);
-	if (fullpath)
-		printf(_("p_name   = \"%s%s\"\n"), mntpt, parent->p_name);
-	else
-		printf(_("p_name   = \"%s\"\n"), parent->p_name);
+	return 0;
 }
 
 static int
-parent_list(int fullpath)
+paths_print(
+	const char		*mntpt,
+	const struct path_list	*path,
+	void			*arg)
 {
-	void *handlep = NULL;
-	size_t handlen;
-	int error, i;
-	int retval = 1;
-	__u32 count;
-	parent_t *entryp;
-	parent_t *parentbuf = NULL;
-	char *path = file->name;
-	int pb_size = PARENTBUF_SZ;
-
-	/* XXXX for linux libhandle version - to set libhandle fsfd cache */
-	{
-		void *fshandle;
-		size_t fshlen;
-
-		if (path_to_fshandle(mntpt, &fshandle, &fshlen) != 0) {
-			fprintf(stderr, _("%s: failed path_to_fshandle \"%s\": %s\n"),
-				progname, path, strerror(errno));
-			goto error;
-		}
-		free_handle(fshandle, fshlen);
-	}
-
-	if (path_to_handle(path, &handlep, &handlen) != 0) {
-		fprintf(stderr, _("%s: path_to_handle failed for \"%s\"\n"), progname, path);
-		goto error;
-	}
-
-	do {
-		parentbuf = (parent_t *)realloc(parentbuf, pb_size);
-		if (!parentbuf) {
-			fprintf(stderr, _("%s: unable to allocate parent buffer: %s\n"),
-				progname, strerror(errno));
-			goto error;
-		}
-
-		if (fullpath) {
-			error = parentpaths_by_handle(handlep,
-						       handlen,
-						       parentbuf,
-						       pb_size,
-						       &count);
-		} else {
-			error = parents_by_handle(handlep,
-						   handlen,
-						   parentbuf,
-						   pb_size,
-						   &count);
-		}
-		if (error == ERANGE) {
-			pb_size *= 2;
-		} else if (error) {
-			fprintf(stderr, _("%s: %s call failed for \"%s\": %s\n"),
-				progname, fullpath ? "parentpaths" : "parents",
-				path, strerror(errno));
-			goto error;
-		}
-	} while (error == ERANGE);
-
-	if (count == 0) {
-		/* no links for inode - something wrong here */
-		fprintf(stderr, _("%s: inode-path is missing\n"), progname);
-		goto error;
-	}
-
-	entryp = parentbuf;
-	for (i = 0; i < count; i++) {
-		print_parent_entry(entryp, fullpath);
-		entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
-	}
-
-	retval = 0;
-error:
-	free(handlep);
-	free(parentbuf);
-	return retval;
+	struct pptr_args	*args = arg;
+	char			*buf = args->pathbuf;
+	size_t			len = MAXPATHLEN;
+	int			mntpt_len = strlen(mntpt);
+	int			ret;
+
+	/* Trim trailing slashes from the mountpoint */
+	while (mntpt_len > 0 && mntpt[mntpt_len - 1] == '/')
+		mntpt_len--;
+
+	ret = snprintf(buf, len, "%.*s", mntpt_len, mntpt);
+	if (ret != mntpt_len)
+		return ENAMETOOLONG;
+
+	ret = path_list_to_string(path, buf + ret, len - ret);
+	if (ret < 0)
+		return ENAMETOOLONG;
+
+	printf("%s\n", buf);
+	return 0;
 }
 
 static int
-parent_f(int argc, char **argv)
+parent_f(
+	int			argc,
+	char			**argv)
 {
-	int c;
-	int listpath_flag = 0;
-	int check_flag = 0;
-	fs_path_t *fs;
-	static int tab_init;
+	char			pathbuf[MAXPATHLEN + 1];
+	struct pptr_args	args = {
+		.pathbuf	= pathbuf,
+	};
+	struct xfs_handle	handle;
+	void			*hanp = NULL;
+	size_t			hlen;
+	struct fs_path		*fs;
+	char			*p;
+	uint64_t		ino = 0;
+	uint32_t		gen = 0;
+	int			c;
+	int			listpath_flag = 0;
+	int			ret;
+	size_t			ioctl_bufsize = 8192;
+	bool			single_path = false;
+	static int		tab_init;
 
 	if (!tab_init) {
 		tab_init = 1;
@@ -380,46 +103,110 @@ parent_f(int argc, char **argv)
 	}
 	mntpt = fs->fs_dir;
 
-	verbose_flag = 0;
-
-	while ((c = getopt(argc, argv, "cpv")) != EOF) {
+	while ((c = getopt(argc, argv, "b:pz")) != EOF) {
 		switch (c) {
-		case 'c':
-			check_flag = 1;
+		case 'b':
+			errno = 0;
+			ioctl_bufsize = atoi(optarg);
+			if (errno) {
+				perror(optarg);
+				exitcode = 1;
+				return 1;
+			}
 			break;
 		case 'p':
 			listpath_flag = 1;
 			break;
-		case 'v':
-			verbose_flag++;
+		case 'z':
+			single_path = true;
 			break;
 		default:
 			return command_usage(&parent_cmd);
 		}
 	}
 
-	if (!check_flag && !listpath_flag) /* default case */
-		exitcode = parent_list(listpath_flag);
-	else {
-		if (listpath_flag)
-			exitcode = parent_list(listpath_flag);
-		if (check_flag)
-			exitcode = parent_check();
+	/*
+	 * Always initialize the fshandle table because we need it for
+	 * the ppaths functions to work.
+	 */
+	ret = path_to_fshandle((char *)mntpt, &hanp, &hlen);
+	if (ret) {
+		perror(mntpt);
+		return 0;
 	}
 
+	if (optind + 2 == argc) {
+		ino = strtoull(argv[optind], &p, 0);
+		if (*p != '\0' || ino == 0) {
+			fprintf(stderr,
+				_("Bad inode number '%s'.\n"),
+				argv[optind]);
+			return 0;
+		}
+		gen = strtoul(argv[optind + 1], &p, 0);
+		if (*p != '\0') {
+			fprintf(stderr,
+				_("Bad generation number '%s'.\n"),
+				argv[optind + 1]);
+			return 0;
+		}
+
+		memcpy(&handle, hanp, sizeof(handle));
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+				sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_ino = ino;
+		handle.ha_fid.fid_gen = gen;
+	} else if (optind != argc) {
+		return command_usage(&parent_cmd);
+	}
+
+	if (single_path) {
+		if (ino)
+			ret = handle_to_path(&handle, sizeof(handle),
+					ioctl_bufsize, pathbuf, MAXPATHLEN);
+		else
+			ret = fd_to_path(file->fd, ioctl_bufsize,
+					pathbuf, MAXPATHLEN);
+		if (!ret)
+			printf("%s\n", pathbuf);
+	} else if (listpath_flag) {
+		if (ino)
+			ret = handle_walk_paths(&handle, sizeof(handle),
+					ioctl_bufsize, paths_print, &args);
+		else
+			ret = fd_walk_paths(file->fd, ioctl_bufsize,
+					paths_print, &args);
+	} else {
+		if (ino)
+			ret = handle_walk_parents(&handle, sizeof(handle),
+					ioctl_bufsize, pptr_print, &args);
+		else
+			ret = fd_walk_parents(file->fd, ioctl_bufsize,
+					pptr_print, &args);
+	}
+
+	if (hanp)
+		free_handle(hanp, hlen);
+	if (ret) {
+		exitcode = 1;
+		fprintf(stderr, "%s: %s\n", file->name, strerror(ret));
+	}
 	return 0;
 }
 
 static void
 parent_help(void)
 {
-	printf(_(
+printf(_(
 "\n"
 " list the current file's parents and their filenames\n"
 "\n"
-" -c -- check the current file's file system for parent consistency\n"
-" -p -- list the current file's parents and their full paths\n"
-" -v -- verbose mode\n"
+" -b -- use this many bytes to hold parent pointer records\n"
+" -p -- list the current file's paths up to the root\n"
+" -z -- print only the first path from the root\n"
+"\n"
+"If ino and gen are supplied, use them instead.\n"
 "\n"));
 }
 
@@ -430,11 +217,10 @@ parent_init(void)
 	parent_cmd.cfunc = parent_f;
 	parent_cmd.argmin = 0;
 	parent_cmd.argmax = -1;
-	parent_cmd.args = _("[-cpv]");
+	parent_cmd.args = _("[-pz] [-b bufsize] [ino gen]");
 	parent_cmd.flags = CMD_NOMAP_OK;
-	parent_cmd.oneline = _("print or check parent inodes");
+	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
 
-	if (expert)
-		add_command(&parent_cmd);
+	add_command(&parent_cmd);
 }
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 2a7c67f7cf3a..b9d5447703fc 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1004,25 +1004,32 @@ and
 options behave as described above, in
 .B chproj.
 .TP
-.BR parent " [ " \-cpv " ]"
+.BR parent " [ " \-pz " ] [ " \-b " bufsize ] [" " ino gen " "]"
 By default this command prints out the parent inode numbers,
 inode generation numbers and basenames of all the hardlinks which
 point to the inode of the current file.
+
+If the optional
+.B ino
+and
+.B gen
+parameters are provided, they will be used to create a file handle on the same
+filesystem as the open file.
+The parents of the file represented by the handle will be reported instead of
+the open file.
+
 .RS 1.0i
 .PD 0
 .TP 0.4i
+.B \-b
+Use a buffer of this size to receive parent pointer records from the kernel.
+.TP
 .B \-p
 the output is similar to the default output except pathnames up to
 the mount-point are printed out instead of the component name.
 .TP
-.B \-c
-the file's filesystem will check all the parent attributes for consistency.
-.TP
-.B \-v
-verbose output will be printed.
-.RE
-.IP
-.B [NOTE: Not currently operational on Linux.]
+.B \-z
+Print only the first path from the root.
 .RE
 .PD
 .TP


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 08/24] xfs_io: Add i, n and f flags to parent command
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  1:12   ` [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls Darrick J. Wong
@ 2024-07-02  1:12   ` Darrick J. Wong
  2024-07-02  6:26     ` Christoph Hellwig
  2024-07-02  1:13   ` [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully Darrick J. Wong
                     ` (15 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:12 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds the flags i, n, and f to the parent command. These flags add
filtering options that are used by the new parent pointer tests in xfstests, and
help to improve the test run time.  The flags are:

-i: Only show parent pointer records containing the given inode
-n: Only show parent pointer records containing the given filename
-f: Print records in short format: ino/gen/namelen/name

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adapt to new getparents ioctl]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c       |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 man/man8/xfs_io.8 |   13 ++++++++++-
 2 files changed, 70 insertions(+), 4 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 927d05d70b0c..8db93d987552 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -17,6 +17,9 @@ static char *mntpt;
 
 struct pptr_args {
 	char		*pathbuf;
+	char		*filter_name;
+	uint64_t	filter_ino;
+	bool		shortformat;
 };
 
 static int
@@ -25,12 +28,27 @@ pptr_print(
 	void			*arg)
 {
 	const struct xfs_fid	*fid = &rec->p_handle.ha_fid;
+	struct pptr_args	*args = arg;
 
 	if (rec->p_flags & PARENTREC_FILE_IS_ROOT) {
 		printf(_("Root directory.\n"));
 		return 0;
 	}
 
+	if (args->filter_ino && fid->fid_ino != args->filter_ino)
+		return 0;
+	if (args->filter_name && strcmp(args->filter_name, rec->p_name))
+		return 0;
+
+	if (args->shortformat) {
+		printf("%llu:%u:%zu:%s\n",
+				(unsigned long long)fid->fid_ino,
+				(unsigned int)fid->fid_gen,
+				strlen(rec->p_name),
+				rec->p_name);
+		return 0;
+	}
+
 	printf(_("p_ino     = %llu\n"), (unsigned long long)fid->fid_ino);
 	printf(_("p_gen     = %u\n"), (unsigned int)fid->fid_gen);
 	printf(_("p_namelen = %zu\n"), strlen(rec->p_name));
@@ -39,6 +57,21 @@ pptr_print(
 	return 0;
 }
 
+static int
+filter_path_components(
+	const char		*name,
+	uint64_t		ino,
+	void			*arg)
+{
+	struct pptr_args	*args = arg;
+
+	if (args->filter_ino && ino == args->filter_ino)
+		return ECANCELED;
+	if (args->filter_name && !strcmp(args->filter_name, name))
+		return ECANCELED;
+	return 0;
+}
+
 static int
 paths_print(
 	const char		*mntpt,
@@ -51,6 +84,12 @@ paths_print(
 	int			mntpt_len = strlen(mntpt);
 	int			ret;
 
+	if (args->filter_ino || args->filter_name) {
+		ret = path_walk_components(path, filter_path_components, args);
+		if (ret != ECANCELED)
+			return 0;
+	}
+
 	/* Trim trailing slashes from the mountpoint */
 	while (mntpt_len > 0 && mntpt[mntpt_len - 1] == '/')
 		mntpt_len--;
@@ -103,7 +142,7 @@ parent_f(
 	}
 	mntpt = fs->fs_dir;
 
-	while ((c = getopt(argc, argv, "b:pz")) != EOF) {
+	while ((c = getopt(argc, argv, "b:i:n:psz")) != EOF) {
 		switch (c) {
 		case 'b':
 			errno = 0;
@@ -114,9 +153,24 @@ parent_f(
 				return 1;
 			}
 			break;
+		case 'i':
+			args.filter_ino = strtoull(optarg, &p, 0);
+			if (*p != '\0' || args.filter_ino == 0) {
+				fprintf(stderr, _("Bad inode number '%s'.\n"),
+						optarg);
+				exitcode = 1;
+				return 1;
+			}
+			break;
+		case 'n':
+			args.filter_name = optarg;
+			break;
 		case 'p':
 			listpath_flag = 1;
 			break;
+		case 's':
+			args.shortformat = true;
+			break;
 		case 'z':
 			single_path = true;
 			break;
@@ -203,7 +257,10 @@ printf(_(
 " list the current file's parents and their filenames\n"
 "\n"
 " -b -- use this many bytes to hold parent pointer records\n"
+" -i -- Only show parent pointer records containing the given inode\n"
+" -n -- Only show parent pointer records containing the given filename\n"
 " -p -- list the current file's paths up to the root\n"
+" -s -- Print records in short format: ino/gen/namelen/filename\n"
 " -z -- print only the first path from the root\n"
 "\n"
 "If ino and gen are supplied, use them instead.\n"
@@ -217,7 +274,7 @@ parent_init(void)
 	parent_cmd.cfunc = parent_f;
 	parent_cmd.argmin = 0;
 	parent_cmd.argmax = -1;
-	parent_cmd.args = _("[-pz] [-b bufsize] [ino gen]");
+	parent_cmd.args = _("[-psz] [-b bufsize] [-i ino] [-n name] [ino gen]");
 	parent_cmd.flags = CMD_NOMAP_OK;
 	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index b9d5447703fc..02036e3d093b 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1004,7 +1004,7 @@ and
 options behave as described above, in
 .B chproj.
 .TP
-.BR parent " [ " \-pz " ] [ " \-b " bufsize ] [" " ino gen " "]"
+.BR parent " [ " \-fpz " ] [ " \-b " bufsize ] [ " \-i " ino ] [ " \-n " name ] [" " ino gen " "]"
 By default this command prints out the parent inode numbers,
 inode generation numbers and basenames of all the hardlinks which
 point to the inode of the current file.
@@ -1023,11 +1023,20 @@ the open file.
 .TP 0.4i
 .B \-b
 Use a buffer of this size to receive parent pointer records from the kernel.
-.TP
+.TP 0.4i
+.B \-i
+Only show parent pointer records containing this inode number.
+.TP 0.4i
+.B \-n
+Only show parent pointer records containing this directory entry name.
+.TP 0.4i
 .B \-p
 the output is similar to the default output except pathnames up to
 the mount-point are printed out instead of the component name.
 .TP
+.B \-s
+Print records in short format: ino/gen/namelen/name
+.TP
 .B \-z
 Print only the first path from the root.
 .RE


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-07-02  1:12   ` [PATCH 08/24] xfs_io: Add i, n and f flags to parent command Darrick J. Wong
@ 2024-07-02  1:13   ` Darrick J. Wong
  2024-07-02  6:27     ` Christoph Hellwig
  2024-07-02  1:13   ` [PATCH 10/24] xfs_spaceman: report file paths Darrick J. Wong
                     ` (14 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:13 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies the ATTRI print routines to look for the parent
pointer flag, and decode logged parent pointers fully when dumping log
contents.  Between the existing ATTRI: printouts and the new ones
introduced here, we can figure out what was stored in each log iovec,
as well as the higher level parent pointer that was logged.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    3 ++
 logprint/log_redo.c      |   77 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index c36a6ac81a7b..b7947591d2db 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -190,6 +190,9 @@
 #define xfs_log_sb			libxfs_log_sb
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
 #define xfs_mkdir_space_res		libxfs_mkdir_space_res
+#define xfs_parent_add			libxfs_parent_add
+#define xfs_parent_finish		libxfs_parent_finish
+#define xfs_parent_start		libxfs_parent_start
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_hold			libxfs_perag_hold
 #define xfs_perag_put			libxfs_perag_put
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 1d55164a90ff..684e5f4a3f32 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -674,6 +674,55 @@ xfs_attri_copy_log_format(
 	return 1;
 }
 
+static void
+dump_pptr(
+	const char			*tag,
+	const void			*name_ptr,
+	unsigned int			name_len,
+	const void			*value_ptr,
+	unsigned int			value_len)
+{
+	const struct xfs_parent_rec	*rec = value_ptr;
+
+	if (value_len < sizeof(struct xfs_parent_rec)) {
+		printf("PPTR: %s CORRUPT\n", tag);
+		return;
+	}
+
+	printf("PPTR: %s attr_namelen %u attr_valuelen %u\n", tag, name_len, value_len);
+	printf("PPTR: %s parent_ino %llu parent_gen %u name '%.*s'\n",
+			tag,
+			(unsigned long long)be64_to_cpu(rec->p_ino),
+			(unsigned int)be32_to_cpu(rec->p_gen),
+			name_len,
+			(char *)name_ptr);
+}
+
+static void
+dump_pptr_update(
+	const void	*name_ptr,
+	unsigned int	name_len,
+	const void	*new_name_ptr,
+	unsigned int	new_name_len,
+	const void	*value_ptr,
+	unsigned int	value_len,
+	const void	*new_value_ptr,
+	unsigned int	new_value_len)
+{
+	if (new_name_ptr && name_ptr) {
+		dump_pptr("OLDNAME", name_ptr, name_len, value_ptr, value_len);
+		dump_pptr("NEWNAME", new_name_ptr, new_name_len, new_value_ptr,
+				new_value_len);
+		return;
+	}
+
+	if (name_ptr)
+		dump_pptr("NAME", name_ptr, name_len, value_ptr, value_len);
+	if (new_name_ptr)
+		dump_pptr("NEWNAME", new_name_ptr, new_name_len, new_value_ptr,
+				new_value_len);
+}
+
 static inline unsigned int
 xfs_attr_log_item_op(const struct xfs_attri_log_format *attrp)
 {
@@ -688,6 +737,10 @@ xlog_print_trans_attri(
 {
 	struct xfs_attri_log_format	*src_f = NULL;
 	xlog_op_header_t		*head = NULL;
+	void				*name_ptr = NULL;
+	void				*new_name_ptr = NULL;
+	void				*value_ptr = NULL;
+	void				*new_value_ptr = NULL;
 	uint				dst_len;
 	unsigned int			name_len = 0;
 	unsigned int			new_name_len = 0;
@@ -742,6 +795,7 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		name_ptr = *ptr;
 		error = xlog_print_trans_attri_name(ptr,
 				be32_to_cpu(head->oh_len), "name");
 		if (error)
@@ -753,6 +807,7 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		new_name_ptr = *ptr;
 		error = xlog_print_trans_attri_name(ptr,
 				be32_to_cpu(head->oh_len), "newname");
 		if (error)
@@ -764,6 +819,7 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		value_ptr = *ptr;
 		error = xlog_print_trans_attri_value(ptr,
 				be32_to_cpu(head->oh_len), value_len, "value");
 		if (error)
@@ -775,12 +831,19 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		new_value_ptr = *ptr;
 		error = xlog_print_trans_attri_value(ptr,
 				be32_to_cpu(head->oh_len), new_value_len,
 				"newvalue");
 		if (error)
 			goto error;
 	}
+
+	if (src_f->alfi_attr_filter & XFS_ATTR_PARENT)
+		dump_pptr_update(name_ptr, name_len,
+				 new_name_ptr, new_name_len,
+				 value_ptr, value_len,
+				 new_value_ptr, new_value_len);
 error:
 	free(src_f);
 
@@ -823,6 +886,10 @@ xlog_recover_print_attri(
 	struct xlog_recover_item	*item)
 {
 	struct xfs_attri_log_format	*f, *src_f = NULL;
+	void				*name_ptr = NULL;
+	void				*new_name_ptr = NULL;
+	void				*value_ptr = NULL;
+	void				*new_value_ptr = NULL;
 	uint				src_len, dst_len;
 	unsigned int			name_len = 0;
 	unsigned int			new_name_len = 0;
@@ -874,6 +941,7 @@ xlog_recover_print_attri(
 		printf(_("ATTRI:  name len:%u\n"), name_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr,
 			       name_len);
+		name_ptr = item->ri_buf[region].i_addr;
 	}
 
 	if (new_name_len > 0) {
@@ -881,6 +949,7 @@ xlog_recover_print_attri(
 		printf(_("ATTRI:  newname len:%u\n"), new_name_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr,
 			       new_name_len);
+		new_name_ptr = item->ri_buf[region].i_addr;
 	}
 
 	if (value_len > 0) {
@@ -889,6 +958,7 @@ xlog_recover_print_attri(
 		region++;
 		printf(_("ATTRI:  value len:%u\n"), value_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr, len);
+		value_ptr = item->ri_buf[region].i_addr;
 	}
 
 	if (new_value_len > 0) {
@@ -897,8 +967,15 @@ xlog_recover_print_attri(
 		region++;
 		printf(_("ATTRI:  newvalue len:%u\n"), new_value_len);
 		print_or_dump((char *)item->ri_buf[region].i_addr, len);
+		new_value_ptr = item->ri_buf[region].i_addr;
 	}
 
+	if (src_f->alfi_attr_filter & XFS_ATTR_PARENT)
+		dump_pptr_update(name_ptr, name_len,
+				 new_name_ptr, new_name_len,
+				 value_ptr, value_len,
+				 new_value_ptr, new_value_len);
+
 out:
 	free(f);
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 10/24] xfs_spaceman: report file paths
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-07-02  1:13   ` [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully Darrick J. Wong
@ 2024-07-02  1:13   ` Darrick J. Wong
  2024-07-02  6:27     ` Christoph Hellwig
  2024-07-02  1:13   ` [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
                     ` (13 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:13 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Teach the health command to report file paths when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_spaceman.8 |    7 +++++-
 spaceman/Makefile       |   16 +++++++++++---
 spaceman/file.c         |    7 ++++++
 spaceman/health.c       |   53 ++++++++++++++++++++++++++++++++++++++---------
 spaceman/space.h        |    3 +++
 5 files changed, 71 insertions(+), 15 deletions(-)


diff --git a/man/man8/xfs_spaceman.8 b/man/man8/xfs_spaceman.8
index ece840d7300a..0d299132a788 100644
--- a/man/man8/xfs_spaceman.8
+++ b/man/man8/xfs_spaceman.8
@@ -91,7 +91,7 @@ The output will have the same format that
 .BR "xfs_info" "(8)"
 prints when querying a filesystem.
 .TP
-.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-q ] [ paths ]"
+.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-n ] [ \-q ] [ paths ]"
 Reports the health of the given group of filesystem metadata.
 .RS 1.0i
 .PD 0
@@ -111,6 +111,11 @@ Report on the health of metadata that affect the entire filesystem.
 .B \-i inum
 Report on the health of a specific inode.
 .TP
+.B \-n
+When reporting on the health of a file, try to report the full file path,
+if possible.
+This option is disabled by default to minimize runtime.
+.TP
 .B \-q
 Report only unhealthy metadata.
 .TP
diff --git a/spaceman/Makefile b/spaceman/Makefile
index 1f048d54a4d2..358db9edf5cb 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -6,12 +6,20 @@ TOPDIR = ..
 include $(TOPDIR)/include/builddefs
 
 LTCOMMAND = xfs_spaceman
-HFILES = init.h space.h
-CFILES = info.c init.c file.c health.c prealloc.c trim.c
+HFILES = \
+	init.h \
+	space.h
+CFILES = \
+	file.c \
+	health.c \
+	info.c \
+	init.c \
+	prealloc.c \
+	trim.c
 LSRCFILES = xfs_info.sh
 
-LLDLIBS = $(LIBXCMD) $(LIBFROG)
-LTDEPENDENCIES = $(LIBXCMD) $(LIBFROG)
+LLDLIBS = $(LIBHANDLE) $(LIBXCMD) $(LIBFROG)
+LTDEPENDENCIES = $(LIBHANDLE) $(LIBXCMD) $(LIBFROG)
 LLDFLAGS = -static
 
 ifeq ($(ENABLE_EDITLINE),yes)
diff --git a/spaceman/file.c b/spaceman/file.c
index eec7ee9f4ba9..850688ace15d 100644
--- a/spaceman/file.c
+++ b/spaceman/file.c
@@ -14,6 +14,7 @@
 #include "libfrog/paths.h"
 #include "libfrog/fsgeom.h"
 #include "space.h"
+#include "handle.h"
 
 static cmdinfo_t print_cmd;
 
@@ -106,6 +107,12 @@ addfile(
 	file->name = filename;
 	memcpy(&file->xfd, xfd, sizeof(struct xfs_fd));
 	memcpy(&file->fs_path, fs_path, sizeof(file->fs_path));
+
+	/* Try to capture a fs handle for reporting paths. */
+	file->fshandle = NULL;
+	file->fshandle_len = 0;
+	path_to_fshandle(filename, &file->fshandle, &file->fshandle_len);
+
 	return 0;
 }
 
diff --git a/spaceman/health.c b/spaceman/health.c
index 88b12c0b0ea3..6722babf5888 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -13,11 +13,13 @@
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
 #include "space.h"
+#include "libfrog/getparents.h"
 
 static cmdinfo_t health_cmd;
 static unsigned long long reported;
 static bool comprehensive;
 static bool quiet;
+static bool report_paths;
 
 static bool has_realtime(const struct xfs_fsop_geom *g)
 {
@@ -265,6 +267,38 @@ report_file_health(
 
 #define BULKSTAT_NR		(128)
 
+static void
+report_inode(
+	const struct xfs_bulkstat	*bs)
+{
+	char				descr[PATH_MAX];
+	int				ret;
+
+	if (report_paths && file->fshandle &&
+	    (file->xfd.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT)) {
+		struct xfs_handle handle;
+
+		memcpy(&handle.ha_fsid, file->fshandle, sizeof(handle.ha_fsid));
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+				sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_ino = bs->bs_ino;
+		handle.ha_fid.fid_gen = bs->bs_gen;
+
+		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 0,
+				descr, sizeof(descr) - 1);
+		if (ret)
+			goto report_inum;
+
+		goto report_status;
+	}
+
+report_inum:
+	snprintf(descr, sizeof(descr) - 1, _("inode %"PRIu64), bs->bs_ino);
+report_status:
+	report_sick(descr, inode_flags, bs->bs_sick, bs->bs_checked);
+}
+
 /*
  * Report on all files' health for a given @agno.  If @agno is NULLAGNUMBER,
  * report on all files in the filesystem.
@@ -274,7 +308,6 @@ report_bulkstat_health(
 	xfs_agnumber_t		agno)
 {
 	struct xfs_bulkstat_req	*breq;
-	char			descr[256];
 	uint32_t		i;
 	int			error;
 
@@ -292,13 +325,8 @@ report_bulkstat_health(
 		error = -xfrog_bulkstat(&file->xfd, breq);
 		if (error)
 			break;
-		for (i = 0; i < breq->hdr.ocount; i++) {
-			snprintf(descr, sizeof(descr) - 1, _("inode %"PRIu64),
-					breq->bulkstat[i].bs_ino);
-			report_sick(descr, inode_flags,
-					breq->bulkstat[i].bs_sick,
-					breq->bulkstat[i].bs_checked);
-		}
+		for (i = 0; i < breq->hdr.ocount; i++)
+			report_inode(&breq->bulkstat[i]);
 	} while (breq->hdr.ocount > 0);
 
 	if (error)
@@ -308,7 +336,7 @@ report_bulkstat_health(
 	return error;
 }
 
-#define OPT_STRING ("a:cfi:q")
+#define OPT_STRING ("a:cfi:nq")
 
 /* Report on health problems in XFS filesystem. */
 static int
@@ -323,6 +351,7 @@ health_f(
 	int			ret;
 
 	reported = 0;
+	report_paths = false;
 
 	if (file->xfd.fsgeom.version != XFS_FSOP_GEOM_VERSION_V5) {
 		perror("health");
@@ -358,6 +387,9 @@ health_f(
 				return 1;
 			}
 			break;
+		case 'n':
+			report_paths = true;
+			break;
 		case 'q':
 			quiet = true;
 			break;
@@ -445,6 +477,7 @@ health_help(void)
 " -c       -- Report on the health of all inodes.\n"
 " -f       -- Report health of the overall filesystem.\n"
 " -i inum  -- Report health of a given inode number.\n"
+" -n       -- Try to report file names.\n"
 " -q       -- Only report unhealthy metadata.\n"
 " paths    -- Report health of the given file path.\n"
 "\n"));
@@ -456,7 +489,7 @@ static cmdinfo_t health_cmd = {
 	.cfunc = health_f,
 	.argmin = 0,
 	.argmax = -1,
-	.args = "[-a agno] [-c] [-f] [-i inum] [-q] [paths]",
+	.args = "[-a agno] [-c] [-f] [-i inum] [-n] [-q] [paths]",
 	.flags = CMD_FLAG_ONESHOT,
 	.help = health_help,
 };
diff --git a/spaceman/space.h b/spaceman/space.h
index 723209edd998..28fa35a30479 100644
--- a/spaceman/space.h
+++ b/spaceman/space.h
@@ -10,6 +10,9 @@ struct fileio {
 	struct xfs_fd	xfd;		/* XFS runtime support context */
 	struct fs_path	fs_path;	/* XFS path information */
 	char		*name;		/* file name at time of open */
+
+	void		*fshandle;
+	size_t		fshandle_len;
 };
 
 extern struct fileio	*filetable;	/* open file table */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-07-02  1:13   ` [PATCH 10/24] xfs_spaceman: report file paths Darrick J. Wong
@ 2024-07-02  1:13   ` Darrick J. Wong
  2024-07-02  6:27     ` Christoph Hellwig
  2024-07-02  1:13   ` [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data Darrick J. Wong
                     ` (12 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:13 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

If parent pointers are available, use them to supply file paths when
doing things to files, instead of merely printing the inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/common.c |   41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index aca596487113..f86546556f46 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -9,6 +9,7 @@
 #include <syslog.h>
 #include "platform_defs.h"
 #include "libfrog/paths.h"
+#include "libfrog/getparents.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "progress.h"
@@ -405,19 +406,55 @@ scrub_render_ino_descr(
 	...)
 {
 	va_list			args;
+	size_t			pathlen = 0;
 	uint32_t		agno;
 	uint32_t		agino;
 	int			ret;
 
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) {
+		struct xfs_handle handle;
+
+		memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+				sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_ino = ino;
+		handle.ha_fid.fid_gen = gen;
+
+		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 4096,
+				buf, buflen);
+		if (ret)
+			goto report_inum;
+
+		/*
+		 * Leave at least 16 bytes for the description of what went
+		 * wrong.  If we can't do that, we'll use the inode number.
+		 */
+		pathlen = strlen(buf);
+		if (pathlen >= buflen - 16)
+			goto report_inum;
+
+		if (format) {
+			buf[pathlen] = ' ';
+			buf[pathlen + 1] = 0;
+			pathlen++;
+		}
+
+		goto report_format;
+	}
+
+report_inum:
 	agno = cvt_ino_to_agno(&ctx->mnt, ino);
 	agino = cvt_ino_to_agino(&ctx->mnt, ino);
 	ret = snprintf(buf, buflen, _("inode %"PRIu64" (%"PRIu32"/%"PRIu32")%s"),
 			ino, agno, agino, format ? " " : "");
 	if (ret < 0 || ret >= buflen || format == NULL)
 		return ret;
+	pathlen = ret;
 
+report_format:
 	va_start(args, format);
-	ret += vsnprintf(buf + ret, buflen - ret, format, args);
+	pathlen += vsnprintf(buf + pathlen, buflen - pathlen, format, args);
 	va_end(args);
-	return ret;
+	return pathlen;
 }


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-07-02  1:13   ` [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
@ 2024-07-02  1:13   ` Darrick J. Wong
  2024-07-02  6:28     ` Christoph Hellwig
  2024-07-02  1:14   ` [PATCH 13/24] xfs_db: report parent pointers in version command Darrick J. Wong
                     ` (11 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:13 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

If parent pointers are enabled, compute the path to the file while we're
doing the fsmap scan and report that, instead of walking the entire
directory tree to print the paths of the (hopefully few) files that lost
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase6.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 63 insertions(+), 12 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 193d3b4e9083..a61853019e29 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -22,6 +22,7 @@
 #include "spacemap.h"
 #include "vfs.h"
 #include "common.h"
+#include "libfrog/bulkstat.h"
 
 /*
  * Phase 6: Verify data file integrity.
@@ -381,6 +382,24 @@ report_dirent_loss(
 	return error;
 }
 
+struct ioerr_filerange {
+	uint64_t		physical;
+	uint64_t		length;
+};
+
+/*
+ * If reverse mapping and parent pointers are enabled, we can map media errors
+ * directly back to a filename and a file position without needing to walk the
+ * directory tree.
+ */
+static inline bool
+can_use_pptrs(
+	const struct scrub_ctx	*ctx)
+{
+	return  (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) &&
+		(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT);
+}
+
 /* Use a fsmap to report metadata lost to a media error. */
 static int
 report_ioerr_fsmap(
@@ -389,16 +408,18 @@ report_ioerr_fsmap(
 	void			*arg)
 {
 	const char		*type;
+	struct xfs_bulkstat	bs = { };
 	char			buf[DESCR_BUFSZ];
-	uint64_t		err_physical = *(uint64_t *)arg;
+	struct ioerr_filerange	*fr = arg;
 	uint64_t		err_off;
+	int			ret;
 
 	/* Don't care about unwritten extents. */
 	if (map->fmr_flags & FMR_OF_PREALLOC)
 		return 0;
 
-	if (err_physical > map->fmr_physical)
-		err_off = err_physical - map->fmr_physical;
+	if (fr->physical > map->fmr_physical)
+		err_off = fr->physical - map->fmr_physical;
 	else
 		err_off = 0;
 
@@ -421,23 +442,43 @@ report_ioerr_fsmap(
 		}
 	}
 
+	if (can_use_pptrs(ctx)) {
+		ret = -xfrog_bulkstat_single(&ctx->mnt, map->fmr_owner, 0, &bs);
+		if (ret)
+			str_liberror(ctx, ret,
+					_("bulkstat for media error report"));
+	}
+
 	/* Report extent maps */
 	if (map->fmr_flags & FMR_OF_EXTENT_MAP) {
 		bool		attr = (map->fmr_flags & FMR_OF_ATTR_FORK);
 
 		scrub_render_ino_descr(ctx, buf, DESCR_BUFSZ,
-				map->fmr_owner, 0, " %s",
+				map->fmr_owner, bs.bs_gen, " %s",
 				attr ? _("extended attribute") :
 				       _("file data"));
 		str_corrupt(ctx, buf, _("media error in extent map"));
 	}
 
 	/*
-	 * XXX: If we had a getparent() call we could report IO errors
-	 * efficiently.  Until then, we'll have to scan the dir tree
-	 * to find the bad file's pathname.
+	 * If directory parent pointers are available, use that to find the
+	 * pathname to a file, and report that path as having lost its
+	 * extended attributes, or the precise offset of the lost file data.
 	 */
+	if (!can_use_pptrs(ctx))
+		return 0;
 
+	scrub_render_ino_descr(ctx, buf, DESCR_BUFSZ, map->fmr_owner,
+			bs.bs_gen, NULL);
+
+	if (map->fmr_flags & FMR_OF_ATTR_FORK) {
+		str_corrupt(ctx, buf, _("media error in extended attributes"));
+		return 0;
+	}
+
+	str_unfixable_error(ctx, buf,
+ _("media error at data offset %llu length %llu."),
+			err_off, fr->length);
 	return 0;
 }
 
@@ -452,6 +493,10 @@ report_ioerr(
 	void				*arg)
 {
 	struct fsmap			keys[2];
+	struct ioerr_filerange		fr = {
+		.physical		= start,
+		.length			= length,
+	};
 	struct disk_ioerr_report	*dioerr = arg;
 	dev_t				dev;
 
@@ -467,7 +512,7 @@ report_ioerr(
 	(keys + 1)->fmr_offset = ULLONG_MAX;
 	(keys + 1)->fmr_flags = UINT_MAX;
 	return -scrub_iterate_fsmap(dioerr->ctx, keys, report_ioerr_fsmap,
-			&start);
+			&fr);
 }
 
 /* Report all the media errors found on a disk. */
@@ -511,10 +556,16 @@ report_all_media_errors(
 		return ret;
 	}
 
-	/* Scan the directory tree to get file paths. */
-	ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss, vs);
-	if (ret)
-		return ret;
+	/*
+	 * Scan the directory tree to get file paths if we didn't already use
+	 * directory parent pointers to report the loss.
+	 */
+	if (!can_use_pptrs(ctx)) {
+		ret = scan_fs_tree(ctx, report_dir_loss, report_dirent_loss,
+				vs);
+		if (ret)
+			return ret;
+	}
 
 	/* Scan for unlinked files. */
 	return scrub_scan_all_inodes(ctx, report_inode_loss, vs);


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 13/24] xfs_db: report parent pointers in version command
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-07-02  1:13   ` [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data Darrick J. Wong
@ 2024-07-02  1:14   ` Darrick J. Wong
  2024-07-02  6:28     ` Christoph Hellwig
  2024-07-02  1:14   ` [PATCH 14/24] xfs_db: report parent bit on xattrs Darrick J. Wong
                     ` (10 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:14 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Report the presents of PARENT pointers from the version subcommand.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/sb.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/db/sb.c b/db/sb.c
index c39011634198..7836384a1faa 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -708,6 +708,8 @@ version_string(
 		strcat(s, ",NREXT64");
 	if (xfs_has_exchange_range(mp))
 		strcat(s, ",EXCHANGE");
+	if (xfs_has_parent(mp))
+		strcat(s, ",PARENT");
 	return s;
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 14/24] xfs_db: report parent bit on xattrs
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-07-02  1:14   ` [PATCH 13/24] xfs_db: report parent pointers in version command Darrick J. Wong
@ 2024-07-02  1:14   ` Darrick J. Wong
  2024-07-02  6:28     ` Christoph Hellwig
  2024-07-02  1:14   ` [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs Darrick J. Wong
                     ` (9 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:14 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Darrick J. Wong <djwong@kernel.org>

Display the parent bit on xattr keys

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 db/attr.c      |    3 +++
 db/attrshort.c |    3 +++
 2 files changed, 6 insertions(+)


diff --git a/db/attr.c b/db/attr.c
index de68d6276337..3252f388614e 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -82,6 +82,9 @@ const field_t	attr_leaf_entry_flds[] = {
 	{ "local", FLDT_UINT1,
 	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_LOCAL_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "parent", FLDT_UINT1,
+	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "pad2", FLDT_UINT8X, OI(LEOFF(pad2)), C1, FLD_SKIPALL, TYP_NONE },
 	{ NULL }
 };
diff --git a/db/attrshort.c b/db/attrshort.c
index 7c386d46f88f..978f58d67a7b 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -43,6 +43,9 @@ const field_t	attr_sf_entry_flds[] = {
 	{ "secure", FLDT_UINT1,
 	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_SECURE_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "parent", FLDT_UINT1,
+	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(EOFF(nameval)), attr_sf_entry_name_count,
 	  FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-07-02  1:14   ` [PATCH 14/24] xfs_db: report parent bit on xattrs Darrick J. Wong
@ 2024-07-02  1:14   ` Darrick J. Wong
  2024-07-02  6:28     ` Christoph Hellwig
  2024-07-02  1:14   ` [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently Darrick J. Wong
                     ` (8 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:14 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Decode the parent pointer inode, generation, and name fields if the
parent pointer passes basic validation checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c      |   28 ++++++++++++++++++++++++++++
 db/attrshort.c |   24 ++++++++++++++++++++++++
 db/field.c     |   10 ++++++++++
 db/field.h     |    3 +++
 4 files changed, 65 insertions(+)


diff --git a/db/attr.c b/db/attr.c
index 3252f388614e..3b556c43def5 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -33,6 +33,8 @@ static int	attr_remote_data_count(void *obj, int startoff);
 static int	attr3_remote_hdr_count(void *obj, int startoff);
 static int	attr3_remote_data_count(void *obj, int startoff);
 
+static int	attr_leaf_value_pptr_count(void *obj, int startoff);
+
 const field_t	attr_hfld[] = {
 	{ "", FLDT_ATTR, OI(0), C1, 0, TYP_NONE },
 	{ NULL }
@@ -118,6 +120,8 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_local_count, FLD_COUNT, TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(LNOFF(nameval)),
 	  attr_leaf_name_local_name_count, FLD_COUNT, TYP_NONE },
+	{ "parent_dir", FLDT_PARENT_REC, attr_leaf_name_local_value_offset,
+	  attr_leaf_value_pptr_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_leaf_name_local_value_offset,
 	  attr_leaf_name_local_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "valueblk", FLDT_UINT32X, OI(LVOFF(valueblk)),
@@ -307,6 +311,8 @@ __attr_leaf_name_local_value_count(
 
 	if (!(e->flags & XFS_ATTR_LOCAL))
 		return 0;
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
+		return 0;
 
 	l = xfs_attr3_leaf_name_local(leaf, i);
 	return be16_to_cpu(l->valuelen);
@@ -514,6 +520,28 @@ attr3_remote_hdr_count(
 	return be32_to_cpu(node->rm_magic) == XFS_ATTR3_RMT_MAGIC;
 }
 
+static int
+__leaf_pptr_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	if (!(e->flags & XFS_ATTR_LOCAL))
+		return 0;
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_PARENT)
+		return 0;
+
+	return 1;
+}
+
+static int
+attr_leaf_value_pptr_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff, __leaf_pptr_count);
+}
+
 int
 attr_size(
 	void	*obj,
diff --git a/db/attrshort.c b/db/attrshort.c
index 978f58d67a7b..7e5c94ca533d 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -18,6 +18,8 @@ static int	attr_sf_entry_value_offset(void *obj, int startoff, int idx);
 static int	attr_shortform_list_count(void *obj, int startoff);
 static int	attr_shortform_list_offset(void *obj, int startoff, int idx);
 
+static int	attr_sf_entry_pptr_count(void *obj, int startoff);
+
 const field_t	attr_shortform_flds[] = {
 	{ "hdr", FLDT_ATTR_SF_HDR, OI(0), C1, 0, TYP_NONE },
 	{ "list", FLDT_ATTR_SF_ENTRY, attr_shortform_list_offset,
@@ -48,6 +50,8 @@ const field_t	attr_sf_entry_flds[] = {
 	  TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(EOFF(nameval)), attr_sf_entry_name_count,
 	  FLD_COUNT, TYP_NONE },
+	{ "parent_dir", FLDT_PARENT_REC, attr_sf_entry_value_offset,
+	  attr_sf_entry_pptr_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,
 	  attr_sf_entry_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ NULL }
@@ -92,6 +96,10 @@ attr_sf_entry_value_count(
 
 	ASSERT(bitoffs(startoff) == 0);
 	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
+		return 0;
+
 	return e->valuelen;
 }
 
@@ -159,3 +167,19 @@ attrshort_size(
 		e = xfs_attr_sf_nextentry(e);
 	return bitize((int)((char *)e - (char *)hdr));
 }
+
+static int
+attr_sf_entry_pptr_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_PARENT)
+		return 0;
+
+	return 1;
+}
diff --git a/db/field.c b/db/field.c
index a3e47ee81ccf..a61ccc9ef6d0 100644
--- a/db/field.c
+++ b/db/field.c
@@ -24,6 +24,14 @@
 #include "dir2sf.h"
 #include "symlink.h"
 
+#define	PPOFF(f)	bitize(offsetof(struct xfs_parent_rec, f))
+const field_t		parent_flds[] = {
+	{ "inumber", FLDT_INO, OI(PPOFF(p_ino)), C1, 0, TYP_INODE },
+	{ "gen", FLDT_UINT32D, OI(PPOFF(p_gen)), C1, 0, TYP_NONE },
+	{ NULL }
+};
+#undef PPOFF
+
 const ftattr_t	ftattrtab[] = {
 	{ FLDT_AGBLOCK, "agblock", fp_num, "%u", SI(bitsz(xfs_agblock_t)),
 	  FTARG_DONULL, fa_agblock, NULL },
@@ -384,6 +392,8 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_UINT8X, "uint8x", fp_num, "%#x", SI(bitsz(uint8_t)), 0, NULL,
 	  NULL },
 	{ FLDT_UUID, "uuid", fp_uuid, NULL, SI(bitsz(uuid_t)), 0, NULL, NULL },
+	{ FLDT_PARENT_REC, "parent", NULL, (char *)parent_flds,
+	  SI(bitsz(struct xfs_parent_rec)), 0, NULL, parent_flds },
 	{ FLDT_ZZZ, NULL }
 };
 
diff --git a/db/field.h b/db/field.h
index 634742a572c8..b1bfdbed19ce 100644
--- a/db/field.h
+++ b/db/field.h
@@ -187,6 +187,9 @@ typedef enum fldt	{
 	FLDT_UINT8O,
 	FLDT_UINT8X,
 	FLDT_UUID,
+
+	FLDT_PARENT_REC,
+
 	FLDT_ZZZ			/* mark last entry */
 } fldt_t;
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-07-02  1:14   ` [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs Darrick J. Wong
@ 2024-07-02  1:14   ` Darrick J. Wong
  2024-07-02  6:33     ` Christoph Hellwig
  2024-07-02  1:15   ` [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
                     ` (7 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:14 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

When someone wants to perform an obfuscated metadump of a filesystem
where parent pointers are enabled, we have to use the *exact* same
obfuscated name for both the directory entry and the parent pointer.

Create a name remapping table so that when we obfuscate a dirent name or
a parent pointer name, we can apply the same obfuscation when we find
the corresponding parent pointer or dirent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/metadump.c            |  310 ++++++++++++++++++++++++++++++++++++++++++++--
 libxfs/libxfs_api_defs.h |    1 
 2 files changed, 299 insertions(+), 12 deletions(-)


diff --git a/db/metadump.c b/db/metadump.c
index c1bf5d002751..e95238fb0471 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -21,6 +21,14 @@
 #include "dir2.h"
 #include "obfuscate.h"
 
+#undef REMAP_DEBUG
+
+#ifdef REMAP_DEBUG
+# define remap_debug		printf
+#else
+# define remap_debug(...)	((void)0)
+#endif
+
 #define DEFAULT_MAX_EXT_SIZE	XFS_MAX_BMBT_EXTLEN
 
 /* copy all metadata structures to/from a file */
@@ -719,6 +727,111 @@ nametable_add(xfs_dahash_t hash, int namelen, unsigned char *name)
 	return ent;
 }
 
+/*
+ * Obfuscated name remapping table for parent pointer-enabled filesystems.
+ * When this feature is enabled, we have to maintain consistency between the
+ * names that appears in the dirent and the corresponding parent pointer.
+ */
+
+struct remap_ent {
+	struct remap_ent	*next;
+	xfs_ino_t		dir_ino;
+	xfs_dahash_t		namehash;
+	uint8_t			namelen;
+
+	uint8_t			names[];
+};
+
+static inline uint8_t *remap_ent_before(struct remap_ent *ent)
+{
+	return &ent->names[0];
+}
+
+static inline uint8_t *remap_ent_after(struct remap_ent *ent)
+{
+	return &ent->names[ent->namelen];
+}
+
+#define REMAP_TABLE_SIZE		4096
+
+static struct remap_ent		*remaptable[REMAP_TABLE_SIZE];
+
+static void
+remaptable_clear(void)
+{
+	int			i;
+	struct remap_ent	*ent, *next;
+
+	for (i = 0; i < REMAP_TABLE_SIZE; i++) {
+		ent = remaptable[i];
+
+		while (ent) {
+			next = ent->next;
+			free(ent);
+			ent = next;
+		}
+	}
+}
+
+/* Try to find a remapping table entry. */
+static struct remap_ent *
+remaptable_find(
+	xfs_ino_t		dir_ino,
+	xfs_dahash_t		namehash,
+	const unsigned char	*name,
+	unsigned int		namelen)
+{
+	struct remap_ent	*ent = remaptable[namehash % REMAP_TABLE_SIZE];
+
+	remap_debug("REMAP FIND: 0x%lx hash 0x%x '%.*s'\n",
+			dir_ino, namehash, namelen, name);
+
+	while (ent) {
+		remap_debug("REMAP ENT: 0x%lx hash 0x%x '%.*s'\n",
+				ent->dir_ino, ent->namehash, ent->namelen,
+				remap_ent_before(ent));
+
+		if (ent->dir_ino == dir_ino &&
+		    ent->namehash == namehash &&
+		    ent->namelen == namelen &&
+		    !memcmp(remap_ent_before(ent), name, namelen))
+			return ent;
+		ent = ent->next;
+	}
+
+	return NULL;
+}
+
+/* Remember the remapping for a particular dirent that we obfuscated. */
+static struct remap_ent *
+remaptable_add(
+	xfs_ino_t		dir_ino,
+	xfs_dahash_t		namehash,
+	const unsigned char	*old_name,
+	unsigned int		namelen,
+	const unsigned char	*new_name)
+{
+	struct remap_ent	*ent;
+
+	ent = malloc(sizeof(struct remap_ent) + (namelen * 2));
+	if (!ent)
+		return NULL;
+
+	ent->dir_ino = dir_ino;
+	ent->namehash = namehash;
+	ent->namelen = namelen;
+	memcpy(remap_ent_before(ent), old_name, namelen);
+	memcpy(remap_ent_after(ent), new_name, namelen);
+	ent->next = remaptable[namehash % REMAP_TABLE_SIZE];
+
+	remaptable[namehash % REMAP_TABLE_SIZE] = ent;
+
+	remap_debug("REMAP ADD: 0x%lx hash 0x%x '%.*s' -> '%.*s'\n",
+			dir_ino, namehash, namelen, old_name, namelen,
+			new_name);
+	return ent;
+}
+
 #define	ORPHANAGE	"lost+found"
 #define	ORPHANAGE_LEN	(sizeof (ORPHANAGE) - 1)
 
@@ -844,6 +957,7 @@ generate_obfuscated_name(
 	int			namelen,
 	unsigned char		*name)
 {
+	unsigned char		*orig_name = NULL;
 	xfs_dahash_t		hash;
 
 	/*
@@ -865,8 +979,37 @@ generate_obfuscated_name(
 		name++;
 
 	/* Obfuscate the name (if possible) */
-
 	hash = dirattr_hashname(ino != 0, name, namelen);
+
+	/*
+	 * If we're obfuscating a dirent name on a pptrs filesystem, see if we
+	 * already processed the parent pointer and use the same name.
+	 */
+	if (xfs_has_parent(mp) && ino) {
+		struct remap_ent	*remap;
+
+		remap = remaptable_find(metadump.cur_ino, hash, name, namelen);
+		if (remap) {
+			remap_debug("found obfuscated dir 0x%lx '%.*s' -> 0x%lx -> '%.*s' \n",
+					cur_ino, namelen,
+					remap_ent_before(remap), ino, namelen,
+					remap_ent_after(remap));
+			memcpy(name, remap_ent_after(remap), namelen);
+			return;
+		}
+
+		/*
+		 * If we haven't procesed this dirent name before, save the
+		 * old name for a remap table entry.  Obfuscate the name.
+		 */
+		orig_name = malloc(namelen);
+		if (!orig_name) {
+			orig_name = name;
+			goto add_remap;
+		}
+		memcpy(orig_name, name, namelen);
+	}
+
 	obfuscate_name(hash, namelen, name, ino != 0);
 	ASSERT(hash == dirattr_hashname(ino != 0, name, namelen));
 
@@ -891,6 +1034,26 @@ generate_obfuscated_name(
 				"in dir inode %llu\n",
 			(unsigned long long) ino,
 			(unsigned long long) metadump.cur_ino);
+
+	/*
+	 * We've obfuscated a name in the directory entry.  Remember this
+	 * remapping for when we come across the parent pointer later.
+	 */
+	if (!orig_name)
+		return;
+
+add_remap:
+	remap_debug("obfuscating dir 0x%lx '%.*s' -> 0x%lx -> '%.*s' \n",
+			metadump.cur_ino, namelen, orig_name, ino, namelen,
+			name);
+
+	if (!remaptable_add(metadump.cur_ino, hash, orig_name, namelen, name))
+		print_warning("unable to record remapped dirent name for inode %llu "
+				"in dir inode %llu\n",
+			(unsigned long long) ino,
+			(unsigned long long) metadump.cur_ino);
+	if (orig_name && orig_name != name)
+		free(orig_name);
 }
 
 static void
@@ -1026,6 +1189,106 @@ process_sf_symlink(
 		memset(&buf[len], 0, XFS_DFORK_DSIZE(dip, mp) - len);
 }
 
+/*
+ * Decide if we want to obfuscate this parent pointer.  If we do, either find
+ * the obfuscated name that we created when we scanned the corresponding dirent
+ * and replace the name with that; or create a new obfuscated name for later
+ * use.
+ */
+static void
+maybe_obfuscate_pptr(
+	unsigned int			attr_flags,
+	uint8_t				*name,
+	int				namelen,
+	const void			*value,
+	int				valuelen)
+{
+	unsigned char			old_name[MAXNAMELEN];
+	struct remap_ent		*remap;
+	xfs_dahash_t			hash;
+	xfs_ino_t			child_ino = metadump.cur_ino;
+	xfs_ino_t			parent_ino;
+	int				error;
+
+	if (!metadump.obfuscate)
+		return;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return;
+
+	/* Do not obfuscate this parent pointer if it fails basic checks. */
+	error = -libxfs_parent_from_attr(mp, attr_flags, name, namelen, value,
+			valuelen, &parent_ino, NULL);
+	if (error)
+		return;
+	memcpy(old_name, name, namelen);
+
+	/*
+	 * We don't obfuscate "lost+found" or any orphan files therein.  When
+	 * the name table is used for extended attributes, the inode number
+	 * provided is 0, in which case we don't need to make this check.
+	 */
+	metadump.cur_ino = parent_ino;
+	if (in_lost_found(child_ino, namelen, name)) {
+		metadump.cur_ino = child_ino;
+		return;
+	}
+	metadump.cur_ino = child_ino;
+
+	hash = dirattr_hashname(true, name, namelen);
+
+	/*
+	 * If we already processed the dirent, use the same name for the parent
+	 * pointer.
+	 */
+	remap = remaptable_find(parent_ino, hash, name, namelen);
+	if (remap) {
+		remap_debug(
+ "found obfuscated pptr 0x%lx '%.*s' -> 0x%lx -> '%.*s' \n",
+				parent_ino, namelen, remap_ent_before(remap),
+				metadump.cur_ino, namelen,
+				remap_ent_after(remap));
+		memcpy(name, remap_ent_after(remap), namelen);
+		return;
+	}
+
+	/*
+	 * Obfuscate the parent pointer name and remember this for later
+	 * in case we encounter the dirent and need to reuse the name there.
+	 */
+	obfuscate_name(hash, namelen, name, true);
+
+	remap_debug("obfuscated pptr 0x%lx '%.*s' -> 0x%lx -> '%.*s'\n",
+			parent_ino, namelen, old_name, metadump.cur_ino,
+			namelen, name);
+	if (!remaptable_add(parent_ino, hash, old_name, namelen, name))
+		print_warning(
+ "unable to record remapped pptr name for inode %llu in dir inode %llu\n",
+			(unsigned long long) metadump.cur_ino,
+			(unsigned long long) parent_ino);
+}
+
+static inline bool
+want_obfuscate_attr(
+	unsigned int	nsp_flags,
+	const void	*name,
+	unsigned int	namelen,
+	const void	*value,
+	unsigned int	valuelen)
+{
+	if (!metadump.obfuscate)
+		return false;
+
+	/*
+	 * If we didn't already obfuscate the parent pointer, it's probably
+	 * corrupt.  Leave it intact for analysis.
+	 */
+	if (nsp_flags & XFS_ATTR_PARENT)
+		return false;
+
+	return true;
+}
+
 static void
 process_sf_attr(
 	struct xfs_dinode		*dip)
@@ -1053,7 +1316,7 @@ process_sf_attr(
 
 	for (i = 0; (i < hdr->count) &&
 			((char *)asfep - (char *)hdr < ino_attr_size); i++) {
-
+		void	*name, *value;
 		int	namelen = asfep->namelen;
 
 		if (namelen == 0) {
@@ -1070,11 +1333,16 @@ process_sf_attr(
 			break;
 		}
 
-		if (metadump.obfuscate) {
-			generate_obfuscated_name(0, asfep->namelen,
-						 &asfep->nameval[0]);
-			memset(&asfep->nameval[asfep->namelen], 'v',
-			       asfep->valuelen);
+		name = &asfep->nameval[0];
+		value = &asfep->nameval[asfep->namelen];
+
+		if (asfep->flags & XFS_ATTR_PARENT) {
+			maybe_obfuscate_pptr(asfep->flags, name, namelen,
+					value, asfep->valuelen);
+		} else if (want_obfuscate_attr(asfep->flags, name, namelen,
+					value, asfep->valuelen)) {
+			generate_obfuscated_name(0, asfep->namelen, name);
+			memset(value, 'v', asfep->valuelen);
 		}
 
 		asfep = (struct xfs_attr_sf_entry *)((char *)asfep +
@@ -1443,6 +1711,9 @@ process_attr_block(
 			break;
 		}
 		if (entry->flags & XFS_ATTR_LOCAL) {
+			void *name, *value;
+			unsigned int valuelen;
+
 			local = xfs_attr3_leaf_name_local(leaf, i);
 			if (local->namelen == 0) {
 				if (metadump.show_warnings)
@@ -1451,11 +1722,21 @@ process_attr_block(
 						(long long)metadump.cur_ino);
 				break;
 			}
-			if (metadump.obfuscate) {
+
+			name = &local->nameval[0];
+			value = &local->nameval[local->namelen];
+			valuelen = be16_to_cpu(local->valuelen);
+
+			if (entry->flags & XFS_ATTR_PARENT) {
+				maybe_obfuscate_pptr(entry->flags, name,
+						local->namelen, value,
+						valuelen);
+			} else if (want_obfuscate_attr(entry->flags, name,
+						local->namelen, value,
+						valuelen)) {
 				generate_obfuscated_name(0, local->namelen,
-					&local->nameval[0]);
-				memset(&local->nameval[local->namelen], 'v',
-					be16_to_cpu(local->valuelen));
+						name);
+				memset(value, 'v', valuelen);
 			}
 			/* zero from end of nameval[] to next name start */
 			nlen = local->namelen;
@@ -1474,7 +1755,11 @@ process_attr_block(
 						(long long)metadump.cur_ino);
 				break;
 			}
-			if (metadump.obfuscate) {
+			if (entry->flags & XFS_ATTR_PARENT) {
+				/* do not obfuscate obviously busted pptr */
+				add_remote_vals(be32_to_cpu(remote->valueblk),
+						be32_to_cpu(remote->valuelen));
+			} else if (metadump.obfuscate) {
 				generate_obfuscated_name(0, remote->namelen,
 							 &remote->name[0]);
 				add_remote_vals(be32_to_cpu(remote->valueblk),
@@ -3044,5 +3329,6 @@ metadump_f(
 		metadump.mdops->release();
 
 out:
+	remaptable_clear();
 	return 0;
 }
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b7947591d2db..c3dde15116c4 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -193,6 +193,7 @@
 #define xfs_parent_add			libxfs_parent_add
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_start		libxfs_parent_start
+#define xfs_parent_from_attr		libxfs_parent_from_attr
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_hold			libxfs_perag_hold
 #define xfs_perag_put			libxfs_perag_put


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-07-02  1:14   ` [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently Darrick J. Wong
@ 2024-07-02  1:15   ` Darrick J. Wong
  2024-07-02  6:35     ` Christoph Hellwig
  2024-07-02  1:15   ` [PATCH 18/24] xfs_db: add a parents command to list the parents of a file Darrick J. Wong
                     ` (6 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:15 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Do the xfs -> libxfs switcheroo and cleanups separately so the next
patch doesn't become an even larger mess.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c                |    2 +-
 db/metadump.c            |    2 +-
 libxfs/libxfs_api_defs.h |    5 +++++
 repair/attr_repair.c     |    6 +++---
 4 files changed, 10 insertions(+), 5 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 3b556c43def5..0b1f498e457c 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -248,7 +248,7 @@ attr_leaf_entry_walk(
 		return 0;
 
 	off = byteize(startoff);
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 	entries = xfs_attr3_leaf_entryp(leaf);
 
 	for (i = 0; i < leafhdr.count; i++) {
diff --git a/db/metadump.c b/db/metadump.c
index e95238fb0471..424544f9f032 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1680,7 +1680,7 @@ process_attr_block(
 	}
 
 	/* Ok, it's a leaf - get header; accounts for crc & non-crc */
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &hdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &hdr, leaf);
 
 	nentries = hdr.count;
 	if (nentries == 0 ||
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index c3dde15116c4..5713e5221fe6 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -36,10 +36,13 @@
 
 #define xfs_ascii_ci_hashname		libxfs_ascii_ci_hashname
 
+#define xfs_attr3_leaf_hdr_from_disk	libxfs_attr3_leaf_hdr_from_disk
+#define xfs_attr3_leaf_read		libxfs_attr3_leaf_read
 #define xfs_attr_check_namespace	libxfs_attr_check_namespace
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_hashname		libxfs_attr_hashname
 #define xfs_attr_hashval		libxfs_attr_hashval
+#define xfs_attr_is_leaf		libxfs_attr_is_leaf
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_set			libxfs_attr_set
@@ -103,6 +106,7 @@
 #define xfs_compute_rextslog		libxfs_compute_rextslog
 #define xfs_create_space_res		libxfs_create_space_res
 #define xfs_da3_node_hdr_from_disk	libxfs_da3_node_hdr_from_disk
+#define xfs_da3_node_read		libxfs_da3_node_read
 #define xfs_da_get_buf			libxfs_da_get_buf
 #define xfs_da_hashname			libxfs_da_hashname
 #define xfs_da_read_buf			libxfs_da_read_buf
@@ -177,6 +181,7 @@
 #define xfs_inobt_stage_cursor		libxfs_inobt_stage_cursor
 #define xfs_inode_from_disk		libxfs_inode_from_disk
 #define xfs_inode_from_disk_ts		libxfs_inode_from_disk_ts
+#define xfs_inode_hasattr		libxfs_inode_hasattr
 #define xfs_inode_to_disk		libxfs_inode_to_disk
 #define xfs_inode_validate_cowextsize	libxfs_inode_validate_cowextsize
 #define xfs_inode_validate_extsize	libxfs_inode_validate_extsize
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 8321c9b679b2..2e97fd9775ba 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -610,7 +610,7 @@ process_leaf_attr_block(
 	da_freemap_t *attr_freemap;
 	struct xfs_attr3_icleaf_hdr leafhdr;
 
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 	clearit = usedbs = 0;
 	firstb = mp->m_sb.sb_blocksize;
 	stop = xfs_attr3_leaf_hdr_size(leaf);
@@ -849,7 +849,7 @@ process_leaf_attr_level(xfs_mount_t	*mp,
 		}
 
 		leaf = bp->b_addr;
-		xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 
 		/* check magic number for leaf directory btree block */
 		if (!(leafhdr.magic == XFS_ATTR_LEAF_MAGIC ||
@@ -1052,7 +1052,7 @@ process_longform_leaf_root(
 	 * check sibling pointers in leaf block or root block 0 before
 	 * we have to release the btree block
 	 */
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, bp->b_addr);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, bp->b_addr);
 	if (leafhdr.forw != 0 || leafhdr.back != 0)  {
 		if (!no_modify)  {
 			do_warn(


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 18/24] xfs_db: add a parents command to list the parents of a file
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-07-02  1:15   ` [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
@ 2024-07-02  1:15   ` Darrick J. Wong
  2024-07-02  6:35     ` Christoph Hellwig
  2024-07-02  1:15   ` [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers Darrick J. Wong
                     ` (5 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:15 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a command to dump the parents of a file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/namei.c        |  323 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_db.8 |    9 +
 2 files changed, 332 insertions(+)


diff --git a/db/namei.c b/db/namei.c
index 41ccaa04b659..46b4cacb507e 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -596,6 +596,326 @@ static struct cmdinfo ls_cmd = {
 	.help		= ls_help,
 };
 
+static void
+pptr_emit(
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const uint8_t		*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_ino_t		parent_ino;
+	uint32_t		parent_gen;
+	int			error;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return;
+
+	error = -libxfs_parent_from_attr(mp, attr_flags, name, namelen, value,
+			valuelen, &parent_ino, &parent_gen);
+	if (error)
+		return;
+
+	dbprintf("%18llu:0x%08x %3d %.*s\n", parent_ino, parent_gen, namelen,
+			namelen, name);
+}
+
+static int
+list_sf_pptrs(
+	struct xfs_inode		*ip)
+{
+	struct xfs_attr_sf_hdr		*hdr = ip->i_af.if_data;
+	struct xfs_attr_sf_entry	*sfe;
+	unsigned int			i;
+
+	sfe = libxfs_attr_sf_firstentry(hdr);
+	for (i = 0; i < hdr->count; i++) {
+		pptr_emit(ip, sfe->flags, sfe->nameval, sfe->namelen,
+				sfe->nameval + sfe->valuelen, sfe->valuelen);
+
+		sfe = xfs_attr_sf_nextentry(sfe);
+	}
+
+	return 0;
+}
+
+static void
+list_leaf_pptr_entries(
+	struct xfs_inode		*ip,
+	struct xfs_buf			*bp)
+{
+	struct xfs_attr3_icleaf_hdr	ichdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf = bp->b_addr;
+	struct xfs_attr_leaf_entry	*entry;
+	unsigned int			i;
+
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &ichdr, leaf);
+	entry = xfs_attr3_leaf_entryp(leaf);
+
+	for (i = 0; i < ichdr.count; entry++, i++) {
+		struct xfs_attr_leaf_name_local	*name_loc;
+
+		/*
+		 * Parent pointers cannot be remote values; don't bother
+		 * decoding this xattr name.
+		 */
+		if (!(entry->flags & XFS_ATTR_LOCAL))
+			continue;
+
+		name_loc = xfs_attr3_leaf_name_local(leaf, i);
+		pptr_emit(ip, entry->flags, name_loc->nameval,
+				name_loc->namelen,
+				name_loc->nameval + name_loc->namelen,
+				be16_to_cpu(name_loc->valuelen));
+	}
+}
+
+static int
+list_leaf_pptrs(
+	struct xfs_inode		*ip)
+{
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	error = -libxfs_attr3_leaf_read(NULL, ip, ip->i_ino, 0, &leaf_bp);
+	if (error)
+		return error;
+
+	list_leaf_pptr_entries(ip, leaf_bp);
+	libxfs_trans_brelse(NULL, leaf_bp);
+	return 0;
+}
+
+static int
+find_leftmost_attr_leaf(
+	struct xfs_inode		*ip,
+	struct xfs_buf			**leaf_bpp)
+{
+	struct xfs_da3_icnode_hdr	nodehdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_buf			*bp;
+	xfs_dablk_t			blkno = 0;
+	unsigned int			expected_level = 0;
+	int				error;
+
+	for (;;) {
+		uint16_t		magic;
+
+		error = -libxfs_da3_node_read(NULL, ip, blkno, &bp,
+				XFS_ATTR_FORK);
+		if (error)
+			return error;
+
+		node = bp->b_addr;
+		magic = be16_to_cpu(node->hdr.info.magic);
+		if (magic == XFS_ATTR_LEAF_MAGIC ||
+		    magic == XFS_ATTR3_LEAF_MAGIC)
+			break;
+
+		error = EFSCORRUPTED;
+		if (magic != XFS_DA_NODE_MAGIC &&
+		    magic != XFS_DA3_NODE_MAGIC)
+			goto out_buf;
+
+		libxfs_da3_node_hdr_from_disk(mp, &nodehdr, node);
+
+		if (nodehdr.count == 0 || nodehdr.level >= XFS_DA_NODE_MAXDEPTH)
+			goto out_buf;
+
+		/* Check the level from the root node. */
+		if (blkno == 0)
+			expected_level = nodehdr.level - 1;
+		else if (expected_level != nodehdr.level)
+			goto out_buf;
+		else
+			expected_level--;
+
+		/* Find the next level towards the leaves of the dabtree. */
+		btree = nodehdr.btree;
+		blkno = be32_to_cpu(btree->before);
+		libxfs_trans_brelse(NULL, bp);
+	}
+
+	error = EFSCORRUPTED;
+	if (expected_level != 0)
+		goto out_buf;
+
+	*leaf_bpp = bp;
+	return 0;
+
+out_buf:
+	libxfs_trans_brelse(NULL, bp);
+	return error;
+}
+
+static int
+list_node_pptrs(
+	struct xfs_inode		*ip)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf;
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	error = find_leftmost_attr_leaf(ip, &leaf_bp);
+	if (error)
+		return error;
+
+	for (;;) {
+		list_leaf_pptr_entries(ip, leaf_bp);
+
+		/* Find the right sibling of this leaf block. */
+		leaf = leaf_bp->b_addr;
+		libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		if (leafhdr.forw == 0)
+			goto out_leaf;
+
+		libxfs_trans_brelse(NULL, leaf_bp);
+
+		error = -libxfs_attr3_leaf_read(NULL, ip, ip->i_ino,
+				leafhdr.forw, &leaf_bp);
+		if (error)
+			return error;
+	}
+
+out_leaf:
+	libxfs_trans_brelse(NULL, leaf_bp);
+	return error;
+}
+
+static int
+list_pptrs(
+	struct xfs_inode	*ip)
+{
+	int			error;
+
+	if (!libxfs_inode_hasattr(ip))
+		return 0;
+
+	if (ip->i_af.if_format == XFS_DINODE_FMT_LOCAL)
+		return list_sf_pptrs(ip);
+
+	/* attr functions require that the attr fork is loaded */
+	error = -libxfs_iread_extents(NULL, ip, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
+	if (libxfs_attr_is_leaf(ip))
+		return list_leaf_pptrs(ip);
+
+	return list_node_pptrs(ip);
+}
+
+/* If the io cursor points to a file, list its parents. */
+static int
+parent_cur(
+	char			*tag)
+{
+	struct xfs_inode	*ip;
+	int			error = 0;
+
+	if (!xfs_has_parent(mp))
+		return 0;
+
+	if (iocur_top->typ != &typtab[TYP_INODE])
+		return ENOTDIR;
+
+	error = -libxfs_iget(mp, NULL, iocur_top->ino, 0, &ip);
+	if (error)
+		return error;
+
+	/* List the parents of a file. */
+	if (tag)
+		dbprintf(_("%s:\n"), tag);
+
+	error = list_pptrs(ip);
+	if (error)
+		goto rele;
+
+rele:
+	libxfs_irele(ip);
+	return error;
+}
+
+static void
+parent_help(void)
+{
+	dbprintf(_(
+"\n"
+" List the parents of the currently selected file.\n"
+"\n"
+" Parent pointers will be listed in the format:\n"
+" inode_number:inode_gen	ondisk_namehash:namehash	name_length	name\n"
+	));
+}
+
+static int
+parent_f(
+	int			argc,
+	char			**argv)
+{
+	int			c;
+	int			error = 0;
+
+	while ((c = getopt(argc, argv, "")) != -1) {
+		switch (c) {
+		default:
+			ls_help();
+			return 0;
+		}
+	}
+
+	if (optind == argc) {
+		error = parent_cur(NULL);
+		if (error) {
+			dbprintf("%s\n", strerror(error));
+			exitcode = 1;
+		}
+
+		return 0;
+	}
+
+	for (c = optind; c < argc; c++) {
+		push_cur();
+
+		error = path_walk(argv[c]);
+		if (error)
+			goto err_cur;
+
+		error = parent_cur(argv[c]);
+		if (error)
+			goto err_cur;
+
+		pop_cur();
+	}
+
+	return 0;
+err_cur:
+	pop_cur();
+	if (error) {
+		dbprintf("%s: %s\n", argv[c], strerror(error));
+		exitcode = 1;
+	}
+	return 0;
+}
+
+static struct cmdinfo parent_cmd = {
+	.name		= "parent",
+	.altname	= "pptr",
+	.cfunc		= parent_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.canpush	= 0,
+	.args		= "[paths...]",
+	.help		= parent_help,
+};
+
 void
 namei_init(void)
 {
@@ -604,4 +924,7 @@ namei_init(void)
 
 	ls_cmd.oneline = _("list directory contents");
 	add_command(&ls_cmd);
+
+	parent_cmd.oneline = _("list parent pointers");
+	add_command(&parent_cmd);
 }
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index a7f6d55ed8be..937b17e79a35 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -943,6 +943,15 @@ See the
 .B print
 command.
 .TP
+.BI "parent [" paths "]..."
+List the parents of a file.
+If a path resolves to a file, the parents of that file will be listed.
+If no paths are supplied and the IO cursor points at an inode, the parents of
+that file will be listed.
+
+The output format is:
+inode number, inode generation, ondisk namehash, namehash, name length, name.
+.TP
 .BI "path " dir_path
 Walk the directory tree to an inode using the supplied path.
 Absolute and relative paths are supported.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-07-02  1:15   ` [PATCH 18/24] xfs_db: add a parents command to list the parents of a file Darrick J. Wong
@ 2024-07-02  1:15   ` Darrick J. Wong
  2024-07-02  6:36     ` Christoph Hellwig
  2024-07-02  1:15   ` [PATCH 20/24] xfs_db: add link and unlink expert commands Darrick J. Wong
                     ` (4 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:15 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Make it so that xfs_db can load up the filesystem (somewhat uselessly)
with parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attrset.c             |  202 +++++++++++++++++++++++++++++++++++++---------
 libxfs/libxfs_api_defs.h |    1 
 man/man8/xfs_db.8        |   21 ++++-
 3 files changed, 181 insertions(+), 43 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index 3b5db7c2aedb..d9ab79fa7849 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -24,11 +24,11 @@ static void		attrset_help(void);
 
 static const cmdinfo_t	attr_set_cmd =
 	{ "attr_set", "aset", attr_set_f, 1, -1, 0,
-	  N_("[-r|-s|-u] [-n] [-R|-C] [-v n] name"),
+	  N_("[-r|-s|-u|-p] [-n] [-R|-C] [-v n] name"),
 	  N_("set the named attribute on the current inode"), attrset_help };
 static const cmdinfo_t	attr_remove_cmd =
 	{ "attr_remove", "aremove", attr_remove_f, 1, -1, 0,
-	  N_("[-r|-s|-u] [-n] name"),
+	  N_("[-r|-s|-u|-p] [-n] name"),
 	  N_("remove the named attribute from the current inode"), attrset_help };
 
 static void
@@ -44,6 +44,7 @@ attrset_help(void)
 "  -r -- 'root'\n"
 "  -u -- 'user'		(default)\n"
 "  -s -- 'secure'\n"
+"  -p -- 'parent'\n"
 "\n"
 " For attr_set, these options further define the type of set operation:\n"
 "  -C -- 'create'    - create attribute, fail if it already exists\n"
@@ -62,6 +63,49 @@ attrset_init(void)
 	add_command(&attr_remove_cmd);
 }
 
+static unsigned char *
+get_buf_from_file(
+	const char	*fname,
+	size_t		bufsize,
+	int		*namelen)
+{
+	FILE		*fp;
+	unsigned char	*buf;
+	size_t		sz;
+
+	buf = malloc(bufsize + 1);
+	if (!buf) {
+		perror("malloc");
+		return NULL;
+	}
+
+	fp = fopen(fname, "r");
+	if (!fp) {
+		perror(fname);
+		goto out_free;
+	}
+
+	sz = fread(buf, sizeof(char), bufsize, fp);
+	if (sz == 0) {
+		printf("%s: Could not read anything from file\n", fname);
+		goto out_fp;
+	}
+
+	fclose(fp);
+
+	*namelen = sz;
+	return buf;
+out_fp:
+	fclose(fp);
+out_free:
+	free(buf);
+	return NULL;
+}
+
+#define LIBXFS_ATTR_NS		(LIBXFS_ATTR_SECURE | \
+				 LIBXFS_ATTR_ROOT | \
+				 LIBXFS_ATTR_PARENT)
+
 static int
 attr_set_f(
 	int			argc,
@@ -69,6 +113,8 @@ attr_set_f(
 {
 	struct xfs_da_args	args = { };
 	char			*sp;
+	char			*name_from_file = NULL;
+	char			*value_from_file = NULL;
 	enum xfs_attr_update	op = XFS_ATTRUPDATE_UPSERT;
 	int			c;
 
@@ -81,20 +127,23 @@ attr_set_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "rusCRnv:")) != EOF) {
+	while ((c = getopt(argc, argv, "ruspCRnN:v:V:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_ROOT;
-			args.attr_filter &= ~LIBXFS_ATTR_SECURE;
 			break;
 		case 'u':
-			args.attr_filter &= ~(LIBXFS_ATTR_ROOT |
-					      LIBXFS_ATTR_SECURE);
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			break;
 		case 's':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_SECURE;
-			args.attr_filter &= ~LIBXFS_ATTR_ROOT;
+			break;
+		case 'p':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= XFS_ATTR_PARENT;
 			break;
 
 		/* modifiers */
@@ -105,6 +154,10 @@ attr_set_f(
 			op = XFS_ATTRUPDATE_REPLACE;
 			break;
 
+		case 'N':
+			name_from_file = optarg;
+			break;
+
 		case 'n':
 			/*
 			 * We never touch attr2 these days; leave this here to
@@ -114,6 +167,11 @@ attr_set_f(
 
 		/* value length */
 		case 'v':
+			if (value_from_file) {
+				dbprintf(_("already set value file\n"));
+				return 0;
+			}
+
 			args.valuelen = strtol(optarg, &sp, 0);
 			if (*sp != '\0' ||
 			    args.valuelen < 0 || args.valuelen > 64 * 1024) {
@@ -122,30 +180,64 @@ attr_set_f(
 			}
 			break;
 
+		case 'V':
+			if (args.valuelen != 0) {
+				dbprintf(_("already set valuelen\n"));
+				return 0;
+			}
+
+			value_from_file = optarg;
+			break;
+
 		default:
 			dbprintf(_("bad option for attr_set command\n"));
 			return 0;
 		}
 	}
 
-	if (optind != argc - 1) {
-		dbprintf(_("too few options for attr_set (no name given)\n"));
-		return 0;
-	}
+	if (name_from_file) {
+		int namelen;
 
-	args.name = (const unsigned char *)argv[optind];
-	if (!args.name) {
-		dbprintf(_("invalid name\n"));
-		return 0;
-	}
+		if (optind != argc) {
+			dbprintf(_("too many options for attr_set (no name needed)\n"));
+			return 0;
+		}
+
+		args.name = get_buf_from_file(name_from_file, MAXNAMELEN,
+				&namelen);
+		if (!args.name)
+			return 0;
+
+		args.namelen = namelen;
+	} else {
+		if (optind != argc - 1) {
+			dbprintf(_("too few options for attr_set (no name given)\n"));
+			return 0;
+		}
 
-	args.namelen = strlen(argv[optind]);
-	if (args.namelen >= MAXNAMELEN) {
-		dbprintf(_("name too long\n"));
-		return 0;
+		args.name = (const unsigned char *)argv[optind];
+		if (!args.name) {
+			dbprintf(_("invalid name\n"));
+			return 0;
+		}
+
+		args.namelen = strlen(argv[optind]);
+		if (args.namelen >= MAXNAMELEN) {
+			dbprintf(_("name too long\n"));
+			goto out;
+		}
 	}
 
-	if (args.valuelen) {
+	if (value_from_file) {
+		int valuelen;
+
+		args.value = get_buf_from_file(value_from_file,
+				XFS_XATTR_SIZE_MAX, &valuelen);
+		if (!args.value)
+			goto out;
+
+		args.valuelen = valuelen;
+	} else if (args.valuelen) {
 		args.value = memalign(getpagesize(), args.valuelen);
 		if (!args.value) {
 			dbprintf(_("cannot allocate buffer (%d)\n"),
@@ -175,6 +267,8 @@ attr_set_f(
 		libxfs_irele(args.dp);
 	if (args.value)
 		free(args.value);
+	if (name_from_file)
+		free((void *)args.name);
 	return 0;
 }
 
@@ -184,6 +278,7 @@ attr_remove_f(
 	char			**argv)
 {
 	struct xfs_da_args	args = { };
+	char			*name_from_file = NULL;
 	int			c;
 
 	if (cur_typ == NULL) {
@@ -195,20 +290,27 @@ attr_remove_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "rusn")) != EOF) {
+	while ((c = getopt(argc, argv, "ruspnN:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_ROOT;
-			args.attr_filter &= ~LIBXFS_ATTR_SECURE;
 			break;
 		case 'u':
-			args.attr_filter &= ~(LIBXFS_ATTR_ROOT |
-					      LIBXFS_ATTR_SECURE);
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			break;
 		case 's':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_SECURE;
-			args.attr_filter &= ~LIBXFS_ATTR_ROOT;
+			break;
+		case 'p':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= XFS_ATTR_PARENT;
+			break;
+
+		case 'N':
+			name_from_file = optarg;
 			break;
 
 		case 'n':
@@ -224,21 +326,37 @@ attr_remove_f(
 		}
 	}
 
-	if (optind != argc - 1) {
-		dbprintf(_("too few options for attr_remove (no name given)\n"));
-		return 0;
-	}
-
-	args.name = (const unsigned char *)argv[optind];
-	if (!args.name) {
-		dbprintf(_("invalid name\n"));
-		return 0;
-	}
-
-	args.namelen = strlen(argv[optind]);
-	if (args.namelen >= MAXNAMELEN) {
-		dbprintf(_("name too long\n"));
-		return 0;
+	if (name_from_file) {
+		int namelen;
+
+		if (optind != argc) {
+			dbprintf(_("too many options for attr_set (no name needed)\n"));
+			return 0;
+		}
+
+		args.name = get_buf_from_file(name_from_file, MAXNAMELEN,
+				&namelen);
+		if (!args.name)
+			return 0;
+
+		args.namelen = namelen;
+	} else {
+		if (optind != argc - 1) {
+			dbprintf(_("too few options for attr_remove (no name given)\n"));
+			return 0;
+		}
+
+		args.name = (const unsigned char *)argv[optind];
+		if (!args.name) {
+			dbprintf(_("invalid name\n"));
+			return 0;
+		}
+
+		args.namelen = strlen(argv[optind]);
+		if (args.namelen >= MAXNAMELEN) {
+			dbprintf(_("name too long\n"));
+			return 0;
+		}
 	}
 
 	if (libxfs_iget(mp, NULL, iocur_top->ino, 0, &args.dp)) {
@@ -260,5 +378,7 @@ attr_remove_f(
 out:
 	if (args.dp)
 		libxfs_irele(args.dp);
+	if (name_from_file)
+		free((void *)args.name);
 	return 0;
 }
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 5713e5221fe6..bceaab8baa8c 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -15,6 +15,7 @@
  */
 #define LIBXFS_ATTR_ROOT		XFS_ATTR_ROOT
 #define LIBXFS_ATTR_SECURE		XFS_ATTR_SECURE
+#define LIBXFS_ATTR_PARENT		XFS_ATTR_PARENT
 
 #define xfs_agfl_size			libxfs_agfl_size
 #define xfs_agfl_walk			libxfs_agfl_walk
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 937b17e79a35..a561bdc492d4 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -184,10 +184,14 @@ Displays the length, free block count, per-AG reservation size, and per-AG
 reservation usage for a given AG.
 If no argument is given, display information for all AGs.
 .TP
-.BI "attr_remove [\-r|\-u|\-s] [\-n] " name
+.BI "attr_remove [\-p|\-r|\-u|\-s] [\-n] [\-N " namefile "|" name "] "
 Remove the specified extended attribute from the current file.
 .RS 1.0i
 .TP 0.4i
+.B \-p
+Sets the attribute in the parent namespace.
+Only one namespace option can be specified.
+.TP
 .B \-r
 Sets the attribute in the root namespace.
 Only one namespace option can be specified.
@@ -200,14 +204,21 @@ Only one namespace option can be specified.
 Sets the attribute in the secure namespace.
 Only one namespace option can be specified.
 .TP
+.B \-N
+Read the name from this file.
+.TP
 .B \-n
 Do not enable 'noattr2' mode on V4 filesystems.
 .RE
 .TP
-.BI "attr_set [\-r|\-u|\-s] [\-n] [\-R|\-C] [\-v " namelen "] " name
+.BI "attr_set [\-p\-r|\-u|\-s] [\-n] [\-R|\-C] [\-v " valuelen "|\-V " valuefile "] [\-N " namefile "|" name "] "
 Sets an extended attribute on the current file with the given name.
 .RS 1.0i
 .TP 0.4i
+.B \-p
+Sets the attribute in the parent namespace.
+Only one namespace option can be specified.
+.TP
 .B \-r
 Sets the attribute in the root namespace.
 Only one namespace option can be specified.
@@ -220,6 +231,9 @@ Only one namespace option can be specified.
 Sets the attribute in the secure namespace.
 Only one namespace option can be specified.
 .TP
+.B \-N
+Read the name from this file.
+.TP
 .B \-n
 Do not enable 'noattr2' mode on V4 filesystems.
 .TP
@@ -231,6 +245,9 @@ The command will fail if the attribute does not already exist.
 Create the attribute.
 The command will fail if the attribute already exists.
 .TP
+.B \-V
+Read the value from this file.
+.TP
 .B \-v
 Set the attribute value to a string of this length containing the letter 'v'.
 .RE


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 20/24] xfs_db: add link and unlink expert commands
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-07-02  1:15   ` [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers Darrick J. Wong
@ 2024-07-02  1:15   ` Darrick J. Wong
  2024-07-02  6:36     ` Christoph Hellwig
  2024-07-02  1:16   ` [PATCH 21/24] xfs_db: compute hashes of parent pointers Darrick J. Wong
                     ` (3 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:15 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a pair of commands to create and remove directory entries to
support functional testing of directory tree corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/namei.c               |  378 ++++++++++++++++++++++++++++++++++++++++++++++
 include/xfs_inode.h      |    4 
 libxfs/libxfs_api_defs.h |    8 +
 man/man8/xfs_db.8        |   20 ++
 4 files changed, 409 insertions(+), 1 deletion(-)


diff --git a/db/namei.c b/db/namei.c
index 46b4cacb507e..d57ead4f165d 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -916,6 +916,376 @@ static struct cmdinfo parent_cmd = {
 	.help		= parent_help,
 };
 
+static void
+link_help(void)
+{
+	dbprintf(_(
+"\n"
+" Create a directory entry in the current directory that points to the\n"
+" specified file.\n"
+"\n"
+" Options:\n"
+"   -i   -- Point to this specific inode number.\n"
+"   -p   -- Point to the inode given by this path.\n"
+"   -t   -- Set the file type to this value.\n"
+"   name -- Create this directory entry with this name.\n"
+	));
+}
+
+static int
+create_child(
+	struct xfs_mount	*mp,
+	xfs_ino_t		parent_ino,
+	const char		*name,
+	unsigned int		ftype,
+	xfs_ino_t		child_ino)
+{
+	struct xfs_name		xname = {
+		.name		= (const unsigned char *)name,
+		.len		= strlen(name),
+		.type		= ftype,
+	};
+	struct xfs_parent_args	*ppargs = NULL;
+	struct xfs_trans	*tp;
+	struct xfs_inode	*dp, *ip;
+	unsigned int		resblks;
+	bool			isdir;
+	int			error;
+
+	error = -libxfs_iget(mp, NULL, parent_ino, 0, &dp);
+	if (error)
+		return error;
+
+	if (!S_ISDIR(VFS_I(dp)->i_mode)) {
+		error = -ENOTDIR;
+		goto out_dp;
+	}
+
+	error = -libxfs_iget(mp, NULL, child_ino, 0, &ip);
+	if (error)
+		goto out_dp;
+	isdir = S_ISDIR(VFS_I(ip)->i_mode);
+
+	if (xname.type == XFS_DIR3_FT_UNKNOWN)
+		xname.type = libxfs_mode_to_ftype(VFS_I(ip)->i_mode);
+
+	error = -libxfs_parent_start(mp, &ppargs);
+	if (error)
+		goto out_ip;
+
+	resblks = libxfs_link_space_res(mp, MAXNAMELEN);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_link, resblks, 0, 0,
+			&tp);
+	if (error)
+		goto out_parent;
+
+	libxfs_trans_ijoin(tp, dp, 0);
+	libxfs_trans_ijoin(tp, ip, 0);
+
+	error = -libxfs_dir_createname(tp, dp, &xname, ip->i_ino, resblks);
+	if (error)
+		goto out_trans;
+
+	/* bump dp's link to ip */
+	libxfs_bumplink(tp, ip);
+
+	/* bump ip's dotdot link to dp */
+	if (isdir)
+		libxfs_bumplink(tp, dp);
+
+	/* Replace the dotdot entry in the child directory. */
+	if (isdir) {
+		error = -libxfs_dir_replace(tp, ip, &xfs_name_dotdot,
+				dp->i_ino, resblks);
+		if (error)
+			goto out_trans;
+	}
+
+	if (ppargs) {
+		error = -libxfs_parent_addname(tp, ppargs, dp, &xname, ip);
+		if (error)
+			goto out_trans;
+	}
+
+	error = -libxfs_trans_commit(tp);
+	goto out_parent;
+
+out_trans:
+	libxfs_trans_cancel(tp);
+out_parent:
+	libxfs_parent_finish(mp, ppargs);
+out_ip:
+	libxfs_irele(ip);
+out_dp:
+	libxfs_irele(dp);
+	return error;
+}
+
+static const char *ftype_map[] = {
+	[XFS_DIR3_FT_REG_FILE]	= "reg",
+	[XFS_DIR3_FT_DIR]	= "dir",
+	[XFS_DIR3_FT_CHRDEV]	= "cdev",
+	[XFS_DIR3_FT_BLKDEV]	= "bdev",
+	[XFS_DIR3_FT_FIFO]	= "fifo",
+	[XFS_DIR3_FT_SOCK]	= "sock",
+	[XFS_DIR3_FT_SYMLINK]	= "symlink",
+	[XFS_DIR3_FT_WHT]	= "whiteout",
+};
+
+static int
+link_f(
+	int			argc,
+	char			**argv)
+{
+	xfs_ino_t		child_ino = NULLFSINO;
+	int			ftype = XFS_DIR3_FT_UNKNOWN;
+	unsigned int		i;
+	int			c;
+	int			error = 0;
+
+	while ((c = getopt(argc, argv, "i:p:t:")) != -1) {
+		switch (c) {
+		case 'i':
+			errno = 0;
+			child_ino = strtoull(optarg, NULL, 0);
+			if (errno == ERANGE) {
+				printf("%s: unknown inode number\n", optarg);
+				exitcode = 1;
+				return 0;
+			}
+			break;
+		case 'p':
+			push_cur();
+			error = path_walk(optarg);
+			if (error) {
+				printf("%s: %s\n", optarg, strerror(error));
+				exitcode = 1;
+				return 0;
+			} else if (iocur_top->typ != &typtab[TYP_INODE]) {
+				printf("%s: does not point to an inode\n",
+						optarg);
+				exitcode = 1;
+				return 0;
+			} else {
+				child_ino = iocur_top->ino;
+			}
+			pop_cur();
+			break;
+		case 't':
+			for (i = 0; i < ARRAY_SIZE(ftype_map); i++) {
+				if (ftype_map[i] &&
+				    !strcmp(ftype_map[i], optarg)) {
+					ftype = i;
+					break;
+				}
+			}
+			if (i == ARRAY_SIZE(ftype_map)) {
+				printf("%s: unknown file type\n", optarg);
+				exitcode = 1;
+				return 0;
+			}
+			break;
+		default:
+			link_help();
+			return 0;
+		}
+	}
+
+	if (child_ino == NULLFSINO) {
+		printf("link: need to specify child via -i or -p\n");
+		exitcode = 1;
+		return 0;
+	}
+
+	if (iocur_top->typ != &typtab[TYP_INODE]) {
+		printf("io cursor does not point to an inode.\n");
+		exitcode = 1;
+		return 0;
+	}
+
+	if (optind + 1 != argc) {
+		printf("link: need directory entry name");
+		exitcode = 1;
+		return 0;
+	}
+
+	error = create_child(mp, iocur_top->ino, argv[optind], ftype,
+			child_ino);
+	if (error) {
+		printf("link failed: %s\n", strerror(error));
+		exitcode = 1;
+		return 0;
+	}
+
+	return 0;
+}
+
+static struct cmdinfo link_cmd = {
+	.name		= "link",
+	.cfunc		= link_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.canpush	= 0,
+	.args		= "[-i ino] [-p path] [-t ftype] name",
+	.help		= link_help,
+};
+
+static void
+unlink_help(void)
+{
+	dbprintf(_(
+"\n"
+" Remove a directory entry from the current directory.\n"
+"\n"
+" Options:\n"
+"   name -- Remove the directory entry with this name.\n"
+	));
+}
+
+static void
+droplink(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip)
+{
+	struct inode		*inode = VFS_I(ip);
+
+	libxfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+
+	if (inode->i_nlink != XFS_NLINK_PINNED)
+		drop_nlink(VFS_I(ip));
+
+	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+}
+
+static int
+remove_child(
+	struct xfs_mount	*mp,
+	xfs_ino_t		parent_ino,
+	const char		*name)
+{
+	struct xfs_name		xname = {
+		.name		= (const unsigned char *)name,
+		.len		= strlen(name),
+	};
+	struct xfs_parent_args	*ppargs;
+	struct xfs_trans	*tp;
+	struct xfs_inode	*dp, *ip;
+	xfs_ino_t		child_ino;
+	unsigned int		resblks;
+	int			error;
+
+	error = -libxfs_iget(mp, NULL, parent_ino, 0, &dp);
+	if (error)
+		return error;
+
+	if (!S_ISDIR(VFS_I(dp)->i_mode)) {
+		error = -ENOTDIR;
+		goto out_dp;
+	}
+
+	error = -libxfs_dir_lookup(NULL, dp, &xname, &child_ino, NULL);
+	if (error)
+		goto out_dp;
+
+	error = -libxfs_iget(mp, NULL, child_ino, 0, &ip);
+	if (error)
+		goto out_dp;
+
+	error = -libxfs_parent_start(mp, &ppargs);
+	if (error)
+		goto out_ip;
+
+	resblks = libxfs_remove_space_res(mp, MAXNAMELEN);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 0,
+			&tp);
+	if (error)
+		goto out_parent;
+
+	libxfs_trans_ijoin(tp, dp, 0);
+	libxfs_trans_ijoin(tp, ip, 0);
+
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		/* drop ip's dotdot link to dp */
+		droplink(tp, dp);
+	} else {
+		libxfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
+	}
+
+	/* drop dp's link to ip */
+	droplink(tp, ip);
+
+	error = -libxfs_dir_removename(tp, dp, &xname, ip->i_ino, resblks);
+	if (error)
+		goto out_trans;
+
+	if (ppargs) {
+		error = -libxfs_parent_removename(tp, ppargs, dp, &xname, ip);
+		if (error)
+			goto out_trans;
+	}
+
+	error = -libxfs_trans_commit(tp);
+	goto out_parent;
+
+out_trans:
+	libxfs_trans_cancel(tp);
+out_parent:
+	libxfs_parent_finish(mp, ppargs);
+out_ip:
+	libxfs_irele(ip);
+out_dp:
+	libxfs_irele(dp);
+	return error;
+}
+
+static int
+unlink_f(
+	int			argc,
+	char			**argv)
+{
+	int			c;
+	int			error = 0;
+
+	while ((c = getopt(argc, argv, "")) != -1) {
+		switch (c) {
+		default:
+			unlink_help();
+			return 0;
+		}
+	}
+
+	if (iocur_top->typ != &typtab[TYP_INODE]) {
+		printf("io cursor does not point to an inode.\n");
+		exitcode = 1;
+		return 0;
+	}
+
+	if (optind + 1 != argc) {
+		printf("unlink: need directory entry name");
+		exitcode = 1;
+		return 0;
+	}
+
+	error = remove_child(mp, iocur_top->ino, argv[optind]);
+	if (error) {
+		printf("unlink failed: %s\n", strerror(error));
+		exitcode = 1;
+		return 0;
+	}
+
+	return 0;
+}
+
+static struct cmdinfo unlink_cmd = {
+	.name		= "unlink",
+	.cfunc		= unlink_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.canpush	= 0,
+	.args		= "name",
+	.help		= unlink_help,
+};
+
 void
 namei_init(void)
 {
@@ -927,4 +1297,12 @@ namei_init(void)
 
 	parent_cmd.oneline = _("list parent pointers");
 	add_command(&parent_cmd);
+
+	if (expert_mode) {
+		link_cmd.oneline = _("create directory link");
+		add_command(&link_cmd);
+
+		unlink_cmd.oneline = _("remove directory link");
+		add_command(&unlink_cmd);
+	}
 }
diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 45339b42621a..9bbf37225197 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -315,6 +315,10 @@ static inline void inc_nlink(struct inode *inode)
 {
 	inode->i_nlink++;
 }
+static inline void drop_nlink(struct inode *inode)
+{
+	inode->i_nlink--;
+}
 
 static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
 {
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index bceaab8baa8c..b7edaf788051 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -147,12 +147,16 @@
 #define xfs_dir_init			libxfs_dir_init
 #define xfs_dir_ino_validate		libxfs_dir_ino_validate
 #define xfs_dir_lookup			libxfs_dir_lookup
+#define xfs_dir_removename		libxfs_dir_removename
 #define xfs_dir_replace			libxfs_dir_replace
 
 #define xfs_dqblk_repair		libxfs_dqblk_repair
 #define xfs_dquot_from_disk_ts		libxfs_dquot_from_disk_ts
 #define xfs_dquot_verify		libxfs_dquot_verify
 
+#define xfs_bumplink			libxfs_bumplink
+#define xfs_droplink			libxfs_droplink
+
 #define xfs_finobt_calc_reserves	libxfs_finobt_calc_reserves
 #define xfs_finobt_init_cursor		libxfs_finobt_init_cursor
 #define xfs_free_extent			libxfs_free_extent
@@ -191,13 +195,15 @@
 
 #define xfs_iread_extents		libxfs_iread_extents
 #define xfs_irele			libxfs_irele
+#define xfs_link_space_res		libxfs_link_space_res
 #define xfs_log_calc_minimum_size	libxfs_log_calc_minimum_size
 #define xfs_log_get_max_trans_res	libxfs_log_get_max_trans_res
 #define xfs_log_sb			libxfs_log_sb
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
 #define xfs_mkdir_space_res		libxfs_mkdir_space_res
-#define xfs_parent_add			libxfs_parent_add
+#define xfs_parent_addname		libxfs_parent_addname
 #define xfs_parent_finish		libxfs_parent_finish
+#define xfs_parent_removename		libxfs_parent_removename
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_parent_from_attr		libxfs_parent_from_attr
 #define xfs_perag_get			libxfs_perag_get
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index a561bdc492d4..f8db6c36f40a 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -901,6 +901,21 @@ will result in truncation and a warning will be issued. If no
 .I label
 is given, the current filesystem label is printed.
 .TP
+.BI "link [-i " ino "] [-p " path "] [-t " ftype "] name"
+In the current directory, create a directory entry with the given
+.I name
+pointing to a file.
+The file must be specified either as a directory tree path as given by the
+.I path
+option; or directly as an inode number as given by the
+.I ino
+option.
+The file type in the directory entry will be determined from the mode of the
+child file unless the
+.I ftype
+option is given.
+The file being targetted must not be on the iunlink list.
+.TP
 .BI "log [stop | start " filename ]
 Start logging output to
 .IR filename ,
@@ -1052,6 +1067,11 @@ Print the timestamps in the current locale's date and time format instead of
 raw seconds since the Unix epoch.
 .RE
 .TP
+.BI "unlink name"
+In the current directory, remove a directory entry with the given
+.IR name .
+The file being targetted will not be put on the iunlink list.
+.TP
 .BI "uuid [" uuid " | " generate " | " rewrite " | " restore ]
 Set the filesystem universally unique identifier (UUID).
 The filesystem UUID can be used by


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 21/24] xfs_db: compute hashes of parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-07-02  1:15   ` [PATCH 20/24] xfs_db: add link and unlink expert commands Darrick J. Wong
@ 2024-07-02  1:16   ` Darrick J. Wong
  2024-07-02  6:37     ` Christoph Hellwig
  2024-07-02  1:16   ` [PATCH 22/24] libxfs: create new files with attr forks if necessary Darrick J. Wong
                     ` (2 subsequent siblings)
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:16 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Enhance the hash command to compute the hashes of parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/hash.c                |   36 ++++++++++++++++++++++++++++++------
 libxfs/libxfs_api_defs.h |    1 +
 man/man8/xfs_db.8        |    9 ++++++++-
 3 files changed, 39 insertions(+), 7 deletions(-)


diff --git a/db/hash.c b/db/hash.c
index 1500a6032fd7..50f6da0b1a20 100644
--- a/db/hash.c
+++ b/db/hash.c
@@ -36,26 +36,42 @@ hash_help(void)
 " 'hash' prints out the calculated hash value for a string using the\n"
 "directory/attribute code hash function.\n"
 "\n"
-" Usage:  \"hash <string>\"\n"
+" Usage:  \"hash [-d|-p parent_ino] <string>\"\n"
 "\n"
 ));
 
 }
 
+enum hash_what {
+	ATTR,
+	DIRECTORY,
+	PPTR,
+};
+
 /* ARGSUSED */
 static int
 hash_f(
 	int		argc,
 	char		**argv)
 {
+	xfs_ino_t	p_ino = 0;
 	xfs_dahash_t	hashval;
-	bool		use_dir2_hash = false;
+	enum hash_what	what = ATTR;
 	int		c;
 
-	while ((c = getopt(argc, argv, "d")) != EOF) {
+	while ((c = getopt(argc, argv, "dp:")) != EOF) {
 		switch (c) {
 		case 'd':
-			use_dir2_hash = true;
+			what = DIRECTORY;
+			break;
+		case 'p':
+			errno = 0;
+			p_ino = strtoull(optarg, NULL, 0);
+			if (errno) {
+				perror(optarg);
+				return 1;
+			}
+			what = PPTR;
 			break;
 		default:
 			exitcode = 1;
@@ -70,10 +86,18 @@ hash_f(
 			.len	= strlen(argv[c]),
 		};
 
-		if (use_dir2_hash)
+		switch (what) {
+		case DIRECTORY:
 			hashval = libxfs_dir2_hashname(mp, &xname);
-		else
+			break;
+		case PPTR:
+			hashval = libxfs_parent_hashval(mp, xname.name,
+					xname.len, p_ino);
+			break;
+		case ATTR:
 			hashval = libxfs_attr_hashname(xname.name, xname.len);
+			break;
+		}
 		dbprintf("0x%x\n", hashval);
 	}
 
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b7edaf788051..df83aabdc13d 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -203,6 +203,7 @@
 #define xfs_mkdir_space_res		libxfs_mkdir_space_res
 #define xfs_parent_addname		libxfs_parent_addname
 #define xfs_parent_finish		libxfs_parent_finish
+#define xfs_parent_hashval		libxfs_parent_hashval
 #define xfs_parent_removename		libxfs_parent_removename
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_parent_from_attr		libxfs_parent_from_attr
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index f8db6c36f40a..9f6fea5748d4 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -822,7 +822,7 @@ Skip write verifiers but perform CRC recalculation; allows invalid data to be
 written to disk to test detection of invalid data.
 .RE
 .TP
-.BI hash [-d]" strings
+.BI hash [-d|-p parent_ino]" strings
 Prints the hash value of
 .I string
 using the hash function of the XFS directory and attribute implementation.
@@ -832,6 +832,13 @@ If the
 option is specified, the directory-specific hash function is used.
 This only makes a difference on filesystems with ascii case-insensitive
 lookups enabled.
+
+If the
+.B \-p
+option is specified, the parent pointer-specific hash function is used.
+The parent directory inumber must be specified as an argument.
+This only makes a difference on filesystems with ascii case-insensitive
+lookups enabled.
 .TP
 .BI "hashcoll [-a] [-s seed] [-n " nr "] [-p " path "] -i | " names...
 Create directory entries or extended attributes names that all have the same


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 22/24] libxfs: create new files with attr forks if necessary
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-07-02  1:16   ` [PATCH 21/24] xfs_db: compute hashes of parent pointers Darrick J. Wong
@ 2024-07-02  1:16   ` Darrick J. Wong
  2024-07-02  6:37     ` Christoph Hellwig
  2024-07-02  1:16   ` [PATCH 23/24] mkfs: Add parent pointers during protofile creation Darrick J. Wong
  2024-07-02  1:16   ` [PATCH 24/24] mkfs: enable formatting with parent pointers Darrick J. Wong
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:16 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Create new files with attr forks if they're going to have parent
pointers.  In the next patch we'll fix mkfs to use the same parent
creation functions as the kernel, so we're going to need this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/init.c |    4 ++++
 libxfs/util.c |   19 ++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index 95de1e6d1d4f..90a539e04161 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -602,14 +602,18 @@ void
 libxfs_compute_all_maxlevels(
 	struct xfs_mount	*mp)
 {
+	struct xfs_ino_geometry *igeo = M_IGEO(mp);
+
 	xfs_alloc_compute_maxlevels(mp);
 	xfs_bmap_compute_maxlevels(mp, XFS_DATA_FORK);
 	xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
+	igeo->attr_fork_offset = xfs_bmap_compute_attr_offset(mp);
 	xfs_ialloc_setup_geometry(mp);
 	xfs_rmapbt_compute_maxlevels(mp);
 	xfs_refcountbt_compute_maxlevels(mp);
 
 	xfs_agbtree_compute_maxlevels(mp);
+
 }
 
 /*
diff --git a/libxfs/util.c b/libxfs/util.c
index 74eea0fcbe07..373749457cc1 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -274,11 +274,12 @@ libxfs_init_new_inode(
 	struct fsxattr		*fsx,
 	struct xfs_inode	**ipp)
 {
+	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_inode	*ip;
 	unsigned int		flags;
 	int			error;
 
-	error = libxfs_iget(tp->t_mountp, tp, ino, XFS_IGET_CREATE, &ip);
+	error = libxfs_iget(mp, tp, ino, XFS_IGET_CREATE, &ip);
 	if (error != 0)
 		return error;
 	ASSERT(ip != NULL);
@@ -340,6 +341,22 @@ libxfs_init_new_inode(
 		ASSERT(0);
 	}
 
+	/*
+	 * If we're going to set a parent pointer on this file, we need to
+	 * create an attr fork to receive that parent pointer.
+	 */
+	if (pip && xfs_has_parent(mp)) {
+		ip->i_forkoff = xfs_default_attroffset(ip) >> 3;
+		xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0);
+
+		if (!xfs_has_attr(mp)) {
+			spin_lock(&mp->m_sb_lock);
+			xfs_add_attr(mp);
+			spin_unlock(&mp->m_sb_lock);
+			xfs_log_sb(tp);
+		}
+	}
+
 	/*
 	 * Log the new values stuffed into the inode.
 	 */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 23/24] mkfs: Add parent pointers during protofile creation
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-07-02  1:16   ` [PATCH 22/24] libxfs: create new files with attr forks if necessary Darrick J. Wong
@ 2024-07-02  1:16   ` Darrick J. Wong
  2024-07-02  6:37     ` Christoph Hellwig
  2024-07-02  1:16   ` [PATCH 24/24] mkfs: enable formatting with parent pointers Darrick J. Wong
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:16 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Allison Henderson <allison.henderson@oracle.com>

Inodes created from protofile parsing will also need to add the
appropriate parent pointers.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: use xfs_parent_add from libxfs instead of open-coding xfs_attr_set]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 mkfs/proto.c |   62 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 48 insertions(+), 14 deletions(-)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index a9a9b704a3ca..8e16eb1506f1 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -348,11 +348,12 @@ newregfile(
 
 static void
 newdirent(
-	xfs_mount_t	*mp,
-	xfs_trans_t	*tp,
-	xfs_inode_t	*pip,
-	struct xfs_name	*name,
-	xfs_ino_t	inum)
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_inode	*pip,
+	struct xfs_name		*name,
+	struct xfs_inode	*ip,
+	struct xfs_parent_args	*ppargs)
 {
 	int	error;
 	int	rsv;
@@ -365,9 +366,15 @@ newdirent(
 
 	rsv = XFS_DIRENTER_SPACE_RES(mp, name->len);
 
-	error = -libxfs_dir_createname(tp, pip, name, inum, rsv);
+	error = -libxfs_dir_createname(tp, pip, name, ip->i_ino, rsv);
 	if (error)
 		fail(_("directory createname error"), error);
+
+	if (ppargs) {
+		error = -libxfs_parent_addname(tp, ppargs, pip, name, ip);
+		if (error)
+			fail(_("parent addname error"), error);
+	}
 }
 
 static void
@@ -384,6 +391,20 @@ newdirectory(
 		fail(_("directory create error"), error);
 }
 
+static struct xfs_parent_args *
+newpptr(
+	struct xfs_mount	*mp)
+{
+	struct xfs_parent_args	*ret;
+	int			error;
+
+	error = -libxfs_parent_start(mp, &ret);
+	if (error)
+		fail(_("initializing parent pointer"), error);
+
+	return ret;
+}
+
 static void
 parseproto(
 	xfs_mount_t	*mp,
@@ -418,6 +439,7 @@ parseproto(
 	struct cred	creds;
 	char		*value;
 	struct xfs_name	xname;
+	struct xfs_parent_args *ppargs = NULL;
 
 	memset(&creds, 0, sizeof(creds));
 	mstr = getstr(pp);
@@ -492,6 +514,7 @@ parseproto(
 	case IF_REGULAR:
 		buf = newregfile(pp, &len);
 		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		ppargs = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0,
 					   &creds, fsxp, &ip);
 		if (error)
@@ -501,7 +524,7 @@ parseproto(
 			free(buf);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		break;
 
 	case IF_RESERVED:			/* pre-allocated space only */
@@ -515,7 +538,7 @@ parseproto(
 			exit(1);
 		}
 		tp = getres(mp, XFS_B_TO_FSB(mp, llen));
-
+		ppargs = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0,
 					  &creds, fsxp, &ip);
 		if (error)
@@ -524,17 +547,19 @@ parseproto(
 		libxfs_trans_ijoin(tp, pip, 0);
 
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		libxfs_trans_log_inode(tp, ip, flags);
 		error = -libxfs_trans_commit(tp);
 		if (error)
 			fail(_("Space preallocation failed."), error);
+		libxfs_parent_finish(mp, ppargs);
 		rsvfile(mp, ip, llen);
 		libxfs_irele(ip);
 		return;
 
 	case IF_BLOCK:
 		tp = getres(mp, 0);
+		ppargs = newpptr(mp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFBLK, 1,
@@ -544,12 +569,13 @@ parseproto(
 		}
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_BLKDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		flags |= XFS_ILOG_DEV;
 		break;
 
 	case IF_CHAR:
 		tp = getres(mp, 0);
+		ppargs = newpptr(mp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFCHR, 1,
@@ -558,24 +584,26 @@ parseproto(
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_CHRDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		flags |= XFS_ILOG_DEV;
 		break;
 
 	case IF_FIFO:
 		tp = getres(mp, 0);
+		ppargs = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFIFO, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_FIFO;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		break;
 	case IF_SYMLINK:
 		buf = getstr(pp);
 		len = (int)strlen(buf);
 		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		ppargs = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFLNK, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -583,7 +611,7 @@ parseproto(
 		writesymlink(tp, ip, buf, len);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_SYMLINK;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
 		break;
 	case IF_DIRECTORY:
 		tp = getres(mp, 0);
@@ -598,9 +626,10 @@ parseproto(
 			libxfs_log_sb(tp);
 			isroot = 1;
 		} else {
+			ppargs = newpptr(mp);
 			libxfs_trans_ijoin(tp, pip, 0);
 			xname.type = XFS_DIR3_FT_DIR;
-			newdirent(mp, tp, pip, &xname, ip->i_ino);
+			newdirent(mp, tp, pip, &xname, ip, ppargs);
 			libxfs_bumplink(tp, pip);
 			libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
 		}
@@ -609,6 +638,9 @@ parseproto(
 		error = -libxfs_trans_commit(tp);
 		if (error)
 			fail(_("Directory inode allocation failed."), error);
+
+		libxfs_parent_finish(mp, ppargs);
+
 		/*
 		 * RT initialization.  Do this here to ensure that
 		 * the RT inodes get placed after the root inode.
@@ -636,6 +668,8 @@ parseproto(
 		fail(_("Error encountered creating file from prototype file"),
 			error);
 	}
+
+	libxfs_parent_finish(mp, ppargs);
 	libxfs_irele(ip);
 }
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 24/24] mkfs: enable formatting with parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-07-02  1:16   ` [PATCH 23/24] mkfs: Add parent pointers during protofile creation Darrick J. Wong
@ 2024-07-02  1:16   ` Darrick J. Wong
  2024-07-02  6:38     ` Christoph Hellwig
  23 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:16 UTC (permalink / raw)
  To: djwong, cem
  Cc: Allison Henderson, catherine.hoang, linux-xfs, allison.henderson,
	hch

From: Allison Henderson <allison.henderson@oracle.com>

Enable parent pointer support in mkfs via the '-n parent' parameter.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move the no-V4 filesystem check to join the rest]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 mkfs/lts_4.19.conf |    3 +++
 mkfs/lts_5.10.conf |    3 +++
 mkfs/lts_5.15.conf |    3 +++
 mkfs/lts_5.4.conf  |    3 +++
 mkfs/lts_6.1.conf  |    3 +++
 mkfs/lts_6.6.conf  |    3 +++
 mkfs/xfs_mkfs.c    |   45 ++++++++++++++++++++++++++++++++++++++++++---
 7 files changed, 60 insertions(+), 3 deletions(-)


diff --git a/mkfs/lts_4.19.conf b/mkfs/lts_4.19.conf
index 92e8eba6ba8f..9fa1f9378f32 100644
--- a/mkfs/lts_4.19.conf
+++ b/mkfs/lts_4.19.conf
@@ -13,3 +13,6 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/lts_5.10.conf b/mkfs/lts_5.10.conf
index 34e7662cd671..d64bcdf8c46b 100644
--- a/mkfs/lts_5.10.conf
+++ b/mkfs/lts_5.10.conf
@@ -13,3 +13,6 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/lts_5.15.conf b/mkfs/lts_5.15.conf
index a36a5c2b7850..775fd9ab91b8 100644
--- a/mkfs/lts_5.15.conf
+++ b/mkfs/lts_5.15.conf
@@ -13,3 +13,6 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/lts_5.4.conf b/mkfs/lts_5.4.conf
index 4204d5b8f235..6f43a6c6d469 100644
--- a/mkfs/lts_5.4.conf
+++ b/mkfs/lts_5.4.conf
@@ -13,3 +13,6 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/lts_6.1.conf b/mkfs/lts_6.1.conf
index 9a90def8f489..a78a4f9e35dc 100644
--- a/mkfs/lts_6.1.conf
+++ b/mkfs/lts_6.1.conf
@@ -13,3 +13,6 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/lts_6.6.conf b/mkfs/lts_6.6.conf
index 3f7fb651937d..91a25bd8121f 100644
--- a/mkfs/lts_6.6.conf
+++ b/mkfs/lts_6.6.conf
@@ -13,3 +13,6 @@ rmapbt=1
 sparse=1
 nrext64=1
 exchange=0
+
+[naming]
+parent=0
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 991ecbdd03ff..394a35771246 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -114,6 +114,7 @@ enum {
 	N_SIZE = 0,
 	N_VERSION,
 	N_FTYPE,
+	N_PARENT,
 	N_MAX_OPTS,
 };
 
@@ -656,6 +657,7 @@ static struct opt_params nopts = {
 		[N_SIZE] = "size",
 		[N_VERSION] = "version",
 		[N_FTYPE] = "ftype",
+		[N_PARENT] = "parent",
 		[N_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -679,6 +681,14 @@ static struct opt_params nopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = N_PARENT,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+
+
 	},
 };
 
@@ -1040,7 +1050,7 @@ usage( void )
 			    sunit=value|su=num,sectsize=num,lazy-count=0|1,\n\
 			    concurrency=num]\n\
 /* label */		[-L label (maximum 12 characters)]\n\
-/* naming */		[-n size=num,version=2|ci,ftype=0|1]\n\
+/* naming */		[-n size=num,version=2|ci,ftype=0|1,parent=0|1]]\n\
 /* no-op info only */	[-N]\n\
 /* prototype file */	[-p fname]\n\
 /* quiet */		[-q]\n\
@@ -1878,6 +1888,9 @@ naming_opts_parser(
 	case N_FTYPE:
 		cli->sb_feat.dirftype = getnum(value, opts, subopt);
 		break;
+	case N_PARENT:
+		cli->sb_feat.parent_pointers = getnum(value, &nopts, N_PARENT);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2385,6 +2398,14 @@ _("exchange-range not supported without CRC support\n"));
 			usage();
 		}
 		cli->sb_feat.exchrange = false;
+
+		if (cli->sb_feat.parent_pointers &&
+		    cli_opt_set(&nopts, N_PARENT)) {
+			fprintf(stderr,
+_("parent pointers not supported without CRC support\n"));
+			usage();
+		}
+		cli->sb_feat.parent_pointers = false;
 	}
 
 	if (!cli->sb_feat.finobt) {
@@ -2419,6 +2440,17 @@ _("cowextsize not supported without reflink support\n"));
 		usage();
 	}
 
+	/*
+	 * Turn on exchange-range if parent pointers are enabled and the caller
+	 * did not provide an explicit exchange-range parameter so that users
+	 * can take advantage of online repair.  It's not required for correct
+	 * operation, but it costs us nothing to enable it.
+	 */
+	if (cli->sb_feat.parent_pointers && !cli->sb_feat.exchrange &&
+	    !cli_opt_set(&iopts, I_EXCHANGE)) {
+		cli->sb_feat.exchrange = true;
+	}
+
 	/*
 	 * Copy features across to config structure now.
 	 */
@@ -3458,8 +3490,6 @@ sb_set_features(
 		sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
 	if (fp->projid32bit)
 		sbp->sb_features2 |= XFS_SB_VERSION2_PROJID32BIT;
-	if (fp->parent_pointers)
-		sbp->sb_features2 |= XFS_SB_VERSION2_PARENTBIT;
 	if (fp->crcs_enabled)
 		sbp->sb_features2 |= XFS_SB_VERSION2_CRCBIT;
 	if (fp->attr_version == 2)
@@ -3520,6 +3550,15 @@ sb_set_features(
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NREXT64;
 	if (fp->exchrange)
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_EXCHRANGE;
+	if (fp->parent_pointers) {
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_PARENT;
+		/*
+		 * Set ATTRBIT even if mkfs doesn't write out a single parent
+		 * pointer so that the kernel doesn't have to do that for us
+		 * with a synchronous write to the primary super at runtime.
+		 */
+		sbp->sb_versionnum |= XFS_SB_VERSION_ATTRBIT;
+	}
 }
 
 /*


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/2] xfs: create a blob array data structure
  2024-07-02  0:52 ` [PATCHSET v13.7 12/16] xfsprogs: scrubbing for " Darrick J. Wong
@ 2024-07-02  1:17   ` Darrick J. Wong
  2024-07-02  6:41     ` Christoph Hellwig
  2024-07-02  1:17   ` [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers Darrick J. Wong
  1 sibling, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:17 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a simple 'blob array' data structure for storage of arbitrarily
sized metadata objects that will be used to reconstruct metadata.  For
the intended usage (temporarily storing extended attribute names and
values) we only have to support storing objects and retrieving them.
Use the xfile abstraction to store the attribute information in memory
that can be swapped out.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/Makefile |    2 +
 libxfs/xfblob.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfblob.h |   24 +++++++++
 libxfs/xfile.c  |   11 ++++
 libxfs/xfile.h  |    1 
 5 files changed, 185 insertions(+)
 create mode 100644 libxfs/xfblob.c
 create mode 100644 libxfs/xfblob.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index 9fb53d9cc32c..4e8f9a135818 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -28,6 +28,7 @@ HFILES = \
 	linux-err.h \
 	topology.h \
 	buf_mem.h \
+	xfblob.h \
 	xfile.h \
 	xfs_ag_resv.h \
 	xfs_alloc.h \
@@ -73,6 +74,7 @@ CFILES = buf_mem.c \
 	topology.c \
 	trans.c \
 	util.c \
+	xfblob.c \
 	xfile.c \
 	xfs_ag.c \
 	xfs_ag_resv.c \
diff --git a/libxfs/xfblob.c b/libxfs/xfblob.c
new file mode 100644
index 000000000000..7d8caaa4c164
--- /dev/null
+++ b/libxfs/xfblob.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+
+/*
+ * XFS Blob Storage
+ * ================
+ * Stores and retrieves blobs using an xfile.  Objects are appended to the file
+ * and the offset is returned as a magic cookie for retrieval.
+ */
+
+#define XB_KEY_MAGIC	0xABAADDAD
+struct xb_key {
+	uint32_t		xb_magic;  /* XB_KEY_MAGIC */
+	uint32_t		xb_size;   /* size of the blob, in bytes */
+	loff_t			xb_offset; /* byte offset of this key */
+	/* blob comes after here */
+} __packed;
+
+/* Initialize a blob storage object. */
+int
+xfblob_create(
+	const char		*description,
+	struct xfblob		**blobp)
+{
+	struct xfblob		*blob;
+	struct xfile		*xfile;
+	int			error;
+
+	error = xfile_create(description, 0, &xfile);
+	if (error)
+		return error;
+
+	blob = malloc(sizeof(struct xfblob));
+	if (!blob) {
+		error = -ENOMEM;
+		goto out_xfile;
+	}
+
+	blob->xfile = xfile;
+	blob->last_offset = PAGE_SIZE;
+
+	*blobp = blob;
+	return 0;
+
+out_xfile:
+	xfile_destroy(xfile);
+	return error;
+}
+
+/* Destroy a blob storage object. */
+void
+xfblob_destroy(
+	struct xfblob	*blob)
+{
+	xfile_destroy(blob->xfile);
+	kfree(blob);
+}
+
+/* Retrieve a blob. */
+int
+xfblob_load(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie,
+	void		*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+	if (size < key.xb_size) {
+		ASSERT(0);
+		return -EFBIG;
+	}
+
+	return xfile_load(blob->xfile, ptr, key.xb_size,
+			cookie + sizeof(key));
+}
+
+/* Store a blob. */
+int
+xfblob_store(
+	struct xfblob	*blob,
+	xfblob_cookie	*cookie,
+	const void	*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key = {
+		.xb_offset = blob->last_offset,
+		.xb_magic = XB_KEY_MAGIC,
+		.xb_size = size,
+	};
+	loff_t		pos = blob->last_offset;
+	int		error;
+
+	error = xfile_store(blob->xfile, &key, sizeof(key), pos);
+	if (error)
+		return error;
+
+	pos += sizeof(key);
+	error = xfile_store(blob->xfile, ptr, size, pos);
+	if (error)
+		goto out_err;
+
+	*cookie = blob->last_offset;
+	blob->last_offset += sizeof(key) + size;
+	return 0;
+out_err:
+	xfile_discard(blob->xfile, blob->last_offset, sizeof(key));
+	return error;
+}
+
+/* Free a blob. */
+int
+xfblob_free(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+
+	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
+	return 0;
+}
diff --git a/libxfs/xfblob.h b/libxfs/xfblob.h
new file mode 100644
index 000000000000..28bf4ab2898f
--- /dev/null
+++ b/libxfs/xfblob.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_XFBLOB_H__
+#define __XFS_SCRUB_XFBLOB_H__
+
+struct xfblob {
+	struct xfile	*xfile;
+	loff_t		last_offset;
+};
+
+typedef loff_t		xfblob_cookie;
+
+int xfblob_create(const char *descr, struct xfblob **blobp);
+void xfblob_destroy(struct xfblob *blob);
+int xfblob_load(struct xfblob *blob, xfblob_cookie cookie, void *ptr,
+		uint32_t size);
+int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
+		uint32_t size);
+int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
+
+#endif /* __XFS_SCRUB_XFBLOB_H__ */
diff --git a/libxfs/xfile.c b/libxfs/xfile.c
index 6e0fa809a296..b4908b49b6d5 100644
--- a/libxfs/xfile.c
+++ b/libxfs/xfile.c
@@ -391,3 +391,14 @@ xfile_bytes(
 
 	return (unsigned long long)statbuf.st_blocks << 9;
 }
+
+/* Discard pages backing a range of the xfile. */
+void
+xfile_discard(
+	struct xfile		*xf,
+	loff_t			pos,
+	unsigned long long	count)
+{
+	fallocate(xf->fcb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+			pos, count);
+}
diff --git a/libxfs/xfile.h b/libxfs/xfile.h
index 180a42bbbaa2..7ee8b90cdf30 100644
--- a/libxfs/xfile.h
+++ b/libxfs/xfile.h
@@ -30,5 +30,6 @@ ssize_t xfile_load(struct xfile *xf, void *buf, size_t count, loff_t pos);
 ssize_t xfile_store(struct xfile *xf, const void *buf, size_t count, loff_t pos);
 
 unsigned long long xfile_bytes(struct xfile *xf);
+void xfile_discard(struct xfile *xf, loff_t pos, unsigned long long count);
 
 #endif /* __LIBXFS_XFILE_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 12/16] xfsprogs: scrubbing for " Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 1/2] xfs: create a blob array data structure Darrick J. Wong
@ 2024-07-02  1:17   ` Darrick J. Wong
  2024-07-02  6:41     ` Christoph Hellwig
  1 sibling, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:17 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Update the man page for the scrub ioctl to reflect the new scrubbing
abilities when parent pointers are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_scrub_metadata.2 |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)


diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index 9963f1913e60..75ae52bb5847 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -109,12 +109,11 @@ must be zero.
 .nf
 .B XFS_SCRUB_TYPE_BMBTD
 .B XFS_SCRUB_TYPE_BMBTA
+.fi
+.TP
 .B XFS_SCRUB_TYPE_BMBTC
-.fi
-.TP
-.B XFS_SCRUB_TYPE_PARENT
 Examine a given inode's data block map, extended attribute block map,
-copy on write block map, or parent inode pointer.
+or copy on write block map.
 Inode records are examined for obviously incorrect values and
 discrepancies with the three block map types.
 The block maps are checked for obviously wrong values and
@@ -133,9 +132,22 @@ The inode to examine can be specified in the same manner as
 .TP
 .B XFS_SCRUB_TYPE_DIR
 Examine the entries in a given directory for invalid data or dangling pointers.
+If the filesystem supports directory parent pointers, each entry will be
+checked to confirm that the child file has a matching parent pointer.
 The directory to examine can be specified in the same manner as
 .BR XFS_SCRUB_TYPE_INODE "."
 
+.TP
+.B XFS_SCRUB_TYPE_PARENT
+For filesystems that support directory parent pointers, this scrubber
+examines all the parent pointers attached to a file and confirms that the
+parent directory has an entry matching the parent pointer.
+For filesystems that do not support directory parent pointers, this scrubber
+checks that a subdirectory's dotdot entry points to a directory with an entry
+that points back to the subdirectory.
+The inode to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
 .TP
 .B XFS_SCRUB_TYPE_SYMLINK
 Examine the target of a symbolic link for obvious pathname problems.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
@ 2024-07-02  1:17   ` Darrick J. Wong
  2024-07-02  6:41     ` Christoph Hellwig
  2024-07-02  1:17   ` [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set Darrick J. Wong
                     ` (10 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:17 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

In preparation for online/offline repair wanting to use xfs_attr_set,
move some of the boilerplate out of this function into the callers.
Repair can initialize the da_args completely, and the userspace flag
handling/twisting goes away once we move it to xfs_attr_change.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attrset.c             |   18 ++++++++++++++++--
 libxfs/libxfs_api_defs.h |    1 +
 2 files changed, 17 insertions(+), 2 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index d9ab79fa7849..008662571323 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -111,7 +111,11 @@ attr_set_f(
 	int			argc,
 	char			**argv)
 {
-	struct xfs_da_args	args = { };
+	struct xfs_da_args	args = {
+		.geo		= mp->m_attr_geo,
+		.whichfork	= XFS_ATTR_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+	};
 	char			*sp;
 	char			*name_from_file = NULL;
 	char			*value_from_file = NULL;
@@ -253,6 +257,9 @@ attr_set_f(
 		goto out;
 	}
 
+	args.owner = iocur_top->ino;
+	libxfs_attr_sethash(&args);
+
 	if (libxfs_attr_set(&args, op, false)) {
 		dbprintf(_("failed to set attr %s on inode %llu\n"),
 			args.name, (unsigned long long)iocur_top->ino);
@@ -277,7 +284,11 @@ attr_remove_f(
 	int			argc,
 	char			**argv)
 {
-	struct xfs_da_args	args = { };
+	struct xfs_da_args	args = {
+		.geo		= mp->m_attr_geo,
+		.whichfork	= XFS_ATTR_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+	};
 	char			*name_from_file = NULL;
 	int			c;
 
@@ -365,6 +376,9 @@ attr_remove_f(
 		goto out;
 	}
 
+	args.owner = iocur_top->ino;
+	libxfs_attr_sethash(&args);
+
 	if (libxfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false)) {
 		dbprintf(_("failed to remove attr %s from inode %llu\n"),
 			(unsigned char *)args.name,
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index df83aabdc13d..bf1d3c9d37f6 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -47,6 +47,7 @@
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_set			libxfs_attr_set
+#define xfs_attr_sethash		libxfs_attr_sethash
 #define xfs_attr_sf_firstentry		libxfs_attr_sf_firstentry
 #define xfs_attr_shortform_verify	libxfs_attr_shortform_verify
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set Darrick J. Wong
@ 2024-07-02  1:17   ` Darrick J. Wong
  2024-07-02  6:41     ` Christoph Hellwig
  2024-07-02  1:18   ` [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them Darrick J. Wong
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:17 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Actually tell the user what went wrong when setting or removing xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attrset.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index 008662571323..81d530055193 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -121,6 +121,7 @@ attr_set_f(
 	char			*value_from_file = NULL;
 	enum xfs_attr_update	op = XFS_ATTRUPDATE_UPSERT;
 	int			c;
+	int			error;
 
 	if (cur_typ == NULL) {
 		dbprintf(_("no current type\n"));
@@ -260,9 +261,11 @@ attr_set_f(
 	args.owner = iocur_top->ino;
 	libxfs_attr_sethash(&args);
 
-	if (libxfs_attr_set(&args, op, false)) {
-		dbprintf(_("failed to set attr %s on inode %llu\n"),
-			args.name, (unsigned long long)iocur_top->ino);
+	error = -libxfs_attr_set(&args, op, false);
+	if (error) {
+		dbprintf(_("failed to set attr %s on inode %llu: %s\n"),
+			args.name, (unsigned long long)iocur_top->ino,
+			strerror(error));
 		goto out;
 	}
 
@@ -291,6 +294,7 @@ attr_remove_f(
 	};
 	char			*name_from_file = NULL;
 	int			c;
+	int			error;
 
 	if (cur_typ == NULL) {
 		dbprintf(_("no current type\n"));
@@ -379,10 +383,12 @@ attr_remove_f(
 	args.owner = iocur_top->ino;
 	libxfs_attr_sethash(&args);
 
-	if (libxfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false)) {
-		dbprintf(_("failed to remove attr %s from inode %llu\n"),
+	error = -libxfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
+	if (error) {
+		dbprintf(_("failed to remove attr %s from inode %llu: %s\n"),
 			(unsigned char *)args.name,
-			(unsigned long long)iocur_top->ino);
+			(unsigned long long)iocur_top->ino,
+			strerror(error));
 		goto out;
 	}
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set Darrick J. Wong
  2024-07-02  1:17   ` [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set Darrick J. Wong
@ 2024-07-02  1:18   ` Darrick J. Wong
  2024-07-02  6:42     ` Christoph Hellwig
  2024-07-02  1:18   ` [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found Darrick J. Wong
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:18 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Drop a parent pointer xattr if the filesystem doesn't support parent
pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/attr_repair.c |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)


diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 2e97fd9775ba..50159b9a5338 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -327,6 +327,13 @@ process_shortform_attr(
 					NULL, currententry->namelen,
 					currententry->valuelen);
 
+		if ((currententry->flags & XFS_ATTR_PARENT) &&
+		    !xfs_has_parent(mp)) {
+			do_warn(
+ _("parent pointer found on filesystem that doesn't support parent pointers\n"));
+			junkit |= 1;
+		}
+
 		remainingspace = remainingspace -
 					xfs_attr_sf_entsize(currententry);
 
@@ -527,6 +534,15 @@ process_leaf_attr_local(
 			return -1;
 		}
 	}
+
+	if ((entry->flags & XFS_ATTR_PARENT) && !xfs_has_parent(mp)) {
+		do_warn(
+ _("parent pointer found in attribute entry %d in attr block %u, inode %"
+   PRIu64 " on filesystem that doesn't support parent pointers\n"),
+				i, da_bno, ino);
+		return -1;
+	}
+
 	return xfs_attr_leaf_entsize_local(local->namelen,
 						be16_to_cpu(local->valuelen));
 }
@@ -562,6 +578,20 @@ process_leaf_attr_remote(
 		return -1;
 	}
 
+	if (entry->flags & XFS_ATTR_PARENT) {
+		if (!xfs_has_parent(mp))
+			do_warn(
+ _("parent pointer found in attribute entry %d in attr block %u, inode %"
+   PRIu64 " on filesystem that doesn't support parent pointers\n"),
+					i, da_bno, ino);
+		else
+			do_warn(
+ _("parent pointer found in attribute entry %d in attr block %u, inode %"
+   PRIu64 " with bogus remote value\n"),
+					i, da_bno, ino);
+		return -1;
+	}
+
 	value = malloc(be32_to_cpu(remotep->valuelen));
 	if (value == NULL) {
 		do_warn(


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:18   ` [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them Darrick J. Wong
@ 2024-07-02  1:18   ` Darrick J. Wong
  2024-07-02  6:42     ` Christoph Hellwig
  2024-07-02  1:18   ` [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents Darrick J. Wong
                     ` (7 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:18 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Make sure that the /lost+found gets created with parent pointers, and
that lost children being put in there get new parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +
 repair/phase6.c          |   76 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index bf1d3c9d37f6..d3611e05bade 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -52,6 +52,7 @@
 #define xfs_attr_shortform_verify	libxfs_attr_shortform_verify
 
 #define __xfs_bmap_add_free		__libxfs_bmap_add_free
+#define xfs_bmap_add_attrfork		libxfs_bmap_add_attrfork
 #define xfs_bmap_validate_extent	libxfs_bmap_validate_extent
 #define xfs_bmapi_read			libxfs_bmapi_read
 #define xfs_bmapi_remap			libxfs_bmapi_remap
@@ -205,6 +206,7 @@
 #define xfs_parent_addname		libxfs_parent_addname
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_hashval		libxfs_parent_hashval
+#define xfs_parent_lookup		libxfs_parent_lookup
 #define xfs_parent_removename		libxfs_parent_removename
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_parent_from_attr		libxfs_parent_from_attr
diff --git a/repair/phase6.c b/repair/phase6.c
index 47dd9de2741c..791f7d36fa8a 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -903,6 +903,12 @@ mk_orphanage(xfs_mount_t *mp)
 	const int	mode = 0755;
 	int		nres;
 	struct xfs_name	xname;
+	struct xfs_parent_args *ppargs = NULL;
+
+	i = -libxfs_parent_start(mp, &ppargs);
+	if (i)
+		do_error(_("%d - couldn't allocate parent pointer for %s\n"),
+			i, ORPHANAGE);
 
 	/*
 	 * check for an existing lost+found first, if it exists, return
@@ -994,6 +1000,14 @@ mk_orphanage(xfs_mount_t *mp)
 		_("can't make %s, createname error %d\n"),
 			ORPHANAGE, error);
 
+	if (ppargs) {
+		error = -libxfs_parent_addname(tp, ppargs, pip, &xname, ip);
+		if (error)
+			do_error(
+ _("can't make %s, parent addname error %d\n"),
+					ORPHANAGE, error);
+	}
+
 	/*
 	 * bump up the link count in the root directory to account
 	 * for .. in the new directory, and update the irec copy of the
@@ -1016,10 +1030,52 @@ mk_orphanage(xfs_mount_t *mp)
 	libxfs_irele(ip);
 out_pip:
 	libxfs_irele(pip);
+	libxfs_parent_finish(mp, ppargs);
 
 	return(ino);
 }
 
+/*
+ * Add a parent pointer back to the orphanage for any file we're moving into
+ * the orphanage, being careful not to trip over any existing parent pointer.
+ * You never know when the orphanage might get corrupted.
+ */
+static void
+add_orphan_pptr(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*orphanage_ip,
+	const struct xfs_name	*xname,
+	struct xfs_inode	*ip,
+	struct xfs_parent_args	*ppargs)
+{
+	struct xfs_parent_rec	pptr = { };
+	struct xfs_da_args	scratch;
+	int			error;
+
+	xfs_inode_to_parent_rec(&pptr, orphanage_ip);
+	error = -libxfs_parent_lookup(tp, ip, xname, &pptr, &scratch);
+	if (!error)
+		return;
+	if (error != ENOATTR)
+		do_log(
+ _("cannot look up parent pointer for '%.*s', err %d\n"),
+				xname->len, xname->name, error);
+
+	if (!xfs_inode_has_attr_fork(ip)) {
+		error = -libxfs_bmap_add_attrfork(tp, ip,
+				sizeof(struct xfs_attr_sf_hdr), true);
+		if (error)
+			do_error(_("can't add attr fork to inode 0x%llx\n"),
+					(unsigned long long)ip->i_ino);
+	}
+
+	error = -libxfs_parent_addname(tp, ppargs, orphanage_ip, xname, ip);
+	if (error)
+		do_error(
+ _("can't add parent pointer for '%.*s', error %d\n"),
+				xname->len, xname->name, error);
+}
+
 /*
  * move a file to the orphange.
  */
@@ -1040,6 +1096,13 @@ mv_orphanage(
 	ino_tree_node_t		*irec;
 	int			ino_offset = 0;
 	struct xfs_name		xname;
+	struct xfs_parent_args	*ppargs;
+
+	err = -libxfs_parent_start(mp, &ppargs);
+	if (err)
+		do_error(
+ _("%d - couldn't allocate parent pointer for lost inode\n"),
+			err);
 
 	xname.name = fname;
 	xname.len = snprintf((char *)fname, sizeof(fname), "%llu",
@@ -1091,6 +1154,10 @@ mv_orphanage(
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
 
+			if (ppargs)
+				add_orphan_pptr(tp, orphanage_ip, &xname,
+						ino_p, ppargs);
+
 			if (irec)
 				add_inode_ref(irec, ino_offset);
 			else
@@ -1125,6 +1192,10 @@ mv_orphanage(
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
 
+			if (ppargs)
+				add_orphan_pptr(tp, orphanage_ip, &xname,
+						ino_p, ppargs);
+
 			if (irec)
 				add_inode_ref(irec, ino_offset);
 			else
@@ -1173,6 +1244,10 @@ mv_orphanage(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
 		ASSERT(err == 0);
 
+		if (ppargs)
+			add_orphan_pptr(tp, orphanage_ip, &xname, ino_p,
+					ppargs);
+
 		set_nlink(VFS_I(ino_p), 1);
 		libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 		err = -libxfs_trans_commit(tp);
@@ -1182,6 +1257,7 @@ mv_orphanage(
 	}
 	libxfs_irele(ino_p);
 	libxfs_irele(orphanage_ip);
+	libxfs_parent_finish(mp, ppargs);
 }
 
 static int


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:18   ` [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found Darrick J. Wong
@ 2024-07-02  1:18   ` Darrick J. Wong
  2024-07-02  6:43     ` Christoph Hellwig
  2024-07-02  1:19   ` [PATCH 06/12] xfs_repair: build a parent pointer index Darrick J. Wong
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:18 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

dir_hash_add() adds the passed-in dirent to the directory hashtab even
if there's already a duplicate.  Therefore, if we detect a duplicate or
a garbage entry while processing the a shortform directory's entries, we
need to junk the newly added entry, just like we do when processing
directory data blocks.

This will become particularly relevant in the next patch, where we
generate a master index of parent pointers from the non-junked hashtab
entries of each directory that phase6 scans.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/phase6.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 791f7d36fa8a..9d41aad784a1 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2562,6 +2562,7 @@ shortform_dir2_entry_check(
 	struct xfs_dir2_sf_entry *next_sfep;
 	struct xfs_ifork	*ifp;
 	struct ino_tree_node	*irec;
+	xfs_dir2_dataptr_t	diroffset;
 	int			max_size;
 	int			ino_offset;
 	int			i;
@@ -2739,8 +2740,9 @@ shortform_dir2_entry_check(
 		/*
 		 * check for duplicate names in directory.
 		 */
-		dup_inum = dir_hash_add(mp, hashtab, (xfs_dir2_dataptr_t)
-				(sfep - xfs_dir2_sf_firstentry(sfp)),
+		diroffset = xfs_dir2_byte_to_dataptr(
+				xfs_dir2_sf_get_offset(sfep));
+		dup_inum = dir_hash_add(mp, hashtab, diroffset,
 				lino, sfep->namelen, sfep->name,
 				libxfs_dir2_sf_get_ftype(mp, sfep));
 		if (dup_inum != NULLFSINO) {
@@ -2775,6 +2777,7 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " already points to ino %" PR
 				next_sfep = shortform_dir2_junk(mp, sfp, sfep,
 						lino, &max_size, &i,
 						&bytes_deleted, ino_dirty);
+				dir_hash_junkit(hashtab, diroffset);
 				continue;
 			} else if (parent == ino)  {
 				add_inode_reached(irec, ino_offset);
@@ -2799,6 +2802,7 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " already points to ino %" PR
 				next_sfep = shortform_dir2_junk(mp, sfp, sfep,
 						lino, &max_size, &i,
 						&bytes_deleted, ino_dirty);
+				dir_hash_junkit(hashtab, diroffset);
 				continue;
 			}
 		}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 06/12] xfs_repair: build a parent pointer index
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:18   ` [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents Darrick J. Wong
@ 2024-07-02  1:19   ` Darrick J. Wong
  2024-07-02  6:45     ` Christoph Hellwig
  2024-07-02  1:19   ` [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
                     ` (5 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:19 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

When we're walking directories during phase 6, build an index of parent
pointers that we expect to find.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/Makefile |    2 +
 repair/phase6.c |   35 +++++++++
 repair/pptr.c   |  215 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h   |   15 ++++
 4 files changed, 267 insertions(+)
 create mode 100644 repair/pptr.c
 create mode 100644 repair/pptr.h


diff --git a/repair/Makefile b/repair/Makefile
index d948588781de..b0acc8c46a33 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -24,6 +24,7 @@ HFILES = \
 	err_protos.h \
 	globals.h \
 	incore.h \
+	pptr.h \
 	prefetch.h \
 	progress.h \
 	protos.h \
@@ -63,6 +64,7 @@ CFILES = \
 	phase5.c \
 	phase6.c \
 	phase7.c \
+	pptr.c \
 	prefetch.c \
 	progress.c \
 	quotacheck.c \
diff --git a/repair/phase6.c b/repair/phase6.c
index 9d41aad784a1..fe56feb6ecef 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -18,6 +18,7 @@
 #include "dinode.h"
 #include "progress.h"
 #include "versions.h"
+#include "repair/pptr.h"
 
 static struct cred		zerocr;
 static struct fsxattr 		zerofsx;
@@ -999,6 +1000,7 @@ mk_orphanage(xfs_mount_t *mp)
 		do_error(
 		_("can't make %s, createname error %d\n"),
 			ORPHANAGE, error);
+	add_parent_ptr(ip->i_ino, (unsigned char *)ORPHANAGE, pip, false);
 
 	if (ppargs) {
 		error = -libxfs_parent_addname(tp, ppargs, pip, &xname, ip);
@@ -1255,6 +1257,10 @@ mv_orphanage(
 			do_error(
 	_("orphanage name create failed (%d)\n"), err);
 	}
+
+	if (xfs_has_parent(mp))
+		add_parent_ptr(ino_p->i_ino, xname.name, orphanage_ip, false);
+
 	libxfs_irele(ino_p);
 	libxfs_irele(orphanage_ip);
 	libxfs_parent_finish(mp, ppargs);
@@ -2894,6 +2900,30 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " already points to ino %" PR
 	}
 }
 
+static void
+dir_hash_add_parent_ptrs(
+	struct xfs_inode	*dp,
+	struct dir_hash_tab	*hashtab)
+{
+	struct dir_hash_ent	*p;
+
+	if (!xfs_has_parent(dp->i_mount))
+		return;
+
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		if (p->junkit)
+			continue;
+		if (p->name.name[0] == '/')
+			continue;
+		if (p->name.name[0] == '.' &&
+		    (p->name.len == 1 ||
+		     (p->name.len == 2 && p->name.name[1] == '.')))
+			continue;
+
+		add_parent_ptr(p->inum, p->name.name, dp, dotdot_update);
+	}
+}
+
 /*
  * processes all reachable inodes in directories
  */
@@ -3020,6 +3050,7 @@ _("error %d fixing shortform directory %llu\n"),
 		default:
 			break;
 	}
+	dir_hash_add_parent_ptrs(ip, hashtab);
 	dir_hash_done(hashtab);
 
 	/*
@@ -3311,6 +3342,8 @@ phase6(xfs_mount_t *mp)
 	ino_tree_node_t		*irec;
 	int			i;
 
+	parent_ptr_init(mp);
+
 	memset(&zerocr, 0, sizeof(struct cred));
 	memset(&zerofsx, 0, sizeof(struct fsxattr));
 	orphanage_ino = 0;
@@ -3411,4 +3444,6 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 			irec = next_ino_rec(irec);
 		}
 	}
+
+	parent_ptr_free(mp);
 }
diff --git a/repair/pptr.c b/repair/pptr.c
new file mode 100644
index 000000000000..193daa0a5467
--- /dev/null
+++ b/repair/pptr.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+#include "libfrog/platform.h"
+#include "repair/err_protos.h"
+#include "repair/slab.h"
+#include "repair/pptr.h"
+
+#undef PPTR_DEBUG
+
+#ifdef PPTR_DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Parent Pointer Validation
+ * =========================
+ *
+ * Phase 6 validates the connectivity of the directory tree after validating
+ * that all the space metadata are correct, and confirming all the inodes that
+ * we intend to keep.  The first part of phase 6 walks the directories of the
+ * filesystem to ensure that every file that isn't the root directory has a
+ * parent.  Unconnected files are attached to the orphanage.  Filesystems with
+ * the directory parent pointer feature enabled must also ensure that for every
+ * directory entry that points to a child file, that child has a matching
+ * parent pointer.
+ *
+ * There are many ways that we could check the parent pointers, but the means
+ * that we have chosen is to build a per-AG master index of all parent pointers
+ * of all inodes stored in that AG, and use that as the basis for comparison.
+ * This consumes a lot of memory, but performing both a forward scan to check
+ * dirent -> parent pointer and a backwards scan of parent pointer -> dirent
+ * takes longer than the simple method presented here.  Userspace adds the
+ * additional twist that inodes are not cached (and there are no ILOCKs), which
+ * makes that approach even less attractive.
+ *
+ * During the directory walk at the start of phase 6, we transform each child
+ * directory entry found into its parent pointer equivalent.  In other words,
+ * the forward information:
+ *
+ *     (dir_ino, name, child_ino)
+ *
+ * becomes this backwards information:
+ *
+ *     (child_agino*, dir_ino*, dir_gen, name*)
+ *
+ * Key fields are starred.
+ *
+ * This tuple is recorded in the per-AG master parent pointer index.  Note
+ * that names are stored separately in an xfblob data structure so that the
+ * rest of the information can be sorted and processed as fixed-size records;
+ * the incore parent pointer record contains a pointer to the xfblob data.
+ */
+
+struct ag_pptr {
+	/* parent directory handle */
+	xfs_ino_t		parent_ino;
+	uint32_t		parent_gen;
+
+	/* dirent name length */
+	unsigned short		namelen;
+
+	/* AG_PPTR_* flags */
+	unsigned short		flags;
+
+	/* cookie for the actual dirent name */
+	xfblob_cookie		name_cookie;
+
+	/* agino of the child file */
+	xfs_agino_t		child_agino;
+
+	/* hash of the dirent name */
+	xfs_dahash_t		namehash;
+};
+
+/* This might be a duplicate due to dotdot reprocessing */
+#define AG_PPTR_POSSIBLE_DUP	(1U << 0)
+
+struct ag_pptrs {
+	/* Lock to protect pptr_recs during the dirent scan. */
+	pthread_mutex_t		lock;
+
+	/* Parent pointer records for files in this AG. */
+	struct xfs_slab		*pptr_recs;
+};
+
+/* Global names storage file. */
+static struct xfblob	*names;
+static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
+static struct ag_pptrs	*fs_pptrs;
+
+void
+parent_ptr_free(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		agno;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		free_slab(&fs_pptrs[agno].pptr_recs);
+		pthread_mutex_destroy(&fs_pptrs[agno].lock);
+	}
+	free(fs_pptrs);
+	fs_pptrs = NULL;
+
+	xfblob_destroy(names);
+}
+
+void
+parent_ptr_init(
+	struct xfs_mount	*mp)
+{
+	char			*descr;
+	xfs_agnumber_t		agno;
+	int			error;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	descr = kasprintf(GFP_KERNEL, "xfs_repair (%s): parent pointer names",
+			mp->m_fsname);
+	error = -xfblob_create(descr, &names);
+	kfree(descr);
+	if (error)
+		do_error(_("init parent pointer names failed: %s\n"),
+				strerror(error));
+
+	fs_pptrs = calloc(mp->m_sb.sb_agcount, sizeof(struct ag_pptrs));
+	if (!fs_pptrs)
+		do_error(
+ _("init parent pointer per-AG record array failed: %s\n"),
+				strerror(errno));
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = pthread_mutex_init(&fs_pptrs[agno].lock, NULL);
+		if (error)
+			do_error(
+ _("init agno %u parent pointer lock failed: %s\n"),
+					agno, strerror(error));
+
+		error = -init_slab(&fs_pptrs[agno].pptr_recs,
+				sizeof(struct ag_pptr));
+		if (error)
+			do_error(
+ _("init agno %u parent pointer recs failed: %s\n"),
+					agno, strerror(error));
+	}
+}
+
+/* Remember that @dp has a dirent (@fname, @ino). */
+void
+add_parent_ptr(
+	xfs_ino_t		ino,
+	const unsigned char	*fname,
+	struct xfs_inode	*dp,
+	bool			possible_dup)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_name		dname = {
+		.name		= fname,
+		.len		= strlen((char *)fname),
+	};
+	struct ag_pptr		ag_pptr = {
+		.child_agino	= XFS_INO_TO_AGINO(mp, ino),
+		.parent_ino	= dp->i_ino,
+		.parent_gen	= VFS_I(dp)->i_generation,
+		.namelen	= dname.len,
+	};
+	struct ag_pptrs		*ag_pptrs;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ino);
+	int			error;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	if (possible_dup)
+		ag_pptr.flags |= AG_PPTR_POSSIBLE_DUP;
+
+	ag_pptr.namehash = libxfs_dir2_hashname(mp, &dname);
+
+	pthread_mutex_lock(&names_mutex);
+	error = -xfblob_store(names, &ag_pptr.name_cookie, fname,
+			ag_pptr.namelen);
+	pthread_mutex_unlock(&names_mutex);
+	if (error)
+		do_error(_("storing name '%s' failed: %s\n"),
+				fname, strerror(error));
+
+	ag_pptrs = &fs_pptrs[agno];
+	pthread_mutex_lock(&ag_pptrs->lock);
+	error = -slab_add(ag_pptrs->pptr_recs, &ag_pptr);
+	pthread_mutex_unlock(&ag_pptrs->lock);
+	if (error)
+		do_error(_("storing name '%s' key failed: %s\n"),
+				fname, strerror(error));
+
+	dbg_printf(
+ _("%s: dp %llu gen 0x%x fname '%s' namehash 0x%x ino %llu namecookie 0x%llx\n"),
+			__func__,
+			(unsigned long long)dp->i_ino,
+			VFS_I(dp)->i_generation,
+			fname,
+			ag_pptr.namehash,
+			(unsigned long long)ino,
+			(unsigned long long)ag_pptr.name_cookie);
+}
diff --git a/repair/pptr.h b/repair/pptr.h
new file mode 100644
index 000000000000..7522379423ff
--- /dev/null
+++ b/repair/pptr.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_PPTR_H__
+#define __REPAIR_PPTR_H__
+
+void parent_ptr_free(struct xfs_mount *mp);
+void parent_ptr_init(struct xfs_mount *mp);
+
+void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
+		struct xfs_inode *dp, bool possible_dup);
+
+#endif /* __REPAIR_PPTR_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  1:19   ` [PATCH 06/12] xfs_repair: build a parent pointer index Darrick J. Wong
@ 2024-07-02  1:19   ` Darrick J. Wong
  2024-07-02  6:45     ` Christoph Hellwig
  2024-07-02  1:19   ` [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:19 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Abstract the main parent pointer dirent names xfblob object into a
separate data structure to hide implementation details.

The goals here are (a) reduce memory usage when we can by deduplicating
dirent names that exist in multiple directories; and (b) provide a
unique id for each name in the system so that sorting incore parent
pointer records can be done in a stable manner.  Fast stable sorting of
records is required for the dirent <-> pptr matching algorithm.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/Makefile   |    2 +
 repair/pptr.c     |   11 ++++---
 repair/strblobs.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/strblobs.h |   19 +++++++++++++
 4 files changed, 106 insertions(+), 5 deletions(-)
 create mode 100644 repair/strblobs.c
 create mode 100644 repair/strblobs.h


diff --git a/repair/Makefile b/repair/Makefile
index b0acc8c46a33..a36a95e353a5 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -35,6 +35,7 @@ HFILES = \
 	rt.h \
 	scan.h \
 	slab.h \
+	strblobs.h \
 	threads.h \
 	versions.h
 
@@ -75,6 +76,7 @@ CFILES = \
 	sb.c \
 	scan.c \
 	slab.c \
+	strblobs.c \
 	threads.c \
 	versions.c \
 	xfs_repair.c
diff --git a/repair/pptr.c b/repair/pptr.c
index 193daa0a5467..9f219b398208 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -10,6 +10,7 @@
 #include "repair/err_protos.h"
 #include "repair/slab.h"
 #include "repair/pptr.h"
+#include "repair/strblobs.h"
 
 #undef PPTR_DEBUG
 
@@ -56,7 +57,7 @@
  * This tuple is recorded in the per-AG master parent pointer index.  Note
  * that names are stored separately in an xfblob data structure so that the
  * rest of the information can be sorted and processed as fixed-size records;
- * the incore parent pointer record contains a pointer to the xfblob data.
+ * the incore parent pointer record contains a pointer to the strblob data.
  */
 
 struct ag_pptr {
@@ -92,7 +93,7 @@ struct ag_pptrs {
 };
 
 /* Global names storage file. */
-static struct xfblob	*names;
+static struct strblobs	*nameblobs;
 static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
 static struct ag_pptrs	*fs_pptrs;
 
@@ -112,7 +113,7 @@ parent_ptr_free(
 	free(fs_pptrs);
 	fs_pptrs = NULL;
 
-	xfblob_destroy(names);
+	strblobs_destroy(&nameblobs);
 }
 
 void
@@ -128,7 +129,7 @@ parent_ptr_init(
 
 	descr = kasprintf(GFP_KERNEL, "xfs_repair (%s): parent pointer names",
 			mp->m_fsname);
-	error = -xfblob_create(descr, &names);
+	error = strblobs_init(descr, &nameblobs);
 	kfree(descr);
 	if (error)
 		do_error(_("init parent pointer names failed: %s\n"),
@@ -188,7 +189,7 @@ add_parent_ptr(
 	ag_pptr.namehash = libxfs_dir2_hashname(mp, &dname);
 
 	pthread_mutex_lock(&names_mutex);
-	error = -xfblob_store(names, &ag_pptr.name_cookie, fname,
+	error = strblobs_store(nameblobs, &ag_pptr.name_cookie, fname,
 			ag_pptr.namelen);
 	pthread_mutex_unlock(&names_mutex);
 	if (error)
diff --git a/repair/strblobs.c b/repair/strblobs.c
new file mode 100644
index 000000000000..45d2559c7223
--- /dev/null
+++ b/repair/strblobs.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+#include "repair/strblobs.h"
+
+/*
+ * String Blob Structure
+ * =====================
+ *
+ * This data structure wraps the storage of strings with explicit length in an
+ * xfblob structure.
+ */
+struct strblobs {
+	struct xfblob		*strings;
+};
+
+/* Initialize a string blob structure. */
+int
+strblobs_init(
+	const char		*descr,
+	struct strblobs		**sblobs)
+{
+	struct strblobs		*sb;
+	int			error;
+
+	sb = malloc(sizeof(struct strblobs));
+	if (!sb)
+		return ENOMEM;
+
+	error = -xfblob_create(descr, &sb->strings);
+	if (error)
+		goto out_free;
+
+	*sblobs = sb;
+	return 0;
+
+out_free:
+	free(sb);
+	return error;
+}
+
+/* Deconstruct a string blob structure. */
+void
+strblobs_destroy(
+	struct strblobs		**sblobs)
+{
+	struct strblobs		*sb = *sblobs;
+
+	xfblob_destroy(sb->strings);
+	free(sb);
+	*sblobs = NULL;
+}
+
+/* Store a string and return a cookie for its retrieval. */
+int
+strblobs_store(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len)
+{
+	return -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+}
+
+/* Retrieve a previously stored string. */
+int
+strblobs_load(
+	struct strblobs		*sblobs,
+	xfblob_cookie		str_cookie,
+	unsigned char		*str,
+	unsigned int		str_len)
+{
+	return -xfblob_load(sblobs->strings, str_cookie, str, str_len);
+}
diff --git a/repair/strblobs.h b/repair/strblobs.h
new file mode 100644
index 000000000000..27e98eee2080
--- /dev/null
+++ b/repair/strblobs.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_STRBLOBS_H__
+#define __REPAIR_STRBLOBS_H__
+
+struct strblobs;
+
+int strblobs_init(const char *descr, struct strblobs **sblobs);
+void strblobs_destroy(struct strblobs **sblobs);
+
+int strblobs_store(struct strblobs *sblobs, xfblob_cookie *str_cookie,
+		const unsigned char *str, unsigned int str_len);
+int strblobs_load(struct strblobs *sblobs, xfblob_cookie str_cookie,
+		unsigned char *str, unsigned int str_len);
+
+#endif /* __REPAIR_STRBLOBS_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  1:19   ` [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
@ 2024-07-02  1:19   ` Darrick J. Wong
  2024-07-02  6:46     ` Christoph Hellwig
  2024-07-02  1:19   ` [PATCH 09/12] xfs_repair: check parent pointers Darrick J. Wong
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:19 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Reduce the memory requirements of the string blob structure by
deduplicating the strings stored within.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c     |   13 ++++-
 repair/strblobs.c |  140 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 repair/strblobs.h |    9 +++
 3 files changed, 153 insertions(+), 9 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index 9f219b398208..f8db57b2c6f6 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -50,7 +50,7 @@
  *
  * becomes this backwards information:
  *
- *     (child_agino*, dir_ino*, dir_gen, name*)
+ *     (child_agino*, dir_ino*, dir_gen, name_cookie*)
  *
  * Key fields are starred.
  *
@@ -58,6 +58,10 @@
  * that names are stored separately in an xfblob data structure so that the
  * rest of the information can be sorted and processed as fixed-size records;
  * the incore parent pointer record contains a pointer to the strblob data.
+ * Because string blobs are deduplicated, there's a 1:1 mapping of name cookies
+ * to strings, which means that we can use the name cookie as a comparison key
+ * instead of loading the full dentry name every time we want to perform a
+ * comparison.
  */
 
 struct ag_pptr {
@@ -121,15 +125,18 @@ parent_ptr_init(
 	struct xfs_mount	*mp)
 {
 	char			*descr;
+	uint64_t		iused;
 	xfs_agnumber_t		agno;
 	int			error;
 
 	if (!xfs_has_parent(mp))
 		return;
 
+	/* One hash bucket per inode, up to about 8M of memory on 64-bit. */
+	iused = min(mp->m_sb.sb_icount - mp->m_sb.sb_ifree, 1048573);
 	descr = kasprintf(GFP_KERNEL, "xfs_repair (%s): parent pointer names",
 			mp->m_fsname);
-	error = strblobs_init(descr, &nameblobs);
+	error = strblobs_init(descr, iused, &nameblobs);
 	kfree(descr);
 	if (error)
 		do_error(_("init parent pointer names failed: %s\n"),
@@ -190,7 +197,7 @@ add_parent_ptr(
 
 	pthread_mutex_lock(&names_mutex);
 	error = strblobs_store(nameblobs, &ag_pptr.name_cookie, fname,
-			ag_pptr.namelen);
+			ag_pptr.namelen, ag_pptr.namehash);
 	pthread_mutex_unlock(&names_mutex);
 	if (error)
 		do_error(_("storing name '%s' failed: %s\n"),
diff --git a/repair/strblobs.c b/repair/strblobs.c
index 45d2559c7223..3cd678dbbab9 100644
--- a/repair/strblobs.c
+++ b/repair/strblobs.c
@@ -13,22 +13,42 @@
  * =====================
  *
  * This data structure wraps the storage of strings with explicit length in an
- * xfblob structure.
+ * xfblob structure.  It stores a hashtable of string checksums to provide
+ * fast(ish) lookups of existing strings to enable deduplication of the strings
+ * contained within.
  */
+struct strblob_hashent {
+	struct strblob_hashent	*next;
+
+	xfblob_cookie		str_cookie;
+	unsigned int		str_len;
+	xfs_dahash_t		str_hash;
+};
+
 struct strblobs {
 	struct xfblob		*strings;
+	unsigned int		nr_buckets;
+
+	struct strblob_hashent	*buckets[];
 };
 
+static inline size_t strblobs_sizeof(unsigned int nr_buckets)
+{
+	return sizeof(struct strblobs) +
+			(nr_buckets * sizeof(struct strblobs_hashent *));
+}
+
 /* Initialize a string blob structure. */
 int
 strblobs_init(
 	const char		*descr,
+	unsigned int		hash_buckets,
 	struct strblobs		**sblobs)
 {
 	struct strblobs		*sb;
 	int			error;
 
-	sb = malloc(sizeof(struct strblobs));
+	sb = calloc(strblobs_sizeof(hash_buckets), 1);
 	if (!sb)
 		return ENOMEM;
 
@@ -36,6 +56,7 @@ strblobs_init(
 	if (error)
 		goto out_free;
 
+	sb->nr_buckets = hash_buckets;
 	*sblobs = sb;
 	return 0;
 
@@ -50,21 +71,132 @@ strblobs_destroy(
 	struct strblobs		**sblobs)
 {
 	struct strblobs		*sb = *sblobs;
+	struct strblob_hashent	*ent, *ent_next;
+	unsigned int		bucket;
+
+	for (bucket = 0; bucket < sb->nr_buckets; bucket++) {
+		ent = sb->buckets[bucket];
+		while (ent != NULL) {
+			ent_next = ent->next;
+			free(ent);
+			ent = ent_next;
+		}
+	}
 
 	xfblob_destroy(sb->strings);
 	free(sb);
 	*sblobs = NULL;
 }
 
+/*
+ * Search the string hashtable for a matching entry.  Sets sets the cookie and
+ * returns 0 if one is found; ENOENT if there is no match; or a positive errno.
+ */
+static int
+__strblobs_lookup(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
+{
+	struct strblob_hashent	*ent;
+	unsigned char		*buf = NULL;
+	unsigned int		bucket;
+	int			error;
+
+	bucket = str_hash % sblobs->nr_buckets;
+	ent = sblobs->buckets[bucket];
+
+	for (ent = sblobs->buckets[bucket]; ent != NULL; ent = ent->next) {
+		if (ent->str_len != str_len || ent->str_hash != str_hash)
+			continue;
+
+		if (!buf) {
+			buf = malloc(str_len);
+			if (!buf)
+				return ENOMEM;
+		}
+
+		error = strblobs_load(sblobs, ent->str_cookie, buf, str_len);
+		if (error)
+			goto out;
+
+		if (memcmp(str, buf, str_len))
+			continue;
+
+		*str_cookie = ent->str_cookie;
+		goto out;
+	}
+	error = ENOENT;
+
+out:
+	free(buf);
+	return error;
+}
+
+/*
+ * Search the string hashtable for a matching entry.  Sets sets the cookie and
+ * returns 0 if one is found; ENOENT if there is no match; or a positive errno.
+ */
+int
+strblobs_lookup(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
+{
+	return __strblobs_lookup(sblobs, str_cookie, str, str_len, str_hash);
+}
+
+/* Remember a string in the hashtable. */
+static int
+strblobs_hash(
+	struct strblobs		*sblobs,
+	xfblob_cookie		str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
+{
+	struct strblob_hashent	*ent;
+	unsigned int		bucket;
+
+	bucket = str_hash % sblobs->nr_buckets;
+
+	ent = malloc(sizeof(struct strblob_hashent));
+	if (!ent)
+		return ENOMEM;
+
+	ent->str_cookie = str_cookie;
+	ent->str_len = str_len;
+	ent->str_hash = str_hash;
+	ent->next = sblobs->buckets[bucket];
+
+	sblobs->buckets[bucket] = ent;
+	return 0;
+}
+
 /* Store a string and return a cookie for its retrieval. */
 int
 strblobs_store(
 	struct strblobs		*sblobs,
 	xfblob_cookie		*str_cookie,
 	const unsigned char	*str,
-	unsigned int		str_len)
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
 {
-	return -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+	int			error;
+
+	error = __strblobs_lookup(sblobs, str_cookie, str, str_len, str_hash);
+	if (error != ENOENT)
+		return error;
+
+	error = -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+	if (error)
+		return error;
+
+	return strblobs_hash(sblobs, *str_cookie, str, str_len, str_hash);
 }
 
 /* Retrieve a previously stored string. */
diff --git a/repair/strblobs.h b/repair/strblobs.h
index 27e98eee2080..40cd6d8e91c0 100644
--- a/repair/strblobs.h
+++ b/repair/strblobs.h
@@ -8,12 +8,17 @@
 
 struct strblobs;
 
-int strblobs_init(const char *descr, struct strblobs **sblobs);
+int strblobs_init(const char *descr, unsigned int hash_buckets,
+		struct strblobs **sblobs);
 void strblobs_destroy(struct strblobs **sblobs);
 
 int strblobs_store(struct strblobs *sblobs, xfblob_cookie *str_cookie,
-		const unsigned char *str, unsigned int str_len);
+		const unsigned char *str, unsigned int str_len,
+		xfs_dahash_t hash);
 int strblobs_load(struct strblobs *sblobs, xfblob_cookie str_cookie,
 		unsigned char *str, unsigned int str_len);
+int strblobs_lookup(struct strblobs *sblobs, xfblob_cookie *str_cookie,
+		const unsigned char *str, unsigned int str_len,
+		xfs_dahash_t hash);
 
 #endif /* __REPAIR_STRBLOBS_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 09/12] xfs_repair: check parent pointers
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-07-02  1:19   ` [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
@ 2024-07-02  1:19   ` Darrick J. Wong
  2024-07-02  6:47     ` Christoph Hellwig
  2024-07-02  1:20   ` [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:19 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Use the parent pointer index that we constructed in the previous patch
to check that each file's parent pointer records exactly match the
directory entries that we recorded while walking directory entries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfblob.c    |    9 +
 libxfs/xfblob.h    |    2 
 repair/Makefile    |    2 
 repair/listxattr.c |  271 +++++++++++++++++
 repair/listxattr.h |   15 +
 repair/phase6.c    |    2 
 repair/pptr.c      |  846 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h      |    2 
 8 files changed, 1149 insertions(+)
 create mode 100644 repair/listxattr.c
 create mode 100644 repair/listxattr.h


diff --git a/libxfs/xfblob.c b/libxfs/xfblob.c
index 7d8caaa4c164..00f8ed5e5a7b 100644
--- a/libxfs/xfblob.c
+++ b/libxfs/xfblob.c
@@ -145,3 +145,12 @@ xfblob_free(
 	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
 	return 0;
 }
+
+/* Drop all the blobs. */
+void
+xfblob_truncate(
+	struct xfblob	*blob)
+{
+	xfile_discard(blob->xfile, PAGE_SIZE, blob->last_offset - PAGE_SIZE);
+	blob->last_offset = PAGE_SIZE;
+}
diff --git a/libxfs/xfblob.h b/libxfs/xfblob.h
index 28bf4ab2898f..1939202e12db 100644
--- a/libxfs/xfblob.h
+++ b/libxfs/xfblob.h
@@ -21,4 +21,6 @@ int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
 		uint32_t size);
 int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
 
+void xfblob_truncate(struct xfblob *blob);
+
 #endif /* __XFS_SCRUB_XFBLOB_H__ */
diff --git a/repair/Makefile b/repair/Makefile
index a36a95e353a5..e7445d53e918 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -24,6 +24,7 @@ HFILES = \
 	err_protos.h \
 	globals.h \
 	incore.h \
+	listxattr.h \
 	pptr.h \
 	prefetch.h \
 	progress.h \
@@ -58,6 +59,7 @@ CFILES = \
 	incore_ext.c \
 	incore_ino.c \
 	init.c \
+	listxattr.c \
 	phase1.c \
 	phase2.c \
 	phase3.c \
diff --git a/repair/listxattr.c b/repair/listxattr.c
new file mode 100644
index 000000000000..2af77b7b2195
--- /dev/null
+++ b/repair/listxattr.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxlog.h"
+#include "libfrog/bitmap.h"
+#include "repair/listxattr.h"
+
+/* Call a function for every entry in a shortform xattr structure. */
+STATIC int
+xattr_walk_sf(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr_sf_hdr		*hdr = ip->i_af.if_data;
+	struct xfs_attr_sf_entry	*sfe;
+	unsigned int			i;
+	int				error;
+
+	sfe = libxfs_attr_sf_firstentry(hdr);
+	for (i = 0; i < hdr->count; i++) {
+		error = attr_fn(ip, sfe->flags, sfe->nameval, sfe->namelen,
+				&sfe->nameval[sfe->namelen], sfe->valuelen,
+				priv);
+		if (error)
+			return error;
+
+		sfe = xfs_attr_sf_nextentry(sfe);
+	}
+
+	return 0;
+}
+
+/* Call a function for every entry in this xattr leaf block. */
+STATIC int
+xattr_walk_leaf_entries(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	struct xfs_buf			*bp,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	ichdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf = bp->b_addr;
+	struct xfs_attr_leaf_entry	*entry;
+	unsigned int			i;
+	int				error;
+
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &ichdr, leaf);
+	entry = xfs_attr3_leaf_entryp(leaf);
+
+	for (i = 0; i < ichdr.count; entry++, i++) {
+		void			*value;
+		unsigned char		*name;
+		unsigned int		namelen, valuelen;
+
+		if (entry->flags & XFS_ATTR_LOCAL) {
+			struct xfs_attr_leaf_name_local		*name_loc;
+
+			name_loc = xfs_attr3_leaf_name_local(leaf, i);
+			name = name_loc->nameval;
+			namelen = name_loc->namelen;
+			value = &name_loc->nameval[name_loc->namelen];
+			valuelen = be16_to_cpu(name_loc->valuelen);
+		} else {
+			struct xfs_attr_leaf_name_remote	*name_rmt;
+
+			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+			name = name_rmt->name;
+			namelen = name_rmt->namelen;
+			value = NULL;
+			valuelen = be32_to_cpu(name_rmt->valuelen);
+		}
+
+		error = attr_fn(ip, entry->flags, name, namelen, value,
+				valuelen, priv);
+		if (error)
+			return error;
+
+	}
+
+	return 0;
+}
+
+/*
+ * Call a function for every entry in a leaf-format xattr structure.  Avoid
+ * memory allocations for the loop detector since there's only one block.
+ */
+STATIC int
+xattr_walk_leaf(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	error = -libxfs_attr3_leaf_read(NULL, ip, ip->i_ino, 0, &leaf_bp);
+	if (error)
+		return error;
+
+	error = xattr_walk_leaf_entries(ip, attr_fn, leaf_bp, priv);
+	libxfs_trans_brelse(NULL, leaf_bp);
+	return error;
+}
+
+/* Find the leftmost leaf in the xattr dabtree. */
+STATIC int
+xattr_walk_find_leftmost_leaf(
+	struct xfs_inode		*ip,
+	struct bitmap			*seen_blocks,
+	struct xfs_buf			**leaf_bpp)
+{
+	struct xfs_da3_icnode_hdr	nodehdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_buf			*bp;
+	//xfs_failaddr_t			fa;
+	xfs_dablk_t			blkno = 0;
+	unsigned int			expected_level = 0;
+	int				error;
+
+	for (;;) {
+		uint16_t		magic;
+
+		error = -libxfs_da3_node_read(NULL, ip, blkno, &bp,
+				XFS_ATTR_FORK);
+		if (error)
+			return error;
+
+		node = bp->b_addr;
+		magic = be16_to_cpu(node->hdr.info.magic);
+		if (magic == XFS_ATTR_LEAF_MAGIC ||
+		    magic == XFS_ATTR3_LEAF_MAGIC)
+			break;
+
+		error = EFSCORRUPTED;
+		if (magic != XFS_DA_NODE_MAGIC &&
+		    magic != XFS_DA3_NODE_MAGIC)
+			goto out_buf;
+
+		libxfs_da3_node_hdr_from_disk(mp, &nodehdr, node);
+
+		if (nodehdr.count == 0 || nodehdr.level >= XFS_DA_NODE_MAXDEPTH)
+			goto out_buf;
+
+		/* Check the level from the root node. */
+		if (blkno == 0)
+			expected_level = nodehdr.level - 1;
+		else if (expected_level != nodehdr.level)
+			goto out_buf;
+		else
+			expected_level--;
+
+		/* Remember that we've seen this node. */
+		error = -bitmap_set(seen_blocks, blkno, 1);
+		if (error)
+			goto out_buf;
+
+		/* Find the next level towards the leaves of the dabtree. */
+		btree = nodehdr.btree;
+		blkno = be32_to_cpu(btree->before);
+		libxfs_trans_brelse(NULL, bp);
+
+		/* Make sure we haven't seen this new block already. */
+		if (bitmap_test(seen_blocks, blkno, 1))
+			return EFSCORRUPTED;
+	}
+
+	error = EFSCORRUPTED;
+	if (expected_level != 0)
+		goto out_buf;
+
+	/* Remember that we've seen this leaf. */
+	error = -bitmap_set(seen_blocks, blkno, 1);
+	if (error)
+		goto out_buf;
+
+	*leaf_bpp = bp;
+	return 0;
+
+out_buf:
+	libxfs_trans_brelse(NULL, bp);
+	return error;
+}
+
+/* Call a function for every entry in a node-format xattr structure. */
+STATIC int
+xattr_walk_node(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct bitmap			*seen_blocks;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf;
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	bitmap_alloc(&seen_blocks);
+
+	error = xattr_walk_find_leftmost_leaf(ip, seen_blocks, &leaf_bp);
+	if (error)
+		goto out_bitmap;
+
+	for (;;) {
+		error = xattr_walk_leaf_entries(ip, attr_fn, leaf_bp,
+				priv);
+		if (error)
+			goto out_leaf;
+
+		/* Find the right sibling of this leaf block. */
+		leaf = leaf_bp->b_addr;
+		libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		if (leafhdr.forw == 0)
+			goto out_leaf;
+
+		libxfs_trans_brelse(NULL, leaf_bp);
+
+		/* Make sure we haven't seen this new leaf already. */
+		if (bitmap_test(seen_blocks, leafhdr.forw, 1))
+			goto out_bitmap;
+
+		error = -libxfs_attr3_leaf_read(NULL, ip, ip->i_ino,
+				leafhdr.forw, &leaf_bp);
+		if (error)
+			goto out_bitmap;
+
+		/* Remember that we've seen this new leaf. */
+		error = -bitmap_set(seen_blocks, leafhdr.forw, 1);
+		if (error)
+			goto out_leaf;
+	}
+
+out_leaf:
+	libxfs_trans_brelse(NULL, leaf_bp);
+out_bitmap:
+	bitmap_free(&seen_blocks);
+	return error;
+}
+
+/* Call a function for every extended attribute in a file. */
+int
+xattr_walk(
+	struct xfs_inode	*ip,
+	xattr_walk_fn		attr_fn,
+	void			*priv)
+{
+	int			error;
+
+	if (!libxfs_inode_hasattr(ip))
+		return 0;
+
+	if (ip->i_af.if_format == XFS_DINODE_FMT_LOCAL)
+		return xattr_walk_sf(ip, attr_fn, priv);
+
+	/* attr functions require that the attr fork is loaded */
+	error = -libxfs_iread_extents(NULL, ip, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
+	if (libxfs_attr_is_leaf(ip))
+		return xattr_walk_leaf(ip, attr_fn, priv);
+
+	return xattr_walk_node(ip, attr_fn, priv);
+}
diff --git a/repair/listxattr.h b/repair/listxattr.h
new file mode 100644
index 000000000000..2d26fce0f323
--- /dev/null
+++ b/repair/listxattr.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_LISTXATTR_H__
+#define __REPAIR_LISTXATTR_H__
+
+typedef int (*xattr_walk_fn)(struct xfs_inode *ip, unsigned int attr_flags,
+		const unsigned char *name, unsigned int namelen,
+		const void *value, unsigned int valuelen, void *priv);
+
+int xattr_walk(struct xfs_inode *ip, xattr_walk_fn attr_fn, void *priv);
+
+#endif /* __REPAIR_LISTXATTR_H__ */
diff --git a/repair/phase6.c b/repair/phase6.c
index fe56feb6ecef..ad067ba0ac87 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3445,5 +3445,7 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 		}
 	}
 
+	/* Check and repair directory parent pointers, if enabled. */
+	check_parent_ptrs(mp);
 	parent_ptr_free(mp);
 }
diff --git a/repair/pptr.c b/repair/pptr.c
index f8db57b2c6f6..bd9bb6f8bf36 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -7,8 +7,13 @@
 #include "libxfs/xfile.h"
 #include "libxfs/xfblob.h"
 #include "libfrog/platform.h"
+#include "libfrog/workqueue.h"
+#include "repair/globals.h"
 #include "repair/err_protos.h"
 #include "repair/slab.h"
+#include "repair/listxattr.h"
+#include "repair/threads.h"
+#include "repair/incore.h"
 #include "repair/pptr.h"
 #include "repair/strblobs.h"
 
@@ -62,6 +67,65 @@
  * to strings, which means that we can use the name cookie as a comparison key
  * instead of loading the full dentry name every time we want to perform a
  * comparison.
+ *
+ * Once we've finished with the forward scan, we get to work on the backwards
+ * scan.  Each AG is processed independently.  First, we sort the per-AG master
+ * records in order of child_agino, dir_ino, and name_cookie.  Each inode in
+ * the AG is then processed in numerical order.
+ *
+ * The first thing that happens to the file is that we read all the extended
+ * attributes to look for parent pointers.  Attributes that claim to be parent
+ * pointers but are obviously garbage are thrown away.  The rest of the ondisk
+ * parent pointers for that file are stored in memory like this:
+ *
+ *     (dir_ino*, dir_gen, name_cookie*)
+ *
+ * After loading the ondisk parent pointer name, we search the strblobs
+ * structure to see if it has already recorded the name.  If so, this value is
+ * used as the name cookie.  If the name has not yet been recorded, we flag the
+ * incore record for later deletion.
+ *
+ * When we've concluded the xattr scan, the per-file records are sorted in
+ * order of dir_ino and name_cookie.
+ *
+ * There are three possibilities here:
+ *
+ * A. The first record in the per-AG master index is an exact match for the
+ * first record in the per-file index.  Everything is consistent, and we can
+ * proceed with the lockstep scan detailed below.
+ *
+ * B. The per-AG master index cursor points to a higher inode number than the
+ * first inode we are scanning.  Delete the ondisk parent pointers
+ * corresponding to the per-file records until condition (B) is no longer true.
+ *
+ * C. The per-AG master index cursor instead points to a lower inode number
+ * than the one we are scanning.  This means that there exists a directory
+ * entry pointing at an inode that is free.  We supposedly already settled
+ * which inodes are free and which aren't, which means in-memory information is
+ * inconsistent.  Abort.
+ *
+ * Otherwise, we are ready to check the file parent pointers against the
+ * master.  If the ondisk directory metadata are all consistent, this recordset
+ * should correspond exactly to the subset of the master records with a
+ * child_agino matching the file that we're scanning.  We should be able to
+ * walk both sets in lockstep, and find one of the following outcomes:
+ *
+ * 1) The master index cursor is ahead of the ondisk index cursor.  This means
+ * that the inode has parent pointers that were not found during the dirent
+ * scan.  These should be deleted.
+ *
+ * 2) The ondisk index gets ahead of the master index.  This means that the
+ * dirent scan found parent pointers that are not attached to the inode.
+ * These should be added.
+ *
+ * 3) The parent_gen or (dirent) name are not consistent.  Update the parent
+ * pointer to the values that we found during the dirent scan.
+ *
+ * 4) Everything matches.  Move on to the next parent pointer.
+ *
+ * The current implementation does not try to rebuild directories from parent
+ * pointer information, as this requires a lengthy scan of the filesystem for
+ * each broken directory.
  */
 
 struct ag_pptr {
@@ -88,6 +152,24 @@ struct ag_pptr {
 /* This might be a duplicate due to dotdot reprocessing */
 #define AG_PPTR_POSSIBLE_DUP	(1U << 0)
 
+struct file_pptr {
+	/* parent directory handle */
+	xfs_ino_t		parent_ino;
+	uint32_t		parent_gen;
+
+	/* Is the name stored in the global nameblobs structure? */
+	unsigned int		name_in_nameblobs;
+
+	/* hash of the dirent name */
+	xfs_dahash_t		namehash;
+
+	/* parent pointer name length */
+	unsigned int		namelen;
+
+	/* cookie for the file dirent name */
+	xfblob_cookie		name_cookie;
+};
+
 struct ag_pptrs {
 	/* Lock to protect pptr_recs during the dirent scan. */
 	pthread_mutex_t		lock;
@@ -96,11 +178,99 @@ struct ag_pptrs {
 	struct xfs_slab		*pptr_recs;
 };
 
+struct file_scan {
+	struct ag_pptrs		*ag_pptrs;
+
+	/* cursor for comparing ag_pptrs.pptr_recs against file_pptrs_recs */
+	struct xfs_slab_cursor	*ag_pptr_recs_cur;
+
+	/* xfs_parent_rec records for a file that we're checking */
+	struct xfs_slab		*file_pptr_recs;
+
+	/* cursor for comparing file_pptr_recs against pptrs_recs */
+	struct xfs_slab_cursor	*file_pptr_recs_cur;
+
+	/* names associated with file_pptr_recs */
+	struct xfblob		*file_pptr_names;
+
+	/* Number of parent pointers recorded for this file. */
+	unsigned int		nr_file_pptrs;
+
+	/* Does this file have garbage xattrs with ATTR_PARENT set? */
+	bool			have_garbage;
+};
+
 /* Global names storage file. */
 static struct strblobs	*nameblobs;
 static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
 static struct ag_pptrs	*fs_pptrs;
 
+static int
+cmp_ag_pptr(
+	const void		*a,
+	const void		*b)
+{
+	const struct ag_pptr	*pa = a;
+	const struct ag_pptr	*pb = b;
+
+	if (pa->child_agino < pb->child_agino)
+		return -1;
+	if (pa->child_agino > pb->child_agino)
+		return 1;
+
+	if (pa->parent_ino < pb->parent_ino)
+		return -1;
+	if (pa->parent_ino > pb->parent_ino)
+		return 1;
+
+	if (pa->namehash < pb->namehash)
+		return -1;
+	if (pa->namehash > pb->namehash)
+		return 1;
+
+	if (pa->name_cookie < pb->name_cookie)
+		return -1;
+	if (pa->name_cookie > pb->name_cookie)
+		return 1;
+
+	return 0;
+}
+
+static int
+cmp_file_pptr(
+	const void		*a,
+	const void		*b)
+{
+	const struct file_pptr	*pa = a;
+	const struct file_pptr	*pb = b;
+
+	if (pa->parent_ino < pb->parent_ino)
+		return -1;
+	if (pa->parent_ino > pb->parent_ino)
+		return 1;
+
+	/*
+	 * Push the parent pointer names that we didn't find in the dirent scan
+	 * towards the end of the list so that we delete them as excess.
+	 */
+	if (!pa->name_in_nameblobs && pb->name_in_nameblobs)
+		return 1;
+	if (pa->name_in_nameblobs && !pb->name_in_nameblobs)
+		return -1;
+
+	if (pa->namehash < pb->namehash)
+		return -1;
+	if (pa->namehash > pb->namehash)
+		return 1;
+
+	if (pa->name_cookie < pb->name_cookie)
+		return -1;
+	if (pa->name_cookie > pb->name_cookie)
+		return 1;
+
+	return 0;
+}
+
 void
 parent_ptr_free(
 	struct xfs_mount	*mp)
@@ -221,3 +391,679 @@ add_parent_ptr(
 			(unsigned long long)ino,
 			(unsigned long long)ag_pptr.name_cookie);
 }
+
+/* Schedule this ATTR_PARENT extended attribute for deletion. */
+static void
+record_garbage_xattr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	unsigned int		attr_filter,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen)
+{
+	if (no_modify) {
+		if (!fscan->have_garbage)
+			do_warn(
+ _("would delete garbage parent pointer extended attributes in ino %llu\n"),
+					(unsigned long long)ip->i_ino);
+		fscan->have_garbage = true;
+		return;
+	}
+
+	if (fscan->have_garbage)
+		return;
+	fscan->have_garbage = true;
+
+	do_warn(
+ _("deleting garbage parent pointer extended attributes in ino %llu\n"),
+			(unsigned long long)ip->i_ino);
+	/* XXX do the work */
+}
+
+/*
+ * Store this file parent pointer's name in the file scan namelist unless it's
+ * already in the global list.
+ */
+static int
+store_file_pptr_name(
+	struct file_scan		*fscan,
+	struct file_pptr		*file_pptr,
+	const struct xfs_name		*xname)
+{
+	int				error;
+
+	error = strblobs_lookup(nameblobs, &file_pptr->name_cookie,
+			xname->name, xname->len, file_pptr->namehash);
+	if (!error) {
+		file_pptr->name_in_nameblobs = true;
+		return 0;
+	}
+	if (error != ENOENT)
+		return error;
+
+	file_pptr->name_in_nameblobs = false;
+	return -xfblob_store(fscan->file_pptr_names, &file_pptr->name_cookie,
+			xname->name, xname->len);
+}
+
+/* Decide if this is a directory parent pointer and stash it if so. */
+static int
+examine_xattr(
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct file_pptr	file_pptr = {
+		.namelen	= namelen,
+	};
+	struct xfs_name		xname = {
+		.name		= name,
+		.len		= namelen,
+	};
+	struct xfs_mount	*mp = ip->i_mount;
+	struct file_scan	*fscan = priv;
+	int			error;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	error = -libxfs_parent_from_attr(mp, attr_flags, name, namelen, value,
+			valuelen, &file_pptr.parent_ino, &file_pptr.parent_gen);
+	if (error)
+		goto corrupt;
+
+	file_pptr.namehash = libxfs_dir2_hashname(mp, &xname);
+
+	error = store_file_pptr_name(fscan, &file_pptr, &xname);
+	if (error)
+		do_error(
+ _("storing ino %llu parent pointer '%.*s' failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				namelen,
+				(const char *)name,
+				strerror(error));
+
+	error = -slab_add(fscan->file_pptr_recs, &file_pptr);
+	if (error)
+		do_error(_("storing ino %llu parent pointer rec failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	dbg_printf(
+ _("%s: dp %llu gen 0x%x fname '%.*s' namelen %u namehash 0x%x ino %llu namecookie 0x%llx global? %d\n"),
+			__func__,
+			(unsigned long long)file_pptr.parent_ino,
+			file_pptr.parent_gen,
+			namelen,
+			(const char *)name,
+			namelen,
+			file_pptr.namehash,
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr.name_cookie,
+			file_pptr.name_in_nameblobs);
+
+	fscan->nr_file_pptrs++;
+	return 0;
+corrupt:
+	record_garbage_xattr(ip, fscan, attr_flags, name, namelen, value,
+			valuelen);
+	return 0;
+}
+
+/* Load a file parent pointer name from wherever we stored it. */
+static int
+load_file_pptr_name(
+	struct file_scan	*fscan,
+	const struct file_pptr	*file_pptr,
+	unsigned char		*name)
+{
+	if (file_pptr->name_in_nameblobs)
+		return strblobs_load(nameblobs, file_pptr->name_cookie,
+				name, file_pptr->namelen);
+
+	return -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
+			name, file_pptr->namelen);
+}
+
+/* Remove all pptrs from @ip. */
+static void
+clear_all_pptrs(
+	struct xfs_inode	*ip)
+{
+	if (no_modify) {
+		do_warn(_("would delete unlinked ino %llu parent pointers\n"),
+				(unsigned long long)ip->i_ino);
+		return;
+	}
+
+	do_warn(_("deleting unlinked ino %llu parent pointers\n"),
+			(unsigned long long)ip->i_ino);
+	/* XXX actually do the work */
+}
+
+/* Add @ag_pptr to @ip. */
+static void
+add_missing_parent_ptr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct ag_pptr	*ag_pptr)
+{
+	unsigned char		name[MAXNAMELEN];
+	int			error;
+
+	error = strblobs_load(nameblobs, ag_pptr->name_cookie, name,
+			ag_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading missing name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				(unsigned long long)ag_pptr->name_cookie,
+				strerror(error));
+
+	if (no_modify) {
+		do_warn(
+ _("would add missing ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				ag_pptr->namelen,
+				name);
+		return;
+	} else {
+		do_warn(
+ _("adding missing ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				ag_pptr->namelen,
+				name);
+	}
+
+	/* XXX actually do the work */
+}
+
+/* Remove @file_pptr from @ip. */
+static void
+remove_incorrect_parent_ptr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct file_pptr	*file_pptr)
+{
+	unsigned char		name[MAXNAMELEN] = { };
+	int			error;
+
+	error = load_file_pptr_name(fscan, file_pptr, name);
+	if (error)
+		do_error(
+ _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				(unsigned long long)file_pptr->name_cookie,
+				strerror(error));
+
+	if (no_modify) {
+		do_warn(
+ _("would remove bad ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				file_pptr->namelen,
+				name);
+		return;
+	}
+
+	do_warn(
+ _("removing bad ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen,
+			file_pptr->namelen,
+			name);
+
+	/* XXX actually do the work */
+}
+
+/*
+ * We found parent pointers that point to the same inode and directory offset.
+ * Make sure they have the same generation number and dirent name.
+ */
+static void
+compare_parent_ptrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct ag_pptr	*ag_pptr,
+	const struct file_pptr	*file_pptr)
+{
+	unsigned char		name1[MAXNAMELEN] = { };
+	unsigned char		name2[MAXNAMELEN] = { };
+	int			error;
+
+	error = strblobs_load(nameblobs, ag_pptr->name_cookie, name1,
+			ag_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading master-list name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx namelen %u) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				(unsigned long long)ag_pptr->name_cookie,
+				ag_pptr->namelen,
+				strerror(error));
+
+	error = load_file_pptr_name(fscan, file_pptr, name2);
+	if (error)
+		do_error(
+ _("loading file-list name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx namelen %u) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				(unsigned long long)file_pptr->name_cookie,
+				ag_pptr->namelen,
+				strerror(error));
+
+	if (ag_pptr->parent_gen != file_pptr->parent_gen)
+		goto reset;
+	if (ag_pptr->namelen != file_pptr->namelen)
+		goto reset;
+	if (ag_pptr->namehash != file_pptr->namehash)
+		goto reset;
+	if (memcmp(name1, name2, ag_pptr->namelen))
+		goto reset;
+
+	return;
+
+reset:
+	if (no_modify) {
+		do_warn(
+ _("would update ino %llu parent pointer (ino %llu gen 0x%x name '%.*s') -> (ino %llu gen 0x%x name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				file_pptr->namelen,
+				name2,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				ag_pptr->namelen,
+				name1);
+		return;
+	}
+
+	do_warn(
+ _("updating ino %llu parent pointer (ino %llu gen 0x%x name '%.*s') -> (ino %llu gen 0x%x name '%.*s')\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen,
+			file_pptr->namelen,
+			name2,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen,
+			ag_pptr->namelen,
+			name1);
+
+	/* XXX do the work */
+}
+
+static int
+cmp_file_to_ag_pptr(
+	const struct file_pptr	*fp,
+	const struct ag_pptr	*ap)
+{
+	/*
+	 * We finished iterating all the pptrs attached to the file before we
+	 * ran out of pptrs that we found in the directory scan.  Return 1 so
+	 * the caller adds the pptr from the dir scan.
+	 */
+	if (!fp)
+		return 1;
+
+	if (fp->parent_ino > ap->parent_ino)
+		return 1;
+	if (fp->parent_ino < ap->parent_ino)
+		return -1;
+
+	if (fp->namehash < ap->namehash)
+		return -1;
+	if (fp->namehash > ap->namehash)
+		return 1;
+
+	/*
+	 * If this parent pointer wasn't found in the dirent scan, we know it
+	 * should be removed.
+	 */
+	if (!fp->name_in_nameblobs)
+		return -1;
+
+	if (fp->name_cookie < ap->name_cookie)
+		return -1;
+	if (fp->name_cookie > ap->name_cookie)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * If this parent pointer that we got via directory scan thinks it might be a
+ * duplicate, compare it to the previous pptr found by the directory scan.  If
+ * they are the same, then we got duplicate entries on account of dotdot
+ * reprocessing and we can ignore this one.
+ */
+static inline bool
+crosscheck_want_skip_dup(
+	const struct ag_pptr	*ag_pptr,
+	const struct ag_pptr	*prev_ag_pptr)
+{
+	if (!(ag_pptr->flags & AG_PPTR_POSSIBLE_DUP) || !prev_ag_pptr)
+		return false;
+
+	if (ag_pptr->parent_ino  == prev_ag_pptr->parent_ino &&
+	    ag_pptr->parent_gen  == prev_ag_pptr->parent_gen &&
+	    ag_pptr->namelen     == prev_ag_pptr->namelen &&
+	    ag_pptr->name_cookie == prev_ag_pptr->name_cookie &&
+	    ag_pptr->child_agino == prev_ag_pptr->child_agino)
+		return true;
+
+	return false;
+}
+
+/*
+ * Make sure that the parent pointers we observed match the ones ondisk.
+ *
+ * Earlier, we generated a master list of parent pointers for files in this AG
+ * based on what we saw during the directory walk at the start of phase 6.
+ * Now that we've read in all of this file's parent pointers, make sure the
+ * lists match.
+ */
+static void
+crosscheck_file_parent_ptrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	struct ag_pptr		*ag_pptr, *prev_ag_pptr = NULL;
+	struct file_pptr	*file_pptr;
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	int			error;
+
+	ag_pptr = peek_slab_cursor(fscan->ag_pptr_recs_cur);
+
+	if (!ag_pptr || ag_pptr->child_agino > agino) {
+		/*
+		 * The cursor for the master pptr list has gone beyond this
+		 * file that we're scanning.  Evidently it has no parents at
+		 * all, so we better not have found any pptrs attached to the
+		 * file.
+		 */
+		if (fscan->nr_file_pptrs > 0)
+			clear_all_pptrs(ip);
+
+		return;
+	}
+
+	if (ag_pptr->child_agino < agino) {
+		/*
+		 * The cursor for the master pptr list is behind the file that
+		 * we're scanning.  This suggests that the incore inode tree
+		 * doesn't know about a file that is mentioned by a dirent.
+		 * At this point the inode liveness is supposed to be settled,
+		 * which means our incore information is inconsistent.
+		 */
+		do_error(
+ _("found dirent referring to ino %llu even though inobt scan moved on to ino %llu?!\n"),
+				(unsigned long long)XFS_AGINO_TO_INO(mp, agno,
+					ag_pptr->child_agino),
+				(unsigned long long)ip->i_ino);
+		/* does not return */
+	}
+
+	/*
+	 * The master pptr list cursor is pointing to the inode that we want
+	 * to check.  Sort the pptr records that we recorded from the ondisk
+	 * pptrs for this file, then set up for the comparison.
+	 */
+	qsort_slab(fscan->file_pptr_recs, cmp_file_pptr);
+
+	error = -init_slab_cursor(fscan->file_pptr_recs, cmp_file_pptr,
+			&fscan->file_pptr_recs_cur);
+	if (error)
+		do_error(_("init ino %llu parent pointer cursor failed: %s\n"),
+				(unsigned long long)ip->i_ino, strerror(error));
+
+	do {
+		int	cmp_result;
+
+		if (crosscheck_want_skip_dup(ag_pptr, prev_ag_pptr)) {
+			/*
+			 * This master parent pointer thinks it's a duplicate
+			 * and it matches the previous master parent pointer.
+			 * We don't want to add duplicate parent pointers, so
+			 * advance the master pptr cursor and loop again.
+			 */
+			dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (skip_dup)\n"),
+					__func__,
+					(unsigned long long)ag_pptr->parent_ino,
+					ag_pptr->parent_gen,
+					ag_pptr->namelen,
+					(unsigned long long)ip->i_ino,
+					(unsigned long long)ag_pptr->name_cookie);
+			prev_ag_pptr = ag_pptr;
+			advance_slab_cursor(fscan->ag_pptr_recs_cur);
+			ag_pptr = peek_slab_cursor(fscan->ag_pptr_recs_cur);
+			continue;
+		}
+		prev_ag_pptr = ag_pptr;
+
+		file_pptr = peek_slab_cursor(fscan->file_pptr_recs_cur);
+
+		dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (master)\n"),
+				__func__,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				ag_pptr->namelen,
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->name_cookie);
+
+		if (file_pptr) {
+			dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (file)\n"),
+					__func__,
+					(unsigned long long)file_pptr->parent_ino,
+					file_pptr->parent_gen,
+					file_pptr->namelen,
+					(unsigned long long)ip->i_ino,
+					(unsigned long long)file_pptr->name_cookie);
+		} else {
+			dbg_printf(
+ _("%s: ran out of parent pointers for ino %llu (file)\n"),
+					__func__,
+					(unsigned long long)ip->i_ino);
+		}
+
+		cmp_result = cmp_file_to_ag_pptr(file_pptr, ag_pptr);
+		if (cmp_result > 0) {
+			/*
+			 * The master pptr list knows about pptrs that are not
+			 * in the ondisk metadata.  Add the missing pptr and
+			 * advance only the master pptr cursor.
+			 */
+			add_missing_parent_ptr(ip, fscan, ag_pptr);
+			advance_slab_cursor(fscan->ag_pptr_recs_cur);
+		} else if (cmp_result < 0) {
+			/*
+			 * The ondisk pptrs mention a link that is not in the
+			 * master list.  Delete the extra pptr and advance only
+			 * the file pptr cursor.
+			 */
+			remove_incorrect_parent_ptr(ip, fscan, file_pptr);
+			advance_slab_cursor(fscan->file_pptr_recs_cur);
+		} else {
+			/*
+			 * Exact match, make sure the parent_gen and dirent
+			 * name parts of the parent pointer match.  Move both
+			 * cursors forward.
+			 */
+			compare_parent_ptrs(ip, fscan, ag_pptr, file_pptr);
+			advance_slab_cursor(fscan->ag_pptr_recs_cur);
+			advance_slab_cursor(fscan->file_pptr_recs_cur);
+		}
+
+		ag_pptr = peek_slab_cursor(fscan->ag_pptr_recs_cur);
+	} while (ag_pptr && ag_pptr->child_agino == agino);
+
+	while ((file_pptr = pop_slab_cursor(fscan->file_pptr_recs_cur))) {
+		dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (excess)\n"),
+				__func__,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				file_pptr->namelen,
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->name_cookie);
+
+		/*
+		 * The master pptr list does not have any more pptrs for this
+		 * file, but we still have unprocessed ondisk pptrs.  Delete
+		 * all these ondisk pptrs.
+		 */
+		remove_incorrect_parent_ptr(ip, fscan, file_pptr);
+	}
+}
+
+/* Ensure this file's parent pointers match what we found in the dirent scan. */
+static void
+check_file_parent_ptrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	int			error;
+
+	error = -init_slab(&fscan->file_pptr_recs, sizeof(struct file_pptr));
+	if (error)
+		do_error(_("init file parent pointer recs failed: %s\n"),
+				strerror(error));
+
+	fscan->have_garbage = false;
+	fscan->nr_file_pptrs = 0;
+
+	error = xattr_walk(ip, examine_xattr, fscan);
+	if (error && !no_modify)
+		do_error(_("ino %llu parent pointer scan failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+	if (error) {
+		do_warn(_("ino %llu parent pointer scan failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+		goto out_free;
+	}
+
+	crosscheck_file_parent_ptrs(ip, fscan);
+
+out_free:
+	free_slab(&fscan->file_pptr_recs);
+	xfblob_truncate(fscan->file_pptr_names);
+}
+
+/* Check all the parent pointers of files in this AG. */
+static void
+check_ag_parent_ptrs(
+	struct workqueue	*wq,
+	uint32_t		agno,
+	void			*arg)
+{
+	struct xfs_mount	*mp = wq->wq_ctx;
+	struct file_scan	fscan = {
+		.ag_pptrs	= &fs_pptrs[agno],
+	};
+	struct ag_pptrs		*ag_pptrs = &fs_pptrs[agno];
+	struct ino_tree_node	*irec;
+	char			*descr;
+	int			error;
+
+	qsort_slab(ag_pptrs->pptr_recs, cmp_ag_pptr);
+
+	error = -init_slab_cursor(ag_pptrs->pptr_recs, cmp_ag_pptr,
+			&fscan.ag_pptr_recs_cur);
+	if (error)
+		do_error(
+ _("init agno %u parent pointer slab cursor failed: %s\n"),
+				agno, strerror(error));
+
+	descr = kasprintf(GFP_KERNEL,
+			"xfs_repair (%s): file parent pointer names",
+			mp->m_fsname);
+	error = -xfblob_create(descr, &fscan.file_pptr_names);
+	kfree(descr);
+	if (error)
+		do_error(
+ _("init agno %u file parent pointer names failed: %s\n"),
+				agno, strerror(error));
+
+	for (irec = findfirst_inode_rec(agno);
+	     irec != NULL;
+	     irec = next_ino_rec(irec)) {
+		unsigned int	ino_offset;
+
+		for (ino_offset = 0;
+		     ino_offset < XFS_INODES_PER_CHUNK;
+		     ino_offset++) {
+			struct xfs_inode *ip;
+			xfs_ino_t	ino;
+
+			if (is_inode_free(irec, ino_offset))
+				continue;
+
+			ino = XFS_AGINO_TO_INO(mp, agno,
+					irec->ino_startnum + ino_offset);
+			error = -libxfs_iget(mp, NULL, ino, 0, &ip);
+			if (error && !no_modify)
+				do_error(
+ _("loading ino %llu for parent pointer check failed: %s\n"),
+						(unsigned long long)ino,
+						strerror(error));
+			if (error) {
+				do_warn(
+ _("loading ino %llu for parent pointer check failed: %s\n"),
+						(unsigned long long)ino,
+						strerror(error));
+				continue;
+			}
+
+			check_file_parent_ptrs(ip, &fscan);
+			libxfs_irele(ip);
+		}
+	}
+
+	xfblob_destroy(fscan.file_pptr_names);
+	free_slab_cursor(&fscan.ag_pptr_recs_cur);
+}
+
+/* Check all the parent pointers of all files in this filesystem. */
+void
+check_parent_ptrs(
+	struct xfs_mount	*mp)
+{
+	struct workqueue	wq;
+	xfs_agnumber_t		agno;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	create_work_queue(&wq, mp, ag_stride);
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
+		queue_work(&wq, check_ag_parent_ptrs, agno, NULL);
+
+	destroy_work_queue(&wq);
+}
diff --git a/repair/pptr.h b/repair/pptr.h
index 7522379423ff..65acff963a3f 100644
--- a/repair/pptr.h
+++ b/repair/pptr.h
@@ -12,4 +12,6 @@ void parent_ptr_init(struct xfs_mount *mp);
 void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
 		struct xfs_inode *dp, bool possible_dup);
 
+void check_parent_ptrs(struct xfs_mount *mp);
+
 #endif /* __REPAIR_PPTR_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-07-02  1:19   ` [PATCH 09/12] xfs_repair: check parent pointers Darrick J. Wong
@ 2024-07-02  1:20   ` Darrick J. Wong
  2024-07-02  6:47     ` Christoph Hellwig
  2024-07-02  1:20   ` [PATCH 11/12] xfs_repair: update ondisk parent pointer records Darrick J. Wong
  2024-07-02  1:20   ` [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:20 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Delete xattrs that have ATTR_PARENT set but are so garbage that they
clearly aren't parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/pptr.c            |  149 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 148 insertions(+), 2 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index d3611e05bade..e12f0a40b00a 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -46,6 +46,7 @@
 #define xfs_attr_is_leaf		libxfs_attr_is_leaf
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
+#define xfs_attr_removename		libxfs_attr_removename
 #define xfs_attr_set			libxfs_attr_set
 #define xfs_attr_sethash		libxfs_attr_sethash
 #define xfs_attr_sf_firstentry		libxfs_attr_sf_firstentry
diff --git a/repair/pptr.c b/repair/pptr.c
index bd9bb6f8bf36..61466009d88b 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -198,6 +198,29 @@ struct file_scan {
 
 	/* Does this file have garbage xattrs with ATTR_PARENT set? */
 	bool			have_garbage;
+
+	/* xattrs that we have to remove from this file */
+	struct xfs_slab		*garbage_xattr_recs;
+
+	/* attr names associated with garbage_xattr_recs */
+	struct xfblob		*garbage_xattr_names;
+};
+
+struct garbage_xattr {
+	/* xfs_da_args.attr_filter for the attribute being removed */
+	unsigned int		attr_filter;
+
+	/* attribute name length */
+	unsigned int		attrnamelen;
+
+	/* attribute value length */
+	unsigned int		attrvaluelen;
+
+	/* cookie for the attribute name */
+	xfblob_cookie		attrname_cookie;
+
+	/* cookie for the attribute value */
+	xfblob_cookie		attrvalue_cookie;
 };
 
 /* Global names storage file. */
@@ -392,6 +415,82 @@ add_parent_ptr(
 			(unsigned long long)ag_pptr.name_cookie);
 }
 
+/* Remove garbage extended attributes that have ATTR_PARENT set. */
+static void
+remove_garbage_xattrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	struct xfs_slab_cursor	*cur;
+	struct garbage_xattr	*ga;
+	void			*buf = NULL;
+	size_t			bufsize = 0;
+	int			error;
+
+	error = -init_slab_cursor(fscan->garbage_xattr_recs, NULL, &cur);
+	if (error)
+		do_error(_("init garbage xattr cursor failed: %s\n"),
+				strerror(error));
+
+	while ((ga = pop_slab_cursor(cur)) != NULL) {
+		struct xfs_da_args	args = {
+			.dp		= ip,
+			.attr_filter	= ga->attr_filter,
+			.namelen	= ga->attrnamelen,
+			.valuelen	= ga->attrvaluelen,
+			.owner		= ip->i_ino,
+			.geo		= ip->i_mount->m_attr_geo,
+			.whichfork	= XFS_ATTR_FORK,
+			.op_flags	= XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED,
+		};
+		size_t		desired = ga->attrnamelen + ga->attrvaluelen;
+
+		if (desired > bufsize) {
+			free(buf);
+			buf = malloc(desired);
+			if (!buf)
+				do_error(
+ _("allocating %zu bytes to remove ino %llu garbage xattr failed: %s\n"),
+						desired,
+						(unsigned long long)ip->i_ino,
+						strerror(errno));
+			bufsize = desired;
+		}
+
+		args.name = buf;
+		args.value = buf + ga->attrnamelen;
+
+		error = -xfblob_load(fscan->garbage_xattr_names,
+				ga->attrname_cookie, buf, ga->attrnamelen);
+		if (error)
+			do_error(
+ _("loading garbage xattr name failed: %s\n"),
+					strerror(error));
+
+		error = -xfblob_load(fscan->garbage_xattr_names,
+				ga->attrvalue_cookie, args.value,
+				ga->attrvaluelen);
+		if (error)
+			do_error(
+ _("loading garbage xattr value failed: %s\n"),
+					strerror(error));
+
+		libxfs_attr_sethash(&args);
+		error = -libxfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, true);
+		if (error)
+			do_error(
+ _("removing ino %llu garbage xattr failed: %s\n"),
+					(unsigned long long)ip->i_ino,
+					strerror(error));
+	}
+
+	free(buf);
+	free_slab_cursor(&cur);
+	free_slab(&fscan->garbage_xattr_recs);
+	xfblob_destroy(fscan->garbage_xattr_names);
+	fscan->garbage_xattr_names = NULL;
+}
+
 /* Schedule this ATTR_PARENT extended attribute for deletion. */
 static void
 record_garbage_xattr(
@@ -403,6 +502,15 @@ record_garbage_xattr(
 	const void		*value,
 	unsigned int		valuelen)
 {
+	struct garbage_xattr	garbage_xattr = {
+		.attr_filter	= attr_filter,
+		.attrnamelen	= namelen,
+		.attrvaluelen	= valuelen,
+	};
+	struct xfs_mount	*mp = ip->i_mount;
+	char			*descr;
+	int			error;
+
 	if (no_modify) {
 		if (!fscan->have_garbage)
 			do_warn(
@@ -413,13 +521,47 @@ record_garbage_xattr(
 	}
 
 	if (fscan->have_garbage)
-		return;
+		goto stuffit;
 	fscan->have_garbage = true;
 
 	do_warn(
  _("deleting garbage parent pointer extended attributes in ino %llu\n"),
 			(unsigned long long)ip->i_ino);
-	/* XXX do the work */
+
+	error = -init_slab(&fscan->garbage_xattr_recs,
+			sizeof(struct garbage_xattr));
+	if (error)
+		do_error(_("init garbage xattr recs failed: %s\n"),
+				strerror(error));
+
+	descr = kasprintf(GFP_KERNEL, "xfs_repair (%s): garbage xattr names",
+			mp->m_fsname);
+	error = -xfblob_create(descr, &fscan->garbage_xattr_names);
+	kfree(descr);
+	if (error)
+		do_error("init garbage xattr names failed: %s\n",
+				strerror(error));
+
+stuffit:
+	error = -xfblob_store(fscan->garbage_xattr_names,
+			&garbage_xattr.attrname_cookie, name, namelen);
+	if (error)
+		do_error(_("storing ino %llu garbage xattr failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	error = -xfblob_store(fscan->garbage_xattr_names,
+			&garbage_xattr.attrvalue_cookie, value, valuelen);
+	if (error)
+		do_error(_("storing ino %llu garbage xattr failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	error = -slab_add(fscan->garbage_xattr_recs, &garbage_xattr);
+	if (error)
+		do_error(_("storing ino %llu garbage xattr rec failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
 }
 
 /*
@@ -968,6 +1110,9 @@ check_file_parent_ptrs(
 		goto out_free;
 	}
 
+	if (!no_modify && fscan->have_garbage)
+		remove_garbage_xattrs(ip, fscan);
+
 	crosscheck_file_parent_ptrs(ip, fscan);
 
 out_free:


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 11/12] xfs_repair: update ondisk parent pointer records
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-07-02  1:20   ` [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
@ 2024-07-02  1:20   ` Darrick J. Wong
  2024-07-02  6:47     ` Christoph Hellwig
  2024-07-02  1:20   ` [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:20 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Update the ondisk parent pointer records as necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +
 repair/pptr.c            |   88 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 87 insertions(+), 3 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index e12f0a40b00a..df316727bd62 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -209,8 +209,10 @@
 #define xfs_parent_hashval		libxfs_parent_hashval
 #define xfs_parent_lookup		libxfs_parent_lookup
 #define xfs_parent_removename		libxfs_parent_removename
+#define xfs_parent_set			libxfs_parent_set
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_parent_from_attr		libxfs_parent_from_attr
+#define xfs_parent_unset		libxfs_parent_unset
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_hold			libxfs_perag_hold
 #define xfs_perag_put			libxfs_perag_put
diff --git a/repair/pptr.c b/repair/pptr.c
index 61466009d88b..94d6d834627c 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -673,6 +673,44 @@ load_file_pptr_name(
 			name, file_pptr->namelen);
 }
 
+/* Add an on disk parent pointer to a file. */
+static int
+add_file_pptr(
+	struct xfs_inode		*ip,
+	const struct ag_pptr		*ag_pptr,
+	const unsigned char		*name)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= ag_pptr->namelen,
+	};
+	struct xfs_parent_rec		pptr_rec = { };
+	struct xfs_da_args		scratch;
+
+	xfs_parent_rec_init(&pptr_rec, ag_pptr->parent_ino,
+			ag_pptr->parent_gen);
+	return -libxfs_parent_set(ip, ip->i_ino, &xname, &pptr_rec, &scratch);
+}
+
+/* Remove an on disk parent pointer from a file. */
+static int
+remove_file_pptr(
+	struct xfs_inode		*ip,
+	const struct file_pptr		*file_pptr,
+	const unsigned char		*name)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= file_pptr->namelen,
+	};
+	struct xfs_parent_rec		pptr_rec = { };
+	struct xfs_da_args		scratch;
+
+	xfs_parent_rec_init(&pptr_rec, file_pptr->parent_ino,
+			file_pptr->parent_gen);
+	return -libxfs_parent_unset(ip, ip->i_ino, &xname, &pptr_rec, &scratch);
+}
+
 /* Remove all pptrs from @ip. */
 static void
 clear_all_pptrs(
@@ -729,7 +767,16 @@ add_missing_parent_ptr(
 				name);
 	}
 
-	/* XXX actually do the work */
+	error = add_file_pptr(ip, ag_pptr, name);
+	if (error)
+		do_error(
+ _("adding ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen,
+			ag_pptr->namelen,
+			name,
+			strerror(error));
 }
 
 /* Remove @file_pptr from @ip. */
@@ -771,7 +818,16 @@ remove_incorrect_parent_ptr(
 			file_pptr->namelen,
 			name);
 
-	/* XXX actually do the work */
+	error = remove_file_pptr(ip, file_pptr, name);
+	if (error)
+		do_error(
+ _("removing ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen,
+			file_pptr->namelen,
+			name,
+			strerror(error));
 }
 
 /*
@@ -851,7 +907,33 @@ compare_parent_ptrs(
 			ag_pptr->namelen,
 			name1);
 
-	/* XXX do the work */
+	/* Remove the parent pointer that we don't want. */
+	error = remove_file_pptr(ip, file_pptr, name2);
+	if (error)
+		do_error(
+_("erasing ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen,
+			file_pptr->namelen,
+			name2,
+			strerror(error));
+
+	/*
+	 * Add the parent pointer that we do want.  It's possible that this
+	 * parent pointer already exists but we haven't gotten that far in the
+	 * scan, so we'll keep going on EEXIST.
+	 */
+	error = add_file_pptr(ip, ag_pptr, name1);
+	if (error && error != EEXIST)
+		do_error(
+ _("updating ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen,
+			ag_pptr->namelen,
+			name1,
+			strerror(error));
 }
 
 static int


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none
  2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-07-02  1:20   ` [PATCH 11/12] xfs_repair: update ondisk parent pointer records Darrick J. Wong
@ 2024-07-02  1:20   ` Darrick J. Wong
  2024-07-02  6:47     ` Christoph Hellwig
  11 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:20 UTC (permalink / raw)
  To: djwong, cem; +Cc: catherine.hoang, linux-xfs, allison.henderson, hch

From: Darrick J. Wong <djwong@kernel.org>

Erase all the parent pointers when there aren't any found by the
directory entry scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c |   41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index 94d6d834627c..8ec6a51d2c3d 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -714,8 +714,13 @@ remove_file_pptr(
 /* Remove all pptrs from @ip. */
 static void
 clear_all_pptrs(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
 {
+	struct xfs_slab_cursor	*cur;
+	struct file_pptr	*file_pptr;
+	int			error;
+
 	if (no_modify) {
 		do_warn(_("would delete unlinked ino %llu parent pointers\n"),
 				(unsigned long long)ip->i_ino);
@@ -724,7 +729,37 @@ clear_all_pptrs(
 
 	do_warn(_("deleting unlinked ino %llu parent pointers\n"),
 			(unsigned long long)ip->i_ino);
-	/* XXX actually do the work */
+
+	error = -init_slab_cursor(fscan->file_pptr_recs, NULL, &cur);
+	if (error)
+		do_error(_("init ino %llu pptr cursor failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	while ((file_pptr = pop_slab_cursor(cur)) != NULL) {
+		unsigned char	name[MAXNAMELEN];
+
+		error = load_file_pptr_name(fscan, file_pptr, name);
+		if (error)
+			do_error(
+  _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
+					(unsigned long long)ip->i_ino,
+					(unsigned long long)file_pptr->parent_ino,
+					file_pptr->parent_gen,
+					(unsigned long long)file_pptr->name_cookie,
+					strerror(error));
+
+		error = remove_file_pptr(ip, file_pptr, name);
+		if (error)
+			do_error(
+ _("wiping ino %llu pptr (ino %llu gen 0x%x) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				strerror(error));
+	}
+
+	free_slab_cursor(&cur);
 }
 
 /* Add @ag_pptr to @ip. */
@@ -1028,7 +1063,7 @@ crosscheck_file_parent_ptrs(
 		 * file.
 		 */
 		if (fscan->nr_file_pptrs > 0)
-			clear_all_pptrs(ip);
+			clear_all_pptrs(ip, fscan);
 
 		return;
 	}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
@ 2024-07-02  1:20   ` Darrick J. Wong
  2024-07-02  6:48     ` Christoph Hellwig
  2024-07-02  1:21   ` [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information Darrick J. Wong
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:20 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Make it so that scrub clients can detect corruptions within the
directory tree structure itself.  Update the documentation for the scrub
ioctl to mention this new functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/scrub.c                     |    5 +++++
 man/man2/ioctl_xfs_scrub_metadata.2 |   14 ++++++++++++++
 2 files changed, 19 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index baaa4b4d9402..a2146e228f5b 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -149,6 +149,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 		.descr	= "retained health records",
 		.group	= XFROG_SCRUB_GROUP_NONE,
 	},
+	[XFS_SCRUB_TYPE_DIRTREE] = {
+		.name	= "dirtree",
+		.descr	= "directory tree structure",
+		.group	= XFROG_SCRUB_GROUP_INODE,
+	},
 };
 #undef DEP
 
diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index 75ae52bb5847..44aa139b297a 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -148,6 +148,20 @@ that points back to the subdirectory.
 The inode to examine can be specified in the same manner as
 .BR XFS_SCRUB_TYPE_INODE "."
 
+.TP
+.B XFS_SCRUB_TYPE_DIRTREE
+This scrubber looks for problems in the directory tree structure such as loops
+and directories accessible through more than one path.
+Problems are detected by walking parent pointers upwards towards the root.
+Loops are detected by comparing the parent directory at each step against the
+directories already examined.
+Directories with multiple paths are detected by counting the parent pointers
+attached to a directory.
+Non-directories do not have links pointing away from the directory tree root
+and can be skipped.
+The directory to examine can be specified in the same manner as
+.BR XFS_SCRUB_TYPE_INODE "."
+
 .TP
 .B XFS_SCRUB_TYPE_SYMLINK
 Examine the target of a symbolic link for obvious pathname problems.


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
  2024-07-02  1:20   ` [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library Darrick J. Wong
@ 2024-07-02  1:21   ` Darrick J. Wong
  2024-07-02  6:48     ` Christoph Hellwig
  2024-07-02  1:21   ` [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names Darrick J. Wong
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:21 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Report directories that are the source of corruption in the directory
tree.  While we're at it, add the documentation updates for the new
reporting flags and scrub type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_bulkstat.2   |    3 +++
 man/man2/ioctl_xfs_fsbulkstat.2 |    3 +++
 spaceman/health.c               |    4 ++++
 3 files changed, 10 insertions(+)


diff --git a/man/man2/ioctl_xfs_bulkstat.2 b/man/man2/ioctl_xfs_bulkstat.2
index 3203ca0c5d27..b6d51aa43811 100644
--- a/man/man2/ioctl_xfs_bulkstat.2
+++ b/man/man2/ioctl_xfs_bulkstat.2
@@ -326,6 +326,9 @@ Symbolic link target.
 .TP
 .B XFS_BS_SICK_PARENT
 Parent pointers.
+.TP
+.B XFS_BS_SICK_DIRTREE
+Directory is the source of corruption in the directory tree.
 .RE
 .SH ERRORS
 Error codes can be one of, but are not limited to, the following:
diff --git a/man/man2/ioctl_xfs_fsbulkstat.2 b/man/man2/ioctl_xfs_fsbulkstat.2
index 3f059942a219..cd38d2fd6f26 100644
--- a/man/man2/ioctl_xfs_fsbulkstat.2
+++ b/man/man2/ioctl_xfs_fsbulkstat.2
@@ -239,6 +239,9 @@ Symbolic link target.
 .TP
 .B XFS_BS_SICK_PARENT
 Parent pointers.
+.TP
+.B XFS_BS_SICK_DIRTREE
+Directory is the source of corruption in the directory tree.
 .RE
 .SH RETURN VALUE
 On error, \-1 is returned, and
diff --git a/spaceman/health.c b/spaceman/health.c
index 6722babf5888..d88a7f6c6e53 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -165,6 +165,10 @@ static const struct flag_map inode_flags[] = {
 		.mask = XFS_BS_SICK_PARENT,
 		.descr = "parent pointers",
 	},
+	{
+		.mask = XFS_BS_SICK_DIRTREE,
+		.descr = "directory tree structure",
+	},
 	{0},
 };
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
  2024-07-02  1:20   ` [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library Darrick J. Wong
  2024-07-02  1:21   ` [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information Darrick J. Wong
@ 2024-07-02  1:21   ` Darrick J. Wong
  2024-07-02  6:48     ` Christoph Hellwig
  2024-07-02  1:21   ` [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions Darrick J. Wong
  2024-07-02  1:21   ` [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails Darrick J. Wong
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:21 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

The early exit logic in this function is a bit suboptimal -- we don't
need to close the @fd if we haven't even opened it, and since all errors
are fatal, we don't need to bump the progress counter.  The logic in
this function is about to get more involved due to the addition of the
directory tree structure checker, so clean up these warts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase5.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index 0df8c46e9f5b..b37196277554 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -279,7 +279,7 @@ check_inode_names(
 	if (bstat->bs_xflags & FS_XFLAG_HASATTR) {
 		error = check_xattr_names(ctx, &dsc, handle, bstat);
 		if (error)
-			goto out;
+			goto err;
 	}
 
 	/*
@@ -295,16 +295,16 @@ check_inode_names(
 			if (error == ESTALE)
 				return ESTALE;
 			str_errno(ctx, descr_render(&dsc));
-			goto out;
+			goto err;
 		}
 
 		error = check_dirent_names(ctx, &dsc, &fd, bstat);
 		if (error)
-			goto out;
+			goto err_fd;
 	}
 
-out:
 	progress_add(1);
+err_fd:
 	if (fd >= 0) {
 		err2 = close(fd);
 		if (err2)
@@ -312,7 +312,7 @@ check_inode_names(
 		if (!error && err2)
 			error = err2;
 	}
-
+err:
 	if (error)
 		*aborted = true;
 	if (!error && *aborted)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:21   ` [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names Darrick J. Wong
@ 2024-07-02  1:21   ` Darrick J. Wong
  2024-07-02  6:49     ` Christoph Hellwig
  2024-07-02  1:21   ` [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails Darrick J. Wong
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:21 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Now that we have online fsck for directory tree structure problems, we
need to find a place to call it.  The scanner requires that parent
pointers are enabled, that directory link counts are correct, and that
every directory entry has a corresponding parent pointer.  Therefore, we
can only run it after phase 4 fixes every file, and phase 5 resets the
link counts.

In other words, we call it as part of the phase 5 file scan that we do
to warn about weird looking file names.  This has the added benefit that
opening the directory by handle is less likely to fail if there are
loops in the directory structure.  For now, only plumb in enough to try
to fix directory tree problems right away; the next patch will make
phase 5 retry the dirloop scanner until the problems are fixed or we
stop making forward progress.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase5.c |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index b37196277554..6c8dee66e6e2 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -252,6 +252,47 @@ render_ino_from_handle(
 			bstat->bs_gen, NULL);
 }
 
+/*
+ * Check the directory structure for problems that could cause open_by_handle
+ * not to work.  Returns 0 for no problems; EADDRNOTAVAIL if the there are
+ * problems that would prevent name checking.
+ */
+static int
+check_dir_connection(
+	struct scrub_ctx		*ctx,
+	struct descr			*dsc,
+	const struct xfs_bulkstat	*bstat)
+{
+	struct scrub_item		sri = { };
+	int				error;
+
+	/* The dirtree scrubber only works when parent pointers are enabled */
+	if (!(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT))
+		return 0;
+
+	scrub_item_init_file(&sri, bstat);
+	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_DIRTREE);
+
+	error = scrub_item_check_file(ctx, &sri, -1);
+	if (error) {
+		str_liberror(ctx, error, _("checking directory loops"));
+		return error;
+	}
+
+	error = repair_file_corruption(ctx, &sri, -1);
+	if (error) {
+		str_liberror(ctx, error, _("repairing directory loops"));
+		return error;
+	}
+
+	/* No directory tree problems?  Clear this inode if it was deferred. */
+	if (repair_item_count_needsrepair(&sri) == 0)
+		return 0;
+
+	str_corrupt(ctx, descr_render(dsc), _("directory loop uncorrected!"));
+	return EADDRNOTAVAIL;
+}
+
 /*
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
@@ -275,6 +316,20 @@ check_inode_names(
 	descr_set(&dsc, bstat);
 	background_sleep();
 
+	/*
+	 * Try to fix directory loops before we have problems opening files by
+	 * handle.
+	 */
+	if (S_ISDIR(bstat->bs_mode)) {
+		error = check_dir_connection(ctx, &dsc, bstat);
+		if (error == EADDRNOTAVAIL) {
+			error = 0;
+			goto out;
+		}
+		if (error)
+			goto err;
+	}
+
 	/* Warn about naming problems in xattrs. */
 	if (bstat->bs_xflags & FS_XFLAG_HASATTR) {
 		error = check_xattr_names(ctx, &dsc, handle, bstat);
@@ -315,6 +370,7 @@ check_inode_names(
 err:
 	if (error)
 		*aborted = true;
+out:
 	if (!error && *aborted)
 		error = ECANCELED;
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails
  2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:21   ` [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions Darrick J. Wong
@ 2024-07-02  1:21   ` Darrick J. Wong
  2024-07-02  6:49     ` Christoph Hellwig
  4 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:21 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

If we cannot fix dirloop problems during the initial phase 5 inode scan,
defer them until later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase5.c |  215 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 scrub/repair.c |   13 +++
 scrub/repair.h |    2 +
 3 files changed, 216 insertions(+), 14 deletions(-)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index 6c8dee66e6e2..f6c295c64ada 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -18,6 +18,8 @@
 #include "libfrog/workqueue.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/scrub.h"
+#include "libfrog/bitmap.h"
+#include "libfrog/bulkstat.h"
 #include "xfs_scrub.h"
 #include "common.h"
 #include "inodes.h"
@@ -29,6 +31,36 @@
 
 /* Phase 5: Full inode scans and check directory connectivity. */
 
+struct ncheck_state {
+	struct scrub_ctx	*ctx;
+
+	/* Have we aborted this scan? */
+	bool			aborted;
+
+	/* Is this the last time we're going to process deferred inodes? */
+	bool			last_call;
+
+	/* Did we fix at least one thing while walking @cur->deferred? */
+	bool			fixed_something;
+
+	/* Lock for this structure */
+	pthread_mutex_t		lock;
+
+	/*
+	 * Inodes that are involved with directory tree structure corruptions
+	 * are marked here.  This will be NULL until the first corruption is
+	 * noted.
+	 */
+	struct bitmap		*new_deferred;
+
+	/*
+	 * Inodes that we're reprocessing due to earlier directory tree
+	 * structure corruption problems are marked here.  This will be NULL
+	 * during the first (parallel) inode scan.
+	 */
+	struct bitmap		*cur_deferred;
+};
+
 /*
  * Warn about problematic bytes in a directory/attribute name.  That means
  * terminal control characters and escape sequences, since that could be used
@@ -252,6 +284,26 @@ render_ino_from_handle(
 			bstat->bs_gen, NULL);
 }
 
+/* Defer this inode until later. */
+static inline int
+defer_inode(
+	struct ncheck_state	*ncs,
+	uint64_t		ino)
+{
+	int			error;
+
+	pthread_mutex_lock(&ncs->lock);
+	if (!ncs->new_deferred) {
+		error = -bitmap_alloc(&ncs->new_deferred);
+		if (error)
+			goto unlock;
+	}
+	error = -bitmap_set(ncs->new_deferred, ino, 1);
+unlock:
+	pthread_mutex_unlock(&ncs->lock);
+	return error;
+}
+
 /*
  * Check the directory structure for problems that could cause open_by_handle
  * not to work.  Returns 0 for no problems; EADDRNOTAVAIL if the there are
@@ -260,7 +312,7 @@ render_ino_from_handle(
 static int
 check_dir_connection(
 	struct scrub_ctx		*ctx,
-	struct descr			*dsc,
+	struct ncheck_state		*ncs,
 	const struct xfs_bulkstat	*bstat)
 {
 	struct scrub_item		sri = { };
@@ -279,17 +331,31 @@ check_dir_connection(
 		return error;
 	}
 
-	error = repair_file_corruption(ctx, &sri, -1);
+	if (ncs->last_call)
+		error = repair_file_corruption_now(ctx, &sri, -1);
+	else
+		error = repair_file_corruption(ctx, &sri, -1);
 	if (error) {
 		str_liberror(ctx, error, _("repairing directory loops"));
 		return error;
 	}
 
 	/* No directory tree problems?  Clear this inode if it was deferred. */
-	if (repair_item_count_needsrepair(&sri) == 0)
+	if (repair_item_count_needsrepair(&sri) == 0) {
+		if (ncs->cur_deferred)
+			ncs->fixed_something = true;
 		return 0;
+	}
+
+	/* Don't defer anything during last call. */
+	if (ncs->last_call)
+		return 0;
+
+	/* Directory tree structure problems exist; do not check names yet. */
+	error = defer_inode(ncs, bstat->bs_ino);
+	if (error)
+		return error;
 
-	str_corrupt(ctx, descr_render(dsc), _("directory loop uncorrected!"));
 	return EADDRNOTAVAIL;
 }
 
@@ -308,7 +374,7 @@ check_inode_names(
 	void			*arg)
 {
 	DEFINE_DESCR(dsc, ctx, render_ino_from_handle);
-	bool			*aborted = arg;
+	struct ncheck_state	*ncs = arg;
 	int			fd = -1;
 	int			error = 0;
 	int			err2;
@@ -321,7 +387,7 @@ check_inode_names(
 	 * handle.
 	 */
 	if (S_ISDIR(bstat->bs_mode)) {
-		error = check_dir_connection(ctx, &dsc, bstat);
+		error = check_dir_connection(ctx, ncs, bstat);
 		if (error == EADDRNOTAVAIL) {
 			error = 0;
 			goto out;
@@ -369,14 +435,120 @@ check_inode_names(
 	}
 err:
 	if (error)
-		*aborted = true;
+		ncs->aborted = true;
 out:
-	if (!error && *aborted)
+	if (!error && ncs->aborted)
 		error = ECANCELED;
 
 	return error;
 }
 
+/* Try to check_inode_names on a specific inode. */
+static int
+retry_deferred_inode(
+	struct ncheck_state	*ncs,
+	struct xfs_handle	*handle,
+	uint64_t		ino)
+{
+	struct xfs_bulkstat	bstat;
+	struct scrub_ctx	*ctx = ncs->ctx;
+	unsigned int		flags = 0;
+	int			error;
+
+	error = -xfrog_bulkstat_single(&ctx->mnt, ino, flags, &bstat);
+	if (error == ENOENT) {
+		/* Directory is gone, mark it clear. */
+		ncs->fixed_something = true;
+		return 0;
+	}
+	if (error)
+		return error;
+
+	handle->ha_fid.fid_ino = bstat.bs_ino;
+	handle->ha_fid.fid_gen = bstat.bs_gen;
+
+	return check_inode_names(ncs->ctx, handle, &bstat, ncs);
+}
+
+/* Try to check_inode_names on a range of inodes from the bitmap. */
+static int
+retry_deferred_inode_range(
+	uint64_t		ino,
+	uint64_t		len,
+	void			*arg)
+{
+	struct xfs_handle	handle = { };
+	struct ncheck_state	*ncs = arg;
+	struct scrub_ctx	*ctx = ncs->ctx;
+	uint64_t		i;
+	int			error;
+
+	memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
+	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+			sizeof(handle.ha_fid.fid_len);
+	handle.ha_fid.fid_pad = 0;
+
+	for (i = 0; i < len; i++) {
+		error = retry_deferred_inode(ncs, &handle, ino + i);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/*
+ * Try to check_inode_names on inodes that were deferred due to directory tree
+ * problems until we stop making progress.
+ */
+static int
+retry_deferred_inodes(
+	struct scrub_ctx	*ctx,
+	struct ncheck_state	*ncs)
+{
+	int			error;
+
+	if  (!ncs->new_deferred)
+		return 0;
+
+	/*
+	 * Try to repair things until we stop making forward progress or we
+	 * don't observe any new corruptions.  During the loop, we do not
+	 * complain about the corruptions that do not get fixed.
+	 */
+	do {
+		ncs->cur_deferred = ncs->new_deferred;
+		ncs->new_deferred = NULL;
+		ncs->fixed_something = false;
+
+		error = -bitmap_iterate(ncs->cur_deferred,
+				retry_deferred_inode_range, ncs);
+		if (error)
+			return error;
+
+		bitmap_free(&ncs->cur_deferred);
+	} while (ncs->fixed_something && ncs->new_deferred);
+
+	/*
+	 * Try one last time to fix things, and complain about any problems
+	 * that remain.
+	 */
+	if (!ncs->new_deferred)
+		return 0;
+
+	ncs->cur_deferred = ncs->new_deferred;
+	ncs->new_deferred = NULL;
+	ncs->last_call = true;
+
+	error = -bitmap_iterate(ncs->cur_deferred,
+			retry_deferred_inode_range, ncs);
+	if (error)
+		return error;
+
+	bitmap_free(&ncs->cur_deferred);
+	return 0;
+}
+
 #ifndef FS_IOC_GETFSLABEL
 # define FSLABEL_MAX		256
 # define FS_IOC_GETFSLABEL	_IOR(0x94, 49, char[FSLABEL_MAX])
@@ -568,9 +740,10 @@ int
 phase5_func(
 	struct scrub_ctx	*ctx)
 {
-	bool			aborted = false;
+	struct ncheck_state	ncs = { .ctx = ctx };
 	int			ret;
 
+
 	/*
 	 * Check and fix anything that requires a full filesystem scan.  We do
 	 * this after we've checked all inodes and repaired anything that could
@@ -590,14 +763,28 @@ _("Filesystem has errors, skipping connectivity checks."));
 	if (ret)
 		return ret;
 
-	ret = scrub_scan_all_inodes(ctx, check_inode_names, &aborted);
+	pthread_mutex_init(&ncs.lock, NULL);
+
+	ret = scrub_scan_all_inodes(ctx, check_inode_names, &ncs);
 	if (ret)
-		return ret;
-	if (aborted)
-		return ECANCELED;
+		goto out_lock;
+	if (ncs.aborted) {
+		ret = ECANCELED;
+		goto out_lock;
+	}
+
+	ret = retry_deferred_inodes(ctx, &ncs);
+	if (ret)
+		goto out_lock;
 
 	scrub_report_preen_triggers(ctx);
-	return 0;
+out_lock:
+	pthread_mutex_destroy(&ncs.lock);
+	if (ncs.new_deferred)
+		bitmap_free(&ncs.new_deferred);
+	if (ncs.cur_deferred)
+		bitmap_free(&ncs.cur_deferred);
+	return ret;
 }
 
 /* Estimate how much work we're going to do. */
diff --git a/scrub/repair.c b/scrub/repair.c
index 0258210722ba..4fed86134eda 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -732,6 +732,19 @@ repair_file_corruption(
 			XRM_REPAIR_ONLY | XRM_NOPROGRESS);
 }
 
+/* Repair all parts of this file or complain if we cannot. */
+int
+repair_file_corruption_now(
+	struct scrub_ctx	*ctx,
+	struct scrub_item	*sri,
+	int			override_fd)
+{
+	repair_item_boost_priorities(sri);
+
+	return repair_item_class(ctx, sri, override_fd, SCRUB_ITEM_CORRUPT,
+			XRM_REPAIR_ONLY | XRM_NOPROGRESS | XRM_FINAL_WARNING);
+}
+
 /*
  * Repair everything in this filesystem object that needs it.  This includes
  * cross-referencing and preening.
diff --git a/scrub/repair.h b/scrub/repair.h
index 411a379f6faa..ec4aa381a82d 100644
--- a/scrub/repair.h
+++ b/scrub/repair.h
@@ -76,6 +76,8 @@ int action_list_process(struct scrub_ctx *ctx, struct action_list *alist,
 int repair_item_corruption(struct scrub_ctx *ctx, struct scrub_item *sri);
 int repair_file_corruption(struct scrub_ctx *ctx, struct scrub_item *sri,
 		int override_fd);
+int repair_file_corruption_now(struct scrub_ctx *ctx, struct scrub_item *sri,
+		int override_fd);
 int repair_item(struct scrub_ctx *ctx, struct scrub_item *sri,
 		unsigned int repair_flags);
 int repair_item_to_action_item(struct scrub_ctx *ctx,


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 01/10] man: document vectored scrub mode
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
@ 2024-07-02  1:22   ` Darrick J. Wong
  2024-07-02  6:56     ` Christoph Hellwig
  2024-07-02  1:22   ` [PATCH 02/10] libfrog: support vectored scrub Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:22 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Add a manpage to document XFS_IOC_SCRUBV_METADATA.  From the kernel
patch:

Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
mode.  The caller specifies the principal metadata object that they want
to scrub (allocation group, inode, etc.) once, followed by an array of
scrub types they want called on that object.  The kernel runs the scrub
operations and writes the output flags and errno code to the
corresponding array element.

A new pseudo scrub type BARRIER is introduced to force the kernel to
return to userspace if any corruptions have been found when scrubbing
the previous scrub types in the array.  This enables userspace to
schedule, for example, the sequence:

 1. data fork
 2. barrier
 3. directory

If the data fork scrub is clean, then the kernel will perform the
directory scrub.  If not, the barrier in 2 will exit back to userspace.

The alternative would have been an interface where userspace passes a
pointer to an empty buffer, and the kernel formats that with
xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
was.  With that the kernel would have to communicate that the buffer
needed to have been at least X size, even though for our cases
XFS_SCRUB_TYPE_NR + 2 would always be enough.

Compared to that, this design keeps all the dependency policy and
ordering logic in userspace where it already resides instead of
duplicating it in the kernel. The downside of that is that it needs the
barrier logic.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime due to fewer transitions across the
system call boundary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_scrubv_metadata.2 |  171 ++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)
 create mode 100644 man/man2/ioctl_xfs_scrubv_metadata.2


diff --git a/man/man2/ioctl_xfs_scrubv_metadata.2 b/man/man2/ioctl_xfs_scrubv_metadata.2
new file mode 100644
index 000000000000..532916756df5
--- /dev/null
+++ b/man/man2/ioctl_xfs_scrubv_metadata.2
@@ -0,0 +1,171 @@
+.\" Copyright (c) 2023-2024 Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\" %%%LICENSE_END
+.TH IOCTL-XFS-SCRUBV-METADATA 2 2024-05-21 "XFS"
+.SH NAME
+ioctl_xfs_scrubv_metadata \- check a lot of XFS filesystem metadata
+.SH SYNOPSIS
+.br
+.B #include <xfs/xfs_fs.h>
+.PP
+.BI "int ioctl(int " dest_fd ", XFS_IOC_SCRUBV_METADATA, struct xfs_scrub_vec_head *" arg );
+.SH DESCRIPTION
+This XFS ioctl asks the kernel driver to examine several pieces of filesystem
+metadata for errors or suboptimal metadata.
+Multiple scrub types can be invoked to target a single filesystem object.
+See
+.BR ioctl_xfs_scrub_metadata (2)
+for a discussion of metadata validation, and documentation of the various
+.B XFS_SCRUB_TYPE
+and
+.B XFS_SCRUB_FLAGS
+values referenced below.
+
+The types and location of the metadata to scrub are conveyed as a vector with
+a header of the following form:
+.PP
+.in +4n
+.nf
+
+struct xfs_scrub_vec_head {
+	__u64 svh_ino;
+	__u32 svh_gen;
+	__u32 svh_agno;
+	__u32 svh_flags;
+	__u16 svh_rest_us;
+	__u16 svh_nr;
+	__u64 svh_reserved;
+	__u64 svh_vectors;
+};
+.fi
+.in
+.PP
+The field
+.IR svh_ino ,
+.IR svh_gen ,
+and
+.IR svh_agno
+correspond to the
+.IR sm_ino ,
+.IR sm_gen ,
+and
+.IR sm_agno
+fields of the regular scrub ioctl.
+Exactly one filesystem object can be specified in a single call.
+The kernel will proceed with each vector in
+.I svh_vectors
+until progress is no longer possible.
+
+The field
+.I svh_rest_us
+specifies an amount of time to pause between each scrub invocation to give
+the system a chance to process other requests.
+
+The field
+.I svh_nr
+specifies the number of vectors in the
+.I svh_vectors
+array.
+
+The field
+.I svh_vectors
+is a pointer to an array of
+.B struct xfs_scrub_vec
+structures.
+
+.PP
+The field
+.I svh_reserved
+must be zero.
+
+Each vector has the following form:
+.PP
+.in +4n
+.nf
+
+struct xfs_scrub_vec {
+	__u32 sv_type;
+	__u32 sv_flags;
+	__s32 sv_ret;
+	__u32 sv_reserved;
+};
+.fi
+.in
+
+.PP
+The fields
+.I sv_type
+and
+.I sv_flags
+indicate the type of metadata to check and the behavioral changes that
+userspace will permit of the kernel.
+The
+.I sv_flags
+field will be updated upon completion of the scrub call.
+See the documentation of
+.B XFS_SCRUB_TYPE_*
+and
+.B XFS_SCRUB_[IO]FLAG_*
+values in
+.BR ioctl_xfs_scrub_metadata (2)
+for a detailed description of their purpose.
+
+.PP
+If a vector's
+.I sv_type
+field is set to the value
+.BR XFS_SCRUB_TYPE_BARRIER ,
+the kernel will stop processing vectors and return to userspace if a scrubber
+flags corruption by setting one of the
+.B XFS_SCRUB_OFLAG_*
+values in
+.I sv_flags
+or
+returns an operation error in
+.IR sv_ret .
+Otherwise, the kernel returns only after processing all vectors.
+
+The
+.I sv_ret
+field is set to the return value of the scrub function.
+See the RETURN VALUE
+section of the
+.BR ioctl_xfs_scrub_metadata (2)
+manual page for more information.
+
+The
+.B sv_reserved
+field must be zero.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+One or more of the arguments specified is invalid.
+.TP
+.B EINTR
+The operation was interrupted.
+.TP
+.B ENOMEM
+There was not sufficient memory to perform the scrub or repair operation.
+.TP
+.B EFAULT
+A memory fault was encountered while reading or writing the vector.
+.SH CONFORMING TO
+This API is specific to XFS filesystem on the Linux kernel.
+.SH NOTES
+These operations may block other filesystem operations for a long time.
+A calling process can stop the operation by being sent a fatal
+signal, but non-fatal signals are blocked.
+.SH SEE ALSO
+.BR ioctl (2)
+.BR ioctl_xfs_scrub_metadata (2)
+.BR xfs_scrub (8)
+.BR xfs_repair (8)


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 02/10] libfrog: support vectored scrub
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
  2024-07-02  1:22   ` [PATCH 01/10] man: document vectored scrub mode Darrick J. Wong
@ 2024-07-02  1:22   ` Darrick J. Wong
  2024-07-02  6:56     ` Christoph Hellwig
  2024-07-02  1:22   ` [PATCH 03/10] xfs_io: " Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:22 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Enhance libfrog to support performing vectored metadata scrub.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/fsgeom.h |    6 ++
 libfrog/scrub.c  |  137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/scrub.h  |   35 ++++++++++++++
 3 files changed, 178 insertions(+)


diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h
index f327dc7d70d0..df2ca2a408e7 100644
--- a/libfrog/fsgeom.h
+++ b/libfrog/fsgeom.h
@@ -50,6 +50,12 @@ struct xfs_fd {
 /* Only use v5 bulkstat/inumbers ioctls. */
 #define XFROG_FLAG_BULKSTAT_FORCE_V5	(1 << 1)
 
+/* Only use the older one-at-a-time scrub ioctl. */
+#define XFROG_FLAG_SCRUB_FORCE_SINGLE	(1 << 2)
+
+/* Only use the vectored scrub ioctl. */
+#define XFROG_FLAG_SCRUB_FORCE_VECTOR	(1 << 3)
+
 /* Static initializers */
 #define XFS_FD_INIT(_fd)	{ .fd = (_fd), }
 #define XFS_FD_INIT_EMPTY	XFS_FD_INIT(-1)
diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index a2146e228f5b..e233c0f9c8e1 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -171,3 +171,140 @@ xfrog_scrub_metadata(
 
 	return 0;
 }
+
+/* Decide if there have been any scrub failures up to this point. */
+static inline int
+xfrog_scrubv_check_barrier(
+	const struct xfs_scrub_vec	*vectors,
+	const struct xfs_scrub_vec	*stop_vec)
+{
+	const struct xfs_scrub_vec	*v;
+	__u32				failmask;
+
+	failmask = stop_vec->sv_flags & XFS_SCRUB_FLAGS_OUT;
+
+	for (v = vectors; v < stop_vec; v++) {
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER)
+			continue;
+
+		/*
+		 * Runtime errors count as a previous failure, except the ones
+		 * used to ask userspace to retry.
+		 */
+		switch (v->sv_ret) {
+		case -EBUSY:
+		case -ENOENT:
+		case -EUSERS:
+		case 0:
+			break;
+		default:
+			return -ECANCELED;
+		}
+
+		/*
+		 * If any of the out-flags on the scrub vector match the mask
+		 * that was set on the barrier vector, that's a previous fail.
+		 */
+		if (v->sv_flags & failmask)
+			return -ECANCELED;
+	}
+
+	return 0;
+}
+
+static int
+xfrog_scrubv_fallback(
+	struct xfs_fd			*xfd,
+	struct xfrog_scrubv		*scrubv)
+{
+	struct xfs_scrub_vec		*vectors = scrubv->vectors;
+	struct xfs_scrub_vec		*v;
+	unsigned int			i;
+
+	if (scrubv->head.svh_flags & ~XFS_SCRUB_VEC_FLAGS_ALL)
+		return -EINVAL;
+
+	foreach_xfrog_scrubv_vec(scrubv, i, v) {
+		if (v->sv_reserved)
+			return -EINVAL;
+
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER &&
+		    (v->sv_flags & ~XFS_SCRUB_FLAGS_OUT))
+			return -EINVAL;
+	}
+
+	/* Run all the scrubbers. */
+	foreach_xfrog_scrubv_vec(scrubv, i, v) {
+		struct xfs_scrub_metadata	sm = {
+			.sm_type		= v->sv_type,
+			.sm_flags		= v->sv_flags,
+			.sm_ino			= scrubv->head.svh_ino,
+			.sm_gen			= scrubv->head.svh_gen,
+			.sm_agno		= scrubv->head.svh_agno,
+		};
+		struct timespec	tv;
+
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) {
+			v->sv_ret = xfrog_scrubv_check_barrier(vectors, v);
+			if (v->sv_ret)
+				break;
+			continue;
+		}
+
+		v->sv_ret = xfrog_scrub_metadata(xfd, &sm);
+		v->sv_flags = sm.sm_flags;
+
+		if (scrubv->head.svh_rest_us) {
+			tv.tv_sec = 0;
+			tv.tv_nsec = scrubv->head.svh_rest_us * 1000;
+			nanosleep(&tv, NULL);
+		}
+	}
+
+	return 0;
+}
+
+/* Invoke the vectored scrub ioctl. */
+static int
+xfrog_scrubv_call(
+	struct xfs_fd			*xfd,
+	struct xfs_scrub_vec_head	*vhead)
+{
+	int				ret;
+
+	ret = ioctl(xfd->fd, XFS_IOC_SCRUBV_METADATA, vhead);
+	if (ret)
+		return -errno;
+
+	return 0;
+}
+
+/* Invoke the vectored scrub ioctl.  Returns zero or negative error code. */
+int
+xfrog_scrubv_metadata(
+	struct xfs_fd			*xfd,
+	struct xfrog_scrubv		*scrubv)
+{
+	int				error = 0;
+
+	if (scrubv->head.svh_nr > XFROG_SCRUBV_MAX_VECTORS)
+		return -EINVAL;
+
+	if (xfd->flags & XFROG_FLAG_SCRUB_FORCE_SINGLE)
+		goto try_single;
+
+	error = xfrog_scrubv_call(xfd, &scrubv->head);
+	if (error == 0 || (xfd->flags & XFROG_FLAG_SCRUB_FORCE_VECTOR))
+		return error;
+
+	/* If the vectored scrub ioctl wasn't found, force single mode. */
+	switch (error) {
+	case -EOPNOTSUPP:
+	case -ENOTTY:
+		xfd->flags |= XFROG_FLAG_SCRUB_FORCE_SINGLE;
+		break;
+	}
+
+try_single:
+	return xfrog_scrubv_fallback(xfd, scrubv);
+}
diff --git a/libfrog/scrub.h b/libfrog/scrub.h
index 27230c62f71a..b564c0d7bd0f 100644
--- a/libfrog/scrub.h
+++ b/libfrog/scrub.h
@@ -28,4 +28,39 @@ extern const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR];
 
 int xfrog_scrub_metadata(struct xfs_fd *xfd, struct xfs_scrub_metadata *meta);
 
+/*
+ * Allow enough space to call all scrub types with a barrier between each.
+ * This is overkill for every caller in xfsprogs.
+ */
+#define XFROG_SCRUBV_MAX_VECTORS	(XFS_SCRUB_TYPE_NR * 2)
+
+struct xfrog_scrubv {
+	struct xfs_scrub_vec_head	head;
+	struct xfs_scrub_vec		vectors[XFROG_SCRUBV_MAX_VECTORS];
+};
+
+/* Initialize a scrubv structure; callers must have zeroed @scrubv. */
+static inline void
+xfrog_scrubv_init(struct xfrog_scrubv *scrubv)
+{
+	scrubv->head.svh_vectors = (uintptr_t)scrubv->vectors;
+}
+
+/* Return the next free vector from the scrubv structure. */
+static inline struct xfs_scrub_vec *
+xfrog_scrubv_next_vector(struct xfrog_scrubv *scrubv)
+{
+	if (scrubv->head.svh_nr >= XFROG_SCRUBV_MAX_VECTORS)
+		return NULL;
+
+	return &scrubv->vectors[scrubv->head.svh_nr++];
+}
+
+#define foreach_xfrog_scrubv_vec(scrubv, i, vec) \
+	for ((i) = 0, (vec) = (scrubv)->vectors; \
+	     (i) < (scrubv)->head.svh_nr; \
+	     (i)++, (vec)++)
+
+int xfrog_scrubv_metadata(struct xfs_fd *xfd, struct xfrog_scrubv *scrubv);
+
 #endif	/* __LIBFROG_SCRUB_H__ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 03/10] xfs_io: support vectored scrub
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
  2024-07-02  1:22   ` [PATCH 01/10] man: document vectored scrub mode Darrick J. Wong
  2024-07-02  1:22   ` [PATCH 02/10] libfrog: support vectored scrub Darrick J. Wong
@ 2024-07-02  1:22   ` Darrick J. Wong
  2024-07-02  6:57     ` Christoph Hellwig
  2024-07-02  1:22   ` [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:22 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Create a new scrubv command to xfs_io to support the vectored scrub
ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/scrub.c        |  368 +++++++++++++++++++++++++++++++++++++++++++++++------
 man/man8/xfs_io.8 |   51 +++++++
 2 files changed, 379 insertions(+), 40 deletions(-)


diff --git a/io/scrub.c b/io/scrub.c
index a77cd872fede..e03c1d0eaf1d 100644
--- a/io/scrub.c
+++ b/io/scrub.c
@@ -12,10 +12,13 @@
 #include "libfrog/paths.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/scrub.h"
+#include "libfrog/logging.h"
 #include "io.h"
+#include "list.h"
 
 static struct cmdinfo scrub_cmd;
 static struct cmdinfo repair_cmd;
+static const struct cmdinfo scrubv_cmd;
 
 static void
 scrub_help(void)
@@ -197,31 +200,38 @@ parse_args(
 	return 0;
 }
 
-static int
-scrub_f(
-	int				argc,
-	char				**argv)
+static void
+report_scrub_outcome(
+	uint32_t	flags)
 {
-	struct xfs_scrub_metadata	meta;
-	int				error;
-
-	error = parse_args(argc, argv, &scrub_cmd, &meta);
-	if (error)
-		return error;
-
-	error = ioctl(file->fd, XFS_IOC_SCRUB_METADATA, &meta);
-	if (error)
-		perror("scrub");
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+	if (flags & XFS_SCRUB_OFLAG_CORRUPT)
 		printf(_("Corruption detected.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_PREEN)
+	if (flags & XFS_SCRUB_OFLAG_PREEN)
 		printf(_("Optimization possible.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_XFAIL)
+	if (flags & XFS_SCRUB_OFLAG_XFAIL)
 		printf(_("Cross-referencing failed.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)
+	if (flags & XFS_SCRUB_OFLAG_XCORRUPT)
 		printf(_("Corruption detected during cross-referencing.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
+	if (flags & XFS_SCRUB_OFLAG_INCOMPLETE)
 		printf(_("Scan was not complete.\n"));
+}
+
+static int
+scrub_f(
+	int				argc,
+	char				**argv)
+{
+	struct xfs_scrub_metadata	meta;
+	int				error;
+
+	error = parse_args(argc, argv, &scrub_cmd, &meta);
+	if (error)
+		return error;
+
+	error = ioctl(file->fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (error)
+		perror("scrub");
+	report_scrub_outcome(meta.sm_flags);
 	return 0;
 }
 
@@ -239,6 +249,7 @@ scrub_init(void)
 	scrub_cmd.help = scrub_help;
 
 	add_command(&scrub_cmd);
+	add_command(&scrubv_cmd);
 }
 
 static void
@@ -267,34 +278,41 @@ repair_help(void)
 	printf("\n");
 }
 
-static int
-repair_f(
-	int				argc,
-	char				**argv)
+static void
+report_repair_outcome(
+	uint32_t	flags)
 {
-	struct xfs_scrub_metadata	meta;
-	int				error;
-
-	error = parse_args(argc, argv, &repair_cmd, &meta);
-	if (error)
-		return error;
-	meta.sm_flags |= XFS_SCRUB_IFLAG_REPAIR;
-
-	error = ioctl(file->fd, XFS_IOC_SCRUB_METADATA, &meta);
-	if (error)
-		perror("repair");
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+	if (flags & XFS_SCRUB_OFLAG_CORRUPT)
 		printf(_("Corruption remains.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_PREEN)
+	if (flags & XFS_SCRUB_OFLAG_PREEN)
 		printf(_("Optimization possible.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_XFAIL)
+	if (flags & XFS_SCRUB_OFLAG_XFAIL)
 		printf(_("Cross-referencing failed.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)
+	if (flags & XFS_SCRUB_OFLAG_XCORRUPT)
 		printf(_("Corruption still detected during cross-referencing.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
+	if (flags & XFS_SCRUB_OFLAG_INCOMPLETE)
 		printf(_("Repair was not complete.\n"));
-	if (meta.sm_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
+	if (flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 		printf(_("Metadata did not need repair or optimization.\n"));
+}
+
+static int
+repair_f(
+	int				argc,
+	char				**argv)
+{
+	struct xfs_scrub_metadata	meta;
+	int				error;
+
+	error = parse_args(argc, argv, &repair_cmd, &meta);
+	if (error)
+		return error;
+	meta.sm_flags |= XFS_SCRUB_IFLAG_REPAIR;
+
+	error = ioctl(file->fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (error)
+		perror("repair");
+	report_repair_outcome(meta.sm_flags);
 	return 0;
 }
 
@@ -315,3 +333,273 @@ repair_init(void)
 
 	add_command(&repair_cmd);
 }
+
+static void
+scrubv_help(void)
+{
+	printf(_(
+"\n"
+" Scrubs pieces of XFS filesystem metadata.  The first argument is the group\n"
+" of metadata to examine.  If the group is 'ag', the second parameter should\n"
+" be the AG number.  If the group is 'inode', the second and third parameters\n"
+" should be the inode number and generation number to act upon; if these are\n"
+" omitted, the scrub is performed on the open file.  If the group is 'fs',\n"
+" 'summary', or 'probe', there are no other parameters.\n"
+"\n"
+" Flags are -d for debug, and -r to allow repairs.\n"
+" -b NN will insert a scrub barrier after every NN scrubs, and -m sets the\n"
+" desired corruption mask in all barriers. -w pauses for some microseconds\n"
+" after each scrub call.\n"
+"\n"
+" Example:\n"
+" 'scrubv ag 3' - scrub all metadata in AG 3.\n"
+" 'scrubv ag 3 -b 2 -m 0x4' - scrub all metadata in AG 3, and use barriers\n"
+"            every third scrub to exit early if there are optimizations.\n"
+" 'scrubv fs' - scrub all non-AG non-file metadata.\n"
+" 'scrubv inode' - scrub all metadata for the open file.\n"
+" 'scrubv inode 128 13525' - scrub all metadata for inode 128 gen 13525.\n"
+" 'scrubv probe' - check for presence of online scrub.\n"
+" 'scrubv summary' - scrub all summary metadata.\n"));
+}
+
+/* Fill out the scrub vectors for a group of scrubber (ag, ino, fs, summary) */
+static void
+scrubv_fill_group(
+	struct xfrog_scrubv		*scrubv,
+	int				barrier_interval,
+	__u32				barrier_mask,
+	enum xfrog_scrub_group		group)
+{
+	const struct xfrog_scrub_descr	*d;
+	struct xfs_scrub_vec		*v;
+	unsigned int			i;
+
+	for (i = 0, d = xfrog_scrubbers; i < XFS_SCRUB_TYPE_NR; i++, d++) {
+		if (d->group != group)
+			continue;
+
+		v = xfrog_scrubv_next_vector(scrubv);
+		v->sv_type = i;
+
+		if (barrier_interval &&
+		    scrubv->head.svh_nr % (barrier_interval + 1) == 0) {
+			v = xfrog_scrubv_next_vector(scrubv);
+			v->sv_flags = barrier_mask;
+			v->sv_type = XFS_SCRUB_TYPE_BARRIER;
+		}
+	}
+}
+
+static int
+scrubv_f(
+	int				argc,
+	char				**argv)
+{
+	struct xfrog_scrubv		scrubv = { };
+	struct xfs_fd			xfd = XFS_FD_INIT(file->fd);
+	struct xfs_scrub_vec		*v;
+	uint32_t			flags = 0;
+	__u32				barrier_mask = XFS_SCRUB_OFLAG_CORRUPT;
+	enum xfrog_scrub_group		group;
+	bool				debug = false;
+	int				version = -1;
+	int				barrier_interval = 0;
+	int				rest_us = 0;
+	int				c;
+	int				error;
+
+	xfrog_scrubv_init(&scrubv);
+
+	while ((c = getopt(argc, argv, "b:dm:rv:w:")) != EOF) {
+		switch (c) {
+		case 'b':
+			barrier_interval = atoi(optarg);
+			if (barrier_interval < 0) {
+				fprintf(stderr,
+ _("Negative barrier interval makes no sense.\n"));
+				exitcode = 1;
+				return command_usage(&scrubv_cmd);
+			}
+			break;
+		case 'd':
+			debug = true;
+			break;
+		case 'm':
+			barrier_mask = strtoul(optarg, NULL, 0);
+			break;
+		case 'r':
+			flags |= XFS_SCRUB_IFLAG_REPAIR;
+			break;
+		case 'v':
+			if (!strcmp("single", optarg)) {
+				version = 0;
+			} else if (!strcmp("vector", optarg)) {
+				version = 1;
+			} else {
+				fprintf(stderr,
+ _("API version must be 'single' or 'vector'.\n"));
+				exitcode = 1;
+				return command_usage(&scrubv_cmd);
+			}
+			break;
+		case 'w':
+			rest_us = atoi(optarg);
+			if (rest_us < 0) {
+				fprintf(stderr,
+ _("Rest time must be positive.\n"));
+				exitcode = 1;
+				return command_usage(&scrubv_cmd);
+			}
+			break;
+		default:
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+	}
+	if (optind > argc - 1) {
+		fprintf(stderr,
+ _("Must have at least one positional argument.\n"));
+		exitcode = 1;
+		return command_usage(&scrubv_cmd);
+	}
+
+	if ((flags & XFS_SCRUB_IFLAG_REPAIR) && !expert) {
+		printf(_("Repair flag requires expert mode.\n"));
+		return 1;
+	}
+
+	scrubv.head.svh_rest_us = rest_us;
+	foreach_xfrog_scrubv_vec(&scrubv, c, v)
+		v->sv_flags = flags;
+
+	/* Extract group and domain information from cmdline. */
+	if (!strcmp(argv[optind], "probe"))
+		group = XFROG_SCRUB_GROUP_NONE;
+	else if (!strcmp(argv[optind], "agheader"))
+		group = XFROG_SCRUB_GROUP_AGHEADER;
+	else if (!strcmp(argv[optind], "ag"))
+		group = XFROG_SCRUB_GROUP_PERAG;
+	else if (!strcmp(argv[optind], "fs"))
+		group = XFROG_SCRUB_GROUP_FS;
+	else if (!strcmp(argv[optind], "inode"))
+		group = XFROG_SCRUB_GROUP_INODE;
+	else if (!strcmp(argv[optind], "iscan"))
+		group = XFROG_SCRUB_GROUP_ISCAN;
+	else if (!strcmp(argv[optind], "summary"))
+		group = XFROG_SCRUB_GROUP_SUMMARY;
+	else {
+		printf(_("Unknown group '%s'.\n"), argv[optind]);
+		exitcode = 1;
+		return command_usage(&scrubv_cmd);
+	}
+	optind++;
+
+	switch (group) {
+	case XFROG_SCRUB_GROUP_INODE:
+		if (!parse_inode(argc, argv, optind, &scrubv.head.svh_ino,
+						     &scrubv.head.svh_gen)) {
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+		break;
+	case XFROG_SCRUB_GROUP_AGHEADER:
+	case XFROG_SCRUB_GROUP_PERAG:
+		if (!parse_agno(argc, argv, optind, &scrubv.head.svh_agno)) {
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+		break;
+	case XFROG_SCRUB_GROUP_FS:
+	case XFROG_SCRUB_GROUP_SUMMARY:
+	case XFROG_SCRUB_GROUP_ISCAN:
+	case XFROG_SCRUB_GROUP_NONE:
+		if (!parse_none(argc, optind)) {
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+		break;
+	default:
+		ASSERT(0);
+		break;
+	}
+	scrubv_fill_group(&scrubv, barrier_interval, barrier_mask, group);
+	assert(scrubv.head.svh_nr <= XFROG_SCRUBV_MAX_VECTORS);
+
+	error = -xfd_prepare_geometry(&xfd);
+	if (error) {
+		xfrog_perror(error, "xfd_prepare_geometry");
+		exitcode = 1;
+		return 0;
+	}
+
+	switch (version) {
+	case 0:
+		xfd.flags |= XFROG_FLAG_SCRUB_FORCE_SINGLE;
+		break;
+	case 1:
+		xfd.flags |= XFROG_FLAG_SCRUB_FORCE_VECTOR;
+		break;
+	default:
+		break;
+	}
+
+	error = -xfrog_scrubv_metadata(&xfd, &scrubv);
+	if (error) {
+		xfrog_perror(error, "xfrog_scrub_many");
+		exitcode = 1;
+		return 0;
+	}
+
+	/* Dump what happened. */
+	if (debug) {
+		foreach_xfrog_scrubv_vec(&scrubv, c, v) {
+			const char	*type;
+
+			if (v->sv_type == XFS_SCRUB_TYPE_BARRIER)
+				type = _("barrier");
+			else
+				type = _(xfrog_scrubbers[v->sv_type].descr);
+			printf(_("[%02u] %-25s: flags 0x%x ret %d\n"), c, type,
+					v->sv_flags, v->sv_ret);
+		}
+	}
+
+	/* Figure out what happened. */
+	foreach_xfrog_scrubv_vec(&scrubv, c, v) {
+		/* Report barrier failures. */
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) {
+			if (v->sv_ret) {
+				printf(_("barrier: FAILED\n"));
+				break;
+			}
+			continue;
+		}
+
+		printf("%s: ", _(xfrog_scrubbers[v->sv_type].descr));
+		switch (v->sv_ret) {
+		case 0:
+			break;
+		default:
+			printf("%s\n", strerror(-v->sv_ret));
+			continue;
+		}
+		if (!(v->sv_flags & XFS_SCRUB_FLAGS_OUT))
+			printf(_("OK.\n"));
+		else if (v->sv_flags & XFS_SCRUB_IFLAG_REPAIR)
+			report_repair_outcome(v->sv_flags);
+		else
+			report_scrub_outcome(v->sv_flags);
+	}
+
+	return 0;
+}
+
+static const struct cmdinfo scrubv_cmd = {
+	.name		= "scrubv",
+	.cfunc		= scrubv_f,
+	.argmin		= 1,
+	.argmax		= -1,
+	.flags		= CMD_NOMAP_OK,
+	.oneline	= N_("vectored metadata scrub"),
+	.help		= scrubv_help,
+};
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 02036e3d093b..657bdaec4381 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1392,6 +1392,57 @@ inode number and generation number are specified.
 .RE
 .PD
 .TP
+.BI "scrubv [ \-b NN ] [ \-d ] [ \-m oflags ] [ \-r ] [ \-v NN ] [ \-w ms ] " group " [ " agnumber " | " "ino" " " "gen" " ]"
+Scrub a bunch of internal XFS filesystem metadata.
+The
+.BI group
+parameter specifies which group of metadata to scrub.
+Valid groups are
+.IR ag ", " agheader ", " inode ", " iscan ", " fs ", " probe ", " rtgroup ", or " summary .
+
+For
+.BR ag " and " agheader
+metadata, one AG number must be specified.
+For
+.B inode
+metadata, the scrub is applied to the open file unless the
+inode number and generation number are specified.
+For
+.B rtgroup
+metadata, one rt group number must be specified.
+
+.RS 1.0i
+.PD 0
+.TP
+.BI "\-b " NN
+Inject scrub barriers into the vector stream at the given interval.
+Barriers abort vector processing if any previous scrub function found
+corruption.
+.TP
+.BI \-d
+Enables debug mode.
+.TP
+.BI \-m
+Fail scrub barriers if any of these scrub output flags are seen.
+.TP
+.BI \-r
+Repair metadata if corruptions are found.
+This option requires expert mode.
+.TP
+.BI "\-v " NN
+Force a particular API version.
+.B single
+selects XFS_SCRUB_METADATA (one-by-one).
+.B vector
+selects XFS_SCRUBV_METADATA (vectored).
+If no option is specified, vector mode will be used, with a fallback to single
+mode if the kernel doesn't recognize the vector mode ioctl.
+.TP
+.BI "\-w " us
+Wait the given number of microseconds between each scrub function.
+.RE
+.PD
+.TP
 .BI "repair " type " [ " agnumber " | " "ino" " " "gen" " ]"
 Repair internal XFS filesystem metadata.  The
 .BI type


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-07-02  1:22   ` [PATCH 03/10] xfs_io: " Darrick J. Wong
@ 2024-07-02  1:22   ` Darrick J. Wong
  2024-07-02  6:57     ` Christoph Hellwig
  2024-07-02  1:23   ` [PATCH 05/10] xfs_scrub: split the repair " Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:22 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Move all the code that updates the internal state in response to a scrub
ioctl() call completion into a separate function.  This will help with
vectorizing scrub calls later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/scrub.c |   52 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 14 deletions(-)


diff --git a/scrub/scrub.c b/scrub/scrub.c
index 1b0609e7418b..c4b4367e458f 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -22,6 +22,10 @@
 #include "descr.h"
 #include "scrub_private.h"
 
+static int scrub_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
+		struct scrub_item *sri, struct xfs_scrub_metadata *meta,
+		int error);
+
 /* Online scrub and repair wrappers. */
 
 /* Format a scrub description. */
@@ -117,12 +121,32 @@ xfs_check_metadata(
 	dbg_printf("check %s flags %xh\n", descr_render(&dsc), meta.sm_flags);
 
 	error = -xfrog_scrub_metadata(xfdp, &meta);
+	return scrub_epilogue(ctx, &dsc, sri, &meta, error);
+}
+
+/*
+ * Update all internal state after a scrub ioctl call.
+ * Returns 0 for success, or ECANCELED to abort the program.
+ */
+static int
+scrub_epilogue(
+	struct scrub_ctx		*ctx,
+	struct descr			*dsc,
+	struct scrub_item		*sri,
+	struct xfs_scrub_metadata	*meta,
+	int				error)
+{
+	unsigned int			scrub_type = meta->sm_type;
+	enum xfrog_scrub_group		group;
+
+	group = xfrog_scrubbers[scrub_type].group;
+
 	switch (error) {
 	case 0:
 		/* No operational errors encountered. */
 		if (!sri->sri_revalidate &&
 		    debug_tweak_on("XFS_SCRUB_FORCE_REPAIR"))
-			meta.sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+			meta->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
 		break;
 	case ENOENT:
 		/* Metadata not present, just skip it. */
@@ -130,13 +154,13 @@ xfs_check_metadata(
 		return 0;
 	case ESHUTDOWN:
 		/* FS already crashed, give up. */
-		str_error(ctx, descr_render(&dsc),
+		str_error(ctx, descr_render(dsc),
 _("Filesystem is shut down, aborting."));
 		return ECANCELED;
 	case EIO:
 	case ENOMEM:
 		/* Abort on I/O errors or insufficient memory. */
-		str_liberror(ctx, error, descr_render(&dsc));
+		str_liberror(ctx, error, descr_render(dsc));
 		return ECANCELED;
 	case EDEADLOCK:
 	case EBUSY:
@@ -152,7 +176,7 @@ _("Filesystem is shut down, aborting."));
 		return 0;
 	default:
 		/* Operational error.  Log it and move on. */
-		str_liberror(ctx, error, descr_render(&dsc));
+		str_liberror(ctx, error, descr_render(dsc));
 		scrub_item_clean_state(sri, scrub_type);
 		return 0;
 	}
@@ -163,27 +187,27 @@ _("Filesystem is shut down, aborting."));
 	 * we'll try the scan again, just in case the fs was busy.
 	 * Only retry so many times.
 	 */
-	if (want_retry(&meta) && scrub_item_schedule_retry(sri, scrub_type))
+	if (want_retry(meta) && scrub_item_schedule_retry(sri, scrub_type))
 		return 0;
 
 	/* Complain about incomplete or suspicious metadata. */
-	scrub_warn_incomplete_scrub(ctx, &dsc, &meta);
+	scrub_warn_incomplete_scrub(ctx, dsc, meta);
 
 	/*
 	 * If we need repairs or there were discrepancies, schedule a
 	 * repair if desired, otherwise complain.
 	 */
-	if (is_corrupt(&meta) || xref_disagrees(&meta)) {
+	if (is_corrupt(meta) || xref_disagrees(meta)) {
 		if (ctx->mode != SCRUB_MODE_REPAIR) {
 			/* Dry-run mode, so log an error and forget it. */
-			str_corrupt(ctx, descr_render(&dsc),
+			str_corrupt(ctx, descr_render(dsc),
 _("Repairs are required."));
 			scrub_item_clean_state(sri, scrub_type);
 			return 0;
 		}
 
 		/* Schedule repairs. */
-		scrub_item_save_state(sri, scrub_type, meta.sm_flags);
+		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
 		return 0;
 	}
 
@@ -191,12 +215,12 @@ _("Repairs are required."));
 	 * If we could optimize, schedule a repair if desired,
 	 * otherwise complain.
 	 */
-	if (is_unoptimized(&meta)) {
+	if (is_unoptimized(meta)) {
 		if (ctx->mode == SCRUB_MODE_DRY_RUN) {
 			/* Dry-run mode, so log an error and forget it. */
 			if (group != XFROG_SCRUB_GROUP_INODE) {
 				/* AG or FS metadata, always warn. */
-				str_info(ctx, descr_render(&dsc),
+				str_info(ctx, descr_render(dsc),
 _("Optimization is possible."));
 			} else if (!ctx->preen_triggers[scrub_type]) {
 				/* File metadata, only warn once per type. */
@@ -210,7 +234,7 @@ _("Optimization is possible."));
 		}
 
 		/* Schedule optimizations. */
-		scrub_item_save_state(sri, scrub_type, meta.sm_flags);
+		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
 		return 0;
 	}
 
@@ -221,8 +245,8 @@ _("Optimization is possible."));
 	 * re-examine the object as repairs progress to see if the kernel will
 	 * deem it completely consistent at some point.
 	 */
-	if (xref_failed(&meta) && ctx->mode == SCRUB_MODE_REPAIR) {
-		scrub_item_save_state(sri, scrub_type, meta.sm_flags);
+	if (xref_failed(meta) && ctx->mode == SCRUB_MODE_REPAIR) {
+		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
 		return 0;
 	}
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 05/10] xfs_scrub: split the repair epilogue code into a separate function
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-07-02  1:22   ` [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function Darrick J. Wong
@ 2024-07-02  1:23   ` Darrick J. Wong
  2024-07-02  6:57     ` Christoph Hellwig
  2024-07-02  1:23   ` [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:23 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Move all the code that updates the internal state in response to a
repair ioctl() call completion into a separate function.  This will help
with vectorizing repair calls later on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/repair.c |   69 +++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 45 insertions(+), 24 deletions(-)


diff --git a/scrub/repair.c b/scrub/repair.c
index 4fed86134eda..0b99e3351918 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -20,6 +20,11 @@
 #include "descr.h"
 #include "scrub_private.h"
 
+static int repair_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
+		struct scrub_item *sri, unsigned int repair_flags,
+		struct xfs_scrub_metadata *oldm,
+		struct xfs_scrub_metadata *meta, int error);
+
 /* General repair routines. */
 
 /*
@@ -133,6 +138,22 @@ xfs_repair_metadata(
 				_("Attempting optimization."));
 
 	error = -xfrog_scrub_metadata(xfdp, &meta);
+	return repair_epilogue(ctx, &dsc, sri, repair_flags, &oldm, &meta,
+			error);
+}
+
+static int
+repair_epilogue(
+	struct scrub_ctx		*ctx,
+	struct descr			*dsc,
+	struct scrub_item		*sri,
+	unsigned int			repair_flags,
+	struct xfs_scrub_metadata	*oldm,
+	struct xfs_scrub_metadata	*meta,
+	int				error)
+{
+	unsigned int			scrub_type = meta->sm_type;
+
 	switch (error) {
 	case 0:
 		/* No operational errors encountered. */
@@ -141,12 +162,12 @@ xfs_repair_metadata(
 	case EBUSY:
 		/* Filesystem is busy, try again later. */
 		if (debug || verbose)
-			str_info(ctx, descr_render(&dsc),
+			str_info(ctx, descr_render(dsc),
 _("Filesystem is busy, deferring repair."));
 		return 0;
 	case ESHUTDOWN:
 		/* Filesystem is already shut down, abort. */
-		str_error(ctx, descr_render(&dsc),
+		str_error(ctx, descr_render(dsc),
 _("Filesystem is shut down, aborting."));
 		return ECANCELED;
 	case ENOTTY:
@@ -157,7 +178,7 @@ _("Filesystem is shut down, aborting."));
 		 * how to perform the repair, don't requeue the request.  Mark
 		 * it done and move on.
 		 */
-		if (is_unoptimized(&oldm) ||
+		if (is_unoptimized(oldm) ||
 		    debug_tweak_on("XFS_SCRUB_FORCE_REPAIR")) {
 			scrub_item_clean_state(sri, scrub_type);
 			return 0;
@@ -175,14 +196,14 @@ _("Filesystem is shut down, aborting."));
 		fallthrough;
 	case EINVAL:
 		/* Kernel doesn't know how to repair this? */
-		str_corrupt(ctx, descr_render(&dsc),
+		str_corrupt(ctx, descr_render(dsc),
 _("Don't know how to fix; offline repair required."));
 		scrub_item_clean_state(sri, scrub_type);
 		return 0;
 	case EROFS:
 		/* Read-only filesystem, can't fix. */
-		if (verbose || debug || needs_repair(&oldm))
-			str_error(ctx, descr_render(&dsc),
+		if (verbose || debug || needs_repair(oldm))
+			str_error(ctx, descr_render(dsc),
 _("Read-only filesystem; cannot make changes."));
 		return ECANCELED;
 	case ENOENT:
@@ -192,7 +213,7 @@ _("Read-only filesystem; cannot make changes."));
 	case ENOMEM:
 	case ENOSPC:
 		/* Don't care if preen fails due to low resources. */
-		if (is_unoptimized(&oldm) && !needs_repair(&oldm)) {
+		if (is_unoptimized(oldm) && !needs_repair(oldm)) {
 			scrub_item_clean_state(sri, scrub_type);
 			return 0;
 		}
@@ -207,7 +228,7 @@ _("Read-only filesystem; cannot make changes."));
 		 */
 		if (!(repair_flags & XRM_FINAL_WARNING))
 			return 0;
-		str_liberror(ctx, error, descr_render(&dsc));
+		str_liberror(ctx, error, descr_render(dsc));
 		scrub_item_clean_state(sri, scrub_type);
 		return 0;
 	}
@@ -218,12 +239,12 @@ _("Read-only filesystem; cannot make changes."));
 	 * the repair again, just in case the fs was busy.  Only retry so many
 	 * times.
 	 */
-	if (want_retry(&meta) && scrub_item_schedule_retry(sri, scrub_type))
+	if (want_retry(meta) && scrub_item_schedule_retry(sri, scrub_type))
 		return 0;
 
 	if (repair_flags & XRM_FINAL_WARNING)
-		scrub_warn_incomplete_scrub(ctx, &dsc, &meta);
-	if (needs_repair(&meta) || is_incomplete(&meta)) {
+		scrub_warn_incomplete_scrub(ctx, dsc, meta);
+	if (needs_repair(meta) || is_incomplete(meta)) {
 		/*
 		 * Still broken; if we've been told not to complain then we
 		 * just requeue this and try again later.  Otherwise we
@@ -231,9 +252,9 @@ _("Read-only filesystem; cannot make changes."));
 		 */
 		if (!(repair_flags & XRM_FINAL_WARNING))
 			return 0;
-		str_corrupt(ctx, descr_render(&dsc),
+		str_corrupt(ctx, descr_render(dsc),
 _("Repair unsuccessful; offline repair required."));
-	} else if (xref_failed(&meta)) {
+	} else if (xref_failed(meta)) {
 		/*
 		 * This metadata object itself looks ok, but we still noticed
 		 * inconsistencies when comparing it with the other filesystem
@@ -242,31 +263,31 @@ _("Repair unsuccessful; offline repair required."));
 		 * reverify the cross-referencing as repairs progress.
 		 */
 		if (repair_flags & XRM_FINAL_WARNING) {
-			str_info(ctx, descr_render(&dsc),
+			str_info(ctx, descr_render(dsc),
  _("Seems correct but cross-referencing failed; offline repair recommended."));
 		} else {
 			if (verbose)
-				str_info(ctx, descr_render(&dsc),
+				str_info(ctx, descr_render(dsc),
  _("Seems correct but cross-referencing failed; will keep checking."));
 			return 0;
 		}
-	} else if (meta.sm_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) {
+	} else if (meta->sm_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) {
 		if (verbose)
-			str_info(ctx, descr_render(&dsc),
+			str_info(ctx, descr_render(dsc),
 					_("No modification needed."));
 	} else {
 		/* Clean operation, no corruption detected. */
-		if (is_corrupt(&oldm))
-			record_repair(ctx, descr_render(&dsc),
+		if (is_corrupt(oldm))
+			record_repair(ctx, descr_render(dsc),
  _("Repairs successful."));
-		else if (xref_disagrees(&oldm))
-			record_repair(ctx, descr_render(&dsc),
+		else if (xref_disagrees(oldm))
+			record_repair(ctx, descr_render(dsc),
  _("Repairs successful after discrepancy in cross-referencing."));
-		else if (xref_failed(&oldm))
-			record_repair(ctx, descr_render(&dsc),
+		else if (xref_failed(oldm))
+			record_repair(ctx, descr_render(dsc),
  _("Repairs successful after cross-referencing failure."));
 		else
-			record_preen(ctx, descr_render(&dsc),
+			record_preen(ctx, descr_render(dsc),
  _("Optimization successful."));
 	}
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-07-02  1:23   ` [PATCH 05/10] xfs_scrub: split the repair " Darrick J. Wong
@ 2024-07-02  1:23   ` Darrick J. Wong
  2024-07-02  6:57     ` Christoph Hellwig
  2024-07-02  1:23   ` [PATCH 07/10] xfs_scrub: vectorize scrub calls Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:23 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Convert the scrub and repair epilogue code to pass around xfs_scrub_vecs
as we prepare for vectorized operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/repair.c        |   35 ++++++++++++++++++-----------------
 scrub/scrub.c         |   27 ++++++++++++++-------------
 scrub/scrub_private.h |   34 +++++++++++++++++-----------------
 3 files changed, 49 insertions(+), 47 deletions(-)


diff --git a/scrub/repair.c b/scrub/repair.c
index 0b99e3351918..7a710a159e69 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -22,8 +22,8 @@
 
 static int repair_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
 		struct scrub_item *sri, unsigned int repair_flags,
-		struct xfs_scrub_metadata *oldm,
-		struct xfs_scrub_metadata *meta, int error);
+		const struct xfs_scrub_vec *oldm,
+		const struct xfs_scrub_vec *meta);
 
 /* General repair routines. */
 
@@ -93,10 +93,9 @@ xfs_repair_metadata(
 	unsigned int			repair_flags)
 {
 	struct xfs_scrub_metadata	meta = { 0 };
-	struct xfs_scrub_metadata	oldm;
+	struct xfs_scrub_vec		oldm, vec;
 	DEFINE_DESCR(dsc, ctx, format_scrub_descr);
 	bool				repair_only;
-	int				error;
 
 	/*
 	 * If the caller boosted the priority of this scrub type on behalf of a
@@ -124,22 +123,24 @@ xfs_repair_metadata(
 		break;
 	}
 
-	if (!is_corrupt(&meta) && repair_only)
+	vec.sv_type = scrub_type;
+	vec.sv_flags = sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY;
+	memcpy(&oldm, &vec, sizeof(struct xfs_scrub_vec));
+	if (!is_corrupt(&vec) && repair_only)
 		return 0;
 
-	memcpy(&oldm, &meta, sizeof(oldm));
-	oldm.sm_flags = sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY;
-	descr_set(&dsc, &oldm);
+	descr_set(&dsc, &meta);
 
-	if (needs_repair(&oldm))
+	if (needs_repair(&vec))
 		str_info(ctx, descr_render(&dsc), _("Attempting repair."));
 	else if (debug || verbose)
 		str_info(ctx, descr_render(&dsc),
 				_("Attempting optimization."));
 
-	error = -xfrog_scrub_metadata(xfdp, &meta);
-	return repair_epilogue(ctx, &dsc, sri, repair_flags, &oldm, &meta,
-			error);
+	vec.sv_ret = xfrog_scrub_metadata(xfdp, &meta);
+	vec.sv_flags = meta.sm_flags;
+
+	return repair_epilogue(ctx, &dsc, sri, repair_flags, &oldm, &vec);
 }
 
 static int
@@ -148,11 +149,11 @@ repair_epilogue(
 	struct descr			*dsc,
 	struct scrub_item		*sri,
 	unsigned int			repair_flags,
-	struct xfs_scrub_metadata	*oldm,
-	struct xfs_scrub_metadata	*meta,
-	int				error)
+	const struct xfs_scrub_vec	*oldm,
+	const struct xfs_scrub_vec	*meta)
 {
-	unsigned int			scrub_type = meta->sm_type;
+	unsigned int			scrub_type = meta->sv_type;
+	int				error = -meta->sv_ret;
 
 	switch (error) {
 	case 0:
@@ -271,7 +272,7 @@ _("Repair unsuccessful; offline repair required."));
  _("Seems correct but cross-referencing failed; will keep checking."));
 			return 0;
 		}
-	} else if (meta->sm_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) {
+	} else if (meta->sv_flags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED) {
 		if (verbose)
 			str_info(ctx, descr_render(dsc),
 					_("No modification needed."));
diff --git a/scrub/scrub.c b/scrub/scrub.c
index c4b4367e458f..2fb2293558e5 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -23,8 +23,7 @@
 #include "scrub_private.h"
 
 static int scrub_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
-		struct scrub_item *sri, struct xfs_scrub_metadata *meta,
-		int error);
+		struct scrub_item *sri, struct xfs_scrub_vec *vec);
 
 /* Online scrub and repair wrappers. */
 
@@ -62,7 +61,7 @@ void
 scrub_warn_incomplete_scrub(
 	struct scrub_ctx		*ctx,
 	struct descr			*dsc,
-	struct xfs_scrub_metadata	*meta)
+	const struct xfs_scrub_vec	*meta)
 {
 	if (is_incomplete(meta))
 		str_info(ctx, descr_render(dsc), _("Check incomplete."));
@@ -91,8 +90,8 @@ xfs_check_metadata(
 {
 	DEFINE_DESCR(dsc, ctx, format_scrub_descr);
 	struct xfs_scrub_metadata	meta = { };
+	struct xfs_scrub_vec		vec;
 	enum xfrog_scrub_group		group;
-	int				error;
 
 	background_sleep();
 
@@ -120,8 +119,10 @@ xfs_check_metadata(
 
 	dbg_printf("check %s flags %xh\n", descr_render(&dsc), meta.sm_flags);
 
-	error = -xfrog_scrub_metadata(xfdp, &meta);
-	return scrub_epilogue(ctx, &dsc, sri, &meta, error);
+	vec.sv_ret = xfrog_scrub_metadata(xfdp, &meta);
+	vec.sv_type = scrub_type;
+	vec.sv_flags = meta.sm_flags;
+	return scrub_epilogue(ctx, &dsc, sri, &vec);
 }
 
 /*
@@ -133,11 +134,11 @@ scrub_epilogue(
 	struct scrub_ctx		*ctx,
 	struct descr			*dsc,
 	struct scrub_item		*sri,
-	struct xfs_scrub_metadata	*meta,
-	int				error)
+	struct xfs_scrub_vec		*meta)
 {
-	unsigned int			scrub_type = meta->sm_type;
+	unsigned int			scrub_type = meta->sv_type;
 	enum xfrog_scrub_group		group;
+	int				error = -meta->sv_ret;
 
 	group = xfrog_scrubbers[scrub_type].group;
 
@@ -146,7 +147,7 @@ scrub_epilogue(
 		/* No operational errors encountered. */
 		if (!sri->sri_revalidate &&
 		    debug_tweak_on("XFS_SCRUB_FORCE_REPAIR"))
-			meta->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+			meta->sv_flags |= XFS_SCRUB_OFLAG_CORRUPT;
 		break;
 	case ENOENT:
 		/* Metadata not present, just skip it. */
@@ -207,7 +208,7 @@ _("Repairs are required."));
 		}
 
 		/* Schedule repairs. */
-		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
+		scrub_item_save_state(sri, scrub_type, meta->sv_flags);
 		return 0;
 	}
 
@@ -234,7 +235,7 @@ _("Optimization is possible."));
 		}
 
 		/* Schedule optimizations. */
-		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
+		scrub_item_save_state(sri, scrub_type, meta->sv_flags);
 		return 0;
 	}
 
@@ -246,7 +247,7 @@ _("Optimization is possible."));
 	 * deem it completely consistent at some point.
 	 */
 	if (xref_failed(meta) && ctx->mode == SCRUB_MODE_REPAIR) {
-		scrub_item_save_state(sri, scrub_type, meta->sm_flags);
+		scrub_item_save_state(sri, scrub_type, meta->sv_flags);
 		return 0;
 	}
 
diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h
index bcfabda16be1..98a9238f2aac 100644
--- a/scrub/scrub_private.h
+++ b/scrub/scrub_private.h
@@ -13,40 +13,40 @@ int format_scrub_descr(struct scrub_ctx *ctx, char *buf, size_t buflen,
 
 /* Predicates for scrub flag state. */
 
-static inline bool is_corrupt(struct xfs_scrub_metadata *sm)
+static inline bool is_corrupt(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_CORRUPT;
 }
 
-static inline bool is_unoptimized(struct xfs_scrub_metadata *sm)
+static inline bool is_unoptimized(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_PREEN;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_PREEN;
 }
 
-static inline bool xref_failed(struct xfs_scrub_metadata *sm)
+static inline bool xref_failed(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_XFAIL;
 }
 
-static inline bool xref_disagrees(struct xfs_scrub_metadata *sm)
+static inline bool xref_disagrees(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_XCORRUPT;
 }
 
-static inline bool is_incomplete(struct xfs_scrub_metadata *sm)
+static inline bool is_incomplete(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_INCOMPLETE;
 }
 
-static inline bool is_suspicious(struct xfs_scrub_metadata *sm)
+static inline bool is_suspicious(const struct xfs_scrub_vec *sv)
 {
-	return sm->sm_flags & XFS_SCRUB_OFLAG_WARNING;
+	return sv->sv_flags & XFS_SCRUB_OFLAG_WARNING;
 }
 
 /* Should we fix it? */
-static inline bool needs_repair(struct xfs_scrub_metadata *sm)
+static inline bool needs_repair(const struct xfs_scrub_vec *sv)
 {
-	return is_corrupt(sm) || xref_disagrees(sm);
+	return is_corrupt(sv) || xref_disagrees(sv);
 }
 
 /*
@@ -54,13 +54,13 @@ static inline bool needs_repair(struct xfs_scrub_metadata *sm)
  * scan/repair; or if there were cross-referencing problems but the object was
  * not obviously corrupt.
  */
-static inline bool want_retry(struct xfs_scrub_metadata *sm)
+static inline bool want_retry(const struct xfs_scrub_vec *sv)
 {
-	return is_incomplete(sm) || (xref_disagrees(sm) && !is_corrupt(sm));
+	return is_incomplete(sv) || (xref_disagrees(sv) && !is_corrupt(sv));
 }
 
 void scrub_warn_incomplete_scrub(struct scrub_ctx *ctx, struct descr *dsc,
-		struct xfs_scrub_metadata *meta);
+		const struct xfs_scrub_vec *meta);
 
 /* Scrub item functions */
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 07/10] xfs_scrub: vectorize scrub calls
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-07-02  1:23   ` [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec Darrick J. Wong
@ 2024-07-02  1:23   ` Darrick J. Wong
  2024-07-02  6:58     ` Christoph Hellwig
  2024-07-02  1:23   ` [PATCH 08/10] xfs_scrub: vectorize repair calls Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:23 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Use the new vectorized kernel scrub calls to reduce the overhead of
checking metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase1.c        |    2 
 scrub/scrub.c         |  277 ++++++++++++++++++++++++++++++++++++-------------
 scrub/scrub.h         |    2 
 scrub/scrub_private.h |   16 +++
 scrub/xfs_scrub.c     |    1 
 5 files changed, 225 insertions(+), 73 deletions(-)


diff --git a/scrub/phase1.c b/scrub/phase1.c
index 095c045915a7..091b59e57e7b 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -216,6 +216,8 @@ _("Kernel metadata scrubbing facility is not available."));
 		return ECANCELED;
 	}
 
+	check_scrubv(ctx);
+
 	/*
 	 * Normally, callers are required to pass -n if the provided path is a
 	 * readonly filesystem or the kernel wasn't built with online repair
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 2fb2293558e5..0c77f947244a 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -22,11 +22,48 @@
 #include "descr.h"
 #include "scrub_private.h"
 
-static int scrub_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
-		struct scrub_item *sri, struct xfs_scrub_vec *vec);
-
 /* Online scrub and repair wrappers. */
 
+/* Describe the current state of a vectored scrub. */
+int
+format_scrubv_descr(
+	struct scrub_ctx		*ctx,
+	char				*buf,
+	size_t				buflen,
+	void				*where)
+{
+	struct scrubv_descr		*vdesc = where;
+	struct xfrog_scrubv		*scrubv = vdesc->scrubv;
+	struct xfs_scrub_vec_head	*vhead = &scrubv->head;
+	const struct xfrog_scrub_descr	*sc;
+	unsigned int			scrub_type;
+
+	if (vdesc->idx >= 0)
+		scrub_type = scrubv->vectors[vdesc->idx].sv_type;
+	else if (scrubv->head.svh_nr > 0)
+		scrub_type = scrubv->vectors[scrubv->head.svh_nr - 1].sv_type;
+	else
+		scrub_type = XFS_SCRUB_TYPE_PROBE;
+	sc = &xfrog_scrubbers[scrub_type];
+
+	switch (sc->group) {
+	case XFROG_SCRUB_GROUP_AGHEADER:
+	case XFROG_SCRUB_GROUP_PERAG:
+		return snprintf(buf, buflen, _("AG %u %s"), vhead->svh_agno,
+				_(sc->descr));
+	case XFROG_SCRUB_GROUP_INODE:
+		return scrub_render_ino_descr(ctx, buf, buflen,
+				vhead->svh_ino, vhead->svh_gen, "%s",
+				_(sc->descr));
+	case XFROG_SCRUB_GROUP_FS:
+	case XFROG_SCRUB_GROUP_SUMMARY:
+	case XFROG_SCRUB_GROUP_ISCAN:
+	case XFROG_SCRUB_GROUP_NONE:
+		return snprintf(buf, buflen, _("%s"), _(sc->descr));
+	}
+	return -1;
+}
+
 /* Format a scrub description. */
 int
 format_scrub_descr(
@@ -80,51 +117,6 @@ scrub_warn_incomplete_scrub(
 				_("Cross-referencing failed."));
 }
 
-/* Do a read-only check of some metadata. */
-static int
-xfs_check_metadata(
-	struct scrub_ctx		*ctx,
-	struct xfs_fd			*xfdp,
-	unsigned int			scrub_type,
-	struct scrub_item		*sri)
-{
-	DEFINE_DESCR(dsc, ctx, format_scrub_descr);
-	struct xfs_scrub_metadata	meta = { };
-	struct xfs_scrub_vec		vec;
-	enum xfrog_scrub_group		group;
-
-	background_sleep();
-
-	group = xfrog_scrubbers[scrub_type].group;
-	meta.sm_type = scrub_type;
-	switch (group) {
-	case XFROG_SCRUB_GROUP_AGHEADER:
-	case XFROG_SCRUB_GROUP_PERAG:
-		meta.sm_agno = sri->sri_agno;
-		break;
-	case XFROG_SCRUB_GROUP_FS:
-	case XFROG_SCRUB_GROUP_SUMMARY:
-	case XFROG_SCRUB_GROUP_ISCAN:
-	case XFROG_SCRUB_GROUP_NONE:
-		break;
-	case XFROG_SCRUB_GROUP_INODE:
-		meta.sm_ino = sri->sri_ino;
-		meta.sm_gen = sri->sri_gen;
-		break;
-	}
-
-	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
-	assert(scrub_type < XFS_SCRUB_TYPE_NR);
-	descr_set(&dsc, &meta);
-
-	dbg_printf("check %s flags %xh\n", descr_render(&dsc), meta.sm_flags);
-
-	vec.sv_ret = xfrog_scrub_metadata(xfdp, &meta);
-	vec.sv_type = scrub_type;
-	vec.sv_flags = meta.sm_flags;
-	return scrub_epilogue(ctx, &dsc, sri, &vec);
-}
-
 /*
  * Update all internal state after a scrub ioctl call.
  * Returns 0 for success, or ECANCELED to abort the program.
@@ -256,6 +248,87 @@ _("Optimization is possible."));
 	return 0;
 }
 
+/* Fill out the scrub vector header from a scrub item. */
+void
+xfrog_scrubv_from_item(
+	struct xfrog_scrubv		*scrubv,
+	const struct scrub_item		*sri)
+{
+	xfrog_scrubv_init(scrubv);
+
+	if (bg_mode > 1)
+		scrubv->head.svh_rest_us = bg_mode - 1;
+	if (sri->sri_agno != -1)
+		scrubv->head.svh_agno = sri->sri_agno;
+	if (sri->sri_ino != -1ULL) {
+		scrubv->head.svh_ino = sri->sri_ino;
+		scrubv->head.svh_gen = sri->sri_gen;
+	}
+}
+
+/* Add a scrubber to the scrub vector. */
+void
+xfrog_scrubv_add_item(
+	struct xfrog_scrubv		*scrubv,
+	const struct scrub_item		*sri,
+	unsigned int			scrub_type)
+{
+	struct xfs_scrub_vec		*v;
+
+	v = xfrog_scrubv_next_vector(scrubv);
+	v->sv_type = scrub_type;
+}
+
+/* Do a read-only check of some metadata. */
+static int
+scrub_call_kernel(
+	struct scrub_ctx		*ctx,
+	struct xfs_fd			*xfdp,
+	struct scrub_item		*sri)
+{
+	DEFINE_DESCR(dsc, ctx, format_scrubv_descr);
+	struct xfrog_scrubv		scrubv = { };
+	struct scrubv_descr		vdesc = SCRUBV_DESCR(&scrubv);
+	struct xfs_scrub_vec		*v;
+	unsigned int			scrub_type;
+	int				error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
+
+	xfrog_scrubv_from_item(&scrubv, sri);
+	descr_set(&dsc, &vdesc);
+
+	foreach_scrub_type(scrub_type) {
+		if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSCHECK))
+			continue;
+		xfrog_scrubv_add_item(&scrubv, sri, scrub_type);
+
+		dbg_printf("check %s flags %xh tries %u\n", descr_render(&dsc),
+				sri->sri_state[scrub_type],
+				sri->sri_tries[scrub_type]);
+	}
+
+	error = -xfrog_scrubv_metadata(xfdp, &scrubv);
+	if (error)
+		return error;
+
+	foreach_xfrog_scrubv_vec(&scrubv, vdesc.idx, v) {
+		error = scrub_epilogue(ctx, &dsc, sri, v);
+		if (error)
+			return error;
+
+		/*
+		 * Progress is counted by the inode for inode metadata; for
+		 * everything else, it's counted for each scrub call.
+		 */
+		if (!(sri->sri_state[v->sv_type] & SCRUB_ITEM_NEEDSCHECK) &&
+		    sri->sri_ino == -1ULL)
+			progress_add(1);
+	}
+
+	return 0;
+}
+
 /* Bulk-notify user about things that could be optimized. */
 void
 scrub_report_preen_triggers(
@@ -291,6 +364,37 @@ scrub_item_schedule_group(
 	}
 }
 
+/* Decide if we call the kernel again to finish scrub/repair activity. */
+static inline bool
+scrub_item_call_kernel_again_future(
+	struct scrub_item	*sri,
+	uint8_t			work_mask,
+	const struct scrub_item	*old)
+{
+	unsigned int		scrub_type;
+	unsigned int		nr = 0;
+
+	/* If there's nothing to do, we're done. */
+	foreach_scrub_type(scrub_type) {
+		if (sri->sri_state[scrub_type] & work_mask)
+			nr++;
+	}
+	if (!nr)
+		return false;
+
+	foreach_scrub_type(scrub_type) {
+		uint8_t		statex = sri->sri_state[scrub_type] ^
+					 old->sri_state[scrub_type];
+
+		if (statex & work_mask)
+			return true;
+		if (sri->sri_tries[scrub_type] != old->sri_tries[scrub_type])
+			return true;
+	}
+
+	return false;
+}
+
 /* Decide if we call the kernel again to finish scrub/repair activity. */
 bool
 scrub_item_call_kernel_again(
@@ -319,6 +423,29 @@ scrub_item_call_kernel_again(
 	return false;
 }
 
+/*
+ * For each scrub item whose state matches the state_flags, set up the item
+ * state for a kernel call.  Returns true if any work was scheduled.
+ */
+bool
+scrub_item_schedule_work(
+	struct scrub_item	*sri,
+	uint8_t			state_flags)
+{
+	unsigned int		scrub_type;
+	unsigned int		nr = 0;
+
+	foreach_scrub_type(scrub_type) {
+		if (!(sri->sri_state[scrub_type] & state_flags))
+			continue;
+
+		sri->sri_tries[scrub_type] = SCRUB_ITEM_MAX_RETRIES;
+		nr++;
+	}
+
+	return nr > 0;
+}
+
 /* Run all the incomplete scans on this scrub principal. */
 int
 scrub_item_check_file(
@@ -329,8 +456,10 @@ scrub_item_check_file(
 	struct xfs_fd			xfd;
 	struct scrub_item		old_sri;
 	struct xfs_fd			*xfdp = &ctx->mnt;
-	unsigned int			scrub_type;
-	int				error;
+	int				error = 0;
+
+	if (!scrub_item_schedule_work(sri, SCRUB_ITEM_NEEDSCHECK))
+		return 0;
 
 	/*
 	 * If the caller passed us a file descriptor for a scrub, use it
@@ -343,31 +472,15 @@ scrub_item_check_file(
 		xfdp = &xfd;
 	}
 
-	foreach_scrub_type(scrub_type) {
-		if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSCHECK))
-			continue;
-
-		sri->sri_tries[scrub_type] = SCRUB_ITEM_MAX_RETRIES;
-		do {
-			memcpy(&old_sri, sri, sizeof(old_sri));
-			error = xfs_check_metadata(ctx, xfdp, scrub_type, sri);
-			if (error)
-				return error;
-		} while (scrub_item_call_kernel_again(sri, scrub_type,
-					SCRUB_ITEM_NEEDSCHECK, &old_sri));
-
-		/*
-		 * Progress is counted by the inode for inode metadata; for
-		 * everything else, it's counted for each scrub call.
-		 */
-		if (sri->sri_ino == -1ULL)
-			progress_add(1);
-
+	do {
+		memcpy(&old_sri, sri, sizeof(old_sri));
+		error = scrub_call_kernel(ctx, xfdp, sri);
 		if (error)
-			break;
-	}
+			return error;
+	} while (scrub_item_call_kernel_again_future(sri, SCRUB_ITEM_NEEDSCHECK,
+				&old_sri));
 
-	return error;
+	return 0;
 }
 
 /* How many items do we have to check? */
@@ -562,3 +675,21 @@ can_force_rebuild(
 	return __scrub_test(ctx, XFS_SCRUB_TYPE_PROBE,
 			XFS_SCRUB_IFLAG_REPAIR | XFS_SCRUB_IFLAG_FORCE_REBUILD);
 }
+
+void
+check_scrubv(
+	struct scrub_ctx	*ctx)
+{
+	struct xfrog_scrubv	scrubv = { };
+
+	xfrog_scrubv_init(&scrubv);
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_SINGLE"))
+		ctx->mnt.flags |= XFROG_FLAG_SCRUB_FORCE_SINGLE;
+
+	/*
+	 * We set the fallback flag if calling the kernel with a zero-length
+	 * vector doesn't work.
+	 */
+	xfrog_scrubv_metadata(&ctx->mnt, &scrubv);
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 90578108a1c8..183b89379cb4 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -138,6 +138,8 @@ bool can_scrub_parent(struct scrub_ctx *ctx);
 bool can_repair(struct scrub_ctx *ctx);
 bool can_force_rebuild(struct scrub_ctx *ctx);
 
+void check_scrubv(struct scrub_ctx *ctx);
+
 int scrub_file(struct scrub_ctx *ctx, int fd, const struct xfs_bulkstat *bstat,
 		unsigned int type, struct scrub_item *sri);
 
diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h
index 98a9238f2aac..bf53ee5af2cf 100644
--- a/scrub/scrub_private.h
+++ b/scrub/scrub_private.h
@@ -8,9 +8,24 @@
 
 /* Shared code between scrub.c and repair.c. */
 
+void xfrog_scrubv_from_item(struct xfrog_scrubv *scrubv,
+		const struct scrub_item *sri);
+void xfrog_scrubv_add_item(struct xfrog_scrubv *scrubv,
+		const struct scrub_item *sri, unsigned int scrub_type);
+
 int format_scrub_descr(struct scrub_ctx *ctx, char *buf, size_t buflen,
 		void *where);
 
+struct scrubv_descr {
+	struct xfrog_scrubv	*scrubv;
+	int			idx;
+};
+
+#define SCRUBV_DESCR(sv)	{ .scrubv = (sv), .idx = -1 }
+
+int format_scrubv_descr(struct scrub_ctx *ctx, char *buf, size_t buflen,
+		void *where);
+
 /* Predicates for scrub flag state. */
 
 static inline bool is_corrupt(const struct xfs_scrub_vec *sv)
@@ -104,5 +119,6 @@ scrub_item_schedule_retry(struct scrub_item *sri, unsigned int scrub_type)
 bool scrub_item_call_kernel_again(struct scrub_item *sri,
 		unsigned int scrub_type, uint8_t work_mask,
 		const struct scrub_item *old);
+bool scrub_item_schedule_work(struct scrub_item *sri, uint8_t state_flags);
 
 #endif /* XFS_SCRUB_SCRUB_PRIVATE_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index bb316f73e02c..f5b58de12812 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -115,6 +115,7 @@
  * XFS_SCRUB_THREADS		-- start exactly this number of threads
  * XFS_SCRUB_DISK_ERROR_INTERVAL-- simulate a disk error every this many bytes
  * XFS_SCRUB_DISK_VERIFY_SKIP	-- pretend disk verify read calls succeeded
+ * XFS_SCRUB_FORCE_SINGLE	-- fall back to ioctl-per-item scrubbing
  *
  * Available even in non-debug mode:
  * SERVICE_MODE			-- compress all error codes to 1 for LSB


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 08/10] xfs_scrub: vectorize repair calls
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-07-02  1:23   ` [PATCH 07/10] xfs_scrub: vectorize scrub calls Darrick J. Wong
@ 2024-07-02  1:23   ` Darrick J. Wong
  2024-07-02  6:58     ` Christoph Hellwig
  2024-07-02  1:24   ` [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls Darrick J. Wong
  2024-07-02  1:24   ` [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress Darrick J. Wong
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:23 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Use the new vectorized scrub kernel calls to reduce the overhead of
performing repairs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/repair.c        |  268 +++++++++++++++++++++++++++----------------------
 scrub/scrub.c         |   77 +++-----------
 scrub/scrub_private.h |    9 +-
 3 files changed, 166 insertions(+), 188 deletions(-)


diff --git a/scrub/repair.c b/scrub/repair.c
index 7a710a159e69..6a5fd40fd02f 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -20,11 +20,6 @@
 #include "descr.h"
 #include "scrub_private.h"
 
-static int repair_epilogue(struct scrub_ctx *ctx, struct descr *dsc,
-		struct scrub_item *sri, unsigned int repair_flags,
-		const struct xfs_scrub_vec *oldm,
-		const struct xfs_scrub_vec *meta);
-
 /* General repair routines. */
 
 /*
@@ -83,64 +78,14 @@ repair_want_service_downgrade(
 	return false;
 }
 
-/* Repair some metadata. */
-static int
-xfs_repair_metadata(
-	struct scrub_ctx		*ctx,
-	struct xfs_fd			*xfdp,
-	unsigned int			scrub_type,
-	struct scrub_item		*sri,
-	unsigned int			repair_flags)
+static inline void
+restore_oldvec(
+	struct xfs_scrub_vec	*oldvec,
+	const struct scrub_item	*sri,
+	unsigned int		scrub_type)
 {
-	struct xfs_scrub_metadata	meta = { 0 };
-	struct xfs_scrub_vec		oldm, vec;
-	DEFINE_DESCR(dsc, ctx, format_scrub_descr);
-	bool				repair_only;
-
-	/*
-	 * If the caller boosted the priority of this scrub type on behalf of a
-	 * higher level repair by setting IFLAG_REPAIR, turn off REPAIR_ONLY.
-	 */
-	repair_only = (repair_flags & XRM_REPAIR_ONLY) &&
-			scrub_item_type_boosted(sri, scrub_type);
-
-	assert(scrub_type < XFS_SCRUB_TYPE_NR);
-	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
-	meta.sm_type = scrub_type;
-	meta.sm_flags = XFS_SCRUB_IFLAG_REPAIR;
-	if (use_force_rebuild)
-		meta.sm_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD;
-	switch (xfrog_scrubbers[scrub_type].group) {
-	case XFROG_SCRUB_GROUP_AGHEADER:
-	case XFROG_SCRUB_GROUP_PERAG:
-		meta.sm_agno = sri->sri_agno;
-		break;
-	case XFROG_SCRUB_GROUP_INODE:
-		meta.sm_ino = sri->sri_ino;
-		meta.sm_gen = sri->sri_gen;
-		break;
-	default:
-		break;
-	}
-
-	vec.sv_type = scrub_type;
-	vec.sv_flags = sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY;
-	memcpy(&oldm, &vec, sizeof(struct xfs_scrub_vec));
-	if (!is_corrupt(&vec) && repair_only)
-		return 0;
-
-	descr_set(&dsc, &meta);
-
-	if (needs_repair(&vec))
-		str_info(ctx, descr_render(&dsc), _("Attempting repair."));
-	else if (debug || verbose)
-		str_info(ctx, descr_render(&dsc),
-				_("Attempting optimization."));
-
-	vec.sv_ret = xfrog_scrub_metadata(xfdp, &meta);
-	vec.sv_flags = meta.sm_flags;
-
-	return repair_epilogue(ctx, &dsc, sri, repair_flags, &oldm, &vec);
+	oldvec->sv_type = scrub_type;
+	oldvec->sv_flags = sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY;
 }
 
 static int
@@ -149,12 +94,15 @@ repair_epilogue(
 	struct descr			*dsc,
 	struct scrub_item		*sri,
 	unsigned int			repair_flags,
-	const struct xfs_scrub_vec	*oldm,
 	const struct xfs_scrub_vec	*meta)
 {
+	struct xfs_scrub_vec		oldv;
+	struct xfs_scrub_vec		*oldm = &oldv;
 	unsigned int			scrub_type = meta->sv_type;
 	int				error = -meta->sv_ret;
 
+	restore_oldvec(oldm, sri, meta->sv_type);
+
 	switch (error) {
 	case 0:
 		/* No operational errors encountered. */
@@ -296,6 +244,133 @@ _("Repair unsuccessful; offline repair required."));
 	return 0;
 }
 
+/* Decide if the dependent scrub types of the given scrub type are ok. */
+static bool
+repair_item_dependencies_ok(
+	const struct scrub_item	*sri,
+	unsigned int		scrub_type)
+{
+	unsigned int		dep_mask = repair_deps[scrub_type];
+	unsigned int		b;
+
+	for (b = 0; dep_mask && b < XFS_SCRUB_TYPE_NR; b++, dep_mask >>= 1) {
+		if (!(dep_mask & 1))
+			continue;
+		/*
+		 * If this lower level object also needs repair, we can't fix
+		 * the higher level item.
+		 */
+		if (sri->sri_state[b] & SCRUB_ITEM_NEEDSREPAIR)
+			return false;
+	}
+
+	return true;
+}
+
+/* Decide if we want to repair a particular type of metadata. */
+static bool
+can_repair_now(
+	const struct scrub_item	*sri,
+	unsigned int		scrub_type,
+	__u32			repair_mask,
+	unsigned int		repair_flags)
+{
+	struct xfs_scrub_vec	oldvec;
+	bool			repair_only;
+
+	/* Do we even need to repair this thing? */
+	if (!(sri->sri_state[scrub_type] & repair_mask))
+		return false;
+
+	restore_oldvec(&oldvec, sri, scrub_type);
+
+	/*
+	 * If the caller boosted the priority of this scrub type on behalf of a
+	 * higher level repair by setting IFLAG_REPAIR, ignore REPAIR_ONLY.
+	 */
+	repair_only = (repair_flags & XRM_REPAIR_ONLY) &&
+		      !(sri->sri_state[scrub_type] & SCRUB_ITEM_BOOST_REPAIR);
+	if (!is_corrupt(&oldvec) && repair_only)
+		return false;
+
+	/*
+	 * Don't try to repair higher level items if their lower-level
+	 * dependencies haven't been verified, unless this is our last chance
+	 * to fix things without complaint.
+	 */
+	if (!(repair_flags & XRM_FINAL_WARNING) &&
+	    !repair_item_dependencies_ok(sri, scrub_type))
+		return false;
+
+	return true;
+}
+
+/*
+ * Repair some metadata.
+ *
+ * Returns 0 for success (or repair item deferral), or ECANCELED to abort the
+ * program.
+ */
+static int
+repair_call_kernel(
+	struct scrub_ctx		*ctx,
+	struct xfs_fd			*xfdp,
+	struct scrub_item		*sri,
+	__u32				repair_mask,
+	unsigned int			repair_flags)
+{
+	DEFINE_DESCR(dsc, ctx, format_scrubv_descr);
+	struct xfrog_scrubv		scrubv = { };
+	struct scrubv_descr		vdesc = SCRUBV_DESCR(&scrubv);
+	struct xfs_scrub_vec		*v;
+	unsigned int			scrub_type;
+	int				error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
+
+	xfrog_scrubv_from_item(&scrubv, sri);
+	descr_set(&dsc, &vdesc);
+
+	foreach_scrub_type(scrub_type) {
+		if (scrub_excessive_errors(ctx))
+			return ECANCELED;
+
+		if (!can_repair_now(sri, scrub_type, repair_mask,
+					repair_flags))
+			continue;
+
+		xfrog_scrubv_add_item(&scrubv, sri, scrub_type, true);
+
+		if (sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSREPAIR)
+			str_info(ctx, descr_render(&dsc),
+					_("Attempting repair."));
+		else if (debug || verbose)
+			str_info(ctx, descr_render(&dsc),
+					_("Attempting optimization."));
+
+		dbg_printf("repair %s flags %xh tries %u\n", descr_render(&dsc),
+				sri->sri_state[scrub_type],
+				sri->sri_tries[scrub_type]);
+	}
+
+	error = -xfrog_scrubv_metadata(xfdp, &scrubv);
+	if (error)
+		return error;
+
+	foreach_xfrog_scrubv_vec(&scrubv, vdesc.idx, v) {
+		error = repair_epilogue(ctx, &dsc, sri, repair_flags, v);
+		if (error)
+			return error;
+
+		/* Maybe update progress if we fixed the problem. */
+		if (!(repair_flags & XRM_NOPROGRESS) &&
+		    !(sri->sri_state[v->sv_type] & SCRUB_ITEM_REPAIR_ANY))
+			progress_add(1);
+	}
+
+	return 0;
+}
+
 /*
  * Prioritize action items in order of how long we can wait.
  *
@@ -632,29 +707,6 @@ action_list_process(
 	return ret;
 }
 
-/* Decide if the dependent scrub types of the given scrub type are ok. */
-static bool
-repair_item_dependencies_ok(
-	const struct scrub_item	*sri,
-	unsigned int		scrub_type)
-{
-	unsigned int		dep_mask = repair_deps[scrub_type];
-	unsigned int		b;
-
-	for (b = 0; dep_mask && b < XFS_SCRUB_TYPE_NR; b++, dep_mask >>= 1) {
-		if (!(dep_mask & 1))
-			continue;
-		/*
-		 * If this lower level object also needs repair, we can't fix
-		 * the higher level item.
-		 */
-		if (sri->sri_state[b] & SCRUB_ITEM_NEEDSREPAIR)
-			return false;
-	}
-
-	return true;
-}
-
 /*
  * For a given filesystem object, perform all repairs of a given class
  * (corrupt, xcorrupt, xfail, preen) if the repair item says it's needed.
@@ -670,13 +722,14 @@ repair_item_class(
 	struct xfs_fd			xfd;
 	struct scrub_item		old_sri;
 	struct xfs_fd			*xfdp = &ctx->mnt;
-	unsigned int			scrub_type;
 	int				error = 0;
 
 	if (ctx->mode == SCRUB_MODE_DRY_RUN)
 		return 0;
 	if (ctx->mode == SCRUB_MODE_PREEN && !(repair_mask & SCRUB_ITEM_PREEN))
 		return 0;
+	if (!scrub_item_schedule_work(sri, repair_mask))
+		return 0;
 
 	/*
 	 * If the caller passed us a file descriptor for a scrub, use it
@@ -689,39 +742,14 @@ repair_item_class(
 		xfdp = &xfd;
 	}
 
-	foreach_scrub_type(scrub_type) {
-		if (scrub_excessive_errors(ctx))
-			return ECANCELED;
-
-		if (!(sri->sri_state[scrub_type] & repair_mask))
-			continue;
-
-		/*
-		 * Don't try to repair higher level items if their lower-level
-		 * dependencies haven't been verified, unless this is our last
-		 * chance to fix things without complaint.
-		 */
-		if (!(flags & XRM_FINAL_WARNING) &&
-		    !repair_item_dependencies_ok(sri, scrub_type))
-			continue;
-
-		sri->sri_tries[scrub_type] = SCRUB_ITEM_MAX_RETRIES;
-		do {
-			memcpy(&old_sri, sri, sizeof(old_sri));
-			error = xfs_repair_metadata(ctx, xfdp, scrub_type, sri,
-					flags);
-			if (error)
-				return error;
-		} while (scrub_item_call_kernel_again(sri, scrub_type,
-					repair_mask, &old_sri));
-
-		/* Maybe update progress if we fixed the problem. */
-		if (!(flags & XRM_NOPROGRESS) &&
-		    !(sri->sri_state[scrub_type] & SCRUB_ITEM_REPAIR_ANY))
-			progress_add(1);
-	}
-
-	return error;
+	do {
+		memcpy(&old_sri, sri, sizeof(struct scrub_item));
+		error = repair_call_kernel(ctx, xfdp, sri, repair_mask, flags);
+		if (error)
+			return error;
+	} while (scrub_item_call_kernel_again(sri, repair_mask, &old_sri));
+
+	return 0;
 }
 
 /*
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 0c77f947244a..d582dafbbe4e 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -64,35 +64,6 @@ format_scrubv_descr(
 	return -1;
 }
 
-/* Format a scrub description. */
-int
-format_scrub_descr(
-	struct scrub_ctx		*ctx,
-	char				*buf,
-	size_t				buflen,
-	void				*where)
-{
-	struct xfs_scrub_metadata	*meta = where;
-	const struct xfrog_scrub_descr	*sc = &xfrog_scrubbers[meta->sm_type];
-
-	switch (sc->group) {
-	case XFROG_SCRUB_GROUP_AGHEADER:
-	case XFROG_SCRUB_GROUP_PERAG:
-		return snprintf(buf, buflen, _("AG %u %s"), meta->sm_agno,
-				_(sc->descr));
-	case XFROG_SCRUB_GROUP_INODE:
-		return scrub_render_ino_descr(ctx, buf, buflen,
-				meta->sm_ino, meta->sm_gen, "%s",
-				_(sc->descr));
-	case XFROG_SCRUB_GROUP_FS:
-	case XFROG_SCRUB_GROUP_SUMMARY:
-	case XFROG_SCRUB_GROUP_ISCAN:
-	case XFROG_SCRUB_GROUP_NONE:
-		return snprintf(buf, buflen, _("%s"), _(sc->descr));
-	}
-	return -1;
-}
-
 /* Warn about strange circumstances after scrub. */
 void
 scrub_warn_incomplete_scrub(
@@ -271,12 +242,17 @@ void
 xfrog_scrubv_add_item(
 	struct xfrog_scrubv		*scrubv,
 	const struct scrub_item		*sri,
-	unsigned int			scrub_type)
+	unsigned int			scrub_type,
+	bool				want_repair)
 {
 	struct xfs_scrub_vec		*v;
 
 	v = xfrog_scrubv_next_vector(scrubv);
 	v->sv_type = scrub_type;
+	if (want_repair)
+		v->sv_flags |= XFS_SCRUB_IFLAG_REPAIR;
+	if (want_repair && use_force_rebuild)
+		v->sv_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD;
 }
 
 /* Do a read-only check of some metadata. */
@@ -301,7 +277,7 @@ scrub_call_kernel(
 	foreach_scrub_type(scrub_type) {
 		if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSCHECK))
 			continue;
-		xfrog_scrubv_add_item(&scrubv, sri, scrub_type);
+		xfrog_scrubv_add_item(&scrubv, sri, scrub_type, false);
 
 		dbg_printf("check %s flags %xh tries %u\n", descr_render(&dsc),
 				sri->sri_state[scrub_type],
@@ -365,8 +341,8 @@ scrub_item_schedule_group(
 }
 
 /* Decide if we call the kernel again to finish scrub/repair activity. */
-static inline bool
-scrub_item_call_kernel_again_future(
+bool
+scrub_item_call_kernel_again(
 	struct scrub_item	*sri,
 	uint8_t			work_mask,
 	const struct scrub_item	*old)
@@ -382,6 +358,11 @@ scrub_item_call_kernel_again_future(
 	if (!nr)
 		return false;
 
+	/*
+	 * We are willing to go again if the last call had any effect on the
+	 * state of the scrub item that the caller cares about or if the kernel
+	 * asked us to try again.
+	 */
 	foreach_scrub_type(scrub_type) {
 		uint8_t		statex = sri->sri_state[scrub_type] ^
 					 old->sri_state[scrub_type];
@@ -395,34 +376,6 @@ scrub_item_call_kernel_again_future(
 	return false;
 }
 
-/* Decide if we call the kernel again to finish scrub/repair activity. */
-bool
-scrub_item_call_kernel_again(
-	struct scrub_item	*sri,
-	unsigned int		scrub_type,
-	uint8_t			work_mask,
-	const struct scrub_item	*old)
-{
-	uint8_t			statex;
-
-	/* If there's nothing to do, we're done. */
-	if (!(sri->sri_state[scrub_type] & work_mask))
-		return false;
-
-	/*
-	 * We are willing to go again if the last call had any effect on the
-	 * state of the scrub item that the caller cares about, if the freeze
-	 * flag got set, or if the kernel asked us to try again...
-	 */
-	statex = sri->sri_state[scrub_type] ^ old->sri_state[scrub_type];
-	if (statex & work_mask)
-		return true;
-	if (sri->sri_tries[scrub_type] != old->sri_tries[scrub_type])
-		return true;
-
-	return false;
-}
-
 /*
  * For each scrub item whose state matches the state_flags, set up the item
  * state for a kernel call.  Returns true if any work was scheduled.
@@ -477,7 +430,7 @@ scrub_item_check_file(
 		error = scrub_call_kernel(ctx, xfdp, sri);
 		if (error)
 			return error;
-	} while (scrub_item_call_kernel_again_future(sri, SCRUB_ITEM_NEEDSCHECK,
+	} while (scrub_item_call_kernel_again(sri, SCRUB_ITEM_NEEDSCHECK,
 				&old_sri));
 
 	return 0;
diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h
index bf53ee5af2cf..d9de18ce1795 100644
--- a/scrub/scrub_private.h
+++ b/scrub/scrub_private.h
@@ -11,10 +11,8 @@
 void xfrog_scrubv_from_item(struct xfrog_scrubv *scrubv,
 		const struct scrub_item *sri);
 void xfrog_scrubv_add_item(struct xfrog_scrubv *scrubv,
-		const struct scrub_item *sri, unsigned int scrub_type);
-
-int format_scrub_descr(struct scrub_ctx *ctx, char *buf, size_t buflen,
-		void *where);
+		const struct scrub_item *sri, unsigned int scrub_type,
+		bool want_repair);
 
 struct scrubv_descr {
 	struct xfrog_scrubv	*scrubv;
@@ -116,8 +114,7 @@ scrub_item_schedule_retry(struct scrub_item *sri, unsigned int scrub_type)
 	return true;
 }
 
-bool scrub_item_call_kernel_again(struct scrub_item *sri,
-		unsigned int scrub_type, uint8_t work_mask,
+bool scrub_item_call_kernel_again(struct scrub_item *sri, uint8_t work_mask,
 		const struct scrub_item *old);
 bool scrub_item_schedule_work(struct scrub_item *sri, uint8_t state_flags);
 


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-07-02  1:23   ` [PATCH 08/10] xfs_scrub: vectorize repair calls Darrick J. Wong
@ 2024-07-02  1:24   ` Darrick J. Wong
  2024-07-02  6:58     ` Christoph Hellwig
  2024-07-02  1:24   ` [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress Darrick J. Wong
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:24 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Use scrub barriers so that we can submit a single scrub request for a
bunch of things, and have the kernel stop midway through if it finds
anything broken.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase2.c        |   15 ++--------
 scrub/phase3.c        |   17 +----------
 scrub/repair.c        |   32 +++++++++++++++++++-
 scrub/scrub.c         |   77 ++++++++++++++++++++++++++++++++++++++++++++++++-
 scrub/scrub.h         |   17 +++++++++++
 scrub/scrub_private.h |    4 ++-
 6 files changed, 130 insertions(+), 32 deletions(-)


diff --git a/scrub/phase2.c b/scrub/phase2.c
index 57c6d0ef2137..d435da07125a 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -91,21 +91,12 @@ scan_ag_metadata(
 	snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno);
 
 	/*
-	 * First we scrub and fix the AG headers, because we need
-	 * them to work well enough to check the AG btrees.
+	 * First we scrub and fix the AG headers, because we need them to work
+	 * well enough to check the AG btrees.  Then scrub the AG btrees.
 	 */
 	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_AGHEADER);
-	ret = scrub_item_check(ctx, &sri);
-	if (ret)
-		goto err;
-
-	/* Repair header damage. */
-	ret = repair_item_corruption(ctx, &sri);
-	if (ret)
-		goto err;
-
-	/* Now scrub the AG btrees. */
 	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_PERAG);
+
 	ret = scrub_item_check(ctx, &sri);
 	if (ret)
 		goto err;
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 98e5c5a1f9f4..09a1ea452bb9 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -145,25 +145,11 @@ scrub_inode(
 
 	/* Scrub the inode. */
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_INODE);
-	error = scrub_item_check_file(ctx, &sri, fd);
-	if (error)
-		goto out;
-
-	error = try_inode_repair(ictx, &sri, fd);
-	if (error)
-		goto out;
 
 	/* Scrub all block mappings. */
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_BMBTD);
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_BMBTA);
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_BMBTC);
-	error = scrub_item_check_file(ctx, &sri, fd);
-	if (error)
-		goto out;
-
-	error = try_inode_repair(ictx, &sri, fd);
-	if (error)
-		goto out;
 
 	/*
 	 * Check file data contents, e.g. symlink and directory entries.
@@ -182,11 +168,12 @@ scrub_inode(
 
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_XATTR);
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_PARENT);
+
+	/* Try to check and repair the file while it's open. */
 	error = scrub_item_check_file(ctx, &sri, fd);
 	if (error)
 		goto out;
 
-	/* Try to repair the file while it's open. */
 	error = try_inode_repair(ictx, &sri, fd);
 	if (error)
 		goto out;
diff --git a/scrub/repair.c b/scrub/repair.c
index 6a5fd40fd02f..8a28f6b13d87 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -324,6 +324,7 @@ repair_call_kernel(
 	struct scrubv_descr		vdesc = SCRUBV_DESCR(&scrubv);
 	struct xfs_scrub_vec		*v;
 	unsigned int			scrub_type;
+	bool				need_barrier = false;
 	int				error;
 
 	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
@@ -339,6 +340,11 @@ repair_call_kernel(
 					repair_flags))
 			continue;
 
+		if (need_barrier) {
+			xfrog_scrubv_add_barrier(&scrubv);
+			need_barrier = false;
+		}
+
 		xfrog_scrubv_add_item(&scrubv, sri, scrub_type, true);
 
 		if (sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSREPAIR)
@@ -351,6 +357,17 @@ repair_call_kernel(
 		dbg_printf("repair %s flags %xh tries %u\n", descr_render(&dsc),
 				sri->sri_state[scrub_type],
 				sri->sri_tries[scrub_type]);
+
+		/*
+		 * One of the other scrub types depends on this one.  Set us up
+		 * to add a repair barrier if we decide to schedule a repair
+		 * after this one.  If the UNFIXED flag is set, that means this
+		 * is our last chance to fix things, so we skip the barriers
+		 * just let everything run.
+		 */
+		if (!(repair_flags & XRM_FINAL_WARNING) &&
+		    (sri->sri_state[scrub_type] & SCRUB_ITEM_BARRIER))
+			need_barrier = true;
 	}
 
 	error = -xfrog_scrubv_metadata(xfdp, &scrubv);
@@ -358,6 +375,16 @@ repair_call_kernel(
 		return error;
 
 	foreach_xfrog_scrubv_vec(&scrubv, vdesc.idx, v) {
+		/* Deal with barriers separately. */
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) {
+			/* -ECANCELED means the kernel stopped here. */
+			if (v->sv_ret == -ECANCELED)
+				return 0;
+			if (v->sv_ret)
+				return -v->sv_ret;
+			continue;
+		}
+
 		error = repair_epilogue(ctx, &dsc, sri, repair_flags, v);
 		if (error)
 			return error;
@@ -446,7 +473,8 @@ repair_item_boost_priorities(
  * bits are left untouched to force a rescan in phase 4.
  */
 #define MUSTFIX_STATES	(SCRUB_ITEM_CORRUPT | \
-			 SCRUB_ITEM_BOOST_REPAIR)
+			 SCRUB_ITEM_BOOST_REPAIR | \
+			 SCRUB_ITEM_BARRIER)
 /*
  * Figure out which AG metadata must be fixed before we can move on
  * to the inode scan.
@@ -728,7 +756,7 @@ repair_item_class(
 		return 0;
 	if (ctx->mode == SCRUB_MODE_PREEN && !(repair_mask & SCRUB_ITEM_PREEN))
 		return 0;
-	if (!scrub_item_schedule_work(sri, repair_mask))
+	if (!scrub_item_schedule_work(sri, repair_mask, repair_deps))
 		return 0;
 
 	/*
diff --git a/scrub/scrub.c b/scrub/scrub.c
index d582dafbbe4e..44c4049899d2 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -24,6 +24,35 @@
 
 /* Online scrub and repair wrappers. */
 
+/*
+ * Bitmap showing the correctness dependencies between scrub types for scrubs.
+ * Dependencies cannot cross scrub groups.
+ */
+#define DEP(x) (1U << (x))
+static const unsigned int scrub_deps[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_AGF]		= DEP(XFS_SCRUB_TYPE_SB),
+	[XFS_SCRUB_TYPE_AGFL]		= DEP(XFS_SCRUB_TYPE_SB) |
+					  DEP(XFS_SCRUB_TYPE_AGF),
+	[XFS_SCRUB_TYPE_AGI]		= DEP(XFS_SCRUB_TYPE_SB),
+	[XFS_SCRUB_TYPE_BNOBT]		= DEP(XFS_SCRUB_TYPE_AGF),
+	[XFS_SCRUB_TYPE_CNTBT]		= DEP(XFS_SCRUB_TYPE_AGF),
+	[XFS_SCRUB_TYPE_INOBT]		= DEP(XFS_SCRUB_TYPE_AGI),
+	[XFS_SCRUB_TYPE_FINOBT]		= DEP(XFS_SCRUB_TYPE_AGI),
+	[XFS_SCRUB_TYPE_RMAPBT]		= DEP(XFS_SCRUB_TYPE_AGF),
+	[XFS_SCRUB_TYPE_REFCNTBT]	= DEP(XFS_SCRUB_TYPE_AGF),
+	[XFS_SCRUB_TYPE_BMBTD]		= DEP(XFS_SCRUB_TYPE_INODE),
+	[XFS_SCRUB_TYPE_BMBTA]		= DEP(XFS_SCRUB_TYPE_INODE),
+	[XFS_SCRUB_TYPE_BMBTC]		= DEP(XFS_SCRUB_TYPE_INODE),
+	[XFS_SCRUB_TYPE_DIR]		= DEP(XFS_SCRUB_TYPE_BMBTD),
+	[XFS_SCRUB_TYPE_XATTR]		= DEP(XFS_SCRUB_TYPE_BMBTA),
+	[XFS_SCRUB_TYPE_SYMLINK]	= DEP(XFS_SCRUB_TYPE_BMBTD),
+	[XFS_SCRUB_TYPE_PARENT]		= DEP(XFS_SCRUB_TYPE_BMBTD),
+	[XFS_SCRUB_TYPE_QUOTACHECK]	= DEP(XFS_SCRUB_TYPE_UQUOTA) |
+					  DEP(XFS_SCRUB_TYPE_GQUOTA) |
+					  DEP(XFS_SCRUB_TYPE_PQUOTA),
+};
+#undef DEP
+
 /* Describe the current state of a vectored scrub. */
 int
 format_scrubv_descr(
@@ -255,6 +284,20 @@ xfrog_scrubv_add_item(
 		v->sv_flags |= XFS_SCRUB_IFLAG_FORCE_REBUILD;
 }
 
+/* Add a barrier to the scrub vector. */
+void
+xfrog_scrubv_add_barrier(
+	struct xfrog_scrubv		*scrubv)
+{
+	struct xfs_scrub_vec		*v;
+
+	v = xfrog_scrubv_next_vector(scrubv);
+
+	v->sv_type = XFS_SCRUB_TYPE_BARRIER;
+	v->sv_flags = XFS_SCRUB_OFLAG_CORRUPT | XFS_SCRUB_OFLAG_XFAIL |
+		      XFS_SCRUB_OFLAG_XCORRUPT | XFS_SCRUB_OFLAG_INCOMPLETE;
+}
+
 /* Do a read-only check of some metadata. */
 static int
 scrub_call_kernel(
@@ -267,6 +310,7 @@ scrub_call_kernel(
 	struct scrubv_descr		vdesc = SCRUBV_DESCR(&scrubv);
 	struct xfs_scrub_vec		*v;
 	unsigned int			scrub_type;
+	bool				need_barrier = false;
 	int				error;
 
 	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
@@ -277,8 +321,17 @@ scrub_call_kernel(
 	foreach_scrub_type(scrub_type) {
 		if (!(sri->sri_state[scrub_type] & SCRUB_ITEM_NEEDSCHECK))
 			continue;
+
+		if (need_barrier) {
+			xfrog_scrubv_add_barrier(&scrubv);
+			need_barrier = false;
+		}
+
 		xfrog_scrubv_add_item(&scrubv, sri, scrub_type, false);
 
+		if (sri->sri_state[scrub_type] & SCRUB_ITEM_BARRIER)
+			need_barrier = true;
+
 		dbg_printf("check %s flags %xh tries %u\n", descr_render(&dsc),
 				sri->sri_state[scrub_type],
 				sri->sri_tries[scrub_type]);
@@ -289,6 +342,16 @@ scrub_call_kernel(
 		return error;
 
 	foreach_xfrog_scrubv_vec(&scrubv, vdesc.idx, v) {
+		/* Deal with barriers separately. */
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) {
+			/* -ECANCELED means the kernel stopped here. */
+			if (v->sv_ret == -ECANCELED)
+				return 0;
+			if (v->sv_ret)
+				return -v->sv_ret;
+			continue;
+		}
+
 		error = scrub_epilogue(ctx, &dsc, sri, v);
 		if (error)
 			return error;
@@ -383,15 +446,25 @@ scrub_item_call_kernel_again(
 bool
 scrub_item_schedule_work(
 	struct scrub_item	*sri,
-	uint8_t			state_flags)
+	uint8_t			state_flags,
+	const unsigned int	*schedule_deps)
 {
 	unsigned int		scrub_type;
 	unsigned int		nr = 0;
 
 	foreach_scrub_type(scrub_type) {
+		unsigned int	j;
+
+		sri->sri_state[scrub_type] &= ~SCRUB_ITEM_BARRIER;
+
 		if (!(sri->sri_state[scrub_type] & state_flags))
 			continue;
 
+		foreach_scrub_type(j) {
+			if (schedule_deps[scrub_type] & (1U << j))
+				sri->sri_state[j] |= SCRUB_ITEM_BARRIER;
+		}
+
 		sri->sri_tries[scrub_type] = SCRUB_ITEM_MAX_RETRIES;
 		nr++;
 	}
@@ -411,7 +484,7 @@ scrub_item_check_file(
 	struct xfs_fd			*xfdp = &ctx->mnt;
 	int				error = 0;
 
-	if (!scrub_item_schedule_work(sri, SCRUB_ITEM_NEEDSCHECK))
+	if (!scrub_item_schedule_work(sri, SCRUB_ITEM_NEEDSCHECK, scrub_deps))
 		return 0;
 
 	/*
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 183b89379cb4..c3eed1b261d5 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -30,6 +30,9 @@ enum xfrog_scrub_group;
 /* This scrub type needs to be checked. */
 #define SCRUB_ITEM_NEEDSCHECK	(1 << 5)
 
+/* Scrub barrier. */
+#define SCRUB_ITEM_BARRIER	(1 << 6)
+
 /* All of the state flags that we need to prioritize repair work. */
 #define SCRUB_ITEM_REPAIR_ANY	(SCRUB_ITEM_CORRUPT | \
 				 SCRUB_ITEM_PREEN | \
@@ -126,6 +129,20 @@ scrub_item_check(struct scrub_ctx *ctx, struct scrub_item *sri)
 	return scrub_item_check_file(ctx, sri, -1);
 }
 
+/* Count the number of metadata objects still needing a scrub. */
+static inline unsigned int
+scrub_item_count_needscheck(
+	const struct scrub_item		*sri)
+{
+	unsigned int			ret = 0;
+	unsigned int			i;
+
+	foreach_scrub_type(i)
+		if (sri->sri_state[i] & SCRUB_ITEM_NEEDSCHECK)
+			ret++;
+	return ret;
+}
+
 void scrub_report_preen_triggers(struct scrub_ctx *ctx);
 
 bool can_scrub_fs_metadata(struct scrub_ctx *ctx);
diff --git a/scrub/scrub_private.h b/scrub/scrub_private.h
index d9de18ce1795..c5e0a7c08be0 100644
--- a/scrub/scrub_private.h
+++ b/scrub/scrub_private.h
@@ -13,6 +13,7 @@ void xfrog_scrubv_from_item(struct xfrog_scrubv *scrubv,
 void xfrog_scrubv_add_item(struct xfrog_scrubv *scrubv,
 		const struct scrub_item *sri, unsigned int scrub_type,
 		bool want_repair);
+void xfrog_scrubv_add_barrier(struct xfrog_scrubv *scrubv);
 
 struct scrubv_descr {
 	struct xfrog_scrubv	*scrubv;
@@ -116,6 +117,7 @@ scrub_item_schedule_retry(struct scrub_item *sri, unsigned int scrub_type)
 
 bool scrub_item_call_kernel_again(struct scrub_item *sri, uint8_t work_mask,
 		const struct scrub_item *old);
-bool scrub_item_schedule_work(struct scrub_item *sri, uint8_t state_flags);
+bool scrub_item_schedule_work(struct scrub_item *sri, uint8_t state_flags,
+		const unsigned int *schedule_deps);
 
 #endif /* XFS_SCRUB_SCRUB_PRIVATE_H_ */


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress
  2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-07-02  1:24   ` [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls Darrick J. Wong
@ 2024-07-02  1:24   ` Darrick J. Wong
  2024-07-02  6:59     ` Christoph Hellwig
  9 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:24 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Now that we've enabled scrub dependency barriers, it's possible that a
scrub_item_check call will return with some of the scrub items still in
NEEDSCHECK state.  If, for example, scrub type B depends on scrub type
A being clean and A is not clean, B will still be in NEEDSCHECK state.

In order to make as much scanning progress as possible during phase 2
and phase 3, allow ourselves to try some spot repairs in the hopes that
it will enable us to make progress towards at least scanning the whole
metadata item.  If we can't make any forward progress, we'll queue the
scrub item for repair in phase 4, which means that anything still in in
NEEDSCHECK state becomes CORRUPT state.  (At worst, the NEEDSCHECK item
will actually be clean by phase 4, and xfs_scrub will report that it
didn't need any work after all.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase2.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/phase3.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 scrub/repair.c |   15 +++++++++++
 3 files changed, 163 insertions(+), 1 deletion(-)


diff --git a/scrub/phase2.c b/scrub/phase2.c
index d435da07125a..c24d137358c7 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -69,6 +69,53 @@ defer_fs_repair(
 	return 0;
 }
 
+/*
+ * If we couldn't check all the scheduled metadata items, try performing spot
+ * repairs until we check everything or stop making forward progress.
+ */
+static int
+repair_and_scrub_loop(
+	struct scrub_ctx	*ctx,
+	struct scrub_item	*sri,
+	const char		*descr,
+	bool			*defer)
+{
+	unsigned int		to_check;
+	int			ret;
+
+	*defer = false;
+	if (ctx->mode != SCRUB_MODE_REPAIR)
+		return 0;
+
+	to_check = scrub_item_count_needscheck(sri);
+	while (to_check > 0) {
+		unsigned int	nr;
+
+		ret = repair_item_corruption(ctx, sri);
+		if (ret)
+			return ret;
+
+		ret = scrub_item_check(ctx, sri);
+		if (ret)
+			return ret;
+
+		nr = scrub_item_count_needscheck(sri);
+		if (nr == to_check) {
+			/*
+			 * We cannot make forward scanning progress with this
+			 * metadata, so defer the rest until phase 4.
+			 */
+			str_info(ctx, descr,
+ _("Unable to make forward checking progress; will try again in phase 4."));
+			*defer = true;
+			return 0;
+		}
+		to_check = nr;
+	}
+
+	return 0;
+}
+
 /* Scrub each AG's metadata btrees. */
 static void
 scan_ag_metadata(
@@ -82,6 +129,7 @@ scan_ag_metadata(
 	struct scan_ctl			*sctl = arg;
 	char				descr[DESCR_BUFSZ];
 	unsigned int			difficulty;
+	bool				defer_repairs;
 	int				ret;
 
 	if (sctl->aborted)
@@ -97,10 +145,22 @@ scan_ag_metadata(
 	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_AGHEADER);
 	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_PERAG);
 
+	/*
+	 * Try to check all of the AG metadata items that we just scheduled.
+	 * If we return with some types still needing a check, try repairing
+	 * any damaged metadata that we've found so far, and try again.  Abort
+	 * if we stop making forward progress.
+	 */
 	ret = scrub_item_check(ctx, &sri);
 	if (ret)
 		goto err;
 
+	ret = repair_and_scrub_loop(ctx, &sri, descr, &defer_repairs);
+	if (ret)
+		goto err;
+	if (defer_repairs)
+		goto defer;
+
 	/*
 	 * Figure out if we need to perform early fixing.  The only
 	 * reason we need to do this is if the inobt is broken, which
@@ -117,6 +177,7 @@ scan_ag_metadata(
 	if (ret)
 		goto err;
 
+defer:
 	/* Everything else gets fixed during phase 4. */
 	ret = defer_fs_repair(ctx, &sri);
 	if (ret)
@@ -137,11 +198,18 @@ scan_fs_metadata(
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	struct scan_ctl		*sctl = arg;
 	unsigned int		difficulty;
+	bool			defer_repairs;
 	int			ret;
 
 	if (sctl->aborted)
 		goto out;
 
+	/*
+	 * Try to check all of the metadata files that we just scheduled.  If
+	 * we return with some types still needing a check, try repairing any
+	 * damaged metadata that we've found so far, and try again.  Abort if
+	 * we stop making forward progress.
+	 */
 	scrub_item_init_fs(&sri);
 	scrub_item_schedule(&sri, type);
 	ret = scrub_item_check(ctx, &sri);
@@ -150,10 +218,20 @@ scan_fs_metadata(
 		goto out;
 	}
 
+	ret = repair_and_scrub_loop(ctx, &sri, xfrog_scrubbers[type].descr,
+			&defer_repairs);
+	if (ret) {
+		sctl->aborted = true;
+		goto out;
+	}
+	if (defer_repairs)
+		goto defer;
+
 	/* Complain about metadata corruptions that might not be fixable. */
 	difficulty = repair_item_difficulty(&sri);
 	warn_repair_difficulties(ctx, difficulty, xfrog_scrubbers[type].descr);
 
+defer:
 	ret = defer_fs_repair(ctx, &sri);
 	if (ret) {
 		sctl->aborted = true;
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 09a1ea452bb9..046a42c1da8b 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -99,6 +99,58 @@ try_inode_repair(
 	return repair_file_corruption(ictx->ctx, sri, fd);
 }
 
+/*
+ * If we couldn't check all the scheduled file metadata items, try performing
+ * spot repairs until we check everything or stop making forward progress.
+ */
+static int
+repair_and_scrub_inode_loop(
+	struct scrub_ctx	*ctx,
+	struct xfs_bulkstat	*bstat,
+	int			fd,
+	struct scrub_item	*sri,
+	bool			*defer)
+{
+	unsigned int		to_check;
+	int			error;
+
+	*defer = false;
+	if (ctx->mode != SCRUB_MODE_REPAIR)
+		return 0;
+
+	to_check = scrub_item_count_needscheck(sri);
+	while (to_check > 0) {
+		unsigned int	nr;
+
+		error = repair_file_corruption(ctx, sri, fd);
+		if (error)
+			return error;
+
+		error = scrub_item_check_file(ctx, sri, fd);
+		if (error)
+			return error;
+
+		nr = scrub_item_count_needscheck(sri);
+		if (nr == to_check) {
+			char	descr[DESCR_BUFSZ];
+
+			/*
+			 * We cannot make forward scanning progress with this
+			 * inode, so defer the rest until phase 4.
+			 */
+			scrub_render_ino_descr(ctx, descr, DESCR_BUFSZ,
+					bstat->bs_ino, bstat->bs_gen, NULL);
+			str_info(ctx, descr,
+ _("Unable to make forward checking progress; will try again in phase 4."));
+			*defer = true;
+			return 0;
+		}
+		to_check = nr;
+	}
+
+	return 0;
+}
+
 /* Verify the contents, xattrs, and extent maps of an inode. */
 static int
 scrub_inode(
@@ -169,11 +221,28 @@ scrub_inode(
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_XATTR);
 	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_PARENT);
 
-	/* Try to check and repair the file while it's open. */
+	/*
+	 * Try to check all of the metadata items that we just scheduled.  If
+	 * we return with some types still needing a check and the space
+	 * metadata isn't also in need of repairs, try repairing any damaged
+	 * file metadata that we've found so far, and try checking the file
+	 * again.  Worst case, defer the repairs and the checks to phase 4 if
+	 * we can't make any progress on anything.
+	 */
 	error = scrub_item_check_file(ctx, &sri, fd);
 	if (error)
 		goto out;
 
+	if (!ictx->always_defer_repairs) {
+		bool	defer_repairs;
+
+		error = repair_and_scrub_inode_loop(ctx, bstat, fd, &sri,
+				&defer_repairs);
+		if (error || defer_repairs)
+			goto out;
+	}
+
+	/* Try to repair the file while it's open. */
 	error = try_inode_repair(ictx, &sri, fd);
 	if (error)
 		goto out;
diff --git a/scrub/repair.c b/scrub/repair.c
index 8a28f6b13d87..e594e704f515 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -860,6 +860,7 @@ repair_item_to_action_item(
 	struct action_item	**aitemp)
 {
 	struct action_item	*aitem;
+	unsigned int		scrub_type;
 
 	if (repair_item_count_needsrepair(sri) == 0)
 		return 0;
@@ -875,6 +876,20 @@ repair_item_to_action_item(
 	INIT_LIST_HEAD(&aitem->list);
 	memcpy(&aitem->sri, sri, sizeof(struct scrub_item));
 
+	/*
+	 * If the scrub item indicates that there is unchecked metadata, assume
+	 * that the scrub type checker depends on something that couldn't be
+	 * fixed.  Mark that type as corrupt so that phase 4 will try it again.
+	 */
+	foreach_scrub_type(scrub_type) {
+		__u8		*state = aitem->sri.sri_state;
+
+		if (state[scrub_type] & SCRUB_ITEM_NEEDSCHECK) {
+			state[scrub_type] &= ~SCRUB_ITEM_NEEDSCHECK;
+			state[scrub_type] |= SCRUB_ITEM_CORRUPT;
+		}
+	}
+
 	*aitemp = aitem;
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [PATCH 1/1] xfs_repair: allow symlinks with short remote targets
  2024-07-02  0:53 ` [PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok Darrick J. Wong
@ 2024-07-02  1:24   ` Darrick J. Wong
  2024-07-02  6:59     ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02  1:24 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Symbolic links can have extended attributes.  If the attr fork consumes
enough space in the inode record, a shortform symlink can become a
remote symlink.  However, if we delete those extended attributes, the
target is not moved back into the inode core.

IOWs, we can end up with a symlink inode that looks like this:

core.magic = 0x494e
core.mode = 0120777
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.nextents = 1
core.size = 297
core.nblocks = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,12,1,0]

This is a symbolic link with a 297-byte target stored in a disk block,
which is to say this is a symlink with a remote target.  The forkoff is
0, which is to say that there's 512 - 176 == 336 bytes in the inode core
to store the data fork.

Prior to kernel commit 1eb70f54c445f, the kernel was ok with this
arrangement, but the change to symlink validation in that patch now
produces corruption errors on filesystems written by older kernels that
are not otherwise inconsistent.  Those changes were inspired by reports
of illegal memory accesses, which I think were a result of making data
fork access decisions based on symlink di_size and not on di_format.

Unfortunately, for a very long time xfs_repair has flagged these inodes
as being corrupt, even though the kernel has historically been willing
to read and write symlinks with these properties.  Resolve the conflict
by adjusting the xfs_repair corruption tests to allow extents format.
This change matches the kernel patch "xfs: allow symlinks with short
remote targets".

While we're at it, fix a lurking bad symlink fork access.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/dinode.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 168cbf484906..e36de9bf1a1b 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -1036,7 +1036,8 @@ process_symlink_extlist(
 	int			max_blocks;
 
 	if (be64_to_cpu(dino->di_size) <= XFS_DFORK_DSIZE(dino, mp)) {
-		if (dino->di_format == XFS_DINODE_FMT_LOCAL)
+		if (dino->di_format == XFS_DINODE_FMT_LOCAL ||
+		    dino->di_format == XFS_DINODE_FMT_EXTENTS)
 			return 0;
 		do_warn(
 _("mismatch between format (%d) and size (%" PRId64 ") in symlink ino %" PRIu64 "\n"),
@@ -1368,7 +1369,7 @@ process_symlink(
 	 * get symlink contents into data area
 	 */
 	symlink = &data[0];
-	if (be64_to_cpu(dino->di_size) <= XFS_DFORK_DSIZE(dino, mp))  {
+	if (dino->di_format == XFS_DINODE_FMT_LOCAL) {
 		/*
 		 * local symlink, just copy the symlink out of the
 		 * inode into the data area


^ permalink raw reply related	[flat|nested] 296+ messages in thread

* Re: [PATCH 01/12] man: document the exchange-range ioctl
  2024-07-02  0:53   ` [PATCH 01/12] man: document the exchange-range ioctl Darrick J. Wong
@ 2024-07-02  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE
  2024-07-02  0:54   ` [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE Darrick J. Wong
@ 2024-07-02  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/12] libhandle: add support for bulkstat v5
  2024-07-02  0:54   ` [PATCH 03/12] libhandle: add support for bulkstat v5 Darrick J. Wong
@ 2024-07-02  5:09     ` Christoph Hellwig
  2024-07-02 19:48       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

> +extern void jdm_new_filehandle_v5(jdm_filehandle_t **handlep, size_t *hlen,
> +		jdm_fshandle_t *fshandlep, struct xfs_bulkstat *sp);

No need for the extern here.  Same for the others below.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 04/12] libfrog: add support for exchange range ioctl family
  2024-07-02  0:54   ` [PATCH 04/12] libfrog: add support for exchange range ioctl family Darrick J. Wong
@ 2024-07-02  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 05/12] xfs_db: advertise exchange-range in the version command
  2024-07-02  0:54   ` [PATCH 05/12] xfs_db: advertise exchange-range in the version command Darrick J. Wong
@ 2024-07-02  5:10     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/12] xfs_logprint: support dumping exchmaps log items
  2024-07-02  0:55   ` [PATCH 06/12] xfs_logprint: support dumping exchmaps log items Darrick J. Wong
@ 2024-07-02  5:11     ` Christoph Hellwig
  2024-07-02 19:49       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

> +extern int xlog_print_trans_xmi(char **ptr, uint src_len, int continued);
> +extern void xlog_recover_print_xmi(struct xlog_recover_item *item);
> +extern int xlog_print_trans_xmd(char **ptr, uint len);
> +extern void xlog_recover_print_xmd(struct xlog_recover_item *item);

Drop the pointless externs here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls
  2024-07-02  0:55   ` [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls Darrick J. Wong
@ 2024-07-02  5:13     ` Christoph Hellwig
  2024-07-02 19:51       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

> +			if (lseek(file_fd->fd, outmap[extent].bmv_length, SEEK_CUR) < 0) {
>  				fsrprintf(_("could not lseek in file: %s : %s\n"),

Maybe avoid the overly long lines here?

> -int	read_fd_bmap(int fd, struct xfs_bstat *sin, int *cur_nextents)
> +int	read_fd_bmap(int fd, struct xfs_bulkstat *sin, int *cur_nextents)

Maybe fix the weird formatting here while you're at it?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations
  2024-07-02  0:55   ` [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations Darrick J. Wong
@ 2024-07-02  5:14     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl
  2024-07-02  0:55   ` [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl Darrick J. Wong
@ 2024-07-02  5:15     ` Christoph Hellwig
  2024-07-02 19:53       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 05:55:49PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create a new xfs_Io command to make raw calls to the

s/xfs_Io/xfs_io/ ?

> +extern void		exchangerange_init(void);

No need for the extern.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 10/12] libfrog: advertise exchange-range support
  2024-07-02  0:56   ` [PATCH 10/12] libfrog: advertise exchange-range support Darrick J. Wong
@ 2024-07-02  5:15     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 11/12] xfs_repair: add exchange-range to file systems
  2024-07-02  0:56   ` [PATCH 11/12] xfs_repair: add exchange-range to file systems Darrick J. Wong
@ 2024-07-02  5:15     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 12/12] mkfs: add a formatting option for exchange-range
  2024-07-02  0:56   ` [PATCH 12/12] mkfs: add a formatting option for exchange-range Darrick J. Wong
@ 2024-07-02  5:16     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args
  2024-07-02  0:56   ` [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args Darrick J. Wong
@ 2024-07-02  5:16     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/3] libxfs: port the bumplink function from the kernel
  2024-07-02  0:57   ` [PATCH 2/3] libxfs: port the bumplink function from the kernel Darrick J. Wong
@ 2024-07-02  5:16     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count
  2024-07-02  0:57   ` [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count Darrick J. Wong
@ 2024-07-02  5:17     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 05:57:23PM -0700, Darrick J. Wong wrote:
>  	case sizeof(uint32_t):
> -		irec->ino_un.ex_data->counted_nlinks.un32[ino_offset]++;
> +		if (irec->ino_un.ex_data->counted_nlinks.un32[ino_offset] != XFS_NLINK_PINNED)

Avoid the long line here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 01/13] xfs_scrub: use proper UChar string iterators
  2024-07-02  0:57   ` [PATCH 01/13] xfs_scrub: use proper UChar string iterators Darrick J. Wong
@ 2024-07-02  5:18     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 05:57:39PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> For code that wants to examine a UChar string, use libicu's string
> iterators to walk UChar strings, instead of the open-coded U16_NEXT*
> macros that perform no typechecking.

I don't claim to understand libicu, but the code looks sane to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters
  2024-07-02  0:57   ` [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters Darrick J. Wong
@ 2024-07-02  5:21     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-02  0:58   ` [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points Darrick J. Wong
@ 2024-07-02  5:22     ` Christoph Hellwig
  2024-07-03  1:59       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 05:58:10PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> I missed a few non-rendering code points in the "zero width"
> classification code.  Add them now, and sort the list.
> 
> $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt

Should this be automated?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry
  2024-07-02  0:58   ` [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry Darrick J. Wong
@ 2024-07-02  5:22     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths
  2024-07-02  0:58   ` [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths Darrick J. Wong
@ 2024-07-02  5:22     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate
  2024-07-02  0:58   ` [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate Darrick J. Wong
@ 2024-07-02  5:23     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/13] xfs_scrub: store bad flags with the name entry
  2024-07-02  0:59   ` [PATCH 07/13] xfs_scrub: store bad flags with the name entry Darrick J. Wong
@ 2024-07-02  5:23     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE
  2024-07-02  0:59   ` [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE Darrick J. Wong
@ 2024-07-02  5:23     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags
  2024-07-02  0:59   ` [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags Darrick J. Wong
@ 2024-07-02  5:24     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 10/13] xfs_scrub: reduce size of struct name_entry
  2024-07-02  0:59   ` [PATCH 10/13] xfs_scrub: reduce size of struct name_entry Darrick J. Wong
@ 2024-07-02  5:24     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer
  2024-07-02  1:00   ` [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer Darrick J. Wong
@ 2024-07-02  5:24     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 12/13] xfs_scrub: report deceptive file extensions
  2024-07-02  1:00   ` [PATCH 12/13] xfs_scrub: report deceptive file extensions Darrick J. Wong
@ 2024-07-02  5:25     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 13/13] xfs_scrub: dump unicode points
  2024-07-02  1:00   ` [PATCH 13/13] xfs_scrub: dump unicode points Darrick J. Wong
@ 2024-07-02  5:25     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/8] xfs_scrub: move FITRIM to phase 8
  2024-07-02  1:01   ` [PATCH 1/8] xfs_scrub: move FITRIM to phase 8 Darrick J. Wong
@ 2024-07-02  5:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim
  2024-07-02  1:01   ` [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim Darrick J. Wong
@ 2024-07-02  5:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/8] xfs_scrub: collapse trim_filesystem
  2024-07-02  1:01   ` [PATCH 3/8] xfs_scrub: collapse trim_filesystem Darrick J. Wong
@ 2024-07-02  5:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8
  2024-07-02  1:01   ` [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8 Darrick J. Wong
@ 2024-07-02  5:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: report FITRIM errors properly
  2024-07-02  1:02   ` [PATCH 5/8] xfs_scrub: report FITRIM errors properly Darrick J. Wong
@ 2024-07-02  5:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors
  2024-07-02  1:02   ` [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors Darrick J. Wong
@ 2024-07-02  5:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance
  2024-07-02  1:02   ` [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance Darrick J. Wong
@ 2024-07-02  5:30     ` Christoph Hellwig
  2024-07-03  3:45       ` Darrick J. Wong
  2024-07-03  3:52     ` [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem Darrick J. Wong
  1 sibling, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 06:02:36PM -0700, Darrick J. Wong wrote:
> Therefore, we created a second implementation that will walk the bnobt
> and perform the trims in block number order.  This implementation avoids
> the worst problems of the original code, though it lacks the desirable
> attribute of freeing the biggest chunks first.
> 
> On the other hand, this second implementation will be much easier to
> constrain the system call latency, and makes it much easier to report
> fstrim progress to anyone who's running xfs_scrub.  Skip the first block
> of each AG to ensure that we get the sub-AG algorithm.

I completely fail to understand how skipping the first block improves
performance and is otherwise a good idea.  What am I missing?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/7] libfrog: hoist free space histogram code
  2024-07-02  1:03   ` [PATCH 1/7] libfrog: hoist free space histogram code Darrick J. Wong
@ 2024-07-02  5:32     ` Christoph Hellwig
  2024-07-03  2:47       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

> +struct histent
> +{

The braces goes to the previous line for our normal coding style.

Also the naming with histend/histogram the naming seems a bit too
generic for something very block specific.  Should the naming be
adjusted a bit?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/7] libfrog: print wider columns for free space histogram
  2024-07-02  1:03   ` [PATCH 2/7] libfrog: print wider columns for free space histogram Darrick J. Wong
@ 2024-07-02  5:33     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/7] libfrog: print cdf of free space buckets
  2024-07-02  1:03   ` [PATCH 3/7] libfrog: print cdf of free space buckets Darrick J. Wong
@ 2024-07-02  5:33     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar
  2024-07-02  1:03   ` [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar Darrick J. Wong
@ 2024-07-02  5:33     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments
  2024-07-02  1:04   ` [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments Darrick J. Wong
@ 2024-07-02  5:33     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7
  2024-07-02  1:04   ` [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7 Darrick J. Wong
@ 2024-07-02  5:34     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-02  1:04   ` [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms Darrick J. Wong
@ 2024-07-02  5:36     ` Christoph Hellwig
  2024-07-03  2:29       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 06:04:41PM -0700, Darrick J. Wong wrote:
> Add a new -o fstrim_pct= option to xfs_scrub just in case there are
> users out there who want a different percentage.  For example, accepting
> a 95% trim would net us a speed increase of nearly two orders of
> magnitude, ignoring system call overhead.  Setting it to 100% will trim
> everything, just like fstrim(8).

It might also make sense to default the parameter to the
discard_granularity reported in sysfs.  While a lot of devices don't
report a useful value there, it avoids pointless work for those that
do.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing
  2024-07-02  1:04   ` [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing Darrick J. Wong
@ 2024-07-02  5:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/6] xfs_scrub_fail: tighten up the security on the background systemd service
  2024-07-02  1:06   ` [PATCH 5/6] xfs_scrub_fail: " Darrick J. Wong
@ 2024-07-02  5:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 6/6] xfs_scrub_all: tighten up the security on the background systemd service
  2024-07-02  1:06   ` [PATCH 6/6] xfs_scrub_all: " Darrick J. Wong
@ 2024-07-02  5:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode
  2024-07-02  1:06   ` [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode Darrick J. Wong
@ 2024-07-02  5:38     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/6] xfs_scrub_all: remove journalctl background process
  2024-07-02  1:06   ` [PATCH 2/6] xfs_scrub_all: remove journalctl background process Darrick J. Wong
@ 2024-07-02  5:38     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems
  2024-07-02  1:07   ` [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems Darrick J. Wong
@ 2024-07-02  5:39     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically
  2024-07-02  1:07   ` [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically Darrick J. Wong
@ 2024-07-02  5:39     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month
  2024-07-02  1:07   ` [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month Darrick J. Wong
@ 2024-07-02  5:39     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job
  2024-07-02  1:07   ` [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job Darrick J. Wong
@ 2024-07-02  5:39     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object
  2024-07-02  1:08   ` [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object Darrick J. Wong
@ 2024-07-02  5:40     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl code in an object
  2024-07-02  1:08   ` [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl " Darrick J. Wong
@ 2024-07-02  5:41     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging
  2024-07-02  1:08   ` [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging Darrick J. Wong
@ 2024-07-02  5:42     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus
  2024-07-02  1:08   ` [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus Darrick J. Wong
@ 2024-07-02  5:42     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls
  2024-07-02  1:09   ` [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls Darrick J. Wong
@ 2024-07-02  5:42     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode
  2024-07-02  1:09   ` [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode Darrick J. Wong
@ 2024-07-02  5:43     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/3] xfs_scrub: add an optimization-only mode
  2024-07-02  1:09   ` [PATCH 2/3] xfs_scrub: add an optimization-only mode Darrick J. Wong
@ 2024-07-02  5:43     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-02  1:09   ` [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default Darrick J. Wong
@ 2024-07-02  5:44     ` Christoph Hellwig
  2024-07-03  2:59       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

On Mon, Jul 01, 2024 at 06:09:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that we're finished building online fsck, enable the periodic
> background scrub service by default.  This involves the postinst script
> starting the resource management slice and the timer.
> 
> No other sub-services need to be enabled or unmasked explicitly.  They
> also shouldn't be started or restarted because that might interrupt
> background operation unnecessarily.

I'm still not sure we quite want to run this by default on all
debian systems.  Are you ready to handle all the bug reports from
people complaining about the I/O load?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades
  2024-07-02  1:10   ` [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong
@ 2024-07-02  5:44     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Chandan Babu R, Dave Chinner, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute
  2024-07-02  1:10   ` [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute Darrick J. Wong
@ 2024-07-02  5:44     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] xfs_repair: check for unknown flags in attr entries
  2024-07-02  1:10   ` [PATCH 3/3] xfs_repair: check for unknown flags in attr entries Darrick J. Wong
@ 2024-07-02  5:45     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  5:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers
  2024-07-02  1:10   ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong
@ 2024-07-02  6:24     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function
  2024-07-02  1:11   ` [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function Darrick J. Wong
@ 2024-07-02  6:24     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/24] xfs_logprint: dump new attr log item fields
  2024-07-02  1:11   ` [PATCH 03/24] xfs_logprint: dump new attr log item fields Darrick J. Wong
@ 2024-07-02  6:25     ` Christoph Hellwig
  2024-07-02 20:02       ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

> +extern int xlog_print_trans_attri_name(char **ptr, uint src_len,
> +		const char *tag);
> +extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len,
> +		const char *tag);

Maybe drop the pointless externs?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl
  2024-07-02  1:11   ` [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl Darrick J. Wong
@ 2024-07-02  6:25     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 05/24] libfrog: report parent pointers to userspace
  2024-07-02  1:11   ` [PATCH 05/24] libfrog: report parent pointers to userspace Darrick J. Wong
@ 2024-07-02  6:25     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good;

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/24] libfrog: add parent pointer support code
  2024-07-02  1:12   ` [PATCH 06/24] libfrog: add parent pointer support code Darrick J. Wong
@ 2024-07-02  6:26     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls
  2024-07-02  1:12   ` [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls Darrick J. Wong
@ 2024-07-02  6:26     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 08/24] xfs_io: Add i, n and f flags to parent command
  2024-07-02  1:12   ` [PATCH 08/24] xfs_io: Add i, n and f flags to parent command Darrick J. Wong
@ 2024-07-02  6:26     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully
  2024-07-02  1:13   ` [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully Darrick J. Wong
@ 2024-07-02  6:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 10/24] xfs_spaceman: report file paths
  2024-07-02  1:13   ` [PATCH 10/24] xfs_spaceman: report file paths Darrick J. Wong
@ 2024-07-02  6:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations
  2024-07-02  1:13   ` [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
@ 2024-07-02  6:27     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data
  2024-07-02  1:13   ` [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data Darrick J. Wong
@ 2024-07-02  6:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 13/24] xfs_db: report parent pointers in version command
  2024-07-02  1:14   ` [PATCH 13/24] xfs_db: report parent pointers in version command Darrick J. Wong
@ 2024-07-02  6:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 14/24] xfs_db: report parent bit on xattrs
  2024-07-02  1:14   ` [PATCH 14/24] xfs_db: report parent bit on xattrs Darrick J. Wong
@ 2024-07-02  6:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs
  2024-07-02  1:14   ` [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs Darrick J. Wong
@ 2024-07-02  6:28     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently
  2024-07-02  1:14   ` [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently Darrick J. Wong
@ 2024-07-02  6:33     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h
  2024-07-02  1:15   ` [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
@ 2024-07-02  6:35     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 18/24] xfs_db: add a parents command to list the parents of a file
  2024-07-02  1:15   ` [PATCH 18/24] xfs_db: add a parents command to list the parents of a file Darrick J. Wong
@ 2024-07-02  6:35     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers
  2024-07-02  1:15   ` [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers Darrick J. Wong
@ 2024-07-02  6:36     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 20/24] xfs_db: add link and unlink expert commands
  2024-07-02  1:15   ` [PATCH 20/24] xfs_db: add link and unlink expert commands Darrick J. Wong
@ 2024-07-02  6:36     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 21/24] xfs_db: compute hashes of parent pointers
  2024-07-02  1:16   ` [PATCH 21/24] xfs_db: compute hashes of parent pointers Darrick J. Wong
@ 2024-07-02  6:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 22/24] libxfs: create new files with attr forks if necessary
  2024-07-02  1:16   ` [PATCH 22/24] libxfs: create new files with attr forks if necessary Darrick J. Wong
@ 2024-07-02  6:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 23/24] mkfs: Add parent pointers during protofile creation
  2024-07-02  1:16   ` [PATCH 23/24] mkfs: Add parent pointers during protofile creation Darrick J. Wong
@ 2024-07-02  6:37     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 24/24] mkfs: enable formatting with parent pointers
  2024-07-02  1:16   ` [PATCH 24/24] mkfs: enable formatting with parent pointers Darrick J. Wong
@ 2024-07-02  6:38     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, Allison Henderson, catherine.hoang, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/2] xfs: create a blob array data structure
  2024-07-02  1:17   ` [PATCH 1/2] xfs: create a blob array data structure Darrick J. Wong
@ 2024-07-02  6:41     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers
  2024-07-02  1:17   ` [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers Darrick J. Wong
@ 2024-07-02  6:41     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set
  2024-07-02  1:17   ` [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set Darrick J. Wong
@ 2024-07-02  6:41     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set
  2024-07-02  1:17   ` [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set Darrick J. Wong
@ 2024-07-02  6:41     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them
  2024-07-02  1:18   ` [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them Darrick J. Wong
@ 2024-07-02  6:42     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found
  2024-07-02  1:18   ` [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found Darrick J. Wong
@ 2024-07-02  6:42     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents
  2024-07-02  1:18   ` [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents Darrick J. Wong
@ 2024-07-02  6:43     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/12] xfs_repair: build a parent pointer index
  2024-07-02  1:19   ` [PATCH 06/12] xfs_repair: build a parent pointer index Darrick J. Wong
@ 2024-07-02  6:45     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object
  2024-07-02  1:19   ` [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
@ 2024-07-02  6:45     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob
  2024-07-02  1:19   ` [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
@ 2024-07-02  6:46     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/12] xfs_repair: check parent pointers
  2024-07-02  1:19   ` [PATCH 09/12] xfs_repair: check parent pointers Darrick J. Wong
@ 2024-07-02  6:47     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes
  2024-07-02  1:20   ` [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
@ 2024-07-02  6:47     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 11/12] xfs_repair: update ondisk parent pointer records
  2024-07-02  1:20   ` [PATCH 11/12] xfs_repair: update ondisk parent pointer records Darrick J. Wong
@ 2024-07-02  6:47     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none
  2024-07-02  1:20   ` [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
@ 2024-07-02  6:47     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library
  2024-07-02  1:20   ` [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library Darrick J. Wong
@ 2024-07-02  6:48     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information
  2024-07-02  1:21   ` [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information Darrick J. Wong
@ 2024-07-02  6:48     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names
  2024-07-02  1:21   ` [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names Darrick J. Wong
@ 2024-07-02  6:48     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions
  2024-07-02  1:21   ` [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions Darrick J. Wong
@ 2024-07-02  6:49     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails
  2024-07-02  1:21   ` [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails Darrick J. Wong
@ 2024-07-02  6:49     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 01/10] man: document vectored scrub mode
  2024-07-02  1:22   ` [PATCH 01/10] man: document vectored scrub mode Darrick J. Wong
@ 2024-07-02  6:56     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 02/10] libfrog: support vectored scrub
  2024-07-02  1:22   ` [PATCH 02/10] libfrog: support vectored scrub Darrick J. Wong
@ 2024-07-02  6:56     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/10] xfs_io: support vectored scrub
  2024-07-02  1:22   ` [PATCH 03/10] xfs_io: " Darrick J. Wong
@ 2024-07-02  6:57     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function
  2024-07-02  1:22   ` [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function Darrick J. Wong
@ 2024-07-02  6:57     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 05/10] xfs_scrub: split the repair epilogue code into a separate function
  2024-07-02  1:23   ` [PATCH 05/10] xfs_scrub: split the repair " Darrick J. Wong
@ 2024-07-02  6:57     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec
  2024-07-02  1:23   ` [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec Darrick J. Wong
@ 2024-07-02  6:57     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/10] xfs_scrub: vectorize scrub calls
  2024-07-02  1:23   ` [PATCH 07/10] xfs_scrub: vectorize scrub calls Darrick J. Wong
@ 2024-07-02  6:58     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 08/10] xfs_scrub: vectorize repair calls
  2024-07-02  1:23   ` [PATCH 08/10] xfs_scrub: vectorize repair calls Darrick J. Wong
@ 2024-07-02  6:58     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls
  2024-07-02  1:24   ` [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls Darrick J. Wong
@ 2024-07-02  6:58     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress
  2024-07-02  1:24   ` [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress Darrick J. Wong
@ 2024-07-02  6:59     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/1] xfs_repair: allow symlinks with short remote targets
  2024-07-02  1:24   ` [PATCH 1/1] xfs_repair: allow symlinks with short remote targets Darrick J. Wong
@ 2024-07-02  6:59     ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-02  6:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/12] libhandle: add support for bulkstat v5
  2024-07-02  5:09     ` Christoph Hellwig
@ 2024-07-02 19:48       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02 19:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:09:20AM +0200, Christoph Hellwig wrote:
> > +extern void jdm_new_filehandle_v5(jdm_filehandle_t **handlep, size_t *hlen,
> > +		jdm_fshandle_t *fshandlep, struct xfs_bulkstat *sp);
> 
> No need for the extern here.  Same for the others below.

Ok, will remove.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 06/12] xfs_logprint: support dumping exchmaps log items
  2024-07-02  5:11     ` Christoph Hellwig
@ 2024-07-02 19:49       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02 19:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:11:54AM +0200, Christoph Hellwig wrote:
> > +extern int xlog_print_trans_xmi(char **ptr, uint src_len, int continued);
> > +extern void xlog_recover_print_xmi(struct xlog_recover_item *item);
> > +extern int xlog_print_trans_xmd(char **ptr, uint len);
> > +extern void xlog_recover_print_xmd(struct xlog_recover_item *item);
> 
> Drop the pointless externs here.

Done.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls
  2024-07-02  5:13     ` Christoph Hellwig
@ 2024-07-02 19:51       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02 19:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:13:46AM +0200, Christoph Hellwig wrote:
> > +			if (lseek(file_fd->fd, outmap[extent].bmv_length, SEEK_CUR) < 0) {
> >  				fsrprintf(_("could not lseek in file: %s : %s\n"),
> 
> Maybe avoid the overly long lines here?
> 
> > -int	read_fd_bmap(int fd, struct xfs_bstat *sin, int *cur_nextents)
> > +int	read_fd_bmap(int fd, struct xfs_bulkstat *sin, int *cur_nextents)
> 
> Maybe fix the weird formatting here while you're at it?

Both fixed.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl
  2024-07-02  5:15     ` Christoph Hellwig
@ 2024-07-02 19:53       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02 19:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:15:10AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 01, 2024 at 05:55:49PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create a new xfs_Io command to make raw calls to the
> 
> s/xfs_Io/xfs_io/ ?
> 
> > +extern void		exchangerange_init(void);
> 
> No need for the extern.
> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Both fixed, thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/24] xfs_logprint: dump new attr log item fields
  2024-07-02  6:25     ` Christoph Hellwig
@ 2024-07-02 20:02       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-02 20:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, catherine.hoang, linux-xfs, allison.henderson

On Tue, Jul 02, 2024 at 08:25:03AM +0200, Christoph Hellwig wrote:
> > +extern int xlog_print_trans_attri_name(char **ptr, uint src_len,
> > +		const char *tag);
> > +extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len,
> > +		const char *tag);
> 
> Maybe drop the pointless externs?
> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Done, thanks.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-02  5:22     ` Christoph Hellwig
@ 2024-07-03  1:59       ` Darrick J. Wong
  2024-07-03  4:27         ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  1:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:22:25AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 01, 2024 at 05:58:10PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > I missed a few non-rendering code points in the "zero width"
> > classification code.  Add them now, and sort the list.
> > 
> > $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> > $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt
> 
> Should this be automated?

That will require a bit more thought -- many distro build systems these
days operate in a sealed box with no network access, so you can't really
automate this.  libicu (the last time I looked) didn't have a predicate
to tell you if a particular code point was one of the invisible ones.

Here's what I get from running the commands right now:

009F;<control>;Cc;0;BN;;;;;N;APPLICATION PROGRAM COMMAND;;;;
034F;COMBINING GRAPHEME JOINER;Mn;0;NSM;;;;;N;;;;;
200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;
2060;WORD JOINER;Cf;0;BN;;;;;N;;;;;
2061;FUNCTION APPLICATION;Cf;0;BN;;;;;N;;;;;
2062;INVISIBLE TIMES;Cf;0;BN;;;;;N;;;;;
2063;INVISIBLE SEPARATOR;Cf;0;BN;;;;;N;;;;;
2064;INVISIBLE PLUS;Cf;0;BN;;;;;N;;;;;
2D7F;TIFINAGH CONSONANT JOINER;Mn;9;NSM;;;;;N;;;;;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;
1107F;BRAHMI NUMBER JOINER;Mn;9;NSM;;;;;N;;;;;
11A47;ZANABAZAR SQUARE SUBJOINER;Mn;9;NSM;;;;;N;;;;;
11A99;SOYOMBO SUBJOINER;Mn;9;NSM;;;;;N;;;;;
11F42;KAWI CONJOINER;Mn;9;NSM;;;;;N;;;;;
13430;EGYPTIAN HIEROGLYPH VERTICAL JOINER;Cf;0;L;;;;;N;;;;;
13431;EGYPTIAN HIEROGLYPH HORIZONTAL JOINER;Cf;0;L;;;;;N;;;;;

The five-digit ones are new to me; I'll go have a look at how the
noto(fu) fonts render those.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-02  5:36     ` Christoph Hellwig
@ 2024-07-03  2:29       ` Darrick J. Wong
  2024-07-03  4:29         ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  2:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:36:27AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 01, 2024 at 06:04:41PM -0700, Darrick J. Wong wrote:
> > Add a new -o fstrim_pct= option to xfs_scrub just in case there are
> > users out there who want a different percentage.  For example, accepting
> > a 95% trim would net us a speed increase of nearly two orders of
> > magnitude, ignoring system call overhead.  Setting it to 100% will trim
> > everything, just like fstrim(8).
> 
> It might also make sense to default the parameter to the
> discard_granularity reported in sysfs.  While a lot of devices don't
> report a useful value there, it avoids pointless work for those that
> do.

Oooh, that's a good idea.  Let me fiddle with that & tack it on the end?

Hmm.  How do we query the discard granularity from a userspace program?
I can try to guess the /sys/block/XXX root from the devices passed in,
or maybe libblkid will tell me?  And then I'd have to go open
queue/discard_granularity underneath that to read that.

That's going to take a day or so, I suspect. :/

> Otherwise this looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/7] libfrog: hoist free space histogram code
  2024-07-02  5:32     ` Christoph Hellwig
@ 2024-07-03  2:47       ` Darrick J. Wong
  2024-07-03  4:30         ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  2:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:32:38AM +0200, Christoph Hellwig wrote:
> > +struct histent
> > +{
> 
> The braces goes to the previous line for our normal coding style.
> 
> Also the naming with histend/histogram the naming seems a bit too
> generic for something very block specific.  Should the naming be
> adjusted a bit?

Hmm.  Either we prefix everything with space, e.g. "struct
space_histogram" and "int spacehist_add(...)"...

Or change "blocks" to "value" and "extents" to "observations":

struct histent
{
	/* Low and high size of this bucket */
	long long	low;
	long long	high;

	/* Count of observations recorded */
	long long	nr_observations;

	/* Sum of values recorded */
	long long	sum_values;
};

struct histogram {
	/* Sum of all values recorded */
	long long	tot_values;

	/* Count of all observations recorded */
	long long	tot_observations;

	struct histent	*buckets;

	/* Number of buckets */
	unsigned int	nr_buckets;
};

int hist_add_bucket(struct histogram *hs, long long bucket_low);
void hist_add(struct histogram *hs, long long value);
void hist_prepare(struct histogram *hs, long long maxvalue);

and then hist_print/hist_summarize would have to be told the units of
the values ("blocks") and the units of the observations ("extents") to
print to stdout.

The first option is a bit lazy since there's nothing really diskspace
specific about the histogram other than the printf labels; and the
second option is more generic but maybe we should let the first
non-space histogram user figure out how to do that?

<shrug> Your thoughts?

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-02  5:44     ` Christoph Hellwig
@ 2024-07-03  2:59       ` Darrick J. Wong
  2024-07-03  4:31         ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  2:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:44:19AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 01, 2024 at 06:09:54PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Now that we're finished building online fsck, enable the periodic
> > background scrub service by default.  This involves the postinst script
> > starting the resource management slice and the timer.
> > 
> > No other sub-services need to be enabled or unmasked explicitly.  They
> > also shouldn't be started or restarted because that might interrupt
> > background operation unnecessarily.
> 
> I'm still not sure we quite want to run this by default on all
> debian systems.  Are you ready to handle all the bug reports from
> people complaining about the I/O load?

CONFIG_XFS_ONLINE_SCRUB isn't turned on for the 6.9.7 kernel in sid, so
there shouldn't be any complaints until we ask the kernel team to enable
it.  I don't think we should ask Debian to do that until after they lift
the debian 13 freeze next year/summer/whenever.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance
  2024-07-02  5:30     ` Christoph Hellwig
@ 2024-07-03  3:45       ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  3:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 07:30:19AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 01, 2024 at 06:02:36PM -0700, Darrick J. Wong wrote:
> > Therefore, we created a second implementation that will walk the bnobt
> > and perform the trims in block number order.  This implementation avoids
> > the worst problems of the original code, though it lacks the desirable
> > attribute of freeing the biggest chunks first.
> > 
> > On the other hand, this second implementation will be much easier to
> > constrain the system call latency, and makes it much easier to report
> > fstrim progress to anyone who's running xfs_scrub.  Skip the first block
> > of each AG to ensure that we get the sub-AG algorithm.
> 
> I completely fail to understand how skipping the first block improves
> performance and is otherwise a good idea.  What am I missing?

Actually, I think this patch doesn't have a good reason to exist on its
own anymore.  The goal of this patch and the next one is to improve
responsiveness to signals and smoothness of the progress bar by reducing
the number of bytes per FITRIM call, but I think limiting that to 11GB
per call is sufficient without the "skip the first block" games.

Those games once was the magic method for getting the kernel to use the
by-block discard algorithm instead of the by-length algorithm, but most
of the deficiencies in both algorithsm have been fixed now.

So in the end the only reason for this patch to continue existing is so
that we only issue one log force and lock one AGF per FITRIM call.  The
next patch exists to constrain latencies.

I think I'll combine these two patches into a single patch that avoids
having FITRIM requests cross AGs and breaks the requests into 11G chunks
to improve latency.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem
  2024-07-02  1:02   ` [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance Darrick J. Wong
  2024-07-02  5:30     ` Christoph Hellwig
@ 2024-07-03  3:52     ` Darrick J. Wong
  2024-07-03  4:33       ` Christoph Hellwig
  1 sibling, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  3:52 UTC (permalink / raw)
  To: hch; +Cc: linux-xfs, cem

From: Darrick J. Wong <djwong@kernel.org>

On a 10TB filesystem where the free space in each AG is heavily
fragmented, I noticed some very high runtimes on a FITRIM call for the
entire filesystem.  xfs_scrub likes to report progress information on
each phase of the scrub, which means that a strace for the entire
filesystem:

ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839>

shows that scrub is uncommunicative for the entire duration.  We can't
report any progress for the duration of the call, and the program is not
responsive to signals.  Reducing the size of the FITRIM requests to a
single AG at a time produces lower times for each individual call, but
even this isn't quite acceptable, because the time between progress
reports are still very high:

Strace for the first 4x 1TB AGs looks like (2):
ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033>
ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323>
ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226>
ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744>

I then had the idea to limit the length parameter of each call to a
smallish amount (~11GB) so that we could report progress relatively
quickly, but much to my surprise, each FITRIM call still took ~68
seconds!

Unfortunately, the by-length fstrim implementation handles this poorly
because it walks the entire free space by length index (cntbt), which is
a very inefficient way to walk a subset of an AG when the free space is
fragmented.

To fix that, I created a second implementation in the kernel that will
walk the bnobt and perform the trims in block number order.  This
algorithm constrains the amount of btree scanning to something
resembling the range passed in, which reduces the amount of time it
takes to respond to a signal.

Therefore, break up the FITRIM calls so they don't scan more than 11GB
of space at a time.  Break the calls up by AG so that each call only has
to take one AGF per call, because each AG that we traverse causes a log
force.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase8.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++----------
 scrub/vfs.c    |   10 ++++-
 scrub/vfs.h    |    2 +
 3 files changed, 91 insertions(+), 23 deletions(-)

diff --git a/scrub/phase8.c b/scrub/phase8.c
index 75400c968595..e35bf11bf329 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -45,27 +45,90 @@ fstrim_ok(
 	return true;
 }
 
+/*
+ * Limit the amount of fstrim scanning that we let the kernel do in a single
+ * call so that we can implement decent progress reporting and CPU resource
+ * control.  Pick a prime number of gigabytes for interest.
+ */
+#define FSTRIM_MAX_BYTES	(11ULL << 30)
+
+/* Trim a certain range of the filesystem. */
+static int
+fstrim_fsblocks(
+	struct scrub_ctx	*ctx,
+	uint64_t		start_fsb,
+	uint64_t		fsbcount)
+{
+	uint64_t		start = cvt_off_fsb_to_b(&ctx->mnt, start_fsb);
+	uint64_t		len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount);
+	int			error;
+
+	while (len > 0) {
+		uint64_t	run;
+
+		run = min(len, FSTRIM_MAX_BYTES);
+
+		error = fstrim(ctx, start, run);
+		if (error == EOPNOTSUPP) {
+			/* Pretend we finished all the work. */
+			progress_add(len);
+			return 0;
+		}
+		if (error) {
+			char		descr[DESCR_BUFSZ];
+
+			snprintf(descr, sizeof(descr) - 1,
+					_("fstrim start 0x%llx run 0x%llx"),
+					(unsigned long long)start,
+					(unsigned long long)run);
+			str_liberror(ctx, error, descr);
+			return error;
+		}
+
+		progress_add(run);
+		len -= run;
+		start += run;
+	}
+
+	return 0;
+}
+
+/* Trim each AG on the data device. */
+static int
+fstrim_datadev(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_fsop_geom	*geo = &ctx->mnt.fsgeom;
+	uint64_t		fsbno;
+	int			error;
+
+	for (fsbno = 0; fsbno < geo->datablocks; fsbno += geo->agblocks) {
+		uint64_t	fsbcount;
+
+		/*
+		 * Make sure that trim calls do not cross AG boundaries so that
+		 * the kernel only performs one log force (and takes one AGF
+		 * lock) per call.
+		 */
+		progress_add(geo->blocksize);
+		fsbcount = min(geo->datablocks - fsbno, geo->agblocks);
+		error = fstrim_fsblocks(ctx, fsbno, fsbcount);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Trim the filesystem, if desired. */
 int
 phase8_func(
 	struct scrub_ctx	*ctx)
 {
-	int			error;
-
 	if (!fstrim_ok(ctx))
 		return 0;
 
-	error = fstrim(ctx);
-	if (error == EOPNOTSUPP)
-		return 0;
-
-	if (error) {
-		str_liberror(ctx, error, _("fstrim"));
-		return error;
-	}
-
-	progress_add(1);
-	return 0;
+	return fstrim_datadev(ctx);
 }
 
 /* Estimate how much work we're going to do. */
@@ -76,12 +139,13 @@ phase8_estimate(
 	unsigned int		*nr_threads,
 	int			*rshift)
 {
-	*items = 0;
-
-	if (fstrim_ok(ctx))
-		*items = 1;
-
+	if (fstrim_ok(ctx)) {
+		*items = cvt_off_fsb_to_b(&ctx->mnt,
+				ctx->mnt.fsgeom.datablocks);
+	} else {
+		*items = 0;
+	}
 	*nr_threads = 1;
-	*rshift = 0;
+	*rshift = 30; /* GiB */
 	return 0;
 }
diff --git a/scrub/vfs.c b/scrub/vfs.c
index bcfd4f42ca8b..cc958ba9438e 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -298,11 +298,15 @@ struct fstrim_range {
 /* Call FITRIM to trim all the unused space in a filesystem. */
 int
 fstrim(
-	struct scrub_ctx	*ctx)
+	struct scrub_ctx	*ctx,
+	uint64_t		start,
+	uint64_t		len)
 {
-	struct fstrim_range	range = {0};
+	struct fstrim_range	range = {
+		.start		= start,
+		.len		= len,
+	};
 
-	range.len = ULLONG_MAX;
 	if (ioctl(ctx->mnt.fd, FITRIM, &range) == 0)
 		return 0;
 	if (errno == EOPNOTSUPP || errno == ENOTTY)
diff --git a/scrub/vfs.h b/scrub/vfs.h
index a8a4d72e290a..1af8d80d1de6 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -24,6 +24,6 @@ typedef int (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 int scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
-int fstrim(struct scrub_ctx *ctx);
+int fstrim(struct scrub_ctx *ctx, uint64_t start, uint64_t len);
 
 #endif /* XFS_SCRUB_VFS_H_ */

^ permalink raw reply related	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-03  1:59       ` Darrick J. Wong
@ 2024-07-03  4:27         ` Christoph Hellwig
  2024-07-03  5:02           ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 06:59:56PM -0700, Darrick J. Wong wrote:
> > > $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> > > $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt
> > 
> > Should this be automated?
> 
> That will require a bit more thought -- many distro build systems these
> days operate in a sealed box with no network access, so you can't really
> automate this.  libicu (the last time I looked) didn't have a predicate
> to tell you if a particular code point was one of the invisible ones.

Oh, I absolutely do not suggest to run the wget from a normal build!

But if you look at the kernel unicode CI support, it allows you to
place the downloaded file into the kernel tree, and then case make file
rules to re-generate the tables from it (see fs/unicode/Makefile).


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  2:29       ` Darrick J. Wong
@ 2024-07-03  4:29         ` Christoph Hellwig
  2024-07-03  4:55           ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 07:29:14PM -0700, Darrick J. Wong wrote:
> Oooh, that's a good idea.  Let me fiddle with that & tack it on the end?
> 
> Hmm.  How do we query the discard granularity from a userspace program?
> I can try to guess the /sys/block/XXX root from the devices passed in,
> or maybe libblkid will tell me?  And then I'd have to go open
> queue/discard_granularity underneath that to read that.

Good question.  As far as I can tell there is no simply ioctl for it.
I really wonder if we need an extensible block topology ioctl that we
can keep adding files for new queue limits, to make it easy to query
them from userspace without all that sysfs mess..

> That's going to take a day or so, I suspect. :/

No rush, just noticed it.  Note that for the discard granularity
we should also look at the alignment and not just the size.

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/7] libfrog: hoist free space histogram code
  2024-07-03  2:47       ` Darrick J. Wong
@ 2024-07-03  4:30         ` Christoph Hellwig
  2024-07-03  4:55           ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 07:47:38PM -0700, Darrick J. Wong wrote:
> Or change "blocks" to "value" and "extents" to "observations":

> The first option is a bit lazy since there's nothing really diskspace
> specific about the histogram other than the printf labels; and the
> second option is more generic but maybe we should let the first
> non-space histogram user figure out how to do that?

Actually the second sounds easy enough even if we never end up using
it for anything but space tracking.


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-03  2:59       ` Darrick J. Wong
@ 2024-07-03  4:31         ` Christoph Hellwig
  2024-07-03  5:01           ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 07:59:29PM -0700, Darrick J. Wong wrote:
> CONFIG_XFS_ONLINE_SCRUB isn't turned on for the 6.9.7 kernel in sid, so
> there shouldn't be any complaints until we ask the kernel team to enable
> it.  I don't think we should ask Debian to do that until after they lift
> the debian 13 freeze next year/summer/whenever.

I'm not entirely sure if we should ever do this by default.  The right
fit to me would be on of those questions asked during apt-get upgrade
to enable/disable things.  But don't ask me how those are implemented
or even called.


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem
  2024-07-03  3:52     ` [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem Darrick J. Wong
@ 2024-07-03  4:33       ` Christoph Hellwig
  2024-07-03  4:50         ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs, cem

On Tue, Jul 02, 2024 at 08:52:27PM -0700, Darrick J. Wong wrote:
> I then had the idea to limit the length parameter of each call to a
> smallish amount (~11GB) so that we could report progress relatively
> quickly, but much to my surprise, each FITRIM call still took ~68
> seconds!

Where do those magic 11GB come from?

> +/*
> + * Limit the amount of fstrim scanning that we let the kernel do in a single
> + * call so that we can implement decent progress reporting and CPU resource
> + * control.  Pick a prime number of gigabytes for interest.

... this explains it somehow, but not really :)

The code itself looks fine, so with a better explanation or more
round number:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem
  2024-07-03  4:33       ` Christoph Hellwig
@ 2024-07-03  4:50         ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  4:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, cem

On Wed, Jul 03, 2024 at 06:33:45AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 08:52:27PM -0700, Darrick J. Wong wrote:
> > I then had the idea to limit the length parameter of each call to a
> > smallish amount (~11GB) so that we could report progress relatively
> > quickly, but much to my surprise, each FITRIM call still took ~68
> > seconds!
> 
> Where do those magic 11GB come from?
> 
> > +/*
> > + * Limit the amount of fstrim scanning that we let the kernel do in a single
> > + * call so that we can implement decent progress reporting and CPU resource
> > + * control.  Pick a prime number of gigabytes for interest.
> 
> ... this explains it somehow, but not really :)

It's entirely arbitrary -- big enough to exceed MAXEXTLEN*4k, rounded up
to the next prime number of gigabytes.

> The code itself looks fine, so with a better explanation or more
> round number:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  4:29         ` Christoph Hellwig
@ 2024-07-03  4:55           ` Darrick J. Wong
  2024-07-03  4:58             ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  4:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 06:29:22AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 07:29:14PM -0700, Darrick J. Wong wrote:
> > Oooh, that's a good idea.  Let me fiddle with that & tack it on the end?
> > 
> > Hmm.  How do we query the discard granularity from a userspace program?
> > I can try to guess the /sys/block/XXX root from the devices passed in,
> > or maybe libblkid will tell me?  And then I'd have to go open
> > queue/discard_granularity underneath that to read that.
> 
> Good question.  As far as I can tell there is no simply ioctl for it.
> I really wonder if we need an extensible block topology ioctl that we
> can keep adding files for new queue limits, to make it easy to query
> them from userspace without all that sysfs mess..

Yeah.  Or implement FS_IOC_GETSYSFSPATH for block devices? :P

> > That's going to take a day or so, I suspect. :/
> 
> No rush, just noticed it.  Note that for the discard granularity
> we should also look at the alignment and not just the size.

<nod> AFAICT the xfs discard code doesn't check the alignment.  Maybe
the block layer does, but ... weirdly we've now refactored
xfs_discard_extents so that it /never/ reports errors of any kind in the
submit loop because __blkdev_issue_discard always returns 0, and the
endio doesn't check the bio status either.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 1/7] libfrog: hoist free space histogram code
  2024-07-03  4:30         ` Christoph Hellwig
@ 2024-07-03  4:55           ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  4:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 06:30:09AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 07:47:38PM -0700, Darrick J. Wong wrote:
> > Or change "blocks" to "value" and "extents" to "observations":
> 
> > The first option is a bit lazy since there's nothing really diskspace
> > specific about the histogram other than the printf labels; and the
> > second option is more generic but maybe we should let the first
> > non-space histogram user figure out how to do that?
> 
> Actually the second sounds easy enough even if we never end up using
> it for anything but space tracking.

Ok, will do that tomorrow.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  4:55           ` Darrick J. Wong
@ 2024-07-03  4:58             ` Christoph Hellwig
  2024-07-03  5:04               ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  4:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 09:55:39PM -0700, Darrick J. Wong wrote:
> > Good question.  As far as I can tell there is no simply ioctl for it.
> > I really wonder if we need an extensible block topology ioctl that we
> > can keep adding files for new queue limits, to make it easy to query
> > them from userspace without all that sysfs mess..
> 
> Yeah.  Or implement FS_IOC_GETSYSFSPATH for block devices? :P

I know people like to fetishize file access (and to be honest from a
shell it is really nice), but from a C program would you rather do
one ioctl to find a sysfs base path, then do string manipulation to
find the actual attribute, then open + read + close it, or do a single
ioctl and read a bunch of values from a struct?

> > > That's going to take a day or so, I suspect. :/
> > 
> > No rush, just noticed it.  Note that for the discard granularity
> > we should also look at the alignment and not just the size.
> 
> <nod> AFAICT the xfs discard code doesn't check the alignment.  Maybe
> the block layer does, but ..

The block layer checks the alignment and silently skips anything not
matching it.  So not adhering it isn't an error, it might just cause
pointless work.


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-03  4:31         ` Christoph Hellwig
@ 2024-07-03  5:01           ` Darrick J. Wong
  2024-07-09 22:53             ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  5:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 06:31:24AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 07:59:29PM -0700, Darrick J. Wong wrote:
> > CONFIG_XFS_ONLINE_SCRUB isn't turned on for the 6.9.7 kernel in sid, so
> > there shouldn't be any complaints until we ask the kernel team to enable
> > it.  I don't think we should ask Debian to do that until after they lift
> > the debian 13 freeze next year/summer/whenever.
> 
> I'm not entirely sure if we should ever do this by default.  The right
> fit to me would be on of those questions asked during apt-get upgrade
> to enable/disable things.  But don't ask me how those are implemented
> or even called.

Some debconf magicks that I don't understand. :(

A more declarative-happy way would be to make a subpackage that turns on
the background service, so the people that want it on by default can add
"Depends: xfsprogs-background-scrub" to their ... uh ... ansible?
puppet?  orchestration system.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-03  4:27         ` Christoph Hellwig
@ 2024-07-03  5:02           ` Darrick J. Wong
  2024-07-03 18:35             ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  5:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 06:27:32AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 06:59:56PM -0700, Darrick J. Wong wrote:
> > > > $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> > > > $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt
> > > 
> > > Should this be automated?
> > 
> > That will require a bit more thought -- many distro build systems these
> > days operate in a sealed box with no network access, so you can't really
> > automate this.  libicu (the last time I looked) didn't have a predicate
> > to tell you if a particular code point was one of the invisible ones.
> 
> Oh, I absolutely do not suggest to run the wget from a normal build!
> 
> But if you look at the kernel unicode CI support, it allows you to
> place the downloaded file into the kernel tree, and then case make file
> rules to re-generate the tables from it (see fs/unicode/Makefile).

Ah ok got it, will take a closer look tomorrow.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  4:58             ` Christoph Hellwig
@ 2024-07-03  5:04               ` Darrick J. Wong
  2024-07-03  5:11                 ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  5:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 06:58:12AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 09:55:39PM -0700, Darrick J. Wong wrote:
> > > Good question.  As far as I can tell there is no simply ioctl for it.
> > > I really wonder if we need an extensible block topology ioctl that we
> > > can keep adding files for new queue limits, to make it easy to query
> > > them from userspace without all that sysfs mess..
> > 
> > Yeah.  Or implement FS_IOC_GETSYSFSPATH for block devices? :P
> 
> I know people like to fetishize file access (and to be honest from a
> shell it is really nice), but from a C program would you rather do
> one ioctl to find a sysfs base path, then do string manipulation to
> find the actual attribute, then open + read + close it, or do a single
> ioctl and read a bunch of values from a struct?

Single ioctl and read from a struct.

Or single ioctl and read a bunch of json (LOL)

I wish the BLK* ioctls had kept pace with the spread of queue limits.

> > > > That's going to take a day or so, I suspect. :/
> > > 
> > > No rush, just noticed it.  Note that for the discard granularity
> > > we should also look at the alignment and not just the size.
> > 
> > <nod> AFAICT the xfs discard code doesn't check the alignment.  Maybe
> > the block layer does, but ..
> 
> The block layer checks the alignment and silently skips anything not
> matching it.  So not adhering it isn't an error, it might just cause
> pointless work.

<nod>

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  5:04               ` Darrick J. Wong
@ 2024-07-03  5:11                 ` Christoph Hellwig
  2024-07-03  5:17                   ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-03  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 02, 2024 at 10:04:22PM -0700, Darrick J. Wong wrote:
> > I know people like to fetishize file access (and to be honest from a
> > shell it is really nice), but from a C program would you rather do
> > one ioctl to find a sysfs base path, then do string manipulation to
> > find the actual attribute, then open + read + close it, or do a single
> > ioctl and read a bunch of values from a struct?
> 
> Single ioctl and read from a struct.
> 
> Or single ioctl and read a bunch of json (LOL)
> 
> I wish the BLK* ioctls had kept pace with the spread of queue limits.

Let me propose a new BLKLIMITS ioctl, and then we'll work from there?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms
  2024-07-03  5:11                 ` Christoph Hellwig
@ 2024-07-03  5:17                   ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03  5:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 03, 2024 at 07:11:50AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 10:04:22PM -0700, Darrick J. Wong wrote:
> > > I know people like to fetishize file access (and to be honest from a
> > > shell it is really nice), but from a C program would you rather do
> > > one ioctl to find a sysfs base path, then do string manipulation to
> > > find the actual attribute, then open + read + close it, or do a single
> > > ioctl and read a bunch of values from a struct?
> > 
> > Single ioctl and read from a struct.
> > 
> > Or single ioctl and read a bunch of json (LOL)
> > 
> > I wish the BLK* ioctls had kept pace with the spread of queue limits.
> 
> Let me propose a new BLKLIMITS ioctl, and then we'll work from there?

Ok!

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-03  5:02           ` Darrick J. Wong
@ 2024-07-03 18:35             ` Darrick J. Wong
  2024-07-04  7:22               ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-03 18:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 10:02:19PM -0700, Darrick J. Wong wrote:
> On Wed, Jul 03, 2024 at 06:27:32AM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 02, 2024 at 06:59:56PM -0700, Darrick J. Wong wrote:
> > > > > $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> > > > > $ grep -E '(zero width|invisible|joiner|application)' -i UnicodeData.txt
> > > > 
> > > > Should this be automated?
> > > 
> > > That will require a bit more thought -- many distro build systems these
> > > days operate in a sealed box with no network access, so you can't really
> > > automate this.  libicu (the last time I looked) didn't have a predicate
> > > to tell you if a particular code point was one of the invisible ones.
> > 
> > Oh, I absolutely do not suggest to run the wget from a normal build!
> > 
> > But if you look at the kernel unicode CI support, it allows you to
> > place the downloaded file into the kernel tree, and then case make file
> > rules to re-generate the tables from it (see fs/unicode/Makefile).
> 
> Ah ok got it, will take a closer look tomorrow.

I'm not sure if there's a good way to automate this.  But let's go
through the grep output?

009F;<control>;Cc;0;BN;;;;;N;APPLICATION PROGRAM COMMAND;;;;

This one seems to render as Ÿ (Y with two dots above it) so I didn't put
this in the case statement.

034F;COMBINING GRAPHEME JOINER;Mn;0;NSM;;;;;N;;;;;

Doesn't seem to have any sort of rendering (i.e. if I type them into a
web browser, it doesn't render any glyph at all), so I put it in the
case statement.

200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;

These ones have 'zero-width' in the name, so I put these in the
denylist.

2060;WORD JOINER;Cf;0;BN;;;;;N;;;;;

https://www.unicode.org/reports/tr14/ says that this doesn't have
any rendering, nor does my web browser try one.

Come to think of it, I should probably add these two:

2028;LINE SEPARATOR;Zl;0;WS;;;;;N;;;;;
2029;PARAGRAPH SEPARATOR;Zp;0;B;;;;;N;;;;;

because it doesn't render either.

2061;FUNCTION APPLICATION;Cf;0;BN;;;;;N;;;;;

https://www.unicode.org/reports/tr25/tr25-6.html says this is a
metacharacter and doesn't have its own rendering.  It seems to be there
to indicate that this:

f(x + y)

is a function taking "x + y" as a parameter, and not shorthand for:

f*x + f+y

2062;INVISIBLE TIMES;Cf;0;BN;;;;;N;;;;;
2063;INVISIBLE SEPARATOR;Cf;0;BN;;;;;N;;;;;
2064;INVISIBLE PLUS;Cf;0;BN;;;;;N;;;;;

These have 'invisible' in the name, that seems pretty obvious to me.

2D7F;TIFINAGH CONSONANT JOINER;Mn;9;NSM;;;;;N;;;;;

This one doesn't seem to render either.  Unicode 15.1 says it isn't
visible rendered.

FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;

Another 'zero-width' one, same as the others.

1107F;BRAHMI NUMBER JOINER;Mn;9;NSM;;;;;N;;;;;
11A47;ZANABAZAR SQUARE SUBJOINER;Mn;9;NSM;;;;;N;;;;;
11A99;SOYOMBO SUBJOINER;Mn;9;NSM;;;;;N;;;;;
13430;EGYPTIAN HIEROGLYPH VERTICAL JOINER;Cf;0;L;;;;;N;;;;;
13431;EGYPTIAN HIEROGLYPH HORIZONTAL JOINER;Cf;0;L;;;;;N;;;;;

When I typed these into a browser, I got *some* kind of rendering.  I
doubt I'm using them correctly, since they're likely supposed to be
surrounded by other codepoints from the same chart.

11F42;KAWI CONJOINER;Mn;9;NSM;;;;;N;;;;;

I don't even know about this one; it generated "tofu" (aka those horrid
rectangular boxes with the byte representation in them).  On the other
hand, https://www.unicode.org/notes/tn48/UTN48-Implementing-Kawi-1.pdf
says it's not visible by itself.  But paired with Kawi script, it's
supposed to affect the formatting.

I'm not sure how exactly to write a classifier here -- the 'invisible'
and 'zero width' ones are obvious, but the 'joiner' code points don't
seem to have any obvious trend to them.

For now I think I'll take the "conservative" approach and only flag
things that sound like they're supposed to be general metacharacters,
and leave out the modifier codepoints that are ok if they're surrounded
by certain codepoints.  But this is a rather manual process.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points
  2024-07-03 18:35             ` Darrick J. Wong
@ 2024-07-04  7:22               ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-04  7:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Wed, Jul 03, 2024 at 11:35:10AM -0700, Darrick J. Wong wrote:
> I'm not sure how exactly to write a classifier here -- the 'invisible'
> and 'zero width' ones are obvious, but the 'joiner' code points don't
> seem to have any obvious trend to them.
> 
> For now I think I'll take the "conservative" approach and only flag
> things that sound like they're supposed to be general metacharacters,
> and leave out the modifier codepoints that are ok if they're surrounded
> by certain codepoints.  But this is a rather manual process.

Oh, right.  There is no clear identification and you are just doing
a manual search based on viѕual output.  Yes, there's unfortunately
no really good way to automate that.


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-03  5:01           ` Darrick J. Wong
@ 2024-07-09 22:53             ` Darrick J. Wong
  2024-07-10  6:18               ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-09 22:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Tue, Jul 02, 2024 at 10:01:54PM -0700, Darrick J. Wong wrote:
> On Wed, Jul 03, 2024 at 06:31:24AM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 02, 2024 at 07:59:29PM -0700, Darrick J. Wong wrote:
> > > CONFIG_XFS_ONLINE_SCRUB isn't turned on for the 6.9.7 kernel in sid, so
> > > there shouldn't be any complaints until we ask the kernel team to enable
> > > it.  I don't think we should ask Debian to do that until after they lift
> > > the debian 13 freeze next year/summer/whenever.
> > 
> > I'm not entirely sure if we should ever do this by default.  The right
> > fit to me would be on of those questions asked during apt-get upgrade
> > to enable/disable things.  But don't ask me how those are implemented
> > or even called.
> 
> Some debconf magicks that I don't understand. :(

A week's worth of digging later:

Actually, no debconf magic -- Debian policy is to enable any service
shipped in a package, by default:

https://www.debian.org/doc/debian-policy/ch-opersys.html#starting-system-services

"Debian packages that provide system services should arrange for those
services to be automatically started and stopped by the init system or
service manager."

It's up to the sysadmin to disable this behavior, either by installing
their own systemd preset file:

https://manpages.debian.org/bookworm/systemd/systemd.preset.5.en.html

that shuts off the services they don't want (or more likely enables only
the ones they do want).  Other people have proposed rc.d hacks:

https://packages.debian.org/bookworm/policy-rcd-declarative-deny-all

which prevent the postinst scripts from actually triggering the service.

> A more declarative-happy way would be to make a subpackage that turns on
> the background service, so the people that want it on by default can add
> "Depends: xfsprogs-background-scrub" to their ... uh ... ansible?
> puppet?  orchestration system.

Though I guess we /could/ decide to ship only the
xfs_scrub_all.{service,timer} files in a totally separate package.  But
that will make packaging more difficult because now we have to have
per-package .install files so that debconf knows where to put the files,
and then we have to split the package scripts too.

On Fedora it's apparently the other way 'round where one has to turn on
services manually.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-09 22:53             ` Darrick J. Wong
@ 2024-07-10  6:18               ` Christoph Hellwig
  2024-07-16 16:46                 ` Darrick J. Wong
  2024-07-16 16:47                 ` [RFC PATCH 1/2] misc: shift install targets Darrick J. Wong
  0 siblings, 2 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-10  6:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs

On Tue, Jul 09, 2024 at 03:53:06PM -0700, Darrick J. Wong wrote:
> Though I guess we /could/ decide to ship only the
> xfs_scrub_all.{service,timer} files in a totally separate package.  But
> that will make packaging more difficult because now we have to have
> per-package .install files so that debconf knows where to put the files,
> and then we have to split the package scripts too.
> 
> On Fedora it's apparently the other way 'round where one has to turn on
> services manually.

Heh.  I'm still worried about scrub just being automatically run without
the user asking for it.

How about something different:  add a new autoscrub mount option, which
the kernel simply ignores, but which the scrub daemon uses to figure out
if it should scrub a file system?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-10  6:18               ` Christoph Hellwig
@ 2024-07-16 16:46                 ` Darrick J. Wong
  2024-07-17  4:59                   ` Christoph Hellwig
  2024-07-16 16:47                 ` [RFC PATCH 1/2] misc: shift install targets Darrick J. Wong
  1 sibling, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-16 16:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 10, 2024 at 08:18:38AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 09, 2024 at 03:53:06PM -0700, Darrick J. Wong wrote:
> > Though I guess we /could/ decide to ship only the
> > xfs_scrub_all.{service,timer} files in a totally separate package.  But
> > that will make packaging more difficult because now we have to have
> > per-package .install files so that debconf knows where to put the files,
> > and then we have to split the package scripts too.
> > 
> > On Fedora it's apparently the other way 'round where one has to turn on
> > services manually.
> 
> Heh.  I'm still worried about scrub just being automatically run without
> the user asking for it.
> 
> How about something different:  add a new autoscrub mount option, which
> the kernel simply ignores, but which the scrub daemon uses to figure out
> if it should scrub a file system?

Hm, xfs could do that, but that would clutter up the mount options.
Or perhaps the systemd service could look for $mountpoint/.autoheal or
something.

It might be easier still to split the packages, let me send an RFC of
what that looks like.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* [RFC PATCH 1/2] misc: shift install targets
  2024-07-10  6:18               ` Christoph Hellwig
  2024-07-16 16:46                 ` Darrick J. Wong
@ 2024-07-16 16:47                 ` Darrick J. Wong
  2024-07-16 16:49                   ` [RFC PATCH 2/2] debian: create a new package for automatic self-healing Darrick J. Wong
  2024-07-17  5:00                   ` [RFC PATCH 1/2] misc: shift install targets Christoph Hellwig
  1 sibling, 2 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-16 16:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Modify each Makefile so that "install-pkg" installs the main package
contents, and "install" just invokes "install-pkg".  We'll need this
indirection for the next patch where we add an install-selfheal target
to build the xfsprogs-self-healing package but will still want 'make
install' to install everything on a developer's workstation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 Makefile           |    8 +++++---
 copy/Makefile      |    4 +++-
 db/Makefile        |    4 +++-
 debian/Makefile    |    4 +++-
 debian/rules       |    2 +-
 doc/Makefile       |    4 +++-
 estimate/Makefile  |    4 +++-
 fsck/Makefile      |    4 +++-
 fsr/Makefile       |    4 +++-
 growfs/Makefile    |    4 +++-
 include/Makefile   |    4 +++-
 io/Makefile        |    4 +++-
 libfrog/Makefile   |    2 +-
 libhandle/Makefile |    4 +++-
 libxcmd/Makefile   |    2 +-
 libxfs/Makefile    |    4 +++-
 libxlog/Makefile   |    2 +-
 logprint/Makefile  |    4 +++-
 m4/Makefile        |    2 +-
 man/Makefile       |    8 +++++---
 man/man2/Makefile  |    4 +++-
 man/man3/Makefile  |    4 +++-
 man/man5/Makefile  |    5 ++++-
 man/man8/Makefile  |    4 +++-
 mdrestore/Makefile |    4 +++-
 mkfs/Makefile      |    4 +++-
 po/Makefile        |    4 +++-
 quota/Makefile     |    4 +++-
 repair/Makefile    |    4 +++-
 rtcp/Makefile      |    4 +++-
 scrub/Makefile     |    4 +++-
 spaceman/Makefile  |    4 +++-
 32 files changed, 91 insertions(+), 36 deletions(-)

diff --git a/Makefile b/Makefile
index 79215c64ef2f..715cc212d686 100644
--- a/Makefile
+++ b/Makefile
@@ -116,15 +116,17 @@ configure: configure.ac
 include/builddefs: configure
 	./configure $$LOCAL_CONFIGURE_OPTIONS
 
-install: $(addsuffix -install,$(SUBDIRS))
+install: install-pkg
+
+install-pkg: $(addsuffix -install-pkg,$(SUBDIRS))
 	$(INSTALL) -m 755 -d $(PKG_DOC_DIR)
 	$(INSTALL) -m 644 README $(PKG_DOC_DIR)
 
 install-dev: $(addsuffix -install-dev,$(SUBDIRS))
 
-%-install:
+%-install-pkg:
 	@echo "Installing $@"
-	$(Q)$(MAKE) $(MAKEOPTS) -C $* install
+	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-pkg
 
 %-install-dev:
 	@echo "Installing $@"
diff --git a/copy/Makefile b/copy/Makefile
index 55160f848225..446d38bea576 100644
--- a/copy/Makefile
+++ b/copy/Makefile
@@ -18,7 +18,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/db/Makefile b/db/Makefile
index 83389376c36c..91e259044beb 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -81,7 +81,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 xfs_admin.sh $(PKG_SBIN_DIR)/xfs_admin
diff --git a/debian/Makefile b/debian/Makefile
index 2f9cd38c2ea6..f6a996e91871 100644
--- a/debian/Makefile
+++ b/debian/Makefile
@@ -15,7 +15,9 @@ default:
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 ifeq ($(PKG_DISTRIBUTION), debian)
 	$(INSTALL) -m 755 -d $(PKG_DOC_DIR)
 	$(INSTALL) -m 644 changelog $(PKG_DOC_DIR)/changelog.Debian
diff --git a/debian/rules b/debian/rules
index 6634dbe335a2..bc1ffd76333b 100755
--- a/debian/rules
+++ b/debian/rules
@@ -100,7 +100,7 @@ binary-arch: checkroot built
 	@echo "== dpkg-buildpackage: binary-arch" 1>&2
 	$(checkdir)
 	-rm -rf $(dirme) $(dirdev) $(dirdi)
-	$(pkgme)  $(MAKE) -C . install
+	$(pkgme)  $(MAKE) -C . install-pkg
 	$(pkgdev) $(MAKE) -C . install-dev
 	$(pkgdi)  $(MAKE) -C debian install-d-i
 	#$(pkgme)  $(MAKE) dist
diff --git a/doc/Makefile b/doc/Makefile
index 83dfa38bedd1..ad6749b8d0be 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -16,7 +16,9 @@ CHANGES.gz:
 	@echo "    [ZIP]    $@"
 	$(Q)$(ZIP) --best -c < CHANGES > $@
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_DOC_DIR)
 	$(INSTALL) -m 644 CHANGES.gz CREDITS $(PKG_DOC_DIR)
 ifeq ($(PKG_DISTRIBUTION), debian)
diff --git a/estimate/Makefile b/estimate/Makefile
index 1080129b3a62..d5f8a6d81d65 100644
--- a/estimate/Makefile
+++ b/estimate/Makefile
@@ -12,7 +12,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/fsck/Makefile b/fsck/Makefile
index 5ca529f53e42..ccba7f0b6892 100644
--- a/fsck/Makefile
+++ b/fsck/Makefile
@@ -11,7 +11,9 @@ default: $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 xfs_fsck.sh $(PKG_SBIN_DIR)/fsck.xfs
 install-dev:
diff --git a/fsr/Makefile b/fsr/Makefile
index d57f2de24c22..3ad9f6d824c6 100644
--- a/fsr/Makefile
+++ b/fsr/Makefile
@@ -15,7 +15,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/growfs/Makefile b/growfs/Makefile
index 2f4cc66a76f3..e0ab870bd6ba 100644
--- a/growfs/Makefile
+++ b/growfs/Makefile
@@ -23,7 +23,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/include/Makefile b/include/Makefile
index f7c40a5ce1a1..23727fccfdcd 100644
--- a/include/Makefile
+++ b/include/Makefile
@@ -57,7 +57,9 @@ install-headers: $(addsuffix -hdrs, $(DKHFILES) $(HFILES))
 %-hdrs:
 	$(Q)$(LN_S) -f $(CURDIR)/$* xfs/$*
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_INC_DIR)
 
 install-dev: install
diff --git a/io/Makefile b/io/Makefile
index 3192b813c740..d035420b555c 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -85,7 +85,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 xfs_bmap.sh $(PKG_SBIN_DIR)/xfs_bmap
diff --git a/libfrog/Makefile b/libfrog/Makefile
index acfa228bc8ec..c4be4a7f2a73 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -70,6 +70,6 @@ crc32table.h: gen_crc32table.c crc32defs.h
 
 include $(BUILDRULES)
 
-install install-dev: default
+install install-pkg install-dev: default
 
 -include .ltdep
diff --git a/libhandle/Makefile b/libhandle/Makefile
index f297a59e47f9..7cfd0fa4f27e 100644
--- a/libhandle/Makefile
+++ b/libhandle/Makefile
@@ -19,7 +19,9 @@ default: ltdepend $(LTLIBRARY)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL_LTLIB)
 
 install-dev: default
diff --git a/libxcmd/Makefile b/libxcmd/Makefile
index 70e54308c34b..afd5349c8af3 100644
--- a/libxcmd/Makefile
+++ b/libxcmd/Makefile
@@ -23,6 +23,6 @@ default: ltdepend $(LTLIBRARY)
 
 include $(BUILDRULES)
 
-install install-dev: default
+install install-pkg install-dev: default
 
 -include .ltdep
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 9fb53d9cc32c..63657fb21cab 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -136,7 +136,9 @@ default: ltdepend $(LTLIBRARY)
 # set up include/xfs header directory
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_INC_DIR)
 
 install-headers: $(addsuffix -hdrs, $(PKGHFILES))
diff --git a/libxlog/Makefile b/libxlog/Makefile
index b0f5ef154133..3710729fe703 100644
--- a/libxlog/Makefile
+++ b/libxlog/Makefile
@@ -21,6 +21,6 @@ default: ltdepend $(LTLIBRARY)
 
 include $(BUILDRULES)
 
-install install-dev: default
+install install-pkg install-dev: default
 
 -include .ltdep
diff --git a/logprint/Makefile b/logprint/Makefile
index bbbed5d259af..5ec02539a7bb 100644
--- a/logprint/Makefile
+++ b/logprint/Makefile
@@ -21,7 +21,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/m4/Makefile b/m4/Makefile
index 84174c3d3e30..eda4c06f6864 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -33,7 +33,7 @@ default:
 
 include $(BUILDRULES)
 
-install install-dev install-lib: default
+install install-pkg install-dev install-lib: default
 
 realclean: distclean
 	rm -f $(CONFIGURE)
diff --git a/man/Makefile b/man/Makefile
index cd1aed6cf202..f62286e8339d 100644
--- a/man/Makefile
+++ b/man/Makefile
@@ -9,12 +9,14 @@ SUBDIRS = man2 man3 man5 man8
 
 default : $(SUBDIRS)
 
-install : $(addsuffix -install,$(SUBDIRS))
+install : install-pkg
+
+install-pkg : $(addsuffix -install-pkg,$(SUBDIRS))
 
 install-dev : $(addsuffix -install-dev,$(SUBDIRS))
 
-%-install:
-	$(Q)$(MAKE) $(MAKEOPTS) -C $* install
+%-install-pkg:
+	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-pkg
 
 %-install-dev:
 	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-dev
diff --git a/man/man2/Makefile b/man/man2/Makefile
index 8aecde3321b0..190ea18e7c6c 100644
--- a/man/man2/Makefile
+++ b/man/man2/Makefile
@@ -15,7 +15,9 @@ default : $(MAN_PAGES)
 
 include $(BUILDRULES)
 
-install :
+install : install-pkg
+
+install-pkg :
 
 install-dev : default
 	$(INSTALL) -m 755 -d $(MAN_DEST)
diff --git a/man/man3/Makefile b/man/man3/Makefile
index a7f607fcbad9..1553e2b2de82 100644
--- a/man/man3/Makefile
+++ b/man/man3/Makefile
@@ -15,7 +15,9 @@ default : $(MAN_PAGES)
 
 include $(BUILDRULES)
 
-install :
+install : install-pkg
+
+install-pkg :
 
 install-dev : default
 	$(INSTALL) -m 755 -d $(MAN_DEST)
diff --git a/man/man5/Makefile b/man/man5/Makefile
index fe0aef6f016b..1fcd3995036f 100644
--- a/man/man5/Makefile
+++ b/man/man5/Makefile
@@ -15,7 +15,10 @@ default : $(MAN_PAGES)
 
 include $(BUILDRULES)
 
-install : default
+install : install-pkg
+
+install-pkg : default
 	$(INSTALL) -m 755 -d $(MAN_DEST)
 	$(INSTALL_MAN)
+
 install-dev :
diff --git a/man/man8/Makefile b/man/man8/Makefile
index 5be76ab727a1..0b40a409a913 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -22,7 +22,9 @@ default : $(MAN_PAGES)
 
 include $(BUILDRULES)
 
-install : default
+install : install-pkg
+
+install-pkg : default
 	$(INSTALL) -m 755 -d $(MAN_DEST)
 	$(INSTALL_MAN)
 
diff --git a/mdrestore/Makefile b/mdrestore/Makefile
index 4a932efb8098..0d02fb383404 100644
--- a/mdrestore/Makefile
+++ b/mdrestore/Makefile
@@ -16,7 +16,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/mkfs/Makefile b/mkfs/Makefile
index a6173083e4c2..cf945aa10c25 100644
--- a/mkfs/Makefile
+++ b/mkfs/Makefile
@@ -27,7 +27,9 @@ default: depend $(LTCOMMAND) $(CFGFILES)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 -d $(MKFS_CFG_DIR)
diff --git a/po/Makefile b/po/Makefile
index 1d35f5191ba2..3cc0b4177c64 100644
--- a/po/Makefile
+++ b/po/Makefile
@@ -19,7 +19,9 @@ default: $(POTHEAD) $(LINGUAS:%=%.mo)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL_LINGUAS)
 
 install-dev install-lib:
diff --git a/quota/Makefile b/quota/Makefile
index da5a1489e468..01584635b3dd 100644
--- a/quota/Makefile
+++ b/quota/Makefile
@@ -23,7 +23,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/repair/Makefile b/repair/Makefile
index d948588781de..ed5a6b1ff4db 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -102,7 +102,9 @@ include $(BUILDRULES)
 #
 #CFLAGS += ...
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/rtcp/Makefile b/rtcp/Makefile
index 264b4f27b5fd..4adb58c4b783 100644
--- a/rtcp/Makefile
+++ b/rtcp/Makefile
@@ -16,7 +16,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
diff --git a/scrub/Makefile b/scrub/Makefile
index 885b43e9948d..653aafd171b5 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -138,7 +138,9 @@ phase5.o unicrash.o xfs.o: $(builddefs)
 
 include $(BUILDRULES)
 
-install: $(INSTALL_SCRUB)
+install: install-pkg
+
+install-pkg: $(INSTALL_SCRUB)
 
 %.service: %.service.in $(builddefs)
 	@echo "    [SED]    $@"
diff --git a/spaceman/Makefile b/spaceman/Makefile
index 1f048d54a4d2..d0495ebb057e 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -26,7 +26,9 @@ default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
 
-install: default
+install: install-pkg
+
+install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 xfs_info.sh $(PKG_SBIN_DIR)/xfs_info

^ permalink raw reply related	[flat|nested] 296+ messages in thread

* [RFC PATCH 2/2] debian: create a new package for automatic self-healing
  2024-07-16 16:47                 ` [RFC PATCH 1/2] misc: shift install targets Darrick J. Wong
@ 2024-07-16 16:49                   ` Darrick J. Wong
  2024-07-17  5:00                   ` [RFC PATCH 1/2] misc: shift install targets Christoph Hellwig
  1 sibling, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-16 16:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a new package for people who explicilty want self-healing turned
on by default for XFS.  This package is named xfsprogs-self-healing.

Note: This introduces a new "install-selfheal" target to install only
the files needed for enabling online fsck by default.  Other
distributions should take note of the new target if they choose to
create a package for enabling autonomous self healing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 Makefile           |    8 +++++++-
 copy/Makefile      |    2 ++
 db/Makefile        |    2 ++
 debian/Makefile    |    2 ++
 debian/control     |    8 ++++++++
 debian/rules       |   13 +++++++++----
 doc/Makefile       |    3 +++
 estimate/Makefile  |    2 ++
 fsck/Makefile      |    3 +++
 fsr/Makefile       |    2 ++
 growfs/Makefile    |    2 ++
 include/Makefile   |    3 +++
 io/Makefile        |    2 ++
 libfrog/Makefile   |    2 ++
 libhandle/Makefile |    2 ++
 libxcmd/Makefile   |    2 ++
 libxfs/Makefile    |    3 +++
 libxlog/Makefile   |    2 ++
 logprint/Makefile  |    2 ++
 m4/Makefile        |    2 ++
 man/Makefile       |    2 ++
 mdrestore/Makefile |    2 ++
 mkfs/Makefile      |    2 ++
 po/Makefile        |    3 +++
 quota/Makefile     |    2 ++
 repair/Makefile    |    2 ++
 rtcp/Makefile      |    2 ++
 scrub/Makefile     |   12 ++++++++++--
 spaceman/Makefile  |    2 ++
 29 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 715cc212d686..b263a2dc98aa 100644
--- a/Makefile
+++ b/Makefile
@@ -116,7 +116,7 @@ configure: configure.ac
 include/builddefs: configure
 	./configure $$LOCAL_CONFIGURE_OPTIONS
 
-install: install-pkg
+install: install-pkg install-selfheal
 
 install-pkg: $(addsuffix -install-pkg,$(SUBDIRS))
 	$(INSTALL) -m 755 -d $(PKG_DOC_DIR)
@@ -124,6 +124,8 @@ install-pkg: $(addsuffix -install-pkg,$(SUBDIRS))
 
 install-dev: $(addsuffix -install-dev,$(SUBDIRS))
 
+install-selfheal: $(addsuffix -install-selfheal,$(SUBDIRS))
+
 %-install-pkg:
 	@echo "Installing $@"
 	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-pkg
@@ -132,6 +134,10 @@ install-dev: $(addsuffix -install-dev,$(SUBDIRS))
 	@echo "Installing $@"
 	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-dev
 
+%-install-selfheal:
+	@echo "Installing $@"
+	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-selfheal
+
 distclean: clean
 	$(Q)rm -f $(LDIRT)
 
diff --git a/copy/Makefile b/copy/Makefile
index 446d38bea576..3013dc2dca27 100644
--- a/copy/Makefile
+++ b/copy/Makefile
@@ -25,4 +25,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/db/Makefile b/db/Makefile
index 91e259044beb..839c51f03593 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -91,4 +91,6 @@ install-pkg: default
 	$(INSTALL) -m 755 xfs_metadump.sh $(PKG_SBIN_DIR)/xfs_metadump
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/debian/Makefile b/debian/Makefile
index f6a996e91871..104d7c5d76d9 100644
--- a/debian/Makefile
+++ b/debian/Makefile
@@ -36,3 +36,5 @@ ifeq ($(PKG_DISTRIBUTION), debian)
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 $(BOOT_MKFS_BIN) $(PKG_SBIN_DIR)/mkfs.xfs
 endif
+
+install-selfheal: default
diff --git a/debian/control b/debian/control
index 31773e53a19a..aa7a920a7964 100644
--- a/debian/control
+++ b/debian/control
@@ -27,6 +27,14 @@ Description: Utilities for managing the XFS filesystem
  Refer to the documentation at https://xfs.wiki.kernel.org/
  for complete details.
 
+Package: xfsprogs-self-healing
+Depends: ${shlibs:Depends}, ${misc:Depends}, xfsprogs, systemd, udev
+Architecture: linux-any
+Description: Automatic self healing for the XFS filesystem
+ A set of background services for the XFS filesystem to make it
+ find and fix corruptions automatically.  These services are activated
+ automatically upon installation of this package.
+
 Package: xfslibs-dev
 Section: libdevel
 Depends: libc6-dev | libc-dev, uuid-dev, xfsprogs (>= 3.0.0), ${misc:Depends}
diff --git a/debian/rules b/debian/rules
index bc1ffd76333b..b9faa203bb69 100755
--- a/debian/rules
+++ b/debian/rules
@@ -14,6 +14,7 @@ endif
 package = xfsprogs
 develop = xfslibs-dev
 bootpkg = xfsprogs-udeb
+healpkg = xfsprogs-self-healing
 
 DEB_BUILD_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_BUILD_GNU_TYPE)
 DEB_HOST_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_HOST_GNU_TYPE)
@@ -26,9 +27,11 @@ udebpkg = $(bootpkg)_$(version)_$(target).udeb
 dirme  = debian/$(package)
 dirdev = debian/$(develop)
 dirdi  = debian/$(bootpkg)
-pkgme  = DIST_ROOT=`pwd`/$(dirme);  export DIST_ROOT;
-pkgdev = DIST_ROOT=`pwd`/$(dirdev); export DIST_ROOT;
-pkgdi  = DIST_ROOT=`pwd`/$(dirdi); export DIST_ROOT;
+dirheal= debian/$(healpkg)
+pkgme  = DIST_ROOT=`pwd`/$(dirme);   export DIST_ROOT;
+pkgdev = DIST_ROOT=`pwd`/$(dirdev);  export DIST_ROOT;
+pkgdi  = DIST_ROOT=`pwd`/$(dirdi);   export DIST_ROOT;
+pkgheal= DIST_ROOT=`pwd`/$(dirheal); export DIST_ROOT;
 stdenv = @GZIP=-q; export GZIP;
 
 configure_options = \
@@ -103,6 +106,7 @@ binary-arch: checkroot built
 	$(pkgme)  $(MAKE) -C . install-pkg
 	$(pkgdev) $(MAKE) -C . install-dev
 	$(pkgdi)  $(MAKE) -C debian install-d-i
+	$(pkgheal) $(MAKE) -C . install-selfheal
 	#$(pkgme)  $(MAKE) dist
 	install -D -m 0755 debian/local/initramfs.hook debian/xfsprogs/usr/share/initramfs-tools/hooks/xfs
 	rmdir debian/xfslibs-dev/usr/share/doc/xfsprogs
@@ -114,7 +118,8 @@ binary-arch: checkroot built
 	dh_compress
 	dh_fixperms
 	dh_makeshlibs
-	dh_installsystemd -p xfsprogs --no-enable --no-start --no-restart-after-upgrade --no-stop-on-upgrade
+	dh_installsystemd -p xfsprogs --no-restart-after-upgrade --no-stop-on-upgrade system-xfs_scrub.slice
+	dh_installsystemd -p xfsprogs-self-healing --no-restart-after-upgrade --no-stop-on-upgrade xfs_scrub_all.timer
 	dh_installdeb
 	dh_shlibdeps
 	dh_gencontrol
diff --git a/doc/Makefile b/doc/Makefile
index ad6749b8d0be..a6ec65f5edc0 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -26,3 +26,6 @@ ifeq ($(PKG_DISTRIBUTION), debian)
 endif
 
 install-dev:
+
+install-selfheal:
+
diff --git a/estimate/Makefile b/estimate/Makefile
index d5f8a6d81d65..4fce3463d55c 100644
--- a/estimate/Makefile
+++ b/estimate/Makefile
@@ -19,4 +19,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/fsck/Makefile b/fsck/Makefile
index ccba7f0b6892..33fca9e18fb0 100644
--- a/fsck/Makefile
+++ b/fsck/Makefile
@@ -17,3 +17,6 @@ install-pkg: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 xfs_fsck.sh $(PKG_SBIN_DIR)/fsck.xfs
 install-dev:
+
+install-selfheal:
+
diff --git a/fsr/Makefile b/fsr/Makefile
index 3ad9f6d824c6..da32bc531970 100644
--- a/fsr/Makefile
+++ b/fsr/Makefile
@@ -22,4 +22,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/growfs/Makefile b/growfs/Makefile
index e0ab870bd6ba..61d184328cf6 100644
--- a/growfs/Makefile
+++ b/growfs/Makefile
@@ -30,4 +30,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/include/Makefile b/include/Makefile
index 23727fccfdcd..c18ab35a9861 100644
--- a/include/Makefile
+++ b/include/Makefile
@@ -64,3 +64,6 @@ install-pkg: default
 
 install-dev: install
 	$(INSTALL) -m 644 $(HFILES) $(PKG_INC_DIR)
+
+install-selfheal:
+
diff --git a/io/Makefile b/io/Makefile
index d035420b555c..70c95a5c1425 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -95,4 +95,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 xfs_mkfile.sh $(PKG_SBIN_DIR)/xfs_mkfile
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/libfrog/Makefile b/libfrog/Makefile
index c4be4a7f2a73..8b6562b194e1 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -72,4 +72,6 @@ include $(BUILDRULES)
 
 install install-pkg install-dev: default
 
+install-selfheal:
+
 -include .ltdep
diff --git a/libhandle/Makefile b/libhandle/Makefile
index 7cfd0fa4f27e..28c104a77f95 100644
--- a/libhandle/Makefile
+++ b/libhandle/Makefile
@@ -27,4 +27,6 @@ install-pkg: default
 install-dev: default
 	$(INSTALL_LTLIB_DEV)
 
+install-selfheal:
+
 -include .ltdep
diff --git a/libxcmd/Makefile b/libxcmd/Makefile
index afd5349c8af3..118ecd3cfc88 100644
--- a/libxcmd/Makefile
+++ b/libxcmd/Makefile
@@ -25,4 +25,6 @@ include $(BUILDRULES)
 
 install install-pkg install-dev: default
 
+install-selfheal:
+
 -include .ltdep
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 63657fb21cab..b386eaaa2cd2 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -158,3 +158,6 @@ install-dev: install
 ifndef NODEP
 -include .ltdep
 endif
+
+install-selfheal:
+
diff --git a/libxlog/Makefile b/libxlog/Makefile
index 3710729fe703..37281eb058f2 100644
--- a/libxlog/Makefile
+++ b/libxlog/Makefile
@@ -23,4 +23,6 @@ include $(BUILDRULES)
 
 install install-pkg install-dev: default
 
+install-selfheal:
+
 -include .ltdep
diff --git a/logprint/Makefile b/logprint/Makefile
index 5ec02539a7bb..f1fccbdc8ca7 100644
--- a/logprint/Makefile
+++ b/logprint/Makefile
@@ -28,4 +28,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/m4/Makefile b/m4/Makefile
index eda4c06f6864..b894dd3b0222 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -35,5 +35,7 @@ include $(BUILDRULES)
 
 install install-pkg install-dev install-lib: default
 
+install-selfheal:
+
 realclean: distclean
 	rm -f $(CONFIGURE)
diff --git a/man/Makefile b/man/Makefile
index f62286e8339d..258e8732a923 100644
--- a/man/Makefile
+++ b/man/Makefile
@@ -21,4 +21,6 @@ install-dev : $(addsuffix -install-dev,$(SUBDIRS))
 %-install-dev:
 	$(Q)$(MAKE) $(MAKEOPTS) -C $* install-dev
 
+install-selfheal:
+
 include $(BUILDRULES)
diff --git a/mdrestore/Makefile b/mdrestore/Makefile
index 0d02fb383404..c75d5875929b 100644
--- a/mdrestore/Makefile
+++ b/mdrestore/Makefile
@@ -23,4 +23,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/mkfs/Makefile b/mkfs/Makefile
index cf945aa10c25..515c2db0639d 100644
--- a/mkfs/Makefile
+++ b/mkfs/Makefile
@@ -37,4 +37,6 @@ install-pkg: default
 
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/po/Makefile b/po/Makefile
index 3cc0b4177c64..cb6ef954b695 100644
--- a/po/Makefile
+++ b/po/Makefile
@@ -25,3 +25,6 @@ install-pkg: default
 	$(INSTALL_LINGUAS)
 
 install-dev install-lib:
+
+install-selfheal:
+
diff --git a/quota/Makefile b/quota/Makefile
index 01584635b3dd..e1193bd203ad 100644
--- a/quota/Makefile
+++ b/quota/Makefile
@@ -30,4 +30,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/repair/Makefile b/repair/Makefile
index ed5a6b1ff4db..a82368e9b6a5 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -109,4 +109,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/rtcp/Makefile b/rtcp/Makefile
index 4adb58c4b783..ac638bcd1029 100644
--- a/rtcp/Makefile
+++ b/rtcp/Makefile
@@ -23,4 +23,6 @@ install-pkg: default
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
 install-dev:
 
+install-selfheal:
+
 -include .dep
diff --git a/scrub/Makefile b/scrub/Makefile
index 653aafd171b5..8a556a736d90 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -20,6 +20,7 @@ XFS_SCRUB_ARGS = -p
 XFS_SCRUB_SERVICE_ARGS = -b
 ifeq ($(HAVE_SYSTEMD),yes)
 INSTALL_SCRUB += install-systemd
+INSTALL_SELFHEAL += install-systemd-selfheal
 SYSTEMD_SERVICES=\
 	$(scrub_svcname) \
 	xfs_scrub_fail@.service \
@@ -27,9 +28,10 @@ SYSTEMD_SERVICES=\
 	xfs_scrub_media_fail@.service \
 	xfs_scrub_all.service \
 	xfs_scrub_all_fail.service \
-	xfs_scrub_all.timer \
 	system-xfs_scrub.slice
-OPTIONAL_TARGETS += $(SYSTEMD_SERVICES)
+SYSTEMD_SERVICES_SELFHEAL=\
+	xfs_scrub_all.timer
+OPTIONAL_TARGETS += $(SYSTEMD_SERVICES) $(SYSTEMD_SERVICES_SELFHEAL)
 endif
 ifeq ($(HAVE_CROND),yes)
 INSTALL_SCRUB += install-crond
@@ -164,6 +166,10 @@ install-systemd: default $(SYSTEMD_SERVICES)
 	$(INSTALL) -m 755 -d $(PKG_LIBEXEC_DIR)
 	$(INSTALL) -m 755 $(XFS_SCRUB_FAIL_PROG) $(PKG_LIBEXEC_DIR)
 
+install-systemd-selfheal: default $(SYSTEMD_SERVICES_SELFHEAL)
+	$(INSTALL) -m 755 -d $(SYSTEMD_SYSTEM_UNIT_DIR)
+	$(INSTALL) -m 644 $(SYSTEMD_SERVICES_SELFHEAL) $(SYSTEMD_SYSTEM_UNIT_DIR)
+
 install-crond: default $(CRONTABS)
 	$(INSTALL) -m 755 -d $(CROND_DIR)
 	$(INSTALL) -m 644 $(CRONTABS) $(CROND_DIR)
@@ -182,4 +188,6 @@ install-udev: $(UDEV_RULES)
 
 install-dev:
 
+install-selfheal: $(INSTALL_SELFHEAL)
+
 -include .dep
diff --git a/spaceman/Makefile b/spaceman/Makefile
index d0495ebb057e..e50595ab33c1 100644
--- a/spaceman/Makefile
+++ b/spaceman/Makefile
@@ -34,4 +34,6 @@ install-pkg: default
 	$(INSTALL) -m 755 xfs_info.sh $(PKG_SBIN_DIR)/xfs_info
 install-dev:
 
+install-selfheal:
+
 -include .dep

^ permalink raw reply related	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-16 16:46                 ` Darrick J. Wong
@ 2024-07-17  4:59                   ` Christoph Hellwig
  2024-07-17 16:15                     ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-17  4:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 16, 2024 at 09:46:29AM -0700, Darrick J. Wong wrote:
> Hm, xfs could do that, but that would clutter up the mount options.
> Or perhaps the systemd service could look for $mountpoint/.autoheal or
> something.
> 
> It might be easier still to split the packages, let me send an RFC of
> what that looks like.

So I think the package split makes sense no matter what we do,
but I really want a per file system choice and not a global one.

As a fіle system developer I always have scratch file systems around
that I definitely never want checked.  I'd also prefer scrub to not
randomly start checking external file system that are only temporarily
mounted and that I want to remove again.

Maybe we'll indeed want some kind of marker in the file system.

Btw, the code in scrub_all that parses lsblk output to find file systems
also looks a bit odd.  I'd expect it to use whatever is the python
equivalent of setmntent/getmntent on /proc/self/mounts.



^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [RFC PATCH 1/2] misc: shift install targets
  2024-07-16 16:47                 ` [RFC PATCH 1/2] misc: shift install targets Darrick J. Wong
  2024-07-16 16:49                   ` [RFC PATCH 2/2] debian: create a new package for automatic self-healing Darrick J. Wong
@ 2024-07-17  5:00                   ` Christoph Hellwig
  2024-07-17 16:19                     ` Darrick J. Wong
  1 sibling, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-17  5:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 16, 2024 at 09:47:14AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Modify each Makefile so that "install-pkg" installs the main package
> contents, and "install" just invokes "install-pkg".  We'll need this
> indirection for the next patch where we add an install-selfheal target
> to build the xfsprogs-self-healing package but will still want 'make
> install' to install everything on a developer's workstation.

Maybe debian packaging foo is getting a little rusty, but wasn't the
a concept of pattern matching to pick what files go into what subpackage
without having to change install targets?


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-17  4:59                   ` Christoph Hellwig
@ 2024-07-17 16:15                     ` Darrick J. Wong
  2024-07-17 16:45                       ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-17 16:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 17, 2024 at 06:59:04AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 16, 2024 at 09:46:29AM -0700, Darrick J. Wong wrote:
> > Hm, xfs could do that, but that would clutter up the mount options.
> > Or perhaps the systemd service could look for $mountpoint/.autoheal or
> > something.
> > 
> > It might be easier still to split the packages, let me send an RFC of
> > what that looks like.
> 
> So I think the package split makes sense no matter what we do,
> but I really want a per file system choice and not a global one.

<nod> How might we do that?

1. touch $mnt/.selfheal to opt-in to online fsck?
2. touch $mnt/.noselfheal to opt-out?
3. attr -s system.selfheal to opt in?
4. attr -s system.noselfheal to opt out?
5. compat feature to opt-in
6. compat feature to opt-out?
7. filesystem property that we stuff in the metadir?

1-2 most resemble the old /.forcefsck knob, and systemd has
ConditionPathExists= that we could use to turn an xfs_scrub@<path>
service invocation into a nop.

3-4 don't clutter up the root filesystem with dotfiles, but then if we
ever reset the root directory then the selfheal property is lost.

5-6 might be my preferred way, but then I think I have to add a fsgeom
flag so that userspace can detect the selfheal preferences.

7 is a special snowflake currently :P

Other issues:

a. Are there really two bits that we might want to store in the *heal
attribute?

   00 == never allow online fsck at all
   01 == allow invocations of xfs_scrub but don't let xfs_scrub_all
         invoke it
   11 == allow all invocations of online fsck

and probably useless:

   10 == allow invocations of xfs_scrub only from xfs_scrub_all

b. Adding these knobs means more userspace code to manage them.  1-4 can
be done easily in xfs_admin, 5-8 involve a new ioctl and io/db code.

c. 1-4 can be lost if something torches the root directory.  Is that ok?
7 also has that problem, but as nobody's really designed a filesystem
properties feature, I don't know if that's acceptable.

> As a fіle system developer I always have scratch file systems around
> that I definitely never want checked.  I'd also prefer scrub to not
> randomly start checking external file system that are only temporarily
> mounted and that I want to remove again.

Another thing I need to figure out if systemd has a handy way of
terminating a service as a "prerequisite" to "stopping" a mount.
Right now you can unmount an fs that's being scrubbed from your mount
hierarchy, but the service keeps running and the mount is still
active.

> Maybe we'll indeed want some kind of marker in the file system.
> 
> Btw, the code in scrub_all that parses lsblk output to find file systems
> also looks a bit odd.  I'd expect it to use whatever is the python
> equivalent of setmntent/getmntent on /proc/self/mounts.

It parses the lsblk json output so that it can trace all the xfs mounts
back to raw block devices so that it can try to kick off as many scrubs
in parallel as it can so long as no two scrubs will hit the same bdev.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [RFC PATCH 1/2] misc: shift install targets
  2024-07-17  5:00                   ` [RFC PATCH 1/2] misc: shift install targets Christoph Hellwig
@ 2024-07-17 16:19                     ` Darrick J. Wong
  2024-07-17 16:43                       ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-17 16:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 17, 2024 at 07:00:05AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 16, 2024 at 09:47:14AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Modify each Makefile so that "install-pkg" installs the main package
> > contents, and "install" just invokes "install-pkg".  We'll need this
> > indirection for the next patch where we add an install-selfheal target
> > to build the xfsprogs-self-healing package but will still want 'make
> > install' to install everything on a developer's workstation.
> 
> Maybe debian packaging foo is getting a little rusty, but wasn't the
> a concept of pattern matching to pick what files go into what subpackage
> without having to change install targets?

It is, and modern debhelper-based packaging workflows make this easy.
Unfortunately, xfsprogs hasn't been updated to use it. :(

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [RFC PATCH 1/2] misc: shift install targets
  2024-07-17 16:19                     ` Darrick J. Wong
@ 2024-07-17 16:43                       ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-17 16:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Wed, Jul 17, 2024 at 09:19:05AM -0700, Darrick J. Wong wrote:
> > > Modify each Makefile so that "install-pkg" installs the main package
> > > contents, and "install" just invokes "install-pkg".  We'll need this
> > > indirection for the next patch where we add an install-selfheal target
> > > to build the xfsprogs-self-healing package but will still want 'make
> > > install' to install everything on a developer's workstation.
> > 
> > Maybe debian packaging foo is getting a little rusty, but wasn't the
> > a concept of pattern matching to pick what files go into what subpackage
> > without having to change install targets?
> 
> It is, and modern debhelper-based packaging workflows make this easy.
> Unfortunately, xfsprogs hasn't been updated to use it. :(

Oh well.  Maybe I'll need to dust off my packaging skills and look
into that.  Because splitting the install targets just makes it harder
to move to a more modern build system eventually.  Another one of those
projects I've started and need to eventually finish..


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-17 16:15                     ` Darrick J. Wong
@ 2024-07-17 16:45                       ` Christoph Hellwig
  2024-07-22  4:12                         ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-17 16:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Wed, Jul 17, 2024 at 09:15:46AM -0700, Darrick J. Wong wrote:
> > So I think the package split makes sense no matter what we do,
> > but I really want a per file system choice and not a global one.
> 
> <nod> How might we do that?
> 
> 1. touch $mnt/.selfheal to opt-in to online fsck?
> 2. touch $mnt/.noselfheal to opt-out?
> 3. attr -s system.selfheal to opt in?
> 4. attr -s system.noselfheal to opt out?
> 5. compat feature to opt-in
> 6. compat feature to opt-out?
> 7. filesystem property that we stuff in the metadir?
> 
> 1-2 most resemble the old /.forcefsck knob, and systemd has
> ConditionPathExists= that we could use to turn an xfs_scrub@<path>
> service invocation into a nop.
> 
> 3-4 don't clutter up the root filesystem with dotfiles, but then if we
> ever reset the root directory then the selfheal property is lost.

All four of of them are kinda scary that the contents of the file systems
affects policy.  Now maybe we should never chown the root directory to
an untrusted or not fully trusted user, but..

> 5-6 might be my preferred way, but then I think I have to add a fsgeom
> flag so that userspace can detect the selfheal preferences.

These actually sound like the most sensible to me, even if the flags
are of course a little annoying.

> b. Adding these knobs means more userspace code to manage them.  1-4 can
> be done easily in xfs_admin, 5-8 involve a new ioctl and io/db code.

Yeah, that's kindof the downside.

The other option would of course be some kind of global table in /etc.


^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-17 16:45                       ` Christoph Hellwig
@ 2024-07-22  4:12                         ` Darrick J. Wong
  2024-07-22 12:34                           ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-22  4:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Wed, Jul 17, 2024 at 06:45:44PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 17, 2024 at 09:15:46AM -0700, Darrick J. Wong wrote:
> > > So I think the package split makes sense no matter what we do,
> > > but I really want a per file system choice and not a global one.
> > 
> > <nod> How might we do that?
> > 
> > 1. touch $mnt/.selfheal to opt-in to online fsck?
> > 2. touch $mnt/.noselfheal to opt-out?
> > 3. attr -s system.selfheal to opt in?
> > 4. attr -s system.noselfheal to opt out?
> > 5. compat feature to opt-in
> > 6. compat feature to opt-out?
> > 7. filesystem property that we stuff in the metadir?
> > 
> > 1-2 most resemble the old /.forcefsck knob, and systemd has
> > ConditionPathExists= that we could use to turn an xfs_scrub@<path>
> > service invocation into a nop.
> > 
> > 3-4 don't clutter up the root filesystem with dotfiles, but then if we
> > ever reset the root directory then the selfheal property is lost.
> 
> All four of of them are kinda scary that the contents of the file systems
> affects policy.  Now maybe we should never chown the root directory to
> an untrusted or not fully trusted user, but..
> 
> > 5-6 might be my preferred way, but then I think I have to add a fsgeom
> > flag so that userspace can detect the selfheal preferences.
> 
> These actually sound like the most sensible to me, even if the flags
> are of course a little annoying.
> 
> > b. Adding these knobs means more userspace code to manage them.  1-4 can
> > be done easily in xfs_admin, 5-8 involve a new ioctl and io/db code.
> 
> Yeah, that's kindof the downside.
> 
> The other option would of course be some kind of global table in /etc.

You could also do:

for x in <ephemeral mountpoints>; do
	systemctl mask xfs_scrub@$(systemd-escape --path $x)
done

(Though iirc xfs_scrub_all currently treats masked services as
failures; maybe it shouldn't.)

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-22  4:12                         ` Darrick J. Wong
@ 2024-07-22 12:34                           ` Christoph Hellwig
  2024-07-23 23:29                             ` Darrick J. Wong
  0 siblings, 1 reply; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-22 12:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Sun, Jul 21, 2024 at 09:12:29PM -0700, Darrick J. Wong wrote:
> You could also do:
> 
> for x in <ephemeral mountpoints>; do
> 	systemctl mask xfs_scrub@$(systemd-escape --path $x)
> done

That assumes I actually know about them.

> (Though iirc xfs_scrub_all currently treats masked services as
> failures; maybe it shouldn't.)

Independent of the rest of the discussion it probably shouldn't.

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-22 12:34                           ` Christoph Hellwig
@ 2024-07-23 23:29                             ` Darrick J. Wong
  2024-07-24 13:18                               ` Christoph Hellwig
  0 siblings, 1 reply; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-23 23:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs

On Mon, Jul 22, 2024 at 02:34:49PM +0200, Christoph Hellwig wrote:
> On Sun, Jul 21, 2024 at 09:12:29PM -0700, Darrick J. Wong wrote:
> > You could also do:
> > 
> > for x in <ephemeral mountpoints>; do
> > 	systemctl mask xfs_scrub@$(systemd-escape --path $x)
> > done
> 
> That assumes I actually know about them.

True.  So, do we want a compat flag to opt in to online fsck?  Or one to
opt out?  Or perhaps filesystems without rmap or pptrs aren't worth
autoscrubbing?

> > (Though iirc xfs_scrub_all currently treats masked services as
> > failures; maybe it shouldn't.)
> 
> Independent of the rest of the discussion it probably shouldn't.

Fixed; will have a patch out shortly.

--D

^ permalink raw reply	[flat|nested] 296+ messages in thread

* Re: [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default
  2024-07-23 23:29                             ` Darrick J. Wong
@ 2024-07-24 13:18                               ` Christoph Hellwig
  0 siblings, 0 replies; 296+ messages in thread
From: Christoph Hellwig @ 2024-07-24 13:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs

On Tue, Jul 23, 2024 at 04:29:51PM -0700, Darrick J. Wong wrote:
> On Mon, Jul 22, 2024 at 02:34:49PM +0200, Christoph Hellwig wrote:
> > On Sun, Jul 21, 2024 at 09:12:29PM -0700, Darrick J. Wong wrote:
> > > You could also do:
> > > 
> > > for x in <ephemeral mountpoints>; do
> > > 	systemctl mask xfs_scrub@$(systemd-escape --path $x)
> > > done
> > 
> > That assumes I actually know about them.
> 
> True.  So, do we want a compat flag to opt in to online fsck?  Or one to
> opt out?  Or perhaps filesystems without rmap or pptrs aren't worth
> autoscrubbing?

I think an explicit flag is definitively the right interface, and it
seems like a compat flag is so far the least bad variant.  Note that
the flag should just be about automatic online fsck - any explicit
user call to xfs_scrub should not be affected.

> 

^ permalink raw reply	[flat|nested] 296+ messages in thread

* [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers
  2024-07-30  0:21 [PATCHSET v13.8 18/23] xfsprogs: Parent Pointers Darrick J. Wong
@ 2024-07-30  1:19 ` Darrick J. Wong
  0 siblings, 0 replies; 296+ messages in thread
From: Darrick J. Wong @ 2024-07-30  1:19 UTC (permalink / raw)
  To: djwong, cem
  Cc: Christoph Hellwig, linux-xfs, catherine.hoang, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

Update xfs_attr_defer_add to use the pptr-specific opcodes if it's
reading or writing parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/defer_item.c |   52 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 9955e189d..8cdf57eac 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -676,21 +676,61 @@ xfs_attr_defer_add(
 	enum xfs_attr_defer_op	op)
 {
 	struct xfs_attr_intent	*new;
+	unsigned int		log_op = 0;
+	bool			is_pptr = args->attr_filter & XFS_ATTR_PARENT;
 
-	new = kmem_cache_zalloc(xfs_attr_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	if (is_pptr) {
+		ASSERT(xfs_has_parent(args->dp->i_mount));
+		ASSERT((args->attr_filter & ~XFS_ATTR_PARENT) != 0);
+		ASSERT(args->op_flags & XFS_DA_OP_LOGGED);
+		ASSERT(args->valuelen == sizeof(struct xfs_parent_rec));
+	}
+
+	new = kmem_cache_zalloc(xfs_attr_intent_cache,
+			GFP_NOFS | __GFP_NOFAIL);
 	new->xattri_da_args = args;
 
+	/* Compute log operation from the higher level op and namespace. */
 	switch (op) {
 	case XFS_ATTR_DEFER_SET:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_SET;
-		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_SET;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_SET;
 		break;
 	case XFS_ATTR_DEFER_REPLACE:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REPLACE;
-		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_REPLACE;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_REPLACE;
 		break;
 	case XFS_ATTR_DEFER_REMOVE:
-		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REMOVE;
+		if (is_pptr)
+			log_op = XFS_ATTRI_OP_FLAGS_PPTR_REMOVE;
+		else
+			log_op = XFS_ATTRI_OP_FLAGS_REMOVE;
+		break;
+	default:
+		ASSERT(0);
+		break;
+	}
+	new->xattri_op_flags = log_op;
+
+	/* Set up initial attr operation state. */
+	switch (log_op) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
+	case XFS_ATTRI_OP_FLAGS_SET:
+		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+		ASSERT(args->new_valuelen == args->valuelen);
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		new->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	}


^ permalink raw reply related	[flat|nested] 296+ messages in thread

end of thread, other threads:[~2024-07-30  1:19 UTC | newest]

Thread overview: 296+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02  0:43 [PATCHBOMB] xfsprogs: program changes (mostly xfs_scrub) for 6.10 Darrick J. Wong
2024-07-02  0:49 ` [PATCHSET v30.7 01/16] xfsprogs: atomic file updates Darrick J. Wong
2024-07-02  0:53   ` [PATCH 01/12] man: document the exchange-range ioctl Darrick J. Wong
2024-07-02  5:08     ` Christoph Hellwig
2024-07-02  0:54   ` [PATCH 02/12] man: document XFS_FSOP_GEOM_FLAGS_EXCHRANGE Darrick J. Wong
2024-07-02  5:08     ` Christoph Hellwig
2024-07-02  0:54   ` [PATCH 03/12] libhandle: add support for bulkstat v5 Darrick J. Wong
2024-07-02  5:09     ` Christoph Hellwig
2024-07-02 19:48       ` Darrick J. Wong
2024-07-02  0:54   ` [PATCH 04/12] libfrog: add support for exchange range ioctl family Darrick J. Wong
2024-07-02  5:09     ` Christoph Hellwig
2024-07-02  0:54   ` [PATCH 05/12] xfs_db: advertise exchange-range in the version command Darrick J. Wong
2024-07-02  5:10     ` Christoph Hellwig
2024-07-02  0:55   ` [PATCH 06/12] xfs_logprint: support dumping exchmaps log items Darrick J. Wong
2024-07-02  5:11     ` Christoph Hellwig
2024-07-02 19:49       ` Darrick J. Wong
2024-07-02  0:55   ` [PATCH 07/12] xfs_fsr: convert to bulkstat v5 ioctls Darrick J. Wong
2024-07-02  5:13     ` Christoph Hellwig
2024-07-02 19:51       ` Darrick J. Wong
2024-07-02  0:55   ` [PATCH 08/12] xfs_fsr: skip the xattr/forkoff levering with the newer swapext implementations Darrick J. Wong
2024-07-02  5:14     ` Christoph Hellwig
2024-07-02  0:55   ` [PATCH 09/12] xfs_io: create exchangerange command to test file range exchange ioctl Darrick J. Wong
2024-07-02  5:15     ` Christoph Hellwig
2024-07-02 19:53       ` Darrick J. Wong
2024-07-02  0:56   ` [PATCH 10/12] libfrog: advertise exchange-range support Darrick J. Wong
2024-07-02  5:15     ` Christoph Hellwig
2024-07-02  0:56   ` [PATCH 11/12] xfs_repair: add exchange-range to file systems Darrick J. Wong
2024-07-02  5:15     ` Christoph Hellwig
2024-07-02  0:56   ` [PATCH 12/12] mkfs: add a formatting option for exchange-range Darrick J. Wong
2024-07-02  5:16     ` Christoph Hellwig
2024-07-02  0:49 ` [PATCHSET v30.7 02/16] xfsprogs: inode-related repair fixes Darrick J. Wong
2024-07-02  0:56   ` [PATCH 1/3] xfs_{db,repair}: add an explicit owner field to xfs_da_args Darrick J. Wong
2024-07-02  5:16     ` Christoph Hellwig
2024-07-02  0:57   ` [PATCH 2/3] libxfs: port the bumplink function from the kernel Darrick J. Wong
2024-07-02  5:16     ` Christoph Hellwig
2024-07-02  0:57   ` [PATCH 3/3] mkfs/repair: pin inodes that would otherwise overflow link count Darrick J. Wong
2024-07-02  5:17     ` Christoph Hellwig
2024-07-02  0:50 ` [PATCHSET v30.7 03/16] xfs_scrub: detect deceptive filename extensions Darrick J. Wong
2024-07-02  0:57   ` [PATCH 01/13] xfs_scrub: use proper UChar string iterators Darrick J. Wong
2024-07-02  5:18     ` Christoph Hellwig
2024-07-02  0:57   ` [PATCH 02/13] xfs_scrub: hoist code that removes ignorable characters Darrick J. Wong
2024-07-02  5:21     ` Christoph Hellwig
2024-07-02  0:58   ` [PATCH 03/13] xfs_scrub: add a couple of omitted invisible code points Darrick J. Wong
2024-07-02  5:22     ` Christoph Hellwig
2024-07-03  1:59       ` Darrick J. Wong
2024-07-03  4:27         ` Christoph Hellwig
2024-07-03  5:02           ` Darrick J. Wong
2024-07-03 18:35             ` Darrick J. Wong
2024-07-04  7:22               ` Christoph Hellwig
2024-07-02  0:58   ` [PATCH 04/13] xfs_scrub: avoid potential UAF after freeing a duplicate name entry Darrick J. Wong
2024-07-02  5:22     ` Christoph Hellwig
2024-07-02  0:58   ` [PATCH 05/13] xfs_scrub: guard against libicu returning negative buffer lengths Darrick J. Wong
2024-07-02  5:22     ` Christoph Hellwig
2024-07-02  0:58   ` [PATCH 06/13] xfs_scrub: hoist non-rendering character predicate Darrick J. Wong
2024-07-02  5:23     ` Christoph Hellwig
2024-07-02  0:59   ` [PATCH 07/13] xfs_scrub: store bad flags with the name entry Darrick J. Wong
2024-07-02  5:23     ` Christoph Hellwig
2024-07-02  0:59   ` [PATCH 08/13] xfs_scrub: rename UNICRASH_ZERO_WIDTH to UNICRASH_INVISIBLE Darrick J. Wong
2024-07-02  5:23     ` Christoph Hellwig
2024-07-02  0:59   ` [PATCH 09/13] xfs_scrub: type-coerce the UNICRASH_* flags Darrick J. Wong
2024-07-02  5:24     ` Christoph Hellwig
2024-07-02  0:59   ` [PATCH 10/13] xfs_scrub: reduce size of struct name_entry Darrick J. Wong
2024-07-02  5:24     ` Christoph Hellwig
2024-07-02  1:00   ` [PATCH 11/13] xfs_scrub: rename struct unicrash.normalizer Darrick J. Wong
2024-07-02  5:24     ` Christoph Hellwig
2024-07-02  1:00   ` [PATCH 12/13] xfs_scrub: report deceptive file extensions Darrick J. Wong
2024-07-02  5:25     ` Christoph Hellwig
2024-07-02  1:00   ` [PATCH 13/13] xfs_scrub: dump unicode points Darrick J. Wong
2024-07-02  5:25     ` Christoph Hellwig
2024-07-02  0:50 ` [PATCHSET v30.7 04/16] xfs_scrub: move fstrim to a separate phase Darrick J. Wong
2024-07-02  1:01   ` [PATCH 1/8] xfs_scrub: move FITRIM to phase 8 Darrick J. Wong
2024-07-02  5:27     ` Christoph Hellwig
2024-07-02  1:01   ` [PATCH 2/8] xfs_scrub: ignore phase 8 if the user disabled fstrim Darrick J. Wong
2024-07-02  5:27     ` Christoph Hellwig
2024-07-02  1:01   ` [PATCH 3/8] xfs_scrub: collapse trim_filesystem Darrick J. Wong
2024-07-02  5:27     ` Christoph Hellwig
2024-07-02  1:01   ` [PATCH 4/8] xfs_scrub: fix the work estimation for phase 8 Darrick J. Wong
2024-07-02  5:27     ` Christoph Hellwig
2024-07-02  1:02   ` [PATCH 5/8] xfs_scrub: report FITRIM errors properly Darrick J. Wong
2024-07-02  5:28     ` Christoph Hellwig
2024-07-02  1:02   ` [PATCH 6/8] xfs_scrub: don't call FITRIM after runtime errors Darrick J. Wong
2024-07-02  5:28     ` Christoph Hellwig
2024-07-02  1:02   ` [PATCH 7/8] xfs_scrub: don't trim the first agbno of each AG for better performance Darrick J. Wong
2024-07-02  5:30     ` Christoph Hellwig
2024-07-03  3:45       ` Darrick J. Wong
2024-07-03  3:52     ` [PATCH v30.7.1 7/8] xfs_scrub: improve responsiveness while trimming the filesystem Darrick J. Wong
2024-07-03  4:33       ` Christoph Hellwig
2024-07-03  4:50         ` Darrick J. Wong
2024-07-02  1:02   ` [PATCH 8/8] xfs_scrub: improve progress meter for phase 8 fstrimming Darrick J. Wong
2024-07-02  0:50 ` [PATCHSET v30.7 05/16] xfs_scrub: use free space histograms to reduce fstrim runtime Darrick J. Wong
2024-07-02  1:03   ` [PATCH 1/7] libfrog: hoist free space histogram code Darrick J. Wong
2024-07-02  5:32     ` Christoph Hellwig
2024-07-03  2:47       ` Darrick J. Wong
2024-07-03  4:30         ` Christoph Hellwig
2024-07-03  4:55           ` Darrick J. Wong
2024-07-02  1:03   ` [PATCH 2/7] libfrog: print wider columns for free space histogram Darrick J. Wong
2024-07-02  5:33     ` Christoph Hellwig
2024-07-02  1:03   ` [PATCH 3/7] libfrog: print cdf of free space buckets Darrick J. Wong
2024-07-02  5:33     ` Christoph Hellwig
2024-07-02  1:03   ` [PATCH 4/7] xfs_scrub: don't close stdout when closing the progress bar Darrick J. Wong
2024-07-02  5:33     ` Christoph Hellwig
2024-07-02  1:04   ` [PATCH 5/7] xfs_scrub: remove pointless spacemap.c arguments Darrick J. Wong
2024-07-02  5:33     ` Christoph Hellwig
2024-07-02  1:04   ` [PATCH 6/7] xfs_scrub: collect free space histograms during phase 7 Darrick J. Wong
2024-07-02  5:34     ` Christoph Hellwig
2024-07-02  1:04   ` [PATCH 7/7] xfs_scrub: tune fstrim minlen parameter based on free space histograms Darrick J. Wong
2024-07-02  5:36     ` Christoph Hellwig
2024-07-03  2:29       ` Darrick J. Wong
2024-07-03  4:29         ` Christoph Hellwig
2024-07-03  4:55           ` Darrick J. Wong
2024-07-03  4:58             ` Christoph Hellwig
2024-07-03  5:04               ` Darrick J. Wong
2024-07-03  5:11                 ` Christoph Hellwig
2024-07-03  5:17                   ` Darrick J. Wong
2024-07-02  0:50 ` [PATCHSET v30.7 06/16] xfs_scrub: tighten security of systemd services Darrick J. Wong
2024-07-02  1:04   ` [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing Darrick J. Wong
2024-07-02  5:37     ` Christoph Hellwig
2024-07-02  1:05   ` [PATCH 2/6] xfs_scrub.service: reduce background CPU usage to less than one core if possible Darrick J. Wong
2024-07-02  1:05   ` [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd service Darrick J. Wong
2024-07-02  1:05   ` [PATCH 4/6] xfs_scrub: tighten up the security on the background " Darrick J. Wong
2024-07-02  1:06   ` [PATCH 5/6] xfs_scrub_fail: " Darrick J. Wong
2024-07-02  5:37     ` Christoph Hellwig
2024-07-02  1:06   ` [PATCH 6/6] xfs_scrub_all: " Darrick J. Wong
2024-07-02  5:37     ` Christoph Hellwig
2024-07-02  0:51 ` [PATCHSET v30.7 07/16] xfs_scrub_all: automatic media scan service Darrick J. Wong
2024-07-02  1:06   ` [PATCH 1/6] xfs_scrub_all: only use the xfs_scrub@ systemd services in service mode Darrick J. Wong
2024-07-02  5:38     ` Christoph Hellwig
2024-07-02  1:06   ` [PATCH 2/6] xfs_scrub_all: remove journalctl background process Darrick J. Wong
2024-07-02  5:38     ` Christoph Hellwig
2024-07-02  1:07   ` [PATCH 3/6] xfs_scrub_all: support metadata+media scans of all filesystems Darrick J. Wong
2024-07-02  5:39     ` Christoph Hellwig
2024-07-02  1:07   ` [PATCH 4/6] xfs_scrub_all: enable periodic file data scrubs automatically Darrick J. Wong
2024-07-02  5:39     ` Christoph Hellwig
2024-07-02  1:07   ` [PATCH 5/6] xfs_scrub_all: trigger automatic media scans once per month Darrick J. Wong
2024-07-02  5:39     ` Christoph Hellwig
2024-07-02  1:07   ` [PATCH 6/6] xfs_scrub_all: failure reporting for the xfs_scrub_all job Darrick J. Wong
2024-07-02  5:39     ` Christoph Hellwig
2024-07-02  0:51 ` [PATCHSET v30.7 08/16] xfs_scrub_all: improve systemd handling Darrick J. Wong
2024-07-02  1:08   ` [PATCH 1/5] xfs_scrub_all: encapsulate all the subprocess code in an object Darrick J. Wong
2024-07-02  5:40     ` Christoph Hellwig
2024-07-02  1:08   ` [PATCH 2/5] xfs_scrub_all: encapsulate all the systemctl " Darrick J. Wong
2024-07-02  5:41     ` Christoph Hellwig
2024-07-02  1:08   ` [PATCH 3/5] xfs_scrub_all: add CLI option for easier debugging Darrick J. Wong
2024-07-02  5:42     ` Christoph Hellwig
2024-07-02  1:08   ` [PATCH 4/5] xfs_scrub_all: convert systemctl calls to dbus Darrick J. Wong
2024-07-02  5:42     ` Christoph Hellwig
2024-07-02  1:09   ` [PATCH 5/5] xfs_scrub_all: implement retry and backoff for dbus calls Darrick J. Wong
2024-07-02  5:42     ` Christoph Hellwig
2024-07-02  0:51 ` [PATCHSET v30.7 09/16] xfs_scrub: automatic optimization by default Darrick J. Wong
2024-07-02  1:09   ` [PATCH 1/3] xfs_scrub: automatic downgrades to dry-run mode in service mode Darrick J. Wong
2024-07-02  5:43     ` Christoph Hellwig
2024-07-02  1:09   ` [PATCH 2/3] xfs_scrub: add an optimization-only mode Darrick J. Wong
2024-07-02  5:43     ` Christoph Hellwig
2024-07-02  1:09   ` [PATCH 3/3] debian: enable xfs_scrub_all systemd timer services by default Darrick J. Wong
2024-07-02  5:44     ` Christoph Hellwig
2024-07-03  2:59       ` Darrick J. Wong
2024-07-03  4:31         ` Christoph Hellwig
2024-07-03  5:01           ` Darrick J. Wong
2024-07-09 22:53             ` Darrick J. Wong
2024-07-10  6:18               ` Christoph Hellwig
2024-07-16 16:46                 ` Darrick J. Wong
2024-07-17  4:59                   ` Christoph Hellwig
2024-07-17 16:15                     ` Darrick J. Wong
2024-07-17 16:45                       ` Christoph Hellwig
2024-07-22  4:12                         ` Darrick J. Wong
2024-07-22 12:34                           ` Christoph Hellwig
2024-07-23 23:29                             ` Darrick J. Wong
2024-07-24 13:18                               ` Christoph Hellwig
2024-07-16 16:47                 ` [RFC PATCH 1/2] misc: shift install targets Darrick J. Wong
2024-07-16 16:49                   ` [RFC PATCH 2/2] debian: create a new package for automatic self-healing Darrick J. Wong
2024-07-17  5:00                   ` [RFC PATCH 1/2] misc: shift install targets Christoph Hellwig
2024-07-17 16:19                     ` Darrick J. Wong
2024-07-17 16:43                       ` Christoph Hellwig
2024-07-02  0:51 ` [PATCHSET v13.7 10/16] xfsprogs: improve extended attribute validation Darrick J. Wong
2024-07-02  1:10   ` [PATCH 1/3] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong
2024-07-02  5:44     ` Christoph Hellwig
2024-07-02  1:10   ` [PATCH 2/3] xfs_repair: enforce one namespace bit per extended attribute Darrick J. Wong
2024-07-02  5:44     ` Christoph Hellwig
2024-07-02  1:10   ` [PATCH 3/3] xfs_repair: check for unknown flags in attr entries Darrick J. Wong
2024-07-02  5:45     ` Christoph Hellwig
2024-07-02  0:52 ` [PATCHSET v13.7 11/16] xfsprogs: Parent Pointers Darrick J. Wong
2024-07-02  1:10   ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong
2024-07-02  6:24     ` Christoph Hellwig
2024-07-02  1:11   ` [PATCH 02/24] xfs_{db,repair}: implement new attr hash value function Darrick J. Wong
2024-07-02  6:24     ` Christoph Hellwig
2024-07-02  1:11   ` [PATCH 03/24] xfs_logprint: dump new attr log item fields Darrick J. Wong
2024-07-02  6:25     ` Christoph Hellwig
2024-07-02 20:02       ` Darrick J. Wong
2024-07-02  1:11   ` [PATCH 04/24] man: document the XFS_IOC_GETPARENTS ioctl Darrick J. Wong
2024-07-02  6:25     ` Christoph Hellwig
2024-07-02  1:11   ` [PATCH 05/24] libfrog: report parent pointers to userspace Darrick J. Wong
2024-07-02  6:25     ` Christoph Hellwig
2024-07-02  1:12   ` [PATCH 06/24] libfrog: add parent pointer support code Darrick J. Wong
2024-07-02  6:26     ` Christoph Hellwig
2024-07-02  1:12   ` [PATCH 07/24] xfs_io: adapt parent command to new parent pointer ioctls Darrick J. Wong
2024-07-02  6:26     ` Christoph Hellwig
2024-07-02  1:12   ` [PATCH 08/24] xfs_io: Add i, n and f flags to parent command Darrick J. Wong
2024-07-02  6:26     ` Christoph Hellwig
2024-07-02  1:13   ` [PATCH 09/24] xfs_logprint: decode parent pointers in ATTRI items fully Darrick J. Wong
2024-07-02  6:27     ` Christoph Hellwig
2024-07-02  1:13   ` [PATCH 10/24] xfs_spaceman: report file paths Darrick J. Wong
2024-07-02  6:27     ` Christoph Hellwig
2024-07-02  1:13   ` [PATCH 11/24] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
2024-07-02  6:27     ` Christoph Hellwig
2024-07-02  1:13   ` [PATCH 12/24] xfs_scrub: use parent pointers to report lost file data Darrick J. Wong
2024-07-02  6:28     ` Christoph Hellwig
2024-07-02  1:14   ` [PATCH 13/24] xfs_db: report parent pointers in version command Darrick J. Wong
2024-07-02  6:28     ` Christoph Hellwig
2024-07-02  1:14   ` [PATCH 14/24] xfs_db: report parent bit on xattrs Darrick J. Wong
2024-07-02  6:28     ` Christoph Hellwig
2024-07-02  1:14   ` [PATCH 15/24] xfs_db: report parent pointers embedded in xattrs Darrick J. Wong
2024-07-02  6:28     ` Christoph Hellwig
2024-07-02  1:14   ` [PATCH 16/24] xfs_db: obfuscate dirent and parent pointer names consistently Darrick J. Wong
2024-07-02  6:33     ` Christoph Hellwig
2024-07-02  1:15   ` [PATCH 17/24] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
2024-07-02  6:35     ` Christoph Hellwig
2024-07-02  1:15   ` [PATCH 18/24] xfs_db: add a parents command to list the parents of a file Darrick J. Wong
2024-07-02  6:35     ` Christoph Hellwig
2024-07-02  1:15   ` [PATCH 19/24] xfs_db: make attr_set and attr_remove handle parent pointers Darrick J. Wong
2024-07-02  6:36     ` Christoph Hellwig
2024-07-02  1:15   ` [PATCH 20/24] xfs_db: add link and unlink expert commands Darrick J. Wong
2024-07-02  6:36     ` Christoph Hellwig
2024-07-02  1:16   ` [PATCH 21/24] xfs_db: compute hashes of parent pointers Darrick J. Wong
2024-07-02  6:37     ` Christoph Hellwig
2024-07-02  1:16   ` [PATCH 22/24] libxfs: create new files with attr forks if necessary Darrick J. Wong
2024-07-02  6:37     ` Christoph Hellwig
2024-07-02  1:16   ` [PATCH 23/24] mkfs: Add parent pointers during protofile creation Darrick J. Wong
2024-07-02  6:37     ` Christoph Hellwig
2024-07-02  1:16   ` [PATCH 24/24] mkfs: enable formatting with parent pointers Darrick J. Wong
2024-07-02  6:38     ` Christoph Hellwig
2024-07-02  0:52 ` [PATCHSET v13.7 12/16] xfsprogs: scrubbing for " Darrick J. Wong
2024-07-02  1:17   ` [PATCH 1/2] xfs: create a blob array data structure Darrick J. Wong
2024-07-02  6:41     ` Christoph Hellwig
2024-07-02  1:17   ` [PATCH 2/2] man2: update ioctl_xfs_scrub_metadata.2 for parent pointers Darrick J. Wong
2024-07-02  6:41     ` Christoph Hellwig
2024-07-02  0:52 ` [PATCHSET v13.7 13/16] xfsprogs: offline repair " Darrick J. Wong
2024-07-02  1:17   ` [PATCH 01/12] xfs_db: remove some boilerplate from xfs_attr_set Darrick J. Wong
2024-07-02  6:41     ` Christoph Hellwig
2024-07-02  1:17   ` [PATCH 02/12] xfs_db: actually report errors from libxfs_attr_set Darrick J. Wong
2024-07-02  6:41     ` Christoph Hellwig
2024-07-02  1:18   ` [PATCH 03/12] xfs_repair: junk parent pointer attributes when filesystem doesn't support them Darrick J. Wong
2024-07-02  6:42     ` Christoph Hellwig
2024-07-02  1:18   ` [PATCH 04/12] xfs_repair: add parent pointers when messing with /lost+found Darrick J. Wong
2024-07-02  6:42     ` Christoph Hellwig
2024-07-02  1:18   ` [PATCH 05/12] xfs_repair: junk duplicate hashtab entries when processing sf dirents Darrick J. Wong
2024-07-02  6:43     ` Christoph Hellwig
2024-07-02  1:19   ` [PATCH 06/12] xfs_repair: build a parent pointer index Darrick J. Wong
2024-07-02  6:45     ` Christoph Hellwig
2024-07-02  1:19   ` [PATCH 07/12] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
2024-07-02  6:45     ` Christoph Hellwig
2024-07-02  1:19   ` [PATCH 08/12] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
2024-07-02  6:46     ` Christoph Hellwig
2024-07-02  1:19   ` [PATCH 09/12] xfs_repair: check parent pointers Darrick J. Wong
2024-07-02  6:47     ` Christoph Hellwig
2024-07-02  1:20   ` [PATCH 10/12] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
2024-07-02  6:47     ` Christoph Hellwig
2024-07-02  1:20   ` [PATCH 11/12] xfs_repair: update ondisk parent pointer records Darrick J. Wong
2024-07-02  6:47     ` Christoph Hellwig
2024-07-02  1:20   ` [PATCH 12/12] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
2024-07-02  6:47     ` Christoph Hellwig
2024-07-02  0:52 ` [PATCHSET v13.7 14/16] xfsprogs: detect and correct directory tree problems Darrick J. Wong
2024-07-02  1:20   ` [PATCH 1/5] libfrog: add directory tree structure scrubber to scrub library Darrick J. Wong
2024-07-02  6:48     ` Christoph Hellwig
2024-07-02  1:21   ` [PATCH 2/5] xfs_spaceman: report directory tree corruption in the health information Darrick J. Wong
2024-07-02  6:48     ` Christoph Hellwig
2024-07-02  1:21   ` [PATCH 3/5] xfs_scrub: fix erroring out of check_inode_names Darrick J. Wong
2024-07-02  6:48     ` Christoph Hellwig
2024-07-02  1:21   ` [PATCH 4/5] xfs_scrub: detect and repair directory tree corruptions Darrick J. Wong
2024-07-02  6:49     ` Christoph Hellwig
2024-07-02  1:21   ` [PATCH 5/5] xfs_scrub: defer phase5 file scans if dirloop fails Darrick J. Wong
2024-07-02  6:49     ` Christoph Hellwig
2024-07-02  0:53 ` [PATCHSET v30.7 15/16] xfs_scrub: vectorize kernel calls Darrick J. Wong
2024-07-02  1:22   ` [PATCH 01/10] man: document vectored scrub mode Darrick J. Wong
2024-07-02  6:56     ` Christoph Hellwig
2024-07-02  1:22   ` [PATCH 02/10] libfrog: support vectored scrub Darrick J. Wong
2024-07-02  6:56     ` Christoph Hellwig
2024-07-02  1:22   ` [PATCH 03/10] xfs_io: " Darrick J. Wong
2024-07-02  6:57     ` Christoph Hellwig
2024-07-02  1:22   ` [PATCH 04/10] xfs_scrub: split the scrub epilogue code into a separate function Darrick J. Wong
2024-07-02  6:57     ` Christoph Hellwig
2024-07-02  1:23   ` [PATCH 05/10] xfs_scrub: split the repair " Darrick J. Wong
2024-07-02  6:57     ` Christoph Hellwig
2024-07-02  1:23   ` [PATCH 06/10] xfs_scrub: convert scrub and repair epilogues to use xfs_scrub_vec Darrick J. Wong
2024-07-02  6:57     ` Christoph Hellwig
2024-07-02  1:23   ` [PATCH 07/10] xfs_scrub: vectorize scrub calls Darrick J. Wong
2024-07-02  6:58     ` Christoph Hellwig
2024-07-02  1:23   ` [PATCH 08/10] xfs_scrub: vectorize repair calls Darrick J. Wong
2024-07-02  6:58     ` Christoph Hellwig
2024-07-02  1:24   ` [PATCH 09/10] xfs_scrub: use scrub barriers to reduce kernel calls Darrick J. Wong
2024-07-02  6:58     ` Christoph Hellwig
2024-07-02  1:24   ` [PATCH 10/10] xfs_scrub: try spot repairs of metadata items to make scrub progress Darrick J. Wong
2024-07-02  6:59     ` Christoph Hellwig
2024-07-02  0:53 ` [PATCHSET v30.7 16/16] xfs_repair: small remote symlinks are ok Darrick J. Wong
2024-07-02  1:24   ` [PATCH 1/1] xfs_repair: allow symlinks with short remote targets Darrick J. Wong
2024-07-02  6:59     ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2024-07-30  0:21 [PATCHSET v13.8 18/23] xfsprogs: Parent Pointers Darrick J. Wong
2024-07-30  1:19 ` [PATCH 01/24] libxfs: create attr log item opcodes and formats for parent pointers Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).