* [PATCHBOMB 6.19] xfs: autonomous self healing
@ 2025-10-22 23:56 Darrick J. Wong
2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong
` (3 more replies)
0 siblings, 4 replies; 80+ messages in thread
From: Darrick J. Wong @ 2025-10-22 23:56 UTC (permalink / raw)
To: Carlos Maiolino, Christoph Hellwig
Cc: xfs, Chandan Babu R, linux-fsdevel, fstests
Hi everyone,
You might recall that 18 months ago I showed off an early draft of a
patchset implementing autonomous self healing capabilities for XFS.
The premise is quite simple -- add a few hooks to the kernel to capture
significant filesystem metadata and file health events (pretty much all
failures), queue these events to a special anonfd, and let userspace
read the events at its leisure. That's patchset 1.
The userspace part is more interesting, because there's a new daemon
that opens the anonfd given the root dir of a filesystem, captures a
file handle for the root dir, detaches from the root dir, and waits for
metadata events. Upon receipt of an adverse health event, it will
reopen the root directory and initiate repairs. I've left the prototype
Python script in place (patchset 2) but my ultimate goal is for everyone
to use the Rust version (patchset 3) because it's much quicker to
respond to problems.
New QA tests are patchset 4. Zorro: No need to merge this right away.
This work was mostly complete by the end of 2024, and I've been letting
it run on my XFS QA testing fleets ever since then. I am submitting
this patchset for upstream for 6.19. Once this is merged, the online
fsck project will be complete.
--D
^ permalink raw reply [flat|nested] 80+ messages in thread* [PATCHSET V2] xfs: autonomous self healing of filesystems 2025-10-22 23:56 [PATCHBOMB 6.19] xfs: autonomous self healing Darrick J. Wong @ 2025-10-22 23:59 ` Darrick J. Wong 2025-10-23 0:00 ` [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation Darrick J. Wong ` (18 more replies) 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 19 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-22 23:59 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs Hi all, This patchset builds new functionality to deliver live information about filesystem health events to userspace. This is done by creating an anonymous file that can be read() for events by userspace programs. Events are captured by hooking various parts of XFS and iomap so that metadata health failures, file I/O errors, and major changes in filesystem state (unmounts, shutdowns, etc.) can be observed by programs. When an event occurs, the hook functions queue an event object to each event anonfd for later processing. Programs must have CAP_SYS_ADMIN to open the anonfd and there's a maximum event lag to prevent resource overconsumption. The events themselves can be read() from the anonfd either as json objects for human readability, or as C structs for daemons. In userspace, we create a new daemon program that will read the event objects and initiate repairs automatically. This daemon is managed entirely by systemd and will not block unmounting of the filesystem unless repairs are ongoing. It is autostarted via some udev rules. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=health-monitoring --- Commits in this patchset: * docs: remove obsolete links in the xfs online repair documentation * docs: discuss autonomous self healing in the xfs online repair design doc * xfs: create debugfs uuid aliases * xfs: create hooks for monitoring health updates * xfs: create a filesystem shutdown hook * xfs: create hooks for media errors * iomap: report buffered read and write io errors to the filesystem * iomap: report directio read and write errors to callers * xfs: create file io error hooks * xfs: create a special file to pass filesystem health to userspace * xfs: create event queuing, formatting, and discovery infrastructure * xfs: report metadata health events through healthmon * xfs: report shutdown events through healthmon * xfs: report media errors through healthmon * xfs: report file io errors through healthmon * xfs: allow reconfiguration of the health monitoring device * xfs: validate fds against running healthmon * xfs: add media error reporting ioctl * xfs: send uevents when major filesystem events happen --- fs/iomap/internal.h | 2 fs/xfs/libxfs/xfs_fs.h | 173 ++ fs/xfs/libxfs/xfs_health.h | 52 + fs/xfs/xfs_file.h | 36 fs/xfs/xfs_fsops.h | 14 fs/xfs/xfs_healthmon.h | 107 + fs/xfs/xfs_linux.h | 3 fs/xfs/xfs_mount.h | 13 fs/xfs/xfs_notify_failure.h | 44 + fs/xfs/xfs_super.h | 13 fs/xfs/xfs_trace.h | 404 +++++ include/linux/fs.h | 4 include/linux/iomap.h | 2 Documentation/filesystems/vfs.rst | 7 .../filesystems/xfs/xfs-online-fsck-design.rst | 336 +--- fs/iomap/buffered-io.c | 27 fs/iomap/direct-io.c | 4 fs/iomap/ioend.c | 4 fs/xfs/Kconfig | 8 fs/xfs/Makefile | 7 fs/xfs/libxfs/xfs_healthmon.schema.json | 648 +++++++ fs/xfs/xfs_aops.c | 2 fs/xfs/xfs_file.c | 167 ++ fs/xfs/xfs_fsops.c | 75 + fs/xfs/xfs_health.c | 269 +++ fs/xfs/xfs_healthmon.c | 1741 ++++++++++++++++++++ fs/xfs/xfs_ioctl.c | 7 fs/xfs/xfs_notify_failure.c | 135 +- fs/xfs/xfs_super.c | 109 + fs/xfs/xfs_trace.c | 4 lib/seq_buf.c | 1 31 files changed, 4173 insertions(+), 245 deletions(-) create mode 100644 fs/xfs/xfs_healthmon.h create mode 100644 fs/xfs/libxfs/xfs_healthmon.schema.json create mode 100644 fs/xfs/xfs_healthmon.c ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong @ 2025-10-23 0:00 ` Darrick J. Wong 2025-10-24 5:40 ` Christoph Hellwig 2025-10-23 0:01 ` [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong ` (17 subsequent siblings) 18 siblings, 1 reply; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:00 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Online repair is now merged in upstream, no need to point to patchset links anymore. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- .../filesystems/xfs/xfs-online-fsck-design.rst | 236 +------------------- 1 file changed, 6 insertions(+), 230 deletions(-) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 8cbcd3c2643430..189d1f5f40788d 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -105,10 +105,8 @@ occur; this capability aids both strategies. TLDR; Show Me the Code! ----------------------- -Code is posted to the kernel.org git trees as follows: -`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_, -`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and -`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_. +Kernel and userspace code has been fully merged as of October 2025. + Each kernel patchset adding an online repair function will use the same branch name across the kernel, xfsprogs, and fstests git repos. @@ -764,12 +762,8 @@ allow the online fsck developers to compare online fsck against offline fsck, and they enable XFS developers to find deficiencies in the code base. Proposed patchsets include -`general fuzzer improvements -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_, `fuzzing baselines -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_, -and `improvements in fuzz testing comprehensiveness -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_. +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_. Stress Testing -------------- @@ -801,11 +795,6 @@ Success is defined by the ability to run all of these tests without observing any unexpected filesystem shutdowns due to corrupted metadata, kernel hang check warnings, or any other sort of mischief. -Proposed patchsets include `general stress testing -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_ -and the `evolution of existing per-function stress testing -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_. - 4. User Interface ================= @@ -886,10 +875,6 @@ apply as nice of a priority to IO and CPU scheduling as possible. This measure was taken to minimize delays in the rest of the filesystem. No such hardening has been performed for the cron job. -Proposed patchset: -`Enabling the xfs_scrub background service -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_. - Health Reporting ---------------- @@ -912,13 +897,6 @@ notifications and initiate a repair? *Answer*: These questions remain unanswered, but should be a part of the conversation with early adopters and potential downstream users of XFS. -Proposed patchsets include -`wiring up health reports to correction returns -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_ -and -`preservation of sickness info during memory reclaim -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_. - 5. Kernel Algorithms and Data Structures ======================================== @@ -1310,21 +1288,6 @@ Space allocation records are cross-referenced as follows: are there the same number of reverse mapping records for each block as the reference count record claims? -Proposed patchsets are the series to find gaps in -`refcount btree -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps>`_, -`inode btree -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-inobt-gaps>`_, and -`rmap btree -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-rmapbt-gaps>`_ records; -to find -`mergeable records -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-mergeable-records>`_; -and to -`improve cross referencing with rmap -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-strengthen-rmap-checking>`_ -before starting a repair. - Checking Extended Attributes ```````````````````````````` @@ -1756,10 +1719,6 @@ For scrub, the drain works as follows: To avoid polling in step 4, the drain provides a waitqueue for scrub threads to be woken up whenever the intent count drops to zero. -The proposed patchset is the -`scrub intent drain series -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-drain-intents>`_. - .. _jump_labels: Static Keys (aka Jump Label Patching) @@ -2036,10 +1995,6 @@ The ``xfarray_store_anywhere`` function is used to insert a record in any null record slot in the bag; and the ``xfarray_unset`` function removes a record from the bag. -The proposed patchset is the -`big in-memory array -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=big-array>`_. - Iterating Array Elements ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2172,10 +2127,6 @@ However, it should be noted that these repair functions only use blob storage to cache a small number of entries before adding them to a temporary ondisk file, which is why compaction is not required. -The proposed patchset is at the start of the -`extended attribute repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ series. - .. _xfbtree: In-Memory B+Trees @@ -2214,11 +2165,6 @@ xfiles enables reuse of the entire btree library. Btrees built atop an xfile are collectively known as ``xfbtrees``. The next few sections describe how they actually work. -The proposed patchset is the -`in-memory btree -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=in-memory-btrees>`_ -series. - Using xfiles as a Buffer Cache Target ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2459,14 +2405,6 @@ This enables the log to release the old EFI to keep the log moving forwards. EFIs have a role to play during the commit and reaping phases; please see the next section and the section about :ref:`reaping<reaping>` for more details. -Proposed patchsets are the -`bitmap rework -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-bitmap-rework>`_ -and the -`preparation for bulk loading btrees -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_. - - Writing the New Tree ```````````````````` @@ -2623,11 +2561,6 @@ The number of records for the inode btree is the number of xfarray records, but the record count for the free inode btree has to be computed as inode chunk records are stored in the xfarray. -The proposed patchset is the -`AG btree repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ -series. - Case Study: Rebuilding the Space Reference Counts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2716,11 +2649,6 @@ Reverse mappings are added to the bag using ``xfarray_store_anywhere`` and removed via ``xfarray_unset``. Bag members are examined through ``xfarray_iter`` loops. -The proposed patchset is the -`AG btree repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ -series. - Case Study: Rebuilding File Fork Mapping Indices ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2757,11 +2685,6 @@ EXTENTS format instead of BMBT, which may require a conversion. Third, the incore extent map must be reloaded carefully to avoid disturbing any delayed allocation extents. -The proposed patchset is the -`file mapping repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-file-mappings>`_ -series. - .. _reaping: Reaping Old Metadata Blocks @@ -2843,11 +2766,6 @@ blocks. As stated earlier, online repair functions use very large transactions to minimize the chances of this occurring. -The proposed patchset is the -`preparation for bulk loading btrees -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_ -series. - Case Study: Reaping After a Regular Btree Repair ```````````````````````````````````````````````` @@ -2943,11 +2861,6 @@ When the walk is complete, the bitmap disunion operation ``(ag_owner_bitmap & btrees. These blocks can then be reaped using the methods outlined above. -The proposed patchset is the -`AG btree repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ -series. - .. _rmap_reap: Case Study: Reaping After Repairing Reverse Mapping Btrees @@ -2972,11 +2885,6 @@ methods outlined above. The rest of the process of rebuildng the reverse mapping btree is discussed in a separate :ref:`case study<rmap_repair>`. -The proposed patchset is the -`AG btree repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_ -series. - Case Study: Rebuilding the AGFL ``````````````````````````````` @@ -3024,11 +2932,6 @@ more complicated, because computing the correct value requires traversing the forks, or if that fails, leaving the fields invalid and waiting for the fork fsck functions to run. -The proposed patchset is the -`inode -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-inodes>`_ -repair series. - Quota Record Repairs -------------------- @@ -3045,11 +2948,6 @@ checking are obviously bad limits and timer values. Quota usage counters are checked, repaired, and discussed separately in the section about :ref:`live quotacheck <quotacheck>`. -The proposed patchset is the -`quota -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quota>`_ -repair series. - .. _fscounters: Freezing to Fix Summary Counters @@ -3145,11 +3043,6 @@ long enough to check and correct the summary counters. | This bug was fixed in Linux 5.17. | +--------------------------------------------------------------------------+ -The proposed patchset is the -`summary counter cleanup -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters>`_ -series. - Full Filesystem Scans --------------------- @@ -3277,15 +3170,6 @@ Second, if the incore inode is stuck in some intermediate state, the scan coordinator must release the AGI and push the main filesystem to get the inode back into a loadable state. -The proposed patches are the -`inode scanner -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_ -series. -The first user of the new functionality is the -`online quotacheck -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_ -series. - Inode Management ```````````````` @@ -3381,12 +3265,6 @@ To capture these nuances, the online fsck code has a separate ``xchk_irele`` function to set or clear the ``DONTCACHE`` flag to get the required release behavior. -Proposed patchsets include fixing -`scrub iget usage -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes>`_ and -`dir iget usage -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-dir-iget-fixes>`_. - .. _ilocking: Locking Inodes @@ -3443,11 +3321,6 @@ If the dotdot entry changes while the directory is unlocked, then a move or rename operation must have changed the child's parentage, and the scan can exit early. -The proposed patchset is the -`directory repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_ -series. - .. _fshooks: Filesystem Hooks @@ -3594,11 +3467,6 @@ The inode scan APIs are pretty simple: - ``xchk_iscan_teardown`` to finish the scan -This functionality is also a part of the -`inode scanner -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_ -series. - .. _quotacheck: Case Study: Quota Counter Checking @@ -3686,11 +3554,6 @@ needing to hold any locks for a long duration. If repairs are desired, the real and shadow dquots are locked and their resource counts are set to the values in the shadow dquot. -The proposed patchset is the -`online quotacheck -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_ -series. - .. _nlinks: Case Study: File Link Count Checking @@ -3744,11 +3607,6 @@ shadow information. If no parents are found, the file must be :ref:`reparented <orphanage>` to the orphanage to prevent the file from being lost forever. -The proposed patchset is the -`file link count repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-nlinks>`_ -series. - .. _rmap_repair: Case Study: Rebuilding Reverse Mapping Records @@ -3828,11 +3686,6 @@ scan for reverse mapping records. 12. Free the xfbtree now that it not needed. -The proposed patchset is the -`rmap repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rmap-btree>`_ -series. - Staging Repairs with Temporary Files on Disk -------------------------------------------- @@ -3971,11 +3824,6 @@ Once a good copy of a data file has been constructed in a temporary file, it must be conveyed to the file being repaired, which is the topic of the next section. -The proposed patches are in the -`repair temporary files -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_ -series. - Logged File Content Exchanges ----------------------------- @@ -4025,11 +3873,6 @@ The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag in the superblock protects these new log item records from being replayed on old kernels. -The proposed patchset is the -`file contents exchange -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_ -series. - +--------------------------------------------------------------------------+ | **Sidebar: Using Log-Incompatible Feature Flags** | +--------------------------------------------------------------------------+ @@ -4323,11 +4166,6 @@ To repair the summary file, write the xfile contents into the temporary file and use atomic mapping exchange to commit the new contents. The temporary file is then reaped. -The proposed patchset is the -`realtime summary repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rtsummary>`_ -series. - Case Study: Salvaging Extended Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4369,11 +4207,6 @@ Salvaging extended attributes is done as follows: 4. Reap the temporary file. -The proposed patchset is the -`extended attribute repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ -series. - Fixing Directories ------------------ @@ -4448,11 +4281,6 @@ Unfortunately, the current dentry cache design doesn't provide a means to walk every child dentry of a specific directory, which makes this a hard problem. There is no known solution. -The proposed patchset is the -`directory repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_ -series. - Parent Pointers ``````````````` @@ -4612,11 +4440,6 @@ a :ref:`directory entry live update hook <liveupdate>` as follows: 7. Reap the temporary directory. -The proposed patchset is the -`parent pointers directory repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_ -series. - Case Study: Repairing Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4662,11 +4485,6 @@ directory reconstruction: 8. Reap the temporary file. -The proposed patchset is the -`parent pointers repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_ -series. - Digression: Offline Checking of Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4755,11 +4573,6 @@ connectivity checks: 4. Move on to examining link counts, as we do today. -The proposed patchset is the -`offline parent pointers repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_ -series. - Rebuilding directories from parent pointers in offline repair would be very challenging because xfs_repair currently uses two single-pass scans of the filesystem during phases 3 and 4 to decide which files are corrupt enough to be @@ -4903,12 +4716,6 @@ Repairing the directory tree works as follows: 6. If the subdirectory has zero paths, attach it to the lost and found. -The proposed patches are in the -`directory tree repair -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_ -series. - - .. _orphanage: The Orphanage @@ -4973,11 +4780,6 @@ Orphaned files are adopted by the orphanage as follows: 7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all resources. -The proposed patches are in the -`orphanage adoption -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-orphanage>`_ -series. - 6. Userspace Algorithms and Data Structures =========================================== @@ -5091,14 +4893,6 @@ first workqueue's workers until the backlog eases. This doesn't completely solve the balancing problem, but reduces it enough to move on to more pressing issues. -The proposed patchsets are the scrub -`performance tweaks -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-performance-tweaks>`_ -and the -`inode scan rebalance -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-iscan-rebalance>`_ -series. - .. _scrubrepair: Scheduling Repairs @@ -5179,20 +4973,6 @@ immediately. Corrupt file data blocks reported by phase 6 cannot be recovered by the filesystem. -The proposed patchsets are the -`repair warning improvements -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-better-repair-warnings>`_, -refactoring of the -`repair data dependency -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-data-deps>`_ -and -`object tracking -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-object-tracking>`_, -and the -`repair scheduling -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-scheduling>`_ -improvement series. - Checking Names for Confusable Unicode Sequences ----------------------------------------------- @@ -5372,6 +5152,8 @@ The extra flexibility enables several new use cases: This emulates an atomic device write in software, and can support arbitrary scattered writes. +(This functionality was merged into mainline as of 2025) + Vectorized Scrub ---------------- @@ -5393,13 +5175,7 @@ It is hoped that ``io_uring`` will pick up enough of this functionality that online fsck can use that instead of adding a separate vectored scrub system call to XFS. -The relevant patchsets are the -`kernel vectorized scrub -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub>`_ -and -`userspace vectorized scrub -<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub>`_ -series. +(This functionality was merged into mainline as of 2025) Quality of Service Targets for Scrub ------------------------------------ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation 2025-10-23 0:00 ` [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation Darrick J. Wong @ 2025-10-24 5:40 ` Christoph Hellwig 2025-10-27 16:15 ` Darrick J. Wong 0 siblings, 1 reply; 80+ messages in thread From: Christoph Hellwig @ 2025-10-24 5:40 UTC (permalink / raw) To: Darrick J. Wong; +Cc: cem, linux-fsdevel, linux-xfs Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> Maybe expedite this for 6.18-rc? ^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation 2025-10-24 5:40 ` Christoph Hellwig @ 2025-10-27 16:15 ` Darrick J. Wong 0 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-27 16:15 UTC (permalink / raw) To: Christoph Hellwig; +Cc: cem, linux-fsdevel, linux-xfs On Thu, Oct 23, 2025 at 10:40:24PM -0700, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Maybe expedite this for 6.18-rc? Ok. I guess removing obsolete links is a bug fix :) --D ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:00 ` [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation Darrick J. Wong @ 2025-10-23 0:01 ` Darrick J. Wong 2025-10-30 16:38 ` Darrick J. Wong 2025-10-23 0:01 ` [PATCH 03/19] xfs: create debugfs uuid aliases Darrick J. Wong ` (16 subsequent siblings) 18 siblings, 1 reply; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:01 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the XFS online repair document to describe the motivation and design of the autonomous filesystem healing agent known as xfs_healer. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- .../filesystems/xfs/xfs-online-fsck-design.rst | 102 ++++++++++++++++++++ 1 file changed, 100 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 189d1f5f40788d..bdbf338a9c9f0c 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -166,9 +166,12 @@ The current XFS tools leave several problems unsolved: malicious actors **exploit quirks of Unicode** to place misleading names in directories. +8. **Site Reliability and Support Engineers** would like to reduce the + frequency of incidents requiring **manual intervention**. + Given this definition of the problems to be solved and the actors who would benefit, the proposed solution is a third fsck tool that acts on a running -filesystem. +filesystem, and an autononmous agent that fixes problems as they arise. This new third program has three components: an in-kernel facility to check metadata, an in-kernel facility to repair metadata, and a userspace driver @@ -203,6 +206,13 @@ Even if a piece of filesystem metadata can only be regenerated by scanning the entire system, the scan can still be done in the background while other file operations continue. +The autonomous self healing agent should listen for metadata health impact +reports coming from the kernel and automatically schedule repairs for the +damaged metadata. +If the required repairs are larger in scope than a single metadata structure, +``xfs_scrub`` should be invoked to perform a full analysis. +``xfs_healer`` is the name of this program. + In summary, online fsck takes advantage of resource sharding and redundant metadata to enable targeted checking and repair operations while the system is running. @@ -850,11 +860,16 @@ variable in the following service files: * ``xfs_scrub_all_fail.service`` The decision to enable the background scan is left to the system administrator. -This can be done by enabling either of the following services: +This can be done system-wide by enabling either of the following services: * ``xfs_scrub_all.timer`` on systemd systems * ``xfs_scrub_all.cron`` on non-systemd systems +To enable online repair for specific filesystems, the ``autofsck`` +filesystem property should be set to ``repair``. +To enable only scanning, the property should be set to ``check``. +To disable online fsck entirely, the property should be set to ``none``. + This automatic weekly scan is configured out of the box to perform an additional media scan of all file data once per month. This is less foolproof than, say, storing file data block checksums, but much @@ -897,6 +912,36 @@ notifications and initiate a repair? *Answer*: These questions remain unanswered, but should be a part of the conversation with early adopters and potential downstream users of XFS. +Autonomous Self Healing +----------------------- + +The autonomous self healing agent is a background system service that starts +when the filesystem is mounted and runs until unmount. +When starting up, the agent opens a special pseudofile under the specific +mount. +When the filesystem generates new adverse health events, the events will be +made available for reading via the special pseudofile. +The events need not be limited to metadata concerns; they can also reflect +events outside of the filesystem's direct control such as file I/O errors. + +The agent reads these events in a loop and responds to the events +appropriately. +For a single trouble report about metadata, the agent initiates a targeted +repair of the specific structure. +If that repair fails or the agent observes too many metadata trouble reports +over a short interval, it should then initiate a full scan of the filesystem +via the ``xfs_scrub`` service. + +The decision to enable the background scan is left to the system administrator. +This can be done system-wide by enabling the following services: + +* ``xfs_healer@.service`` on systemd systems + +To enable autonomous healing for specific filesystems, the ``autofsck`` +filesystem property should be set to ``repair``. +To disable self healing, the property should be set to ``check``, +``optimize``, or ``none``. + 5. Kernel Algorithms and Data Structures ======================================== @@ -5071,6 +5116,59 @@ and report what has been lost. For media errors in blocks owned by files, parent pointers can be used to construct file paths from inode numbers for user-friendly reporting. +Autonomous Self Healing +----------------------- + +When a filesystem mounts, the Linux kernel initiates a uevent describing the +mount and the path to the data device. +A udev rule determines the initial mountpoint from the data device path +and starts a mount-specific ``xfs_healer`` service instance. +The ``xfs_healer`` service opens the mountpoint and issues the +XFS_IOC_HEALTH_MONITOR ioctl to open a special health monitoring file. +After that is set up, the mountpoint is closed to avoid pinning the mount. + +The health monitoring file hooks certain points of the filesystem so that it +may receive events about metadata health, filesystem shutdowns, media errors, +file I/O errors, and unmounting of the filesystem. +Events are queued up for each health monitor file and encoded into a +``struct xfs_health_monitor_event`` object when the agent calls ``read()`` on +the file. +All health events are dispatched to a background threadpool to reduce stalls +in the main event loop. +Events can be logged into the system log for further analysis. + +For metadata health events, the specific details are used to construct a call +to the scrub ioctl. +The filesystem mountpoint is reopened, and the kernel is called. +If events are lost or the repairs fail, a full scan will be initiated by +starting up an ``xfs_scrub@.service`` for the given mountpoint. + +A filesystem shutdown causes all future repair work to cease, and an unmount +causes the agent to exit. + +**Question**: Why use a pseudofile and not use existing notification methods? + +*Answer*: The pseudofile is a private filesystem interface only available to +processes with the CAP_SYS_ADMIN priviledge. +Being private gives the kernel and ``xfs_healer`` the flexibility to change +or update the event format in the future without worrying about backwards +compatibility. +Using existing notifications means that the event format would be frozen in +public UAPI forever. + +The pseudofile can also accept ioctls, which gives ``xfs_healer`` a solid +means to validate that prior to a repair, its reopened mountpoint is actually +the same filesystem that is being monitored. + +**Future Work Question**: Should the healer daemon also register a dbus +listener and publish events there? + +*Answer*: This is unclear -- if there's a demand for system monitoring daemons +to consume this information and make decisions, then yes, this could be wired +up in ``xfs_healer``. +On the other hand, systemd is in the middle of a transition to varlink, so +it makes more sense to wait and see what happens. + 7. Conclusion and Future Work ============================= ^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc 2025-10-23 0:01 ` [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong @ 2025-10-30 16:38 ` Darrick J. Wong 0 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-30 16:38 UTC (permalink / raw) To: cem; +Cc: linux-fsdevel, linux-xfs On Wed, Oct 22, 2025 at 05:01:07PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Update the XFS online repair document to describe the motivation and > design of the autonomous filesystem healing agent known as xfs_healer. > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> /me decides (or rather it was pointed out to me) that there's a kernel component to xfs_healer, but no explicit discussion of it in section 5 ("Kernel Algorithms and Data Structures"). Also given the frequency of the question "why not reuse fsnotify?" I'll address the reasons for that here. I've added the following text, which will appear in the next revision: 5. Kernel Algorithms and Data Structures ======================================== <snip> +Health Monitoring +----------------- + +A self-correcting filesystem responds to observations of problems by scheduling +repairs of the affected areas. +The filesystem must therefore create event objects in response to stimuli +(metadata corruption, file I/O errors, etc.) and dispatch these events to +downstream consumers. +Downstream consumers that are in the kernel itself are easy to implement with +the ``xfs_hooks`` infrastructure created for other parts of online repair; these +are basically indirect function calls. + +However, the decision to translate an adverse metadata health report into a +repair should be made by userspace, and the actual scheduling done by userspace. +Some users (e.g. containers) would prefer to fast-fail the container and restart +it on another node at a previous checkpoint. +For workloads running in isolation, repairs may be preferable; either way this +is something the system administrator knows, and not the kernel. +A userspace agent (``xfs_healer``, described later) will collect events from the +kernel and dispatch them appropriately. + +Exporting health events to userspace requires the creation of a new component, +known as the health monitor. +Because the monitor exposes itself to userspace to deliver information, a file +descriptor is the natural abstraction to use here. +The health monitor hooks all the relevant sources of metadata health events. +Upon activation of the hook, a new event object is created and added to a queue. +When the agent reads from the fd, event objects are pulled from the start of the +queue and formatted into the user's buffer. +The events are freed, and the read call returns to userspace to allow the agent +to perform some work. +Memory usage is constrained on a per-fd basis to prevent memory exhaustion; if +an event must be discarded, a special "lost event" event is delivered to the +agent. + +In short, health events are captured, queued, and eventually copied out to +userspace for dispatching. + +**Question**: Why use a pseudofile and not use existing notification methods? + +*Answer*: The pseudofile is a private filesystem interface only available to +processes with the CAP_SYS_ADMIN priviledge and the ability to open the root +directory. +Being private gives the kernel and ``xfs_healer`` the flexibility to change +or update the event format in the future without worrying about backwards +compatibility. +Using existing notifications means that the event format would be frozen in +the public fsnotify UAPI forever, which would affect two subsystems. + +The pseudofile can also accept ioctls, which gives ``xfs_healer`` a solid +means to validate that prior to a repair, its reopened mountpoint is actually +the same filesystem that is being monitored. + +**Question**: Why not reuse fs/notify? + +*Answer*: It's much simpler for the healthmon code to manage its own queue of +events and to wake up readers instead of reusing fsnotify because that's the +only part of fsnotify that would use. + +Before I get started, an introduction: fsnotify expects its users (e.g. +fanotify) to implement quite a bit of functionality; all it provides is a +wrapper around a simple queue and a lot of code to convey information about the +calling process to that user. +fanotify has to actually implement all the queue management code on its own, +and so would healthmon. + +So if healthmon used fsnotify, it would have to create its own fsnotify group +structure. +For our purposes, the group is a very large wrapper around a linked list, some +counters, and a mutex. +The group object is critical for ensuring that sees only its own events, and +that nobody else (e.g. regular fanotify) ever sees these events. +There's a lot more in there for controlling whether fanotify reports pids, +groups, file handles, etc. that healthmon doesn't care about. + +Starting from the fsnotify() function call: + + - I /think/ we'd have to define a new "data type", which itself is just a plain + int but I think they correspond to FSNOTIFY_EVENT_* values which themselves + are actually part of an enum. + The data type controls the typecasting options for the ``void *data`` + parameter, which I guess is how I'd pass the healthmon event info from the + hooks into the fsnotify mechanism and back out to the healthmon code. + + - Each filesystem that wants to do this probably has to add their own + FSNOTIFY_EVENT_{XFS,BTRFS,BFS} data type value because that's a casting + decision that's made inside the main fsnotify code. + I think this can be avoided if each fs is careful never to leak events + outside of the group. + Either way, it's harder to follow the data flows here because fsnotify can + only take and pass around ``void *`` pointers, and it makes various indirect + function calls to manage events. + Contrast this with doing everything with typed pointers and direct calls + within ``xfs_healthmon.c``. + + - Since healthmon is both producer and consumer of fsnotify events, we can + probably define our own "mask" value. + It's a relief that we don't have to interact with fanotify, because fanotify + has used up 22 of its 32 mask bits. + +Once healthmon gets an event into fsnotify, fsnotify will call back (into +healthmon!) to tell it that it got an event. +From there, the fsnotify implementation (healthmon) has to allocate an event +object and add it to the event queue in the group, which is what it already does +now. +Overflow control is up to the fsnotify implementation, which healthmon already +implements. + +After the event is queued, the fsnotify implementation also has to implement its +own read file op to dequeue an event and copy it to the userspace buffer in +whatever format it likes. +Again, healthmon already does all this. + +In the end, replacing the homegrown event dispatching in healthmon with fsnotify +would make the data flows much harder to understand, and all we gain is a +generic event dispatcher that relies on indirect function calls instead of +direct ones. +We still have to implement the queuing discipline ourselves! :( + +**Future Work Question**: Should these events be exposed through the fanotify +filesystem error event interface? + +*Answer*: Yes. +fanotify is much more careful about filtering out events to processes that +aren't running with privileges. +These processes should have a means to receive simple notifications about +file errors. +However, this will require coordination between fanotify, ext4, and XFS, and +is (for now) outside the scope of this project. --D > --- > .../filesystems/xfs/xfs-online-fsck-design.rst | 102 ++++++++++++++++++++ > 1 file changed, 100 insertions(+), 2 deletions(-) > > > diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst > index 189d1f5f40788d..bdbf338a9c9f0c 100644 > --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst > +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst > @@ -166,9 +166,12 @@ The current XFS tools leave several problems unsolved: > malicious actors **exploit quirks of Unicode** to place misleading names > in directories. > > +8. **Site Reliability and Support Engineers** would like to reduce the > + frequency of incidents requiring **manual intervention**. > + > Given this definition of the problems to be solved and the actors who would > benefit, the proposed solution is a third fsck tool that acts on a running > -filesystem. > +filesystem, and an autononmous agent that fixes problems as they arise. > > This new third program has three components: an in-kernel facility to check > metadata, an in-kernel facility to repair metadata, and a userspace driver > @@ -203,6 +206,13 @@ Even if a piece of filesystem metadata can only be regenerated by scanning the > entire system, the scan can still be done in the background while other file > operations continue. > > +The autonomous self healing agent should listen for metadata health impact > +reports coming from the kernel and automatically schedule repairs for the > +damaged metadata. > +If the required repairs are larger in scope than a single metadata structure, > +``xfs_scrub`` should be invoked to perform a full analysis. > +``xfs_healer`` is the name of this program. > + > In summary, online fsck takes advantage of resource sharding and redundant > metadata to enable targeted checking and repair operations while the system > is running. > @@ -850,11 +860,16 @@ variable in the following service files: > * ``xfs_scrub_all_fail.service`` > > The decision to enable the background scan is left to the system administrator. > -This can be done by enabling either of the following services: > +This can be done system-wide by enabling either of the following services: > > * ``xfs_scrub_all.timer`` on systemd systems > * ``xfs_scrub_all.cron`` on non-systemd systems > > +To enable online repair for specific filesystems, the ``autofsck`` > +filesystem property should be set to ``repair``. > +To enable only scanning, the property should be set to ``check``. > +To disable online fsck entirely, the property should be set to ``none``. > + > This automatic weekly scan is configured out of the box to perform an > additional media scan of all file data once per month. > This is less foolproof than, say, storing file data block checksums, but much > @@ -897,6 +912,36 @@ notifications and initiate a repair? > *Answer*: These questions remain unanswered, but should be a part of the > conversation with early adopters and potential downstream users of XFS. > > +Autonomous Self Healing > +----------------------- > + > +The autonomous self healing agent is a background system service that starts > +when the filesystem is mounted and runs until unmount. > +When starting up, the agent opens a special pseudofile under the specific > +mount. > +When the filesystem generates new adverse health events, the events will be > +made available for reading via the special pseudofile. > +The events need not be limited to metadata concerns; they can also reflect > +events outside of the filesystem's direct control such as file I/O errors. > + > +The agent reads these events in a loop and responds to the events > +appropriately. > +For a single trouble report about metadata, the agent initiates a targeted > +repair of the specific structure. > +If that repair fails or the agent observes too many metadata trouble reports > +over a short interval, it should then initiate a full scan of the filesystem > +via the ``xfs_scrub`` service. > + > +The decision to enable the background scan is left to the system administrator. > +This can be done system-wide by enabling the following services: > + > +* ``xfs_healer@.service`` on systemd systems > + > +To enable autonomous healing for specific filesystems, the ``autofsck`` > +filesystem property should be set to ``repair``. > +To disable self healing, the property should be set to ``check``, > +``optimize``, or ``none``. > + > 5. Kernel Algorithms and Data Structures > ======================================== > > @@ -5071,6 +5116,59 @@ and report what has been lost. > For media errors in blocks owned by files, parent pointers can be used to > construct file paths from inode numbers for user-friendly reporting. > > +Autonomous Self Healing > +----------------------- > + > +When a filesystem mounts, the Linux kernel initiates a uevent describing the > +mount and the path to the data device. > +A udev rule determines the initial mountpoint from the data device path > +and starts a mount-specific ``xfs_healer`` service instance. > +The ``xfs_healer`` service opens the mountpoint and issues the > +XFS_IOC_HEALTH_MONITOR ioctl to open a special health monitoring file. > +After that is set up, the mountpoint is closed to avoid pinning the mount. > + > +The health monitoring file hooks certain points of the filesystem so that it > +may receive events about metadata health, filesystem shutdowns, media errors, > +file I/O errors, and unmounting of the filesystem. > +Events are queued up for each health monitor file and encoded into a > +``struct xfs_health_monitor_event`` object when the agent calls ``read()`` on > +the file. > +All health events are dispatched to a background threadpool to reduce stalls > +in the main event loop. > +Events can be logged into the system log for further analysis. > + > +For metadata health events, the specific details are used to construct a call > +to the scrub ioctl. > +The filesystem mountpoint is reopened, and the kernel is called. > +If events are lost or the repairs fail, a full scan will be initiated by > +starting up an ``xfs_scrub@.service`` for the given mountpoint. > + > +A filesystem shutdown causes all future repair work to cease, and an unmount > +causes the agent to exit. > + > +**Question**: Why use a pseudofile and not use existing notification methods? > + > +*Answer*: The pseudofile is a private filesystem interface only available to > +processes with the CAP_SYS_ADMIN priviledge. > +Being private gives the kernel and ``xfs_healer`` the flexibility to change > +or update the event format in the future without worrying about backwards > +compatibility. > +Using existing notifications means that the event format would be frozen in > +public UAPI forever. > + > +The pseudofile can also accept ioctls, which gives ``xfs_healer`` a solid > +means to validate that prior to a repair, its reopened mountpoint is actually > +the same filesystem that is being monitored. > + > +**Future Work Question**: Should the healer daemon also register a dbus > +listener and publish events there? > + > +*Answer*: This is unclear -- if there's a demand for system monitoring daemons > +to consume this information and make decisions, then yes, this could be wired > +up in ``xfs_healer``. > +On the other hand, systemd is in the middle of a transition to varlink, so > +it makes more sense to wait and see what happens. > + > 7. Conclusion and Future Work > ============================= > > > ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 03/19] xfs: create debugfs uuid aliases 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:00 ` [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation Darrick J. Wong 2025-10-23 0:01 ` [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong @ 2025-10-23 0:01 ` Darrick J. Wong 2025-10-23 0:01 ` [PATCH 04/19] xfs: create hooks for monitoring health updates Darrick J. Wong ` (15 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:01 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an alias for the debugfs dir so that we can find a filesystem by uuid. Unless it's mounted nouuid. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 11 +++++++++++ 2 files changed, 12 insertions(+) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index f046d1215b043c..8643d539bc4869 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -290,6 +290,7 @@ typedef struct xfs_mount { struct delayed_work m_reclaim_work; /* background inode reclaim */ struct xfs_zone_info *m_zone_info; /* zone allocator information */ struct dentry *m_debugfs; /* debugfs parent */ + struct dentry *m_debugfs_uuid; /* debugfs symlink */ struct xfs_kobj m_kobj; struct xfs_kobj m_error_kobj; struct xfs_kobj m_error_meta_kobj; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d8f326d8838036..abe229fa5aa4b6 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -813,6 +813,7 @@ xfs_mount_free( if (mp->m_ddev_targp) xfs_free_buftarg(mp->m_ddev_targp); + debugfs_remove(mp->m_debugfs_uuid); debugfs_remove(mp->m_debugfs); kfree(mp->m_rtname); kfree(mp->m_logname); @@ -1963,6 +1964,16 @@ xfs_fs_fill_super( goto out_unmount; } + if (xfs_debugfs && mp->m_debugfs && !xfs_has_nouuid(mp)) { + char name[UUID_STRING_LEN + 1]; + + snprintf(name, UUID_STRING_LEN + 1, "%pU", &mp->m_sb.sb_uuid); + mp->m_debugfs_uuid = debugfs_create_symlink(name, xfs_debugfs, + mp->m_super->s_id); + } else { + mp->m_debugfs_uuid = NULL; + } + return 0; out_filestream_unmount: ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 04/19] xfs: create hooks for monitoring health updates 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (2 preceding siblings ...) 2025-10-23 0:01 ` [PATCH 03/19] xfs: create debugfs uuid aliases Darrick J. Wong @ 2025-10-23 0:01 ` Darrick J. Wong 2025-10-23 0:01 ` [PATCH 05/19] xfs: create a filesystem shutdown hook Darrick J. Wong ` (14 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:01 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create hooks for monitoring health events. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_health.h | 47 ++++++++++ fs/xfs/xfs_mount.h | 3 + fs/xfs/xfs_health.c | 202 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_super.c | 1 4 files changed, 252 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index b31000f7190ce5..39fef33dedc6a8 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -289,4 +289,51 @@ void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs); #define xfs_metadata_is_sick(error) \ (unlikely((error) == -EFSCORRUPTED || (error) == -EFSBADCRC)) +/* + * Parameters for tracking health updates. The enum below is passed as the + * hook function argument. + */ +enum xfs_health_update_type { + XFS_HEALTHUP_SICK = 1, /* runtime corruption observed */ + XFS_HEALTHUP_CORRUPT, /* fsck reported corruption */ + XFS_HEALTHUP_HEALTHY, /* fsck reported healthy structure */ + XFS_HEALTHUP_UNMOUNT, /* filesystem is unmounting */ +}; + +/* Where in the filesystem was the event observed? */ +enum xfs_health_update_domain { + XFS_HEALTHUP_FS = 1, /* main filesystem */ + XFS_HEALTHUP_AG, /* allocation group */ + XFS_HEALTHUP_INODE, /* inode */ + XFS_HEALTHUP_RTGROUP, /* realtime group */ +}; + +struct xfs_health_update_params { + /* XFS_HEALTHUP_INODE */ + xfs_ino_t ino; + uint32_t gen; + + /* XFS_HEALTHUP_AG/RTGROUP */ + uint32_t group; + + /* XFS_SICK_* flags */ + unsigned int old_mask; + unsigned int new_mask; + + enum xfs_health_update_domain domain; +}; + +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_health_hook { + struct xfs_hook health_hook; +}; + +void xfs_health_hook_disable(void); +void xfs_health_hook_enable(void); + +int xfs_health_hook_add(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_del(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_setup(struct xfs_health_hook *hook, notifier_fn_t mod_fn); +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_HEALTH_H__ */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 8643d539bc4869..b810b01734d854 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -344,6 +344,9 @@ typedef struct xfs_mount { /* Hook to feed dirent updates to an active online repair. */ struct xfs_hooks m_dir_update_hooks; + + /* Hook to feed health events to a daemon. */ + struct xfs_hooks m_health_update_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 7c541fb373d5b2..abf9460ae79953 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -20,6 +20,157 @@ #include "xfs_quota_defs.h" #include "xfs_rtgroup.h" +#ifdef CONFIG_XFS_LIVE_HOOKS +/* + * Use a static key here to reduce the overhead of health updates. If + * the compiler supports jump labels, the static branch will be replaced by a + * nop sled when there are no hook users. Online fsck is currently the only + * caller, so this is a reasonable tradeoff. + * + * Note: Patching the kernel code requires taking the cpu hotplug lock. Other + * parts of the kernel allocate memory with that lock held, which means that + * XFS callers cannot hold any locks that might be used by memory reclaim or + * writeback when calling the static_branch_{inc,dec} functions. + */ +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_health_hooks_switch); + +void +xfs_health_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_health_hooks_switch); +} + +void +xfs_health_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_health_hooks_switch); +} + +/* Call downstream hooks for a filesystem unmount health update. */ +static inline void +xfs_health_unmount_hook( + struct xfs_mount *mp) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_FS, + }; + + xfs_hooks_call(&mp->m_health_update_hooks, + XFS_HEALTHUP_UNMOUNT, &p); + } +} + +/* Call downstream hooks for a filesystem health update. */ +static inline void +xfs_fs_health_update_hook( + struct xfs_mount *mp, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_FS, + .old_mask = old_mask, + .new_mask = new_mask, + }; + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call downstream hooks for a group health update. */ +static inline void +xfs_group_health_update_hook( + struct xfs_group *xg, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .old_mask = old_mask, + .new_mask = new_mask, + .group = xg->xg_gno, + }; + struct xfs_mount *mp = xg->xg_mount; + + switch (xg->xg_type) { + case XG_TYPE_AG: + p.domain = XFS_HEALTHUP_AG; + break; + case XG_TYPE_RTG: + p.domain = XFS_HEALTHUP_RTGROUP; + break; + default: + ASSERT(0); + return; + } + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call downstream hooks for an inode health update. */ +static inline void +xfs_inode_health_update_hook( + struct xfs_inode *ip, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_INODE, + .old_mask = old_mask, + .new_mask = new_mask, + .ino = ip->i_ino, + .gen = VFS_I(ip)->i_generation, + }; + struct xfs_mount *mp = ip->i_mount; + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call the specified function during a health update. */ +int +xfs_health_hook_add( + struct xfs_mount *mp, + struct xfs_health_hook *hook) +{ + return xfs_hooks_add(&mp->m_health_update_hooks, &hook->health_hook); +} + +/* Stop calling the specified function during a health update. */ +void +xfs_health_hook_del( + struct xfs_mount *mp, + struct xfs_health_hook *hook) +{ + xfs_hooks_del(&mp->m_health_update_hooks, &hook->health_hook); +} + +/* Configure health update hook functions. */ +void +xfs_health_hook_setup( + struct xfs_health_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->health_hook, mod_fn); +} +#else +# define xfs_health_unmount_hook(...) ((void)0) +# define xfs_fs_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_rt_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_group_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_inode_health_update_hook(a,b,o,n) do {o = o;} while(0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + static void xfs_health_unmount_group( struct xfs_group *xg, @@ -50,8 +201,10 @@ xfs_health_unmount( unsigned int checked = 0; bool warn = false; - if (xfs_is_shutdown(mp)) + if (xfs_is_shutdown(mp)) { + xfs_health_unmount_hook(mp); return; + } /* Measure AG corruption levels. */ while ((pag = xfs_perag_next(mp, pag))) @@ -97,6 +250,8 @@ xfs_health_unmount( if (sick & XFS_SICK_FS_COUNTERS) xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS); } + + xfs_health_unmount_hook(mp); } /* Mark unhealthy per-fs metadata. */ @@ -105,12 +260,17 @@ xfs_fs_mark_sick( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_sick(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_SICK, old_mask, mask); } /* Mark per-fs metadata as having been checked and found unhealthy by fsck. */ @@ -119,13 +279,18 @@ xfs_fs_mark_corrupt( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_corrupt(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick |= mask; mp->m_fs_checked |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* Mark a per-fs metadata healed. */ @@ -134,15 +299,20 @@ xfs_fs_mark_healthy( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_healthy(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick &= ~mask; if (!(mp->m_fs_sick & XFS_SICK_FS_PRIMARY)) mp->m_fs_sick &= ~XFS_SICK_FS_SECONDARY; mp->m_fs_checked |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which per-fs metadata are unhealthy. */ @@ -192,12 +362,17 @@ xfs_group_mark_sick( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_sick(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_SICK, old_mask, mask); } /* @@ -208,13 +383,18 @@ xfs_group_mark_corrupt( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_corrupt(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick |= mask; xg->xg_checked |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* @@ -225,15 +405,20 @@ xfs_group_mark_healthy( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_healthy(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick &= ~mask; if (!(xg->xg_sick & XFS_SICK_AG_PRIMARY)) xg->xg_sick &= ~XFS_SICK_AG_SECONDARY; xg->xg_checked |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which per-ag metadata are unhealthy. */ @@ -272,10 +457,13 @@ xfs_inode_mark_sick( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_sick(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick |= mask; spin_unlock(&ip->i_flags_lock); @@ -287,6 +475,8 @@ xfs_inode_mark_sick( spin_lock(&VFS_I(ip)->i_lock); VFS_I(ip)->i_state &= ~I_DONTCACHE; spin_unlock(&VFS_I(ip)->i_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_SICK, old_mask, mask); } /* Mark inode metadata as having been checked and found unhealthy by fsck. */ @@ -295,10 +485,13 @@ xfs_inode_mark_corrupt( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_corrupt(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick |= mask; ip->i_checked |= mask; spin_unlock(&ip->i_flags_lock); @@ -311,6 +504,8 @@ xfs_inode_mark_corrupt( spin_lock(&VFS_I(ip)->i_lock); VFS_I(ip)->i_state &= ~I_DONTCACHE; spin_unlock(&VFS_I(ip)->i_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* Mark parts of an inode healed. */ @@ -319,15 +514,20 @@ xfs_inode_mark_healthy( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_healthy(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick &= ~mask; if (!(ip->i_sick & XFS_SICK_INO_PRIMARY)) ip->i_sick &= ~XFS_SICK_INO_SECONDARY; ip->i_checked |= mask; spin_unlock(&ip->i_flags_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which parts of an inode are unhealthy. */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index abe229fa5aa4b6..cd3b7343b326a8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2285,6 +2285,7 @@ xfs_init_fs_context( mp->m_allocsize_log = 16; /* 64k */ xfs_hooks_init(&mp->m_dir_update_hooks); + xfs_hooks_init(&mp->m_health_update_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 05/19] xfs: create a filesystem shutdown hook 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (3 preceding siblings ...) 2025-10-23 0:01 ` [PATCH 04/19] xfs: create hooks for monitoring health updates Darrick J. Wong @ 2025-10-23 0:01 ` Darrick J. Wong 2025-10-23 0:02 ` [PATCH 06/19] xfs: create hooks for media errors Darrick J. Wong ` (13 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:01 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a hook so that health monitoring can report filesystem shutdown events to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_fsops.h | 14 +++++++++++++ fs/xfs/xfs_mount.h | 3 +++ fs/xfs/xfs_fsops.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_super.c | 1 + 4 files changed, 75 insertions(+) diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h index 9d23c361ef56e4..7f6f876de072b1 100644 --- a/fs/xfs/xfs_fsops.h +++ b/fs/xfs/xfs_fsops.h @@ -15,4 +15,18 @@ int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags); int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp); void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp); +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_shutdown_hook { + struct xfs_hook shutdown_hook; +}; + +void xfs_shutdown_hook_disable(void); +void xfs_shutdown_hook_enable(void); + +int xfs_shutdown_hook_add(struct xfs_mount *mp, struct xfs_shutdown_hook *hook); +void xfs_shutdown_hook_del(struct xfs_mount *mp, struct xfs_shutdown_hook *hook); +void xfs_shutdown_hook_setup(struct xfs_shutdown_hook *hook, + notifier_fn_t mod_fn); +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_FSOPS_H__ */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index b810b01734d854..96c920ad5add13 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -347,6 +347,9 @@ typedef struct xfs_mount { /* Hook to feed health events to a daemon. */ struct xfs_hooks m_health_update_hooks; + + /* Hook to feed shutdown events to a daemon. */ + struct xfs_hooks m_shutdown_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 0ada735693945c..69918cd1ba1dbc 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -482,6 +482,61 @@ xfs_fs_goingdown( return 0; } +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_shutdown_hooks_switch); + +void +xfs_shutdown_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_shutdown_hooks_switch); +} + +void +xfs_shutdown_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_shutdown_hooks_switch); +} + +/* Call downstream hooks for a filesystem shutdown. */ +static inline void +xfs_shutdown_hook( + struct xfs_mount *mp, + uint32_t flags) +{ + if (xfs_hooks_switched_on(&xfs_shutdown_hooks_switch)) + xfs_hooks_call(&mp->m_shutdown_hooks, flags, NULL); +} + +/* Call the specified function during a shutdown update. */ +int +xfs_shutdown_hook_add( + struct xfs_mount *mp, + struct xfs_shutdown_hook *hook) +{ + return xfs_hooks_add(&mp->m_shutdown_hooks, &hook->shutdown_hook); +} + +/* Stop calling the specified function during a shutdown update. */ +void +xfs_shutdown_hook_del( + struct xfs_mount *mp, + struct xfs_shutdown_hook *hook) +{ + xfs_hooks_del(&mp->m_shutdown_hooks, &hook->shutdown_hook); +} + +/* Configure shutdown update hook functions. */ +void +xfs_shutdown_hook_setup( + struct xfs_shutdown_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->shutdown_hook, mod_fn); +} +#else +# define xfs_shutdown_hook(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + /* * Force a shutdown of the filesystem instantly while keeping the filesystem * consistent. We don't do an unmount here; just shutdown the shop, make sure @@ -540,6 +595,8 @@ xfs_do_force_shutdown( "Please unmount the filesystem and rectify the problem(s)"); if (xfs_error_level >= XFS_ERRLEVEL_HIGH) xfs_stack_trace(); + + xfs_shutdown_hook(mp, flags); } /* diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index cd3b7343b326a8..54dcc42c65c786 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2285,6 +2285,7 @@ xfs_init_fs_context( mp->m_allocsize_log = 16; /* 64k */ xfs_hooks_init(&mp->m_dir_update_hooks); + xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); fc->s_fs_info = mp; ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 06/19] xfs: create hooks for media errors 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (4 preceding siblings ...) 2025-10-23 0:01 ` [PATCH 05/19] xfs: create a filesystem shutdown hook Darrick J. Wong @ 2025-10-23 0:02 ` Darrick J. Wong 2025-10-23 0:02 ` [PATCH 07/19] iomap: report buffered read and write io errors to the filesystem Darrick J. Wong ` (12 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:02 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a media error event hook so that we can send events to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_mount.h | 3 ++ fs/xfs/xfs_notify_failure.h | 38 +++++++++++++++++++ fs/xfs/xfs_notify_failure.c | 84 ++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_super.c | 1 + 4 files changed, 121 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 96c920ad5add13..0907714c9d6f21 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -350,6 +350,9 @@ typedef struct xfs_mount { /* Hook to feed shutdown events to a daemon. */ struct xfs_hooks m_shutdown_hooks; + + /* Hook to feed media error events to a daemon. */ + struct xfs_hooks m_media_error_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h index 8d08ec29dd2949..528317ff24320a 100644 --- a/fs/xfs/xfs_notify_failure.h +++ b/fs/xfs/xfs_notify_failure.h @@ -8,4 +8,42 @@ extern const struct dax_holder_operations xfs_dax_holder_operations; +enum xfs_failed_device { + XFS_FAILED_DATADEV, + XFS_FAILED_LOGDEV, + XFS_FAILED_RTDEV, +}; + +#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +struct xfs_media_error_params { + struct xfs_mount *mp; + enum xfs_failed_device fdev; + xfs_daddr_t daddr; + uint64_t bbcount; + bool pre_remove; +}; + +struct xfs_media_error_hook { + struct xfs_hook error_hook; +}; + +void xfs_media_error_hook_disable(void); +void xfs_media_error_hook_enable(void); + +int xfs_media_error_hook_add(struct xfs_mount *mp, + struct xfs_media_error_hook *hook); +void xfs_media_error_hook_del(struct xfs_mount *mp, + struct xfs_media_error_hook *hook); +void xfs_media_error_hook_setup(struct xfs_media_error_hook *hook, + notifier_fn_t mod_fn); +#else +struct xfs_media_error_params { }; +struct xfs_media_error_hook { }; +# define xfs_media_error_hook_disable() ((void)0) +# define xfs_media_error_hook_enable() ((void)0) +# define xfs_media_error_hook_add(...) (0) +# define xfs_media_error_hook_del(...) ((void)0) +# define xfs_media_error_hook_setup(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index b1767288994206..2098ff452a3b87 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -27,6 +27,73 @@ #include <linux/dax.h> #include <linux/fs.h> +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_media_error_hooks_switch); + +void +xfs_media_error_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_media_error_hooks_switch); +} + +void +xfs_media_error_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_media_error_hooks_switch); +} + +/* Call downstream hooks for a media error. */ +static inline void +xfs_media_error_hook( + struct xfs_mount *mp, + enum xfs_failed_device fdev, + xfs_daddr_t daddr, + uint64_t bbcount, + bool pre_remove) +{ + if (xfs_hooks_switched_on(&xfs_media_error_hooks_switch)) { + struct xfs_media_error_params p = { + .mp = mp, + .fdev = fdev, + .daddr = daddr, + .bbcount = bbcount, + .pre_remove = pre_remove, + }; + + xfs_hooks_call(&mp->m_media_error_hooks, 0, &p); + } +} + +/* Call the specified function during a media error. */ +int +xfs_media_error_hook_add( + struct xfs_mount *mp, + struct xfs_media_error_hook *hook) +{ + return xfs_hooks_add(&mp->m_media_error_hooks, &hook->error_hook); +} + +/* Stop calling the specified function during a media error. */ +void +xfs_media_error_hook_del( + struct xfs_mount *mp, + struct xfs_media_error_hook *hook) +{ + xfs_hooks_del(&mp->m_media_error_hooks, &hook->error_hook); +} + +/* Configure media error hook functions. */ +void +xfs_media_error_hook_setup( + struct xfs_media_error_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->error_hook, mod_fn); +} +#else +# define xfs_media_error_hook(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + struct xfs_failure_info { xfs_agblock_t startblock; xfs_extlen_t blockcount; @@ -215,6 +282,9 @@ xfs_dax_notify_logdev_failure( if (error) return error; + xfs_media_error_hook(mp, XFS_FAILED_LOGDEV, daddr, bblen, + mf_flags & MF_MEM_PRE_REMOVE); + /* * In the pre-remove case the failure notification is attempting to * trigger a force unmount. The expectation is that the device is @@ -248,16 +318,20 @@ xfs_dax_notify_dev_failure( uint64_t bblen; struct xfs_group *xg = NULL; + error = xfs_dax_translate_range(xfs_group_type_buftarg(mp, type), + offset, len, &daddr, &bblen); + if (error) + return error; + + xfs_media_error_hook(mp, type == XG_TYPE_RTG ? + XFS_FAILED_RTDEV : XFS_FAILED_DATADEV, + daddr, bblen, mf_flags & MF_MEM_PRE_REMOVE); + if (!xfs_has_rmapbt(mp)) { xfs_debug(mp, "notify_failure() needs rmapbt enabled!"); return -EOPNOTSUPP; } - error = xfs_dax_translate_range(xfs_group_type_buftarg(mp, type), - offset, len, &daddr, &bblen); - if (error) - return error; - if (type == XG_TYPE_RTG) { start_bno = xfs_daddr_to_rtb(mp, daddr); end_bno = xfs_daddr_to_rtb(mp, daddr + bblen - 1); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 54dcc42c65c786..51f8db95e717a8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2287,6 +2287,7 @@ xfs_init_fs_context( xfs_hooks_init(&mp->m_dir_update_hooks); xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); + xfs_hooks_init(&mp->m_media_error_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 07/19] iomap: report buffered read and write io errors to the filesystem 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (5 preceding siblings ...) 2025-10-23 0:02 ` [PATCH 06/19] xfs: create hooks for media errors Darrick J. Wong @ 2025-10-23 0:02 ` Darrick J. Wong 2025-10-23 0:02 ` [PATCH 08/19] iomap: report directio read and write errors to callers Darrick J. Wong ` (11 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:02 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Provide a callback so that iomap can report read and write IO errors to the caller filesystem. For now this is only wired up for iomap as a testbed for XFS. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/iomap/internal.h | 2 ++ include/linux/fs.h | 4 ++++ Documentation/filesystems/vfs.rst | 7 +++++++ fs/iomap/buffered-io.c | 27 +++++++++++++++++++++++++-- fs/iomap/ioend.c | 4 ++++ 5 files changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h index d05cb3aed96e79..06d9145b6be4fa 100644 --- a/fs/iomap/internal.h +++ b/fs/iomap/internal.h @@ -5,5 +5,7 @@ #define IOEND_BATCH_SIZE 4096 u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend); +void iomap_mapping_ioerror(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error); #endif /* _IOMAP_INTERNAL_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index c895146c1444be..5e4b3a4b24823f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -477,6 +477,10 @@ struct address_space_operations { sector_t *span); void (*swap_deactivate)(struct file *file); int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); + + /* Callback for dealing with IO errors during readahead or writeback */ + void (*ioerror)(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error); }; extern const struct address_space_operations empty_aops; diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 4f13b01e42eb5e..9e70006bf99a63 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -822,6 +822,8 @@ cache in your filesystem. The following members are defined: int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); + void (*ioerror)(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error); }; ``read_folio`` @@ -1032,6 +1034,11 @@ cache in your filesystem. The following members are defined: ``swap_rw`` Called to read or write swap pages when SWP_FS_OPS is set. +``ioerror`` + Called to deal with IO errors during readahead or writeback. + This may be called from interrupt context, and without any + locks necessarily being held. + The File Object =============== diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 8b847a1e27f13e..8dd5421cb910b5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -288,6 +288,14 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter, pos >= i_size_read(iter->inode); } +inline void iomap_mapping_ioerror(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error) +{ + if (mapping && mapping->a_ops->ioerror) + mapping->a_ops->ioerror(mapping, direction, pos, len, + error); +} + /** * iomap_read_inline_data - copy inline data into the page cache * @iter: iteration structure @@ -310,8 +318,11 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, if (folio_test_uptodate(folio)) return 0; - if (WARN_ON_ONCE(size > iomap->length)) + if (WARN_ON_ONCE(size > iomap->length)) { + iomap_mapping_ioerror(folio->mapping, READ, iomap->offset, + size, -EIO); return -EIO; + } if (offset > 0) ifs_alloc(iter->inode, folio, iter->flags); @@ -339,6 +350,10 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off, spin_unlock_irqrestore(&ifs->state_lock, flags); } + if (error) + iomap_mapping_ioerror(folio->mapping, READ, + folio_pos(folio) + off, len, error); + if (finished) folio_end_read(folio, uptodate); } @@ -558,11 +573,15 @@ static int iomap_read_folio_range(const struct iomap_iter *iter, const struct iomap *srcmap = iomap_iter_srcmap(iter); struct bio_vec bvec; struct bio bio; + int ret; bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ); bio.bi_iter.bi_sector = iomap_sector(srcmap, pos); bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos)); - return submit_bio_wait(&bio); + ret = submit_bio_wait(&bio); + if (ret) + iomap_mapping_ioerror(folio->mapping, READ, pos, len, ret); + return ret; } #else static int iomap_read_folio_range(const struct iomap_iter *iter, @@ -1674,6 +1693,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio) u64 pos = folio_pos(folio); u64 end_pos = pos + folio_size(folio); u64 end_aligned = 0; + loff_t orig_pos = pos; bool wb_pending = false; int error = 0; u32 rlen; @@ -1724,6 +1744,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio) if (wb_pending) wpc->nr_folios++; + if (error && pos > orig_pos) + iomap_mapping_ioerror(inode->i_mapping, WRITE, orig_pos, 0, + error); /* * We can have dirty bits set past end of file in page_mkwrite path diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c index b49fa75eab260a..56e654f2d36fe9 100644 --- a/fs/iomap/ioend.c +++ b/fs/iomap/ioend.c @@ -55,6 +55,10 @@ static u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend) /* walk all folios in bio, ending page IO on them */ bio_for_each_folio_all(fi, bio) { + if (ioend->io_error) + iomap_mapping_ioerror(inode->i_mapping, WRITE, + folio_pos(fi.folio) + fi.offset, + fi.length, ioend->io_error); iomap_finish_folio_write(inode, fi.folio, fi.length); folio_count++; } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 08/19] iomap: report directio read and write errors to callers 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (6 preceding siblings ...) 2025-10-23 0:02 ` [PATCH 07/19] iomap: report buffered read and write io errors to the filesystem Darrick J. Wong @ 2025-10-23 0:02 ` Darrick J. Wong 2025-10-23 0:02 ` [PATCH 09/19] xfs: create file io error hooks Darrick J. Wong ` (10 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:02 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add more hooks to report directio IO errors to the filesystem. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- include/linux/iomap.h | 2 ++ fs/iomap/direct-io.c | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 73dceabc21c8c7..ca1590e5002342 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -486,6 +486,8 @@ struct iomap_dio_ops { unsigned flags); void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, loff_t file_offset); + void (*ioerror)(struct inode *inode, int direction, loff_t pos, + u64 len, int error); /* * Filesystems wishing to attach private information to a direct io bio diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 5d5d63efbd5767..1512d8dbb0d2e7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -95,6 +95,10 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) if (dops && dops->end_io) ret = dops->end_io(iocb, dio->size, ret, dio->flags); + if (dio->error && dops && dops->ioerror) + dops->ioerror(file_inode(iocb->ki_filp), + (dio->flags & IOMAP_DIO_WRITE) ? WRITE : READ, + offset, dio->size, dio->error); if (likely(!ret)) { ret = dio->size; ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 09/19] xfs: create file io error hooks 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (7 preceding siblings ...) 2025-10-23 0:02 ` [PATCH 08/19] iomap: report directio read and write errors to callers Darrick J. Wong @ 2025-10-23 0:02 ` Darrick J. Wong 2025-10-23 0:03 ` [PATCH 10/19] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong ` (9 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:02 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create hooks within XFS to deliver IO errors to callers. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_file.h | 36 +++++++++++ fs/xfs/xfs_mount.h | 3 + fs/xfs/xfs_aops.c | 2 + fs/xfs/xfs_file.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_super.c | 1 5 files changed, 208 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h index 2ad91f755caf35..2b4e02efefb7b1 100644 --- a/fs/xfs/xfs_file.h +++ b/fs/xfs/xfs_file.h @@ -12,4 +12,40 @@ extern const struct file_operations xfs_dir_file_operations; bool xfs_is_falloc_aligned(struct xfs_inode *ip, loff_t pos, long long int len); +enum xfs_file_ioerror_type { + XFS_FILE_IOERROR_BUFFERED_READ, + XFS_FILE_IOERROR_BUFFERED_WRITE, + XFS_FILE_IOERROR_DIRECT_READ, + XFS_FILE_IOERROR_DIRECT_WRITE, +}; + +struct xfs_file_ioerror_params { + xfs_ino_t ino; + loff_t pos; + u64 len; + u32 gen; + int error; +}; + +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_file_ioerror_hook { + struct xfs_hook ioerror_hook; +}; + +void xfs_file_ioerror_hook_disable(void); +void xfs_file_ioerror_hook_enable(void); + +int xfs_file_ioerror_hook_add(struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook); +void xfs_file_ioerror_hook_del(struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook); +void xfs_file_ioerror_hook_setup(struct xfs_file_ioerror_hook *hook, + notifier_fn_t mod_fn); + +void xfs_vm_ioerror(struct address_space *mapping, int direction, loff_t pos, + u64 len, int error); +#else +# define xfs_vm_ioerror NULL +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_FILE_H__ */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 0907714c9d6f21..9b17899a012fe6 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -353,6 +353,9 @@ typedef struct xfs_mount { /* Hook to feed media error events to a daemon. */ struct xfs_hooks m_media_error_hooks; + + /* Hook to feed file io error events to a daemon. */ + struct xfs_hooks m_file_ioerror_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index a26f798155331f..f3f28b9ae0f70e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -22,6 +22,7 @@ #include "xfs_icache.h" #include "xfs_zone_alloc.h" #include "xfs_rtgroup.h" +#include "xfs_file.h" struct xfs_writepage_ctx { struct iomap_writepage_ctx ctx; @@ -810,6 +811,7 @@ const struct address_space_operations xfs_address_space_operations = { .is_partially_uptodate = iomap_is_partially_uptodate, .error_remove_folio = generic_error_remove_folio, .swap_activate = xfs_vm_swap_activate, + .ioerror = xfs_vm_ioerror, }; const struct address_space_operations xfs_dax_aops = { diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 2702fef2c90cd2..1c9b21ad97d46c 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -222,6 +222,169 @@ xfs_ilock_iocb_for_write( return 0; } +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_file_ioerror_hooks_switch); + +void +xfs_file_ioerror_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_file_ioerror_hooks_switch); +} + +void +xfs_file_ioerror_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_file_ioerror_hooks_switch); +} + +struct xfs_file_ioerror { + struct work_struct work; + struct xfs_mount *mp; + xfs_ino_t ino; + loff_t pos; + u64 len; + u32 gen; + int error; + enum xfs_file_ioerror_type type; +}; + +/* Call downstream hooks for a file io error update. */ +STATIC void +xfs_file_report_ioerror( + struct work_struct *work) +{ + struct xfs_file_ioerror *ioerr; + + ioerr = container_of(work, struct xfs_file_ioerror, work); + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + struct xfs_file_ioerror_params p = { + .ino = ioerr->ino, + .gen = ioerr->gen, + .pos = ioerr->pos, + .len = ioerr->len, + }; + struct xfs_mount *mp = ioerr->mp; + + xfs_hooks_call(&mp->m_file_ioerror_hooks, ioerr->type, &p); + } + + kfree(ioerr); +} + +/* Queue a directio io error notification. */ +STATIC void +xfs_dio_ioerror( + struct inode *inode, + int direction, + loff_t pos, + u64 len, + int error) +{ + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct xfs_file_ioerror *ioerr; + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + ioerr = kzalloc(sizeof(*ioerr), GFP_ATOMIC); + if (!ioerr) { + xfs_err(mp, + "lost ioerror report for ino 0x%llx %s pos 0x%llx len 0x%llx error %d", + ip->i_ino, + direction == WRITE ? "WRITE" : "READ", + pos, len, error); + return; + } + + INIT_WORK(&ioerr->work, xfs_file_report_ioerror); + ioerr->mp = mp; + ioerr->ino = ip->i_ino; + ioerr->gen = VFS_I(ip)->i_generation; + ioerr->pos = pos; + ioerr->len = len; + if (direction == WRITE) + ioerr->type = XFS_FILE_IOERROR_DIRECT_WRITE; + else + ioerr->type = XFS_FILE_IOERROR_DIRECT_READ; + ioerr->error = error; + queue_work(mp->m_unwritten_workqueue, &ioerr->work); + } +} + +/* Queue a buffered io error notification. */ +void +xfs_vm_ioerror( + struct address_space *mapping, + int direction, + loff_t pos, + u64 len, + int error) +{ + struct inode *inode = mapping->host; + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct xfs_file_ioerror *ioerr; + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + ioerr = kzalloc(sizeof(*ioerr), GFP_ATOMIC); + if (!ioerr) { + xfs_err(mp, + "lost ioerror report for ino 0x%llx %s pos 0x%llx len 0x%llx error %d", + ip->i_ino, + direction == WRITE ? "WRITE" : "READ", + pos, len, error); + return; + } + + INIT_WORK(&ioerr->work, xfs_file_report_ioerror); + ioerr->mp = mp; + ioerr->ino = ip->i_ino; + ioerr->gen = VFS_I(ip)->i_generation; + ioerr->pos = pos; + ioerr->len = len; + if (direction == WRITE) + ioerr->type = XFS_FILE_IOERROR_BUFFERED_WRITE; + else + ioerr->type = XFS_FILE_IOERROR_BUFFERED_READ; + ioerr->error = error; + queue_work(mp->m_unwritten_workqueue, &ioerr->work); + } +} + +/* Call the specified function after a file io error. */ +int +xfs_file_ioerror_hook_add( + struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook) +{ + return xfs_hooks_add(&mp->m_file_ioerror_hooks, &hook->ioerror_hook); +} + +/* Stop calling the specified function after a file io error. */ +void +xfs_file_ioerror_hook_del( + struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook) +{ + xfs_hooks_del(&mp->m_file_ioerror_hooks, &hook->ioerror_hook); +} + +/* Configure file io error update hook functions. */ +void +xfs_file_ioerror_hook_setup( + struct xfs_file_ioerror_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->ioerror_hook, mod_fn); +} +#else +# define xfs_dio_ioerror NULL +#endif /* CONFIG_XFS_LIVE_HOOKS */ + +static const struct iomap_dio_ops xfs_dio_read_ops = { + .ioerror = xfs_dio_ioerror, +}; + STATIC ssize_t xfs_file_dio_read( struct kiocb *iocb, @@ -240,7 +403,8 @@ xfs_file_dio_read( ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); if (ret) return ret; - ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0, NULL, 0); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, &xfs_dio_read_ops, + 0, NULL, 0); xfs_iunlock(ip, XFS_IOLOCK_SHARED); return ret; @@ -625,6 +789,7 @@ xfs_dio_write_end_io( static const struct iomap_dio_ops xfs_dio_write_ops = { .end_io = xfs_dio_write_end_io, + .ioerror = xfs_dio_ioerror, }; static void diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 51f8db95e717a8..b6a6027b4df8d8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2288,6 +2288,7 @@ xfs_init_fs_context( xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); xfs_hooks_init(&mp->m_media_error_hooks); + xfs_hooks_init(&mp->m_file_ioerror_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 10/19] xfs: create a special file to pass filesystem health to userspace 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (8 preceding siblings ...) 2025-10-23 0:02 ` [PATCH 09/19] xfs: create file io error hooks Darrick J. Wong @ 2025-10-23 0:03 ` Darrick J. Wong 2025-10-23 0:03 ` [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong ` (8 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:03 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an ioctl that installs a file descriptor backed by an anon_inode file that will convey filesystem health events to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 8 ++ fs/xfs/xfs_healthmon.h | 16 +++++ fs/xfs/Kconfig | 8 ++ fs/xfs/Makefile | 1 fs/xfs/xfs_healthmon.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_ioctl.c | 4 + 6 files changed, 194 insertions(+) create mode 100644 fs/xfs/xfs_healthmon.h create mode 100644 fs/xfs/xfs_healthmon.c diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 12463ba766da05..dba7896f716092 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1003,6 +1003,13 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ #define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1U << 4) /* reference counts */ +struct xfs_health_monitor { + __u64 flags; /* flags */ + __u8 format; /* output format */ + __u8 pad1[7]; /* zeroes */ + __u64 pad2[2]; /* zeroes */ +}; + /* * ioctl commands that are used by Linux filesystems */ @@ -1042,6 +1049,7 @@ struct xfs_rtgroup_geometry { #define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle) #define XFS_IOC_SCRUBV_METADATA _IOWR('X', 64, struct xfs_scrub_vec_head) #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) +#define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h new file mode 100644 index 00000000000000..07126e39281a0c --- /dev/null +++ b/fs/xfs/xfs_healthmon.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2024-2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_HEALTHMON_H__ +#define __XFS_HEALTHMON_H__ + +#ifdef CONFIG_XFS_HEALTH_MONITOR +long xfs_ioc_health_monitor(struct xfs_mount *mp, + struct xfs_health_monitor __user *arg); +#else +# define xfs_ioc_health_monitor(mp, hmo) (-ENOTTY) +#endif /* CONFIG_XFS_HEALTH_MONITOR */ + +#endif /* __XFS_HEALTHMON_H__ */ diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index 8930d5254e1da6..b5d48515236302 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -121,6 +121,14 @@ config XFS_RT If unsure, say N. +config XFS_HEALTH_MONITOR + bool "Report filesystem health events to userspace" + depends on XFS_FS + select XFS_LIVE_HOOKS + default y + help + Report health events to userspace programs. + config XFS_DRAIN_INTENTS bool select JUMP_LABEL if HAVE_ARCH_JUMP_LABEL diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 5bf501cf827172..d4e9070a9326ba 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -157,6 +157,7 @@ xfs-$(CONFIG_XFS_DRAIN_INTENTS) += xfs_drain.o xfs-$(CONFIG_XFS_LIVE_HOOKS) += xfs_hooks.o xfs-$(CONFIG_XFS_MEMORY_BUFS) += xfs_buf_mem.o xfs-$(CONFIG_XFS_BTREE_IN_MEM) += libxfs/xfs_btree_mem.o +xfs-$(CONFIG_XFS_HEALTH_MONITOR) += xfs_healthmon.o # online scrub/repair ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y) diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c new file mode 100644 index 00000000000000..7b0d9f78b0a402 --- /dev/null +++ b/fs/xfs/xfs_healthmon.c @@ -0,0 +1,157 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2024-2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_inode.h" +#include "xfs_trace.h" +#include "xfs_ag.h" +#include "xfs_btree.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_quota_defs.h" +#include "xfs_rtgroup.h" +#include "xfs_healthmon.h" + +#include <linux/anon_inodes.h> +#include <linux/eventpoll.h> +#include <linux/poll.h> + +/* + * Live Health Monitoring + * ====================== + * + * Autonomous self-healing of XFS filesystems requires a means for the kernel + * to send filesystem health events to a monitoring daemon in userspace. To + * accomplish this, we establish a thread_with_file kthread object to handle + * translating internal events about filesystem health into a format that can + * be parsed easily by userspace. Then we hook various parts of the filesystem + * to supply those internal events to the kthread. Userspace reads events + * from the file descriptor returned by the ioctl. + * + * The healthmon abstraction has a weak reference to the host filesystem mount + * so that the queueing and processing of the events do not pin the mount and + * cannot slow down the main filesystem. The healthmon object can exist past + * the end of the filesystem mount. + */ + +struct xfs_healthmon { + struct xfs_mount *mp; +}; + +/* + * Convey queued event data to userspace. First copy any remaining bytes in + * the outbuf, then format the oldest event into the outbuf and copy that too. + */ +STATIC ssize_t +xfs_healthmon_read_iter( + struct kiocb *iocb, + struct iov_iter *to) +{ + return -EIO; +} + +/* Free the health monitoring information. */ +STATIC int +xfs_healthmon_release( + struct inode *inode, + struct file *file) +{ + struct xfs_healthmon *hm = file->private_data; + + kfree(hm); + + return 0; +} + +/* Validate ioctl parameters. */ +static inline bool +xfs_healthmon_validate( + const struct xfs_health_monitor *hmo) +{ + if (hmo->flags) + return false; + if (hmo->format) + return false; + if (memchr_inv(&hmo->pad1, 0, sizeof(hmo->pad1))) + return false; + if (memchr_inv(&hmo->pad2, 0, sizeof(hmo->pad2))) + return false; + return true; +} + +/* Emit some data about the health monitoring fd. */ +#ifdef CONFIG_PROC_FS +static void +xfs_healthmon_show_fdinfo( + struct seq_file *m, + struct file *file) +{ + struct xfs_healthmon *hm = file->private_data; + + seq_printf(m, "state:\talive\ndev:\t%s\n", + hm->mp->m_super->s_id); +} +#endif + +static const struct file_operations xfs_healthmon_fops = { + .owner = THIS_MODULE, +#ifdef CONFIG_PROC_FS + .show_fdinfo = xfs_healthmon_show_fdinfo, +#endif + .read_iter = xfs_healthmon_read_iter, + .release = xfs_healthmon_release, +}; + +/* + * Create a health monitoring file. Returns an index to the fd table or a + * negative errno. + */ +long +xfs_ioc_health_monitor( + struct xfs_mount *mp, + struct xfs_health_monitor __user *arg) +{ + struct xfs_health_monitor hmo; + struct xfs_healthmon *hm; + int fd; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&hmo, arg, sizeof(hmo))) + return -EFAULT; + + if (!xfs_healthmon_validate(&hmo)) + return -EINVAL; + + hm = kzalloc(sizeof(*hm), GFP_KERNEL); + if (!hm) + return -ENOMEM; + hm->mp = mp; + + /* + * Create the anonymous file. If it succeeds, the file owns hm and + * can go away at any time, so we must not access it again. + */ + fd = anon_inode_getfd("xfs_healthmon", &xfs_healthmon_fops, hm, + O_CLOEXEC | O_RDONLY); + if (fd < 0) { + ret = fd; + goto out_hm; + } + + return fd; + +out_hm: + kfree(hm); + return ret; +} diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index a6bb7ee7a27ad5..08998d84554f09 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -41,6 +41,7 @@ #include "xfs_exchrange.h" #include "xfs_handle.h" #include "xfs_rtgroup.h" +#include "xfs_healthmon.h" #include <linux/mount.h> #include <linux/fileattr.h> @@ -1421,6 +1422,9 @@ xfs_file_ioctl( case XFS_IOC_COMMIT_RANGE: return xfs_ioc_commit_range(filp, arg); + case XFS_IOC_HEALTH_MONITOR: + return xfs_ioc_health_monitor(mp, arg); + default: return -ENOTTY; } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (9 preceding siblings ...) 2025-10-23 0:03 ` [PATCH 10/19] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong @ 2025-10-23 0:03 ` Darrick J. Wong 2025-10-30 16:54 ` Darrick J. Wong 2025-10-23 0:03 ` [PATCH 12/19] xfs: report metadata health events through healthmon Darrick J. Wong ` (7 subsequent siblings) 18 siblings, 1 reply; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:03 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create the basic infrastructure that we need to report health events to userspace. We need a compact form for recording critical information about an event and queueing them; a means to notice that we've lost some events; and a means to format the events into something that userspace can handle. Here, we've chosen json to export information to userspace. The structured key-value nature of json gives us enormous flexibility to modify the schema of what we'll send to userspace because we can add new keys at any time. Userspace can use whatever json parsers are available to consume the events and will not be confused by keys they don't recognize. Note that we do NOT allow sending json back to the kernel, nor is there any intent to do that. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 50 ++ fs/xfs/xfs_healthmon.h | 29 + fs/xfs/xfs_linux.h | 3 fs/xfs/xfs_trace.h | 171 +++++++ fs/xfs/libxfs/xfs_healthmon.schema.json | 129 +++++ fs/xfs/xfs_healthmon.c | 728 +++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.c | 2 lib/seq_buf.c | 1 8 files changed, 1106 insertions(+), 7 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_healthmon.schema.json diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index dba7896f716092..4b642eea18b5ca 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1003,6 +1003,45 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ #define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1U << 4) /* reference counts */ +/* Health monitor event domains */ + +/* affects the whole fs */ +#define XFS_HEALTH_MONITOR_DOMAIN_MOUNT (0) + +/* Health monitor event types */ + +/* status of the monitor itself */ +#define XFS_HEALTH_MONITOR_TYPE_RUNNING (0) +#define XFS_HEALTH_MONITOR_TYPE_LOST (1) + +/* lost events */ +struct xfs_health_monitor_lost { + __u64 count; +}; + +struct xfs_health_monitor_event { + /* XFS_HEALTH_MONITOR_DOMAIN_* */ + __u32 domain; + + /* XFS_HEALTH_MONITOR_TYPE_* */ + __u32 type; + + /* Timestamp of the event, in nanoseconds since the Unix epoch */ + __u64 time_ns; + + /* + * Details of the event. The primary clients are written in python + * and rust, so break this up because bindgen hates anonymous structs + * and unions. + */ + union { + struct xfs_health_monitor_lost lost; + } e; + + /* zeroes */ + __u64 pad[2]; +}; + struct xfs_health_monitor { __u64 flags; /* flags */ __u8 format; /* output format */ @@ -1010,6 +1049,17 @@ struct xfs_health_monitor { __u64 pad2[2]; /* zeroes */ }; +/* Return all health status events, not just deltas */ +#define XFS_HEALTH_MONITOR_VERBOSE (1ULL << 0) + +#define XFS_HEALTH_MONITOR_ALL (XFS_HEALTH_MONITOR_VERBOSE) + +/* Return events in a C structure */ +#define XFS_HEALTH_MONITOR_FMT_CSTRUCT (0) + +/* Return events in JSON format */ +#define XFS_HEALTH_MONITOR_FMT_JSON (1) + /* * ioctl commands that are used by Linux filesystems */ diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 07126e39281a0c..ea2d6a327dfb16 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -6,6 +6,35 @@ #ifndef __XFS_HEALTHMON_H__ #define __XFS_HEALTHMON_H__ +enum xfs_healthmon_type { + XFS_HEALTHMON_RUNNING, /* monitor running */ + XFS_HEALTHMON_LOST, /* message lost */ +}; + +enum xfs_healthmon_domain { + XFS_HEALTHMON_MOUNT, /* affects the whole fs */ +}; + +struct xfs_healthmon_event { + struct xfs_healthmon_event *next; + + enum xfs_healthmon_type type; + enum xfs_healthmon_domain domain; + + uint64_t time_ns; + + union { + /* lost events */ + struct { + uint64_t lostcount; + }; + /* mount */ + struct { + unsigned int flags; + }; + }; +}; + #ifdef CONFIG_XFS_HEALTH_MONITOR long xfs_ioc_health_monitor(struct xfs_mount *mp, struct xfs_health_monitor __user *arg); diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 4dd747bdbccab2..e122db938cc06b 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -63,6 +63,9 @@ typedef __u32 xfs_nlink_t; #include <linux/xattr.h> #include <linux/mnt_idmapping.h> #include <linux/debugfs.h> +#ifdef CONFIG_XFS_HEALTH_MONITOR +# include <linux/seq_buf.h> +#endif #include <asm/page.h> #include <asm/div64.h> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 79b8641880ab9d..17af5efee026c9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -103,6 +103,8 @@ struct xfs_refcount_intent; struct xfs_metadir_update; struct xfs_rtgroup; struct xfs_open_zone; +struct xfs_healthmon_event; +struct xfs_health_update_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -5908,6 +5910,175 @@ DEFINE_EVENT(xfs_freeblocks_resv_class, name, \ DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved); DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc); +#ifdef CONFIG_XFS_HEALTH_MONITOR +TRACE_EVENT(xfs_healthmon_lost_event, + TP_PROTO(const struct xfs_mount *mp, unsigned long long lost_prev), + TP_ARGS(mp, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d lost_prev %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->lost_prev) +); + +#define XFS_HEALTHMON_FLAGS_STRINGS \ + { XFS_HEALTH_MONITOR_VERBOSE, "verbose" } +#define XFS_HEALTHMON_FMT_STRINGS \ + { XFS_HEALTH_MONITOR_FMT_JSON, "json" }, \ + { XFS_HEALTH_MONITOR_FMT_CSTRUCT, "cstruct" } + +TRACE_EVENT(xfs_healthmon_create, + TP_PROTO(const struct xfs_mount *mp, u64 flags, u8 format), + TP_ARGS(mp, flags, format), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, flags) + __field(u8, format) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->flags = flags; + __entry->format = format; + ), + TP_printk("dev %d:%d flags %s format %s", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_flags(__entry->flags, "|", XFS_HEALTHMON_FLAGS_STRINGS), + __print_symbolic(__entry->format, XFS_HEALTHMON_FMT_STRINGS)) +); + +TRACE_EVENT(xfs_healthmon_copybuf, + TP_PROTO(const struct xfs_mount *mp, const struct iov_iter *iov, + const struct seq_buf *seqbuf, size_t outpos), + TP_ARGS(mp, iov, seqbuf, outpos), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(size_t, seqbuf_size) + __field(size_t, seqbuf_len) + __field(size_t, outpos) + __field(size_t, to_copy) + __field(size_t, iter_count) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->seqbuf_size = seqbuf->size; + __entry->seqbuf_len = seqbuf->len; + __entry->outpos = outpos; + __entry->to_copy = seqbuf->len - outpos; + __entry->iter_count = iov_iter_count(iov); + ), + TP_printk("dev %d:%d seqsize %zu seqlen %zu out_pos %zu to_copy %zu iter_count %zu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->seqbuf_size, + __entry->seqbuf_len, + __entry->outpos, + __entry->to_copy, + __entry->iter_count) +); + +DECLARE_EVENT_CLASS(xfs_healthmon_class, + TP_PROTO(const struct xfs_mount *mp, unsigned int events, + unsigned long long lost_prev), + TP_ARGS(mp, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, events) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d events %u lost_prev? %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->events, + __entry->lost_prev) +); +#define DEFINE_HEALTHMON_EVENT(name) \ +DEFINE_EVENT(xfs_healthmon_class, name, \ + TP_PROTO(const struct xfs_mount *mp, unsigned int events, \ + unsigned long long lost_prev), \ + TP_ARGS(mp, events, lost_prev)) +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_start); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_finish); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); + +#define XFS_HEALTHMON_TYPE_STRINGS \ + { XFS_HEALTHMON_LOST, "lost" } + +#define XFS_HEALTHMON_DOMAIN_STRINGS \ + { XFS_HEALTHMON_MOUNT, "mount" } + +TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); + +TRACE_DEFINE_ENUM(XFS_HEALTHMON_MOUNT); + +DECLARE_EVENT_CLASS(xfs_healthmon_event_class, + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), + TP_ARGS(mp, event), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, type) + __field(unsigned int, domain) + __field(unsigned int, mask) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(unsigned int, group) + __field(unsigned long long, offset) + __field(unsigned long long, length) + __field(unsigned long long, lostcount) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->type = event->type; + __entry->domain = event->domain; + __entry->mask = 0; + __entry->group = 0; + __entry->ino = 0; + __entry->gen = 0; + __entry->offset = 0; + __entry->length = 0; + __entry->lostcount = 0; + switch (__entry->domain) { + case XFS_HEALTHMON_MOUNT: + switch (__entry->type) { + case XFS_HEALTHMON_LOST: + __entry->lostcount = event->lostcount; + break; + } + break; + } + ), + TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x offset 0x%llx len 0x%llx group 0x%x lost %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XFS_HEALTHMON_TYPE_STRINGS), + __print_symbolic(__entry->domain, XFS_HEALTHMON_DOMAIN_STRINGS), + __entry->mask, + __entry->ino, + __entry->gen, + __entry->offset, + __entry->length, + __entry->group, + __entry->lostcount) +); +#define DEFINE_HEALTHMONEVENT_EVENT(name) \ +DEFINE_EVENT(xfs_healthmon_event_class, name, \ + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), \ + TP_ARGS(mp, event)) +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_push); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_pop); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format_overflow); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_drop); +#endif /* CONFIG_XFS_HEALTH_MONITOR */ + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json new file mode 100644 index 00000000000000..68762738b04191 --- /dev/null +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -0,0 +1,129 @@ +{ + "$comment": [ + "SPDX-License-Identifier: GPL-2.0-or-later", + "Copyright (c) 2024-2025 Oracle. All Rights Reserved.", + "Author: Darrick J. Wong <djwong@kernel.org>", + "", + "This schema file describes the format of the json objects", + "readable from the fd returned by the XFS_IOC_HEALTHMON", + "ioctl." + ], + + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/fs/xfs/libxfs/xfs_healthmon.schema.json", + + "title": "XFS Health Monitoring Events", + + "$comment": "Events must be one of the following types:", + "oneOf": [ + { + "$ref": "#/$events/running" + }, + { + "$ref": "#/$events/unmount" + }, + { + "$ref": "#/$events/lost" + } + ], + + "$comment": "Simple data types are defined here.", + "$defs": { + "time_ns": { + "title": "Time of Event", + "description": "Timestamp of the event, in nanoseconds since the Unix epoch.", + "type": "integer" + }, + "count": { + "title": "Count of events", + "description": "Number of events.", + "type": "integer", + "minimum": 1 + } + }, + + "$comment": "Event types are defined here.", + "$events": { + "running": { + "title": "Health Monitoring Running", + "$comment": [ + "The health monitor is actually running." + ], + "type": "object", + + "properties": { + "type": { + "const": "running" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "time_ns", + "domain" + ] + }, + "unmount": { + "title": "Filesystem Unmounted", + "$comment": [ + "The filesystem was unmounted." + ], + "type": "object", + + "properties": { + "type": { + "const": "unmount" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "time_ns", + "domain" + ] + }, + "lost": { + "title": "Health Monitoring Events Lost", + "$comment": [ + "Previous health monitoring events were", + "dropped due to memory allocation failures", + "or queue limits." + ], + "type": "object", + + "properties": { + "type": { + "const": "lost" + }, + "count": { + "$ref": "#/$defs/count" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "count", + "time_ns", + "domain" + ] + } + } +} diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 7b0d9f78b0a402..d5ca6ef8015c0e 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -40,12 +40,558 @@ * so that the queueing and processing of the events do not pin the mount and * cannot slow down the main filesystem. The healthmon object can exist past * the end of the filesystem mount. + * + * Please see the xfs_healthmon.schema.json file for a description of the + * format of the json events that are conveyed to userspace. */ +/* Allow this many events to build up in memory per healthmon fd. */ +#define XFS_HEALTHMON_MAX_EVENTS \ + (32768 / sizeof(struct xfs_healthmon_event)) + +struct flag_string { + unsigned int mask; + const char *str; +}; + struct xfs_healthmon { + /* lock for mp and eventlist */ + struct mutex lock; + + /* waiter for signalling the arrival of events */ + struct wait_queue_head wait; + + /* list of event objects */ + struct xfs_healthmon_event *first_event; + struct xfs_healthmon_event *last_event; + struct xfs_mount *mp; + + /* number of events */ + unsigned int events; + + /* + * Buffer for formatting events. New buffer data are appended to the + * end of the seqbuf, and outpos is used to determine where to start + * a copy_iter. Both are protected by inode_lock. + */ + struct seq_buf outbuf; + size_t outpos; + + /* XFS_HEALTH_MONITOR_FMT_* */ + uint8_t format; + + /* do we want all events? */ + bool verbose; + + /* did we lose previous events? */ + unsigned long long lost_prev_event; + + /* total counts of events observed and lost events */ + unsigned long long total_events; + unsigned long long total_lost; }; +static inline void xfs_healthmon_bump_events(struct xfs_healthmon *hm) +{ + hm->events++; + hm->total_events++; +} + +static inline void xfs_healthmon_bump_lost(struct xfs_healthmon *hm) +{ + hm->lost_prev_event++; + hm->total_lost++; +} + +/* Remove an event from the head of the list. */ +static inline int +xfs_healthmon_free_head( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + struct xfs_healthmon_event *head; + + mutex_lock(&hm->lock); + head = hm->first_event; + if (head != event) { + ASSERT(hm->first_event == event); + mutex_unlock(&hm->lock); + return -EFSCORRUPTED; + } + + if (hm->last_event == head) + hm->last_event = NULL; + hm->first_event = head->next; + hm->events--; + mutex_unlock(&hm->lock); + + trace_xfs_healthmon_pop(hm->mp, head); + kfree(event); + return 0; +} + +/* Push an event onto the end of the list. */ +static inline void +__xfs_healthmon_push( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + if (!hm->first_event) + hm->first_event = event; + if (hm->last_event) + hm->last_event->next = event; + hm->last_event = event; + event->next = NULL; + xfs_healthmon_bump_events(hm); + wake_up(&hm->wait); + + trace_xfs_healthmon_push(hm->mp, event); +} + +/* Push an event onto the end of the list if we're not full. */ +static inline int +xfs_healthmon_push( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); + + xfs_healthmon_bump_lost(hm); + return -ENOMEM; + } + + __xfs_healthmon_push(hm, event); + return 0; +} + +/* Create a new event or record that we failed. */ +static struct xfs_healthmon_event * +xfs_healthmon_alloc( + struct xfs_healthmon *hm, + enum xfs_healthmon_type type, + enum xfs_healthmon_domain domain) +{ + struct timespec64 now; + struct xfs_healthmon_event *event; + + event = kzalloc(sizeof(*event), GFP_NOFS); + if (!event) { + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); + + xfs_healthmon_bump_lost(hm); + return NULL; + } + + event->type = type; + event->domain = domain; + ktime_get_coarse_real_ts64(&now); + event->time_ns = (now.tv_sec * NSEC_PER_SEC) + now.tv_nsec; + + return event; +} + +/* + * Before we accept an event notification from a live update hook, we need to + * clear out any previously lost events. + */ +static inline int +xfs_healthmon_start_live_update( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event; + + /* If the queue is already full.... */ + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); + + if (hm->last_event && + hm->last_event->type == XFS_HEALTHMON_LOST) { + /* + * ...and the last event notes lost events, then add + * the number of events we already lost, plus one for + * this event that we're about to lose. + */ + hm->last_event->lostcount += hm->lost_prev_event + 1; + hm->lost_prev_event = 0; + } else { + /* + * ...try to create a new lost event. Add the number + * of events we previously lost, plus one for this + * event. + */ + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_LOST, + XFS_HEALTHMON_MOUNT); + if (!event) { + xfs_healthmon_bump_lost(hm); + return -ENOMEM; + } + event->lostcount = hm->lost_prev_event + 1; + hm->lost_prev_event = 0; + + __xfs_healthmon_push(hm, event); + } + + return -ENOSPC; + } + + /* If we lost an event in the past, but the queue isn't yet full... */ + if (hm->lost_prev_event) { + /* + * ...try to create a new lost event. Add the number of events + * we previously lost, plus one for this event. + */ + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_LOST, + XFS_HEALTHMON_MOUNT); + if (!event) { + xfs_healthmon_bump_lost(hm); + return -ENOMEM; + } + event->lostcount = hm->lost_prev_event; + hm->lost_prev_event = 0; + + /* + * If adding this lost event pushes us over the limit, we're + * going to lose the current event. Note that in the lost + * event count too. + */ + if (hm->events == XFS_HEALTHMON_MAX_EVENTS - 1) + event->lostcount++; + + __xfs_healthmon_push(hm, event); + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { + trace_xfs_healthmon_lost_event(hm->mp, + hm->lost_prev_event); + return -ENOSPC; + } + } + + /* + * The queue is not full and it is not currently the case that events + * were lost. + */ + return 0; +} + +/* Render the health update type as a string. */ +STATIC const char * +xfs_healthmon_typestring( + const struct xfs_healthmon_event *event) +{ + static const char *type_strings[] = { + [XFS_HEALTHMON_RUNNING] = "running", + [XFS_HEALTHMON_LOST] = "lost", + }; + + if (event->type >= ARRAY_SIZE(type_strings)) + return "?"; + + return type_strings[event->type]; +} + +/* Render the health domain as a string. */ +STATIC const char * +xfs_healthmon_domstring( + const struct xfs_healthmon_event *event) +{ + static const char *dom_strings[] = { + [XFS_HEALTHMON_MOUNT] = "mount", + }; + + if (event->domain >= ARRAY_SIZE(dom_strings)) + return "?"; + + return dom_strings[event->domain]; +} + +/* Convert a flags bitmap into a jsonable string. */ +static inline int +xfs_healthmon_format_flags( + struct seq_buf *outbuf, + const struct flag_string *strings, + size_t nr_strings, + unsigned int flags) +{ + const struct flag_string *p; + ssize_t ret; + unsigned int i; + bool first = true; + + for (i = 0, p = strings; i < nr_strings; i++, p++) { + if (!(p->mask & flags)) + continue; + + ret = seq_buf_printf(outbuf, "%s\"%s\"", + first ? "" : ", ", p->str); + if (ret < 0) + return ret; + + first = false; + flags &= ~p->mask; + } + + for (i = 0; flags != 0 && i < sizeof(flags) * NBBY; i++) { + if (!(flags & (1U << i))) + continue; + + /* json doesn't support hexadecimal notation */ + ret = seq_buf_printf(outbuf, "%s%u", + first ? "" : ", ", (1U << i)); + if (ret < 0) + return ret; + + first = false; + } + + return 0; +} + +/* Convert the event mask into a jsonable string. */ +static inline int +__xfs_healthmon_format_mask( + struct seq_buf *outbuf, + const char *descr, + const struct flag_string *strings, + size_t nr_strings, + unsigned int mask) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"%s\": [", descr); + if (ret < 0) + return ret; + + ret = xfs_healthmon_format_flags(outbuf, strings, nr_strings, mask); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, "],\n"); +} + +#define xfs_healthmon_format_mask(o, d, s, m) \ + __xfs_healthmon_format_mask((o), (d), (s), ARRAY_SIZE(s), (m)) + +static inline void +xfs_healthmon_reset_outbuf( + struct xfs_healthmon *hm) +{ + hm->outpos = 0; + seq_buf_clear(&hm->outbuf); +} + +/* Render lost event mask as a string set */ +static int +xfs_healthmon_format_lost( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + return seq_buf_printf(outbuf, " \"count\": %llu,\n", + event->lostcount); +} + +/* + * Format an event into json. Returns 0 if we formatted the event. If + * formatting the event overflows the buffer, returns -1 with the seqbuf len + * unchanged. + */ +STATIC int +xfs_healthmon_format_json( + struct xfs_healthmon *hm, + const struct xfs_healthmon_event *event) +{ + struct seq_buf *outbuf = &hm->outbuf; + size_t old_seqlen = outbuf->len; + int ret; + + trace_xfs_healthmon_format(hm->mp, event); + + ret = seq_buf_printf(outbuf, "{\n"); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, " \"domain\": \"%s\",\n", + xfs_healthmon_domstring(event)); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, " \"type\": \"%s\",\n", + xfs_healthmon_typestring(event)); + if (ret < 0) + goto overrun; + + switch (event->domain) { + case XFS_HEALTHMON_MOUNT: + switch (event->type) { + case XFS_HEALTHMON_RUNNING: + /* nothing to format */ + break; + case XFS_HEALTHMON_LOST: + ret = xfs_healthmon_format_lost(outbuf, event); + break; + default: + break; + } + break; + } + if (ret < 0) + goto overrun; + + /* The last element in the json must not have a trailing comma. */ + ret = seq_buf_printf(outbuf, " \"time_ns\": %llu\n", + event->time_ns); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, "}\n"); + if (ret < 0) + goto overrun; + + ASSERT(!seq_buf_has_overflowed(outbuf)); + return 0; +overrun: + /* + * We overflowed the buffer and could not format the event. Reset the + * seqbuf and tell the caller not to delete the event. + */ + trace_xfs_healthmon_format_overflow(hm->mp, event); + outbuf->len = old_seqlen; + return -1; +} + +static const unsigned int domain_map[] = { + [XFS_HEALTHMON_MOUNT] = XFS_HEALTH_MONITOR_DOMAIN_MOUNT, +}; + +static const unsigned int type_map[] = { + [XFS_HEALTHMON_RUNNING] = XFS_HEALTH_MONITOR_TYPE_RUNNING, + [XFS_HEALTHMON_LOST] = XFS_HEALTH_MONITOR_TYPE_LOST, +}; + +/* Render event as a C structure */ +STATIC int +xfs_healthmon_format_cstruct( + struct xfs_healthmon *hm, + const struct xfs_healthmon_event *event) +{ + struct xfs_health_monitor_event hme = { + .time_ns = event->time_ns, + }; + struct seq_buf *outbuf = &hm->outbuf; + size_t old_seqlen = outbuf->len; + int ret; + + trace_xfs_healthmon_format(hm->mp, event); + + if (event->domain < 0 || event->domain >= ARRAY_SIZE(domain_map) || + event->type < 0 || event->type >= ARRAY_SIZE(type_map)) + return -EFSCORRUPTED; + + hme.domain = domain_map[event->domain]; + hme.type = type_map[event->type]; + + /* fill in the event-specific details */ + switch (event->domain) { + case XFS_HEALTHMON_MOUNT: + switch (event->type) { + case XFS_HEALTHMON_LOST: + hme.e.lost.count = event->lostcount; + break; + default: + break; + } + break; + default: + break; + } + + ret = seq_buf_putmem(outbuf, &hme, sizeof(hme)); + if (ret < 0) { + /* + * We overflowed the buffer and could not format the event. + * Reset the seqbuf and tell the caller not to delete the + * event. + */ + trace_xfs_healthmon_format_overflow(hm->mp, event); + outbuf->len = old_seqlen; + return -1; + } + + ASSERT(!seq_buf_has_overflowed(outbuf)); + return 0; +} + +/* How many bytes are waiting in the outbuf to be copied? */ +static inline size_t +xfs_healthmon_outbuf_bytes( + struct xfs_healthmon *hm) +{ + unsigned int used = seq_buf_used(&hm->outbuf); + + if (used > hm->outpos) + return used - hm->outpos; + return 0; +} + +/* + * Do we have something for userspace to do? This can mean unmount events, + * events pending in the queue, or pending bytes in the outbuf. + */ +static inline bool +xfs_healthmon_has_eventdata( + struct xfs_healthmon *hm) +{ + return hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; +} + +/* Try to copy the rest of the outbuf to the iov iter. */ +STATIC ssize_t +xfs_healthmon_copybuf( + struct xfs_healthmon *hm, + struct iov_iter *to) +{ + size_t to_copy; + size_t w = 0; + + trace_xfs_healthmon_copybuf(hm->mp, to, &hm->outbuf, hm->outpos); + + to_copy = xfs_healthmon_outbuf_bytes(hm); + if (to_copy) { + w = copy_to_iter(hm->outbuf.buffer + hm->outpos, to_copy, to); + if (!w) + return -EFAULT; + + hm->outpos += w; + } + + /* + * Nothing left to copy? Reset the seqbuf pointers and outbuf to the + * start since there's no live data in the buffer. + */ + if (xfs_healthmon_outbuf_bytes(hm) == 0) + xfs_healthmon_reset_outbuf(hm); + return w; +} + +/* + * See if there's an event waiting for us. If the fs is no longer mounted, + * don't bother sending any more events. + */ +static inline struct xfs_healthmon_event * +xfs_healthmon_peek( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event; + + mutex_lock(&hm->lock); + if (hm->mp) + event = hm->first_event; + else + event = NULL; + mutex_unlock(&hm->lock); + return event; +} + /* * Convey queued event data to userspace. First copy any remaining bytes in * the outbuf, then format the oldest event into the outbuf and copy that too. @@ -55,7 +601,125 @@ xfs_healthmon_read_iter( struct kiocb *iocb, struct iov_iter *to) { - return -EIO; + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct xfs_healthmon *hm = file->private_data; + struct xfs_healthmon_event *event; + size_t copied = 0; + ssize_t ret = 0; + + /* Wait for data to become available */ + if (!(file->f_flags & O_NONBLOCK)) { + ret = wait_event_interruptible(hm->wait, + xfs_healthmon_has_eventdata(hm)); + if (ret) + return ret; + } else if (!xfs_healthmon_has_eventdata(hm)) { + return -EAGAIN; + } + + /* Allocate formatting buffer up to 64k if necessary */ + if (hm->outbuf.size == 0) { + void *outbuf; + size_t bufsize = min(65536, max(PAGE_SIZE, + iov_iter_count(to))); + + outbuf = kzalloc(bufsize, GFP_KERNEL); + if (!outbuf) { + bufsize = PAGE_SIZE; + outbuf = kzalloc(bufsize, GFP_KERNEL); + if (!outbuf) + return -ENOMEM; + } + + inode_lock(inode); + if (hm->outbuf.size == 0) { + seq_buf_init(&hm->outbuf, outbuf, bufsize); + hm->outpos = 0; + } else { + kfree(outbuf); + } + } else { + inode_lock(inode); + } + + trace_xfs_healthmon_read_start(hm->mp, hm->events, hm->lost_prev_event); + + /* + * If there's anything left in the seqbuf, copy that before formatting + * more events. + */ + ret = xfs_healthmon_copybuf(hm, to); + if (ret < 0) + goto out_unlock; + copied += ret; + + while (iov_iter_count(to) > 0) { + /* Format the next events into the outbuf until it's full. */ + while ((event = xfs_healthmon_peek(hm)) != NULL) { + switch (hm->format) { + case XFS_HEALTH_MONITOR_FMT_JSON: + ret = xfs_healthmon_format_json(hm, event); + break; + case XFS_HEALTH_MONITOR_FMT_CSTRUCT: + ret = xfs_healthmon_format_cstruct(hm, event); + break; + default: + ret = -EINVAL; + goto out_unlock; + } + if (ret < 0) + break; + ret = xfs_healthmon_free_head(hm, event); + if (ret) + goto out_unlock; + } + + /* Copy it to userspace */ + ret = xfs_healthmon_copybuf(hm, to); + if (ret <= 0) + break; + + copied += ret; + } + +out_unlock: + trace_xfs_healthmon_read_finish(hm->mp, hm->events, hm->lost_prev_event); + inode_unlock(inode); + return copied ?: ret; +} + +/* Poll for available events. */ +STATIC __poll_t +xfs_healthmon_poll( + struct file *file, + struct poll_table_struct *wait) +{ + struct xfs_healthmon *hm = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &hm->wait, wait); + + if (xfs_healthmon_has_eventdata(hm)) + mask |= EPOLLIN; + return mask; +} + +/* Free all events */ +STATIC void +xfs_healthmon_free_events( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event, *next; + + event = hm->first_event; + while (event != NULL) { + trace_xfs_healthmon_drop(hm->mp, event); + next = event->next; + kfree(event); + event = next; + } + hm->first_event = hm->last_event = NULL; } /* Free the health monitoring information. */ @@ -66,6 +730,14 @@ xfs_healthmon_release( { struct xfs_healthmon *hm = file->private_data; + trace_xfs_healthmon_release(hm->mp, hm->events, hm->lost_prev_event); + + wake_up_all(&hm->wait); + + mutex_destroy(&hm->lock); + xfs_healthmon_free_events(hm); + if (hm->outbuf.size) + kfree(hm->outbuf.buffer); kfree(hm); return 0; @@ -76,9 +748,10 @@ static inline bool xfs_healthmon_validate( const struct xfs_health_monitor *hmo) { - if (hmo->flags) + if (hmo->flags & ~XFS_HEALTH_MONITOR_ALL) return false; - if (hmo->format) + if (hmo->format != XFS_HEALTH_MONITOR_FMT_JSON && + hmo->format != XFS_HEALTH_MONITOR_FMT_CSTRUCT) return false; if (memchr_inv(&hmo->pad1, 0, sizeof(hmo->pad1))) return false; @@ -89,6 +762,19 @@ xfs_healthmon_validate( /* Emit some data about the health monitoring fd. */ #ifdef CONFIG_PROC_FS +static const char * +xfs_healthmon_format_string(const struct xfs_healthmon *hm) +{ + switch (hm->format) { + case XFS_HEALTH_MONITOR_FMT_JSON: + return "json"; + case XFS_HEALTH_MONITOR_FMT_CSTRUCT: + return "blob"; + } + + return ""; +} + static void xfs_healthmon_show_fdinfo( struct seq_file *m, @@ -96,8 +782,13 @@ xfs_healthmon_show_fdinfo( { struct xfs_healthmon *hm = file->private_data; - seq_printf(m, "state:\talive\ndev:\t%s\n", - hm->mp->m_super->s_id); + mutex_lock(&hm->lock); + seq_printf(m, "state:\talive\ndev:\t%s\nformat:\t%s\nevents:\t%llu\nlost:\t%llu\n", + hm->mp->m_super->s_id, + xfs_healthmon_format_string(hm), + hm->total_events, + hm->total_lost); + mutex_unlock(&hm->lock); } #endif @@ -107,6 +798,7 @@ static const struct file_operations xfs_healthmon_fops = { .show_fdinfo = xfs_healthmon_show_fdinfo, #endif .read_iter = xfs_healthmon_read_iter, + .poll = xfs_healthmon_poll, .release = xfs_healthmon_release, }; @@ -121,6 +813,7 @@ xfs_ioc_health_monitor( { struct xfs_health_monitor hmo; struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; int fd; int ret; @@ -137,6 +830,23 @@ xfs_ioc_health_monitor( if (!hm) return -ENOMEM; hm->mp = mp; + hm->format = hmo.format; + + seq_buf_init(&hm->outbuf, NULL, 0); + mutex_init(&hm->lock); + init_waitqueue_head(&hm->wait); + + if (hmo.flags & XFS_HEALTH_MONITOR_VERBOSE) + hm->verbose = true; + + /* Queue up the first event that lets the client know we're running. */ + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, + XFS_HEALTHMON_MOUNT); + if (!event) { + ret = -ENOMEM; + goto out_mutex; + } + __xfs_healthmon_push(hm, event); /* * Create the anonymous file. If it succeeds, the file owns hm and @@ -146,12 +856,16 @@ xfs_ioc_health_monitor( O_CLOEXEC | O_RDONLY); if (fd < 0) { ret = fd; - goto out_hm; + goto out_mutex; } + trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); + return fd; -out_hm: +out_mutex: + mutex_destroy(&hm->lock); + xfs_healthmon_free_events(hm); kfree(hm); return ret; } diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index a60556dbd172ee..d42b864a3837a2 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -51,6 +51,8 @@ #include "xfs_rtgroup.h" #include "xfs_zone_alloc.h" #include "xfs_zone_priv.h" +#include "xfs_health.h" +#include "xfs_healthmon.h" /* * We include this last to have the helpers above available for the trace diff --git a/lib/seq_buf.c b/lib/seq_buf.c index f3f3436d60a940..f6a1fb46a1d6c9 100644 --- a/lib/seq_buf.c +++ b/lib/seq_buf.c @@ -245,6 +245,7 @@ int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len) seq_buf_set_overflow(s); return -1; } +EXPORT_SYMBOL_GPL(seq_buf_putmem); #define MAX_MEMHEX_BYTES 8U #define HEX_CHARS (MAX_MEMHEX_BYTES*2 + 1) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure 2025-10-23 0:03 ` [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong @ 2025-10-30 16:54 ` Darrick J. Wong 0 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-30 16:54 UTC (permalink / raw) To: cem; +Cc: linux-fsdevel, linux-xfs On Wed, Oct 22, 2025 at 05:03:27PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Create the basic infrastructure that we need to report health events to > userspace. We need a compact form for recording critical information > about an event and queueing them; a means to notice that we've lost some > events; and a means to format the events into something that userspace > can handle. > > Here, we've chosen json to export information to userspace. The > structured key-value nature of json gives us enormous flexibility to > modify the schema of what we'll send to userspace because we can add new > keys at any time. Userspace can use whatever json parsers are available > to consume the events and will not be confused by keys they don't > recognize. Self-review: originally when I started designing this new subsystem, I wanted to explore data exchange formats that are more flexible and easier for humans to read than C structures. The thought being that when we want to rev (or worse, enlarge) the event format, it ought to be trivially easy to do that in a way that doesn't break old userspace. I looked at formats such as protobufs and capnproto. These look really nice in that extending the wire format is fairly easy, you can give it a data schema and it generates the serialization code for you, handles endianness problems, etc. The huge downside is that neither support C all that well. Too hard, and didn't want to port either of those huge sprawling libraries first to the kernel and then again to xfsprogs. Then I thought, how about JSON? Javascript objects are human readable, the kernel can emit json without much fuss (it's all just strings!) and there are plenty of interpreters for python/rust/c/etc. There's a proposed schema format for json, which means that xfs can publish a description of the events that kernel will emit. Userspace consumers (e.g. xfsprogs/xfs_healer) can embed the same schema document and use it to validate the incoming events from the kernel, which means it can discard events that it doesn't understand, or garbage being emitted due to bugs. However, json has a huge crutch -- javascript is well known for its vague definitions of what are numbers. This makes expressing a large number rather fraught, because the runtime is free to represent a number in nearly any way it wants. Stupider ones will truncate values to word size, others will roll out doubles for uint52_t (yes, fifty-two) with the resulting loss of precision. Not good when you're dealing with discrete units. It just so happens that python's json library is smart enough to see a sequence of digits and put them in a u64 (at least on x86_64/aarch64) but an actual javascript interpreter (pasting into Firefox) isn't necessarily so clever. It turns out that none of the proposed json schemas were ever ratified even in an open-consensus way, so json blobs are still just loosely structured blobs. The parsing in userspace was also noticeably slow and memory-consumptive. As a result, I'm dropping all the json stuff from the codebase and leaving only the C structure event format. Since this is a mostly private interface, we can always rev the format in the traditional ways if we ever have to; there are 254 remaining unused format values. I wanted to document the outcome of this experiment for posterity in a public place, and now I have done so. --D > Note that we do NOT allow sending json back to the kernel, nor is there > any intent to do that. > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > --- > fs/xfs/libxfs/xfs_fs.h | 50 ++ > fs/xfs/xfs_healthmon.h | 29 + > fs/xfs/xfs_linux.h | 3 > fs/xfs/xfs_trace.h | 171 +++++++ > fs/xfs/libxfs/xfs_healthmon.schema.json | 129 +++++ > fs/xfs/xfs_healthmon.c | 728 +++++++++++++++++++++++++++++++ > fs/xfs/xfs_trace.c | 2 > lib/seq_buf.c | 1 > 8 files changed, 1106 insertions(+), 7 deletions(-) > create mode 100644 fs/xfs/libxfs/xfs_healthmon.schema.json > > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h > index dba7896f716092..4b642eea18b5ca 100644 > --- a/fs/xfs/libxfs/xfs_fs.h > +++ b/fs/xfs/libxfs/xfs_fs.h > @@ -1003,6 +1003,45 @@ struct xfs_rtgroup_geometry { > #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ > #define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1U << 4) /* reference counts */ > > +/* Health monitor event domains */ > + > +/* affects the whole fs */ > +#define XFS_HEALTH_MONITOR_DOMAIN_MOUNT (0) > + > +/* Health monitor event types */ > + > +/* status of the monitor itself */ > +#define XFS_HEALTH_MONITOR_TYPE_RUNNING (0) > +#define XFS_HEALTH_MONITOR_TYPE_LOST (1) > + > +/* lost events */ > +struct xfs_health_monitor_lost { > + __u64 count; > +}; > + > +struct xfs_health_monitor_event { > + /* XFS_HEALTH_MONITOR_DOMAIN_* */ > + __u32 domain; > + > + /* XFS_HEALTH_MONITOR_TYPE_* */ > + __u32 type; > + > + /* Timestamp of the event, in nanoseconds since the Unix epoch */ > + __u64 time_ns; > + > + /* > + * Details of the event. The primary clients are written in python > + * and rust, so break this up because bindgen hates anonymous structs > + * and unions. > + */ > + union { > + struct xfs_health_monitor_lost lost; > + } e; > + > + /* zeroes */ > + __u64 pad[2]; > +}; > + > struct xfs_health_monitor { > __u64 flags; /* flags */ > __u8 format; /* output format */ > @@ -1010,6 +1049,17 @@ struct xfs_health_monitor { > __u64 pad2[2]; /* zeroes */ > }; > > +/* Return all health status events, not just deltas */ > +#define XFS_HEALTH_MONITOR_VERBOSE (1ULL << 0) > + > +#define XFS_HEALTH_MONITOR_ALL (XFS_HEALTH_MONITOR_VERBOSE) > + > +/* Return events in a C structure */ > +#define XFS_HEALTH_MONITOR_FMT_CSTRUCT (0) > + > +/* Return events in JSON format */ > +#define XFS_HEALTH_MONITOR_FMT_JSON (1) > + > /* > * ioctl commands that are used by Linux filesystems > */ > diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h > index 07126e39281a0c..ea2d6a327dfb16 100644 > --- a/fs/xfs/xfs_healthmon.h > +++ b/fs/xfs/xfs_healthmon.h > @@ -6,6 +6,35 @@ > #ifndef __XFS_HEALTHMON_H__ > #define __XFS_HEALTHMON_H__ > > +enum xfs_healthmon_type { > + XFS_HEALTHMON_RUNNING, /* monitor running */ > + XFS_HEALTHMON_LOST, /* message lost */ > +}; > + > +enum xfs_healthmon_domain { > + XFS_HEALTHMON_MOUNT, /* affects the whole fs */ > +}; > + > +struct xfs_healthmon_event { > + struct xfs_healthmon_event *next; > + > + enum xfs_healthmon_type type; > + enum xfs_healthmon_domain domain; > + > + uint64_t time_ns; > + > + union { > + /* lost events */ > + struct { > + uint64_t lostcount; > + }; > + /* mount */ > + struct { > + unsigned int flags; > + }; > + }; > +}; > + > #ifdef CONFIG_XFS_HEALTH_MONITOR > long xfs_ioc_health_monitor(struct xfs_mount *mp, > struct xfs_health_monitor __user *arg); > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index 4dd747bdbccab2..e122db938cc06b 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -63,6 +63,9 @@ typedef __u32 xfs_nlink_t; > #include <linux/xattr.h> > #include <linux/mnt_idmapping.h> > #include <linux/debugfs.h> > +#ifdef CONFIG_XFS_HEALTH_MONITOR > +# include <linux/seq_buf.h> > +#endif > > #include <asm/page.h> > #include <asm/div64.h> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 79b8641880ab9d..17af5efee026c9 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -103,6 +103,8 @@ struct xfs_refcount_intent; > struct xfs_metadir_update; > struct xfs_rtgroup; > struct xfs_open_zone; > +struct xfs_healthmon_event; > +struct xfs_health_update_params; > > #define XFS_ATTR_FILTER_FLAGS \ > { XFS_ATTR_ROOT, "ROOT" }, \ > @@ -5908,6 +5910,175 @@ DEFINE_EVENT(xfs_freeblocks_resv_class, name, \ > DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_reserved); > DEFINE_FREEBLOCKS_RESV_EVENT(xfs_freecounter_enospc); > > +#ifdef CONFIG_XFS_HEALTH_MONITOR > +TRACE_EVENT(xfs_healthmon_lost_event, > + TP_PROTO(const struct xfs_mount *mp, unsigned long long lost_prev), > + TP_ARGS(mp, lost_prev), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(unsigned long long, lost_prev) > + ), > + TP_fast_assign( > + __entry->dev = mp ? mp->m_super->s_dev : 0; > + __entry->lost_prev = lost_prev; > + ), > + TP_printk("dev %d:%d lost_prev %llu", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __entry->lost_prev) > +); > + > +#define XFS_HEALTHMON_FLAGS_STRINGS \ > + { XFS_HEALTH_MONITOR_VERBOSE, "verbose" } > +#define XFS_HEALTHMON_FMT_STRINGS \ > + { XFS_HEALTH_MONITOR_FMT_JSON, "json" }, \ > + { XFS_HEALTH_MONITOR_FMT_CSTRUCT, "cstruct" } > + > +TRACE_EVENT(xfs_healthmon_create, > + TP_PROTO(const struct xfs_mount *mp, u64 flags, u8 format), > + TP_ARGS(mp, flags, format), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(u64, flags) > + __field(u8, format) > + ), > + TP_fast_assign( > + __entry->dev = mp ? mp->m_super->s_dev : 0; > + __entry->flags = flags; > + __entry->format = format; > + ), > + TP_printk("dev %d:%d flags %s format %s", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __print_flags(__entry->flags, "|", XFS_HEALTHMON_FLAGS_STRINGS), > + __print_symbolic(__entry->format, XFS_HEALTHMON_FMT_STRINGS)) > +); > + > +TRACE_EVENT(xfs_healthmon_copybuf, > + TP_PROTO(const struct xfs_mount *mp, const struct iov_iter *iov, > + const struct seq_buf *seqbuf, size_t outpos), > + TP_ARGS(mp, iov, seqbuf, outpos), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(size_t, seqbuf_size) > + __field(size_t, seqbuf_len) > + __field(size_t, outpos) > + __field(size_t, to_copy) > + __field(size_t, iter_count) > + ), > + TP_fast_assign( > + __entry->dev = mp ? mp->m_super->s_dev : 0; > + __entry->seqbuf_size = seqbuf->size; > + __entry->seqbuf_len = seqbuf->len; > + __entry->outpos = outpos; > + __entry->to_copy = seqbuf->len - outpos; > + __entry->iter_count = iov_iter_count(iov); > + ), > + TP_printk("dev %d:%d seqsize %zu seqlen %zu out_pos %zu to_copy %zu iter_count %zu", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __entry->seqbuf_size, > + __entry->seqbuf_len, > + __entry->outpos, > + __entry->to_copy, > + __entry->iter_count) > +); > + > +DECLARE_EVENT_CLASS(xfs_healthmon_class, > + TP_PROTO(const struct xfs_mount *mp, unsigned int events, > + unsigned long long lost_prev), > + TP_ARGS(mp, events, lost_prev), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(unsigned int, events) > + __field(unsigned long long, lost_prev) > + ), > + TP_fast_assign( > + __entry->dev = mp ? mp->m_super->s_dev : 0; > + __entry->events = events; > + __entry->lost_prev = lost_prev; > + ), > + TP_printk("dev %d:%d events %u lost_prev? %llu", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __entry->events, > + __entry->lost_prev) > +); > +#define DEFINE_HEALTHMON_EVENT(name) \ > +DEFINE_EVENT(xfs_healthmon_class, name, \ > + TP_PROTO(const struct xfs_mount *mp, unsigned int events, \ > + unsigned long long lost_prev), \ > + TP_ARGS(mp, events, lost_prev)) > +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_start); > +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_finish); > +DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); > +DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); > + > +#define XFS_HEALTHMON_TYPE_STRINGS \ > + { XFS_HEALTHMON_LOST, "lost" } > + > +#define XFS_HEALTHMON_DOMAIN_STRINGS \ > + { XFS_HEALTHMON_MOUNT, "mount" } > + > +TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); > + > +TRACE_DEFINE_ENUM(XFS_HEALTHMON_MOUNT); > + > +DECLARE_EVENT_CLASS(xfs_healthmon_event_class, > + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), > + TP_ARGS(mp, event), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(unsigned int, type) > + __field(unsigned int, domain) > + __field(unsigned int, mask) > + __field(unsigned long long, ino) > + __field(unsigned int, gen) > + __field(unsigned int, group) > + __field(unsigned long long, offset) > + __field(unsigned long long, length) > + __field(unsigned long long, lostcount) > + ), > + TP_fast_assign( > + __entry->dev = mp ? mp->m_super->s_dev : 0; > + __entry->type = event->type; > + __entry->domain = event->domain; > + __entry->mask = 0; > + __entry->group = 0; > + __entry->ino = 0; > + __entry->gen = 0; > + __entry->offset = 0; > + __entry->length = 0; > + __entry->lostcount = 0; > + switch (__entry->domain) { > + case XFS_HEALTHMON_MOUNT: > + switch (__entry->type) { > + case XFS_HEALTHMON_LOST: > + __entry->lostcount = event->lostcount; > + break; > + } > + break; > + } > + ), > + TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x offset 0x%llx len 0x%llx group 0x%x lost %llu", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __print_symbolic(__entry->type, XFS_HEALTHMON_TYPE_STRINGS), > + __print_symbolic(__entry->domain, XFS_HEALTHMON_DOMAIN_STRINGS), > + __entry->mask, > + __entry->ino, > + __entry->gen, > + __entry->offset, > + __entry->length, > + __entry->group, > + __entry->lostcount) > +); > +#define DEFINE_HEALTHMONEVENT_EVENT(name) \ > +DEFINE_EVENT(xfs_healthmon_event_class, name, \ > + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), \ > + TP_ARGS(mp, event)) > +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_push); > +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_pop); > +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format); > +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format_overflow); > +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_drop); > +#endif /* CONFIG_XFS_HEALTH_MONITOR */ > + > #endif /* _TRACE_XFS_H */ > > #undef TRACE_INCLUDE_PATH > diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json > new file mode 100644 > index 00000000000000..68762738b04191 > --- /dev/null > +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json > @@ -0,0 +1,129 @@ > +{ > + "$comment": [ > + "SPDX-License-Identifier: GPL-2.0-or-later", > + "Copyright (c) 2024-2025 Oracle. All Rights Reserved.", > + "Author: Darrick J. Wong <djwong@kernel.org>", > + "", > + "This schema file describes the format of the json objects", > + "readable from the fd returned by the XFS_IOC_HEALTHMON", > + "ioctl." > + ], > + > + "$schema": "https://json-schema.org/draft/2020-12/schema", > + "$id": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/fs/xfs/libxfs/xfs_healthmon.schema.json", > + > + "title": "XFS Health Monitoring Events", > + > + "$comment": "Events must be one of the following types:", > + "oneOf": [ > + { > + "$ref": "#/$events/running" > + }, > + { > + "$ref": "#/$events/unmount" > + }, > + { > + "$ref": "#/$events/lost" > + } > + ], > + > + "$comment": "Simple data types are defined here.", > + "$defs": { > + "time_ns": { > + "title": "Time of Event", > + "description": "Timestamp of the event, in nanoseconds since the Unix epoch.", > + "type": "integer" > + }, > + "count": { > + "title": "Count of events", > + "description": "Number of events.", > + "type": "integer", > + "minimum": 1 > + } > + }, > + > + "$comment": "Event types are defined here.", > + "$events": { > + "running": { > + "title": "Health Monitoring Running", > + "$comment": [ > + "The health monitor is actually running." > + ], > + "type": "object", > + > + "properties": { > + "type": { > + "const": "running" > + }, > + "time_ns": { > + "$ref": "#/$defs/time_ns" > + }, > + "domain": { > + "const": "mount" > + } > + }, > + > + "required": [ > + "type", > + "time_ns", > + "domain" > + ] > + }, > + "unmount": { > + "title": "Filesystem Unmounted", > + "$comment": [ > + "The filesystem was unmounted." > + ], > + "type": "object", > + > + "properties": { > + "type": { > + "const": "unmount" > + }, > + "time_ns": { > + "$ref": "#/$defs/time_ns" > + }, > + "domain": { > + "const": "mount" > + } > + }, > + > + "required": [ > + "type", > + "time_ns", > + "domain" > + ] > + }, > + "lost": { > + "title": "Health Monitoring Events Lost", > + "$comment": [ > + "Previous health monitoring events were", > + "dropped due to memory allocation failures", > + "or queue limits." > + ], > + "type": "object", > + > + "properties": { > + "type": { > + "const": "lost" > + }, > + "count": { > + "$ref": "#/$defs/count" > + }, > + "time_ns": { > + "$ref": "#/$defs/time_ns" > + }, > + "domain": { > + "const": "mount" > + } > + }, > + > + "required": [ > + "type", > + "count", > + "time_ns", > + "domain" > + ] > + } > + } > +} > diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c > index 7b0d9f78b0a402..d5ca6ef8015c0e 100644 > --- a/fs/xfs/xfs_healthmon.c > +++ b/fs/xfs/xfs_healthmon.c > @@ -40,12 +40,558 @@ > * so that the queueing and processing of the events do not pin the mount and > * cannot slow down the main filesystem. The healthmon object can exist past > * the end of the filesystem mount. > + * > + * Please see the xfs_healthmon.schema.json file for a description of the > + * format of the json events that are conveyed to userspace. > */ > > +/* Allow this many events to build up in memory per healthmon fd. */ > +#define XFS_HEALTHMON_MAX_EVENTS \ > + (32768 / sizeof(struct xfs_healthmon_event)) > + > +struct flag_string { > + unsigned int mask; > + const char *str; > +}; > + > struct xfs_healthmon { > + /* lock for mp and eventlist */ > + struct mutex lock; > + > + /* waiter for signalling the arrival of events */ > + struct wait_queue_head wait; > + > + /* list of event objects */ > + struct xfs_healthmon_event *first_event; > + struct xfs_healthmon_event *last_event; > + > struct xfs_mount *mp; > + > + /* number of events */ > + unsigned int events; > + > + /* > + * Buffer for formatting events. New buffer data are appended to the > + * end of the seqbuf, and outpos is used to determine where to start > + * a copy_iter. Both are protected by inode_lock. > + */ > + struct seq_buf outbuf; > + size_t outpos; > + > + /* XFS_HEALTH_MONITOR_FMT_* */ > + uint8_t format; > + > + /* do we want all events? */ > + bool verbose; > + > + /* did we lose previous events? */ > + unsigned long long lost_prev_event; > + > + /* total counts of events observed and lost events */ > + unsigned long long total_events; > + unsigned long long total_lost; > }; > > +static inline void xfs_healthmon_bump_events(struct xfs_healthmon *hm) > +{ > + hm->events++; > + hm->total_events++; > +} > + > +static inline void xfs_healthmon_bump_lost(struct xfs_healthmon *hm) > +{ > + hm->lost_prev_event++; > + hm->total_lost++; > +} > + > +/* Remove an event from the head of the list. */ > +static inline int > +xfs_healthmon_free_head( > + struct xfs_healthmon *hm, > + struct xfs_healthmon_event *event) > +{ > + struct xfs_healthmon_event *head; > + > + mutex_lock(&hm->lock); > + head = hm->first_event; > + if (head != event) { > + ASSERT(hm->first_event == event); > + mutex_unlock(&hm->lock); > + return -EFSCORRUPTED; > + } > + > + if (hm->last_event == head) > + hm->last_event = NULL; > + hm->first_event = head->next; > + hm->events--; > + mutex_unlock(&hm->lock); > + > + trace_xfs_healthmon_pop(hm->mp, head); > + kfree(event); > + return 0; > +} > + > +/* Push an event onto the end of the list. */ > +static inline void > +__xfs_healthmon_push( > + struct xfs_healthmon *hm, > + struct xfs_healthmon_event *event) > +{ > + if (!hm->first_event) > + hm->first_event = event; > + if (hm->last_event) > + hm->last_event->next = event; > + hm->last_event = event; > + event->next = NULL; > + xfs_healthmon_bump_events(hm); > + wake_up(&hm->wait); > + > + trace_xfs_healthmon_push(hm->mp, event); > +} > + > +/* Push an event onto the end of the list if we're not full. */ > +static inline int > +xfs_healthmon_push( > + struct xfs_healthmon *hm, > + struct xfs_healthmon_event *event) > +{ > + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { > + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); > + > + xfs_healthmon_bump_lost(hm); > + return -ENOMEM; > + } > + > + __xfs_healthmon_push(hm, event); > + return 0; > +} > + > +/* Create a new event or record that we failed. */ > +static struct xfs_healthmon_event * > +xfs_healthmon_alloc( > + struct xfs_healthmon *hm, > + enum xfs_healthmon_type type, > + enum xfs_healthmon_domain domain) > +{ > + struct timespec64 now; > + struct xfs_healthmon_event *event; > + > + event = kzalloc(sizeof(*event), GFP_NOFS); > + if (!event) { > + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); > + > + xfs_healthmon_bump_lost(hm); > + return NULL; > + } > + > + event->type = type; > + event->domain = domain; > + ktime_get_coarse_real_ts64(&now); > + event->time_ns = (now.tv_sec * NSEC_PER_SEC) + now.tv_nsec; > + > + return event; > +} > + > +/* > + * Before we accept an event notification from a live update hook, we need to > + * clear out any previously lost events. > + */ > +static inline int > +xfs_healthmon_start_live_update( > + struct xfs_healthmon *hm) > +{ > + struct xfs_healthmon_event *event; > + > + /* If the queue is already full.... */ > + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { > + trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); > + > + if (hm->last_event && > + hm->last_event->type == XFS_HEALTHMON_LOST) { > + /* > + * ...and the last event notes lost events, then add > + * the number of events we already lost, plus one for > + * this event that we're about to lose. > + */ > + hm->last_event->lostcount += hm->lost_prev_event + 1; > + hm->lost_prev_event = 0; > + } else { > + /* > + * ...try to create a new lost event. Add the number > + * of events we previously lost, plus one for this > + * event. > + */ > + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_LOST, > + XFS_HEALTHMON_MOUNT); > + if (!event) { > + xfs_healthmon_bump_lost(hm); > + return -ENOMEM; > + } > + event->lostcount = hm->lost_prev_event + 1; > + hm->lost_prev_event = 0; > + > + __xfs_healthmon_push(hm, event); > + } > + > + return -ENOSPC; > + } > + > + /* If we lost an event in the past, but the queue isn't yet full... */ > + if (hm->lost_prev_event) { > + /* > + * ...try to create a new lost event. Add the number of events > + * we previously lost, plus one for this event. > + */ > + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_LOST, > + XFS_HEALTHMON_MOUNT); > + if (!event) { > + xfs_healthmon_bump_lost(hm); > + return -ENOMEM; > + } > + event->lostcount = hm->lost_prev_event; > + hm->lost_prev_event = 0; > + > + /* > + * If adding this lost event pushes us over the limit, we're > + * going to lose the current event. Note that in the lost > + * event count too. > + */ > + if (hm->events == XFS_HEALTHMON_MAX_EVENTS - 1) > + event->lostcount++; > + > + __xfs_healthmon_push(hm, event); > + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { > + trace_xfs_healthmon_lost_event(hm->mp, > + hm->lost_prev_event); > + return -ENOSPC; > + } > + } > + > + /* > + * The queue is not full and it is not currently the case that events > + * were lost. > + */ > + return 0; > +} > + > +/* Render the health update type as a string. */ > +STATIC const char * > +xfs_healthmon_typestring( > + const struct xfs_healthmon_event *event) > +{ > + static const char *type_strings[] = { > + [XFS_HEALTHMON_RUNNING] = "running", > + [XFS_HEALTHMON_LOST] = "lost", > + }; > + > + if (event->type >= ARRAY_SIZE(type_strings)) > + return "?"; > + > + return type_strings[event->type]; > +} > + > +/* Render the health domain as a string. */ > +STATIC const char * > +xfs_healthmon_domstring( > + const struct xfs_healthmon_event *event) > +{ > + static const char *dom_strings[] = { > + [XFS_HEALTHMON_MOUNT] = "mount", > + }; > + > + if (event->domain >= ARRAY_SIZE(dom_strings)) > + return "?"; > + > + return dom_strings[event->domain]; > +} > + > +/* Convert a flags bitmap into a jsonable string. */ > +static inline int > +xfs_healthmon_format_flags( > + struct seq_buf *outbuf, > + const struct flag_string *strings, > + size_t nr_strings, > + unsigned int flags) > +{ > + const struct flag_string *p; > + ssize_t ret; > + unsigned int i; > + bool first = true; > + > + for (i = 0, p = strings; i < nr_strings; i++, p++) { > + if (!(p->mask & flags)) > + continue; > + > + ret = seq_buf_printf(outbuf, "%s\"%s\"", > + first ? "" : ", ", p->str); > + if (ret < 0) > + return ret; > + > + first = false; > + flags &= ~p->mask; > + } > + > + for (i = 0; flags != 0 && i < sizeof(flags) * NBBY; i++) { > + if (!(flags & (1U << i))) > + continue; > + > + /* json doesn't support hexadecimal notation */ > + ret = seq_buf_printf(outbuf, "%s%u", > + first ? "" : ", ", (1U << i)); > + if (ret < 0) > + return ret; > + > + first = false; > + } > + > + return 0; > +} > + > +/* Convert the event mask into a jsonable string. */ > +static inline int > +__xfs_healthmon_format_mask( > + struct seq_buf *outbuf, > + const char *descr, > + const struct flag_string *strings, > + size_t nr_strings, > + unsigned int mask) > +{ > + ssize_t ret; > + > + ret = seq_buf_printf(outbuf, " \"%s\": [", descr); > + if (ret < 0) > + return ret; > + > + ret = xfs_healthmon_format_flags(outbuf, strings, nr_strings, mask); > + if (ret < 0) > + return ret; > + > + return seq_buf_printf(outbuf, "],\n"); > +} > + > +#define xfs_healthmon_format_mask(o, d, s, m) \ > + __xfs_healthmon_format_mask((o), (d), (s), ARRAY_SIZE(s), (m)) > + > +static inline void > +xfs_healthmon_reset_outbuf( > + struct xfs_healthmon *hm) > +{ > + hm->outpos = 0; > + seq_buf_clear(&hm->outbuf); > +} > + > +/* Render lost event mask as a string set */ > +static int > +xfs_healthmon_format_lost( > + struct seq_buf *outbuf, > + const struct xfs_healthmon_event *event) > +{ > + return seq_buf_printf(outbuf, " \"count\": %llu,\n", > + event->lostcount); > +} > + > +/* > + * Format an event into json. Returns 0 if we formatted the event. If > + * formatting the event overflows the buffer, returns -1 with the seqbuf len > + * unchanged. > + */ > +STATIC int > +xfs_healthmon_format_json( > + struct xfs_healthmon *hm, > + const struct xfs_healthmon_event *event) > +{ > + struct seq_buf *outbuf = &hm->outbuf; > + size_t old_seqlen = outbuf->len; > + int ret; > + > + trace_xfs_healthmon_format(hm->mp, event); > + > + ret = seq_buf_printf(outbuf, "{\n"); > + if (ret < 0) > + goto overrun; > + > + ret = seq_buf_printf(outbuf, " \"domain\": \"%s\",\n", > + xfs_healthmon_domstring(event)); > + if (ret < 0) > + goto overrun; > + > + ret = seq_buf_printf(outbuf, " \"type\": \"%s\",\n", > + xfs_healthmon_typestring(event)); > + if (ret < 0) > + goto overrun; > + > + switch (event->domain) { > + case XFS_HEALTHMON_MOUNT: > + switch (event->type) { > + case XFS_HEALTHMON_RUNNING: > + /* nothing to format */ > + break; > + case XFS_HEALTHMON_LOST: > + ret = xfs_healthmon_format_lost(outbuf, event); > + break; > + default: > + break; > + } > + break; > + } > + if (ret < 0) > + goto overrun; > + > + /* The last element in the json must not have a trailing comma. */ > + ret = seq_buf_printf(outbuf, " \"time_ns\": %llu\n", > + event->time_ns); > + if (ret < 0) > + goto overrun; > + > + ret = seq_buf_printf(outbuf, "}\n"); > + if (ret < 0) > + goto overrun; > + > + ASSERT(!seq_buf_has_overflowed(outbuf)); > + return 0; > +overrun: > + /* > + * We overflowed the buffer and could not format the event. Reset the > + * seqbuf and tell the caller not to delete the event. > + */ > + trace_xfs_healthmon_format_overflow(hm->mp, event); > + outbuf->len = old_seqlen; > + return -1; > +} > + > +static const unsigned int domain_map[] = { > + [XFS_HEALTHMON_MOUNT] = XFS_HEALTH_MONITOR_DOMAIN_MOUNT, > +}; > + > +static const unsigned int type_map[] = { > + [XFS_HEALTHMON_RUNNING] = XFS_HEALTH_MONITOR_TYPE_RUNNING, > + [XFS_HEALTHMON_LOST] = XFS_HEALTH_MONITOR_TYPE_LOST, > +}; > + > +/* Render event as a C structure */ > +STATIC int > +xfs_healthmon_format_cstruct( > + struct xfs_healthmon *hm, > + const struct xfs_healthmon_event *event) > +{ > + struct xfs_health_monitor_event hme = { > + .time_ns = event->time_ns, > + }; > + struct seq_buf *outbuf = &hm->outbuf; > + size_t old_seqlen = outbuf->len; > + int ret; > + > + trace_xfs_healthmon_format(hm->mp, event); > + > + if (event->domain < 0 || event->domain >= ARRAY_SIZE(domain_map) || > + event->type < 0 || event->type >= ARRAY_SIZE(type_map)) > + return -EFSCORRUPTED; > + > + hme.domain = domain_map[event->domain]; > + hme.type = type_map[event->type]; > + > + /* fill in the event-specific details */ > + switch (event->domain) { > + case XFS_HEALTHMON_MOUNT: > + switch (event->type) { > + case XFS_HEALTHMON_LOST: > + hme.e.lost.count = event->lostcount; > + break; > + default: > + break; > + } > + break; > + default: > + break; > + } > + > + ret = seq_buf_putmem(outbuf, &hme, sizeof(hme)); > + if (ret < 0) { > + /* > + * We overflowed the buffer and could not format the event. > + * Reset the seqbuf and tell the caller not to delete the > + * event. > + */ > + trace_xfs_healthmon_format_overflow(hm->mp, event); > + outbuf->len = old_seqlen; > + return -1; > + } > + > + ASSERT(!seq_buf_has_overflowed(outbuf)); > + return 0; > +} > + > +/* How many bytes are waiting in the outbuf to be copied? */ > +static inline size_t > +xfs_healthmon_outbuf_bytes( > + struct xfs_healthmon *hm) > +{ > + unsigned int used = seq_buf_used(&hm->outbuf); > + > + if (used > hm->outpos) > + return used - hm->outpos; > + return 0; > +} > + > +/* > + * Do we have something for userspace to do? This can mean unmount events, > + * events pending in the queue, or pending bytes in the outbuf. > + */ > +static inline bool > +xfs_healthmon_has_eventdata( > + struct xfs_healthmon *hm) > +{ > + return hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; > +} > + > +/* Try to copy the rest of the outbuf to the iov iter. */ > +STATIC ssize_t > +xfs_healthmon_copybuf( > + struct xfs_healthmon *hm, > + struct iov_iter *to) > +{ > + size_t to_copy; > + size_t w = 0; > + > + trace_xfs_healthmon_copybuf(hm->mp, to, &hm->outbuf, hm->outpos); > + > + to_copy = xfs_healthmon_outbuf_bytes(hm); > + if (to_copy) { > + w = copy_to_iter(hm->outbuf.buffer + hm->outpos, to_copy, to); > + if (!w) > + return -EFAULT; > + > + hm->outpos += w; > + } > + > + /* > + * Nothing left to copy? Reset the seqbuf pointers and outbuf to the > + * start since there's no live data in the buffer. > + */ > + if (xfs_healthmon_outbuf_bytes(hm) == 0) > + xfs_healthmon_reset_outbuf(hm); > + return w; > +} > + > +/* > + * See if there's an event waiting for us. If the fs is no longer mounted, > + * don't bother sending any more events. > + */ > +static inline struct xfs_healthmon_event * > +xfs_healthmon_peek( > + struct xfs_healthmon *hm) > +{ > + struct xfs_healthmon_event *event; > + > + mutex_lock(&hm->lock); > + if (hm->mp) > + event = hm->first_event; > + else > + event = NULL; > + mutex_unlock(&hm->lock); > + return event; > +} > + > /* > * Convey queued event data to userspace. First copy any remaining bytes in > * the outbuf, then format the oldest event into the outbuf and copy that too. > @@ -55,7 +601,125 @@ xfs_healthmon_read_iter( > struct kiocb *iocb, > struct iov_iter *to) > { > - return -EIO; > + struct file *file = iocb->ki_filp; > + struct inode *inode = file_inode(file); > + struct xfs_healthmon *hm = file->private_data; > + struct xfs_healthmon_event *event; > + size_t copied = 0; > + ssize_t ret = 0; > + > + /* Wait for data to become available */ > + if (!(file->f_flags & O_NONBLOCK)) { > + ret = wait_event_interruptible(hm->wait, > + xfs_healthmon_has_eventdata(hm)); > + if (ret) > + return ret; > + } else if (!xfs_healthmon_has_eventdata(hm)) { > + return -EAGAIN; > + } > + > + /* Allocate formatting buffer up to 64k if necessary */ > + if (hm->outbuf.size == 0) { > + void *outbuf; > + size_t bufsize = min(65536, max(PAGE_SIZE, > + iov_iter_count(to))); > + > + outbuf = kzalloc(bufsize, GFP_KERNEL); > + if (!outbuf) { > + bufsize = PAGE_SIZE; > + outbuf = kzalloc(bufsize, GFP_KERNEL); > + if (!outbuf) > + return -ENOMEM; > + } > + > + inode_lock(inode); > + if (hm->outbuf.size == 0) { > + seq_buf_init(&hm->outbuf, outbuf, bufsize); > + hm->outpos = 0; > + } else { > + kfree(outbuf); > + } > + } else { > + inode_lock(inode); > + } > + > + trace_xfs_healthmon_read_start(hm->mp, hm->events, hm->lost_prev_event); > + > + /* > + * If there's anything left in the seqbuf, copy that before formatting > + * more events. > + */ > + ret = xfs_healthmon_copybuf(hm, to); > + if (ret < 0) > + goto out_unlock; > + copied += ret; > + > + while (iov_iter_count(to) > 0) { > + /* Format the next events into the outbuf until it's full. */ > + while ((event = xfs_healthmon_peek(hm)) != NULL) { > + switch (hm->format) { > + case XFS_HEALTH_MONITOR_FMT_JSON: > + ret = xfs_healthmon_format_json(hm, event); > + break; > + case XFS_HEALTH_MONITOR_FMT_CSTRUCT: > + ret = xfs_healthmon_format_cstruct(hm, event); > + break; > + default: > + ret = -EINVAL; > + goto out_unlock; > + } > + if (ret < 0) > + break; > + ret = xfs_healthmon_free_head(hm, event); > + if (ret) > + goto out_unlock; > + } > + > + /* Copy it to userspace */ > + ret = xfs_healthmon_copybuf(hm, to); > + if (ret <= 0) > + break; > + > + copied += ret; > + } > + > +out_unlock: > + trace_xfs_healthmon_read_finish(hm->mp, hm->events, hm->lost_prev_event); > + inode_unlock(inode); > + return copied ?: ret; > +} > + > +/* Poll for available events. */ > +STATIC __poll_t > +xfs_healthmon_poll( > + struct file *file, > + struct poll_table_struct *wait) > +{ > + struct xfs_healthmon *hm = file->private_data; > + __poll_t mask = 0; > + > + poll_wait(file, &hm->wait, wait); > + > + if (xfs_healthmon_has_eventdata(hm)) > + mask |= EPOLLIN; > + return mask; > +} > + > +/* Free all events */ > +STATIC void > +xfs_healthmon_free_events( > + struct xfs_healthmon *hm) > +{ > + struct xfs_healthmon_event *event, *next; > + > + event = hm->first_event; > + while (event != NULL) { > + trace_xfs_healthmon_drop(hm->mp, event); > + next = event->next; > + kfree(event); > + event = next; > + } > + hm->first_event = hm->last_event = NULL; > } > > /* Free the health monitoring information. */ > @@ -66,6 +730,14 @@ xfs_healthmon_release( > { > struct xfs_healthmon *hm = file->private_data; > > + trace_xfs_healthmon_release(hm->mp, hm->events, hm->lost_prev_event); > + > + wake_up_all(&hm->wait); > + > + mutex_destroy(&hm->lock); > + xfs_healthmon_free_events(hm); > + if (hm->outbuf.size) > + kfree(hm->outbuf.buffer); > kfree(hm); > > return 0; > @@ -76,9 +748,10 @@ static inline bool > xfs_healthmon_validate( > const struct xfs_health_monitor *hmo) > { > - if (hmo->flags) > + if (hmo->flags & ~XFS_HEALTH_MONITOR_ALL) > return false; > - if (hmo->format) > + if (hmo->format != XFS_HEALTH_MONITOR_FMT_JSON && > + hmo->format != XFS_HEALTH_MONITOR_FMT_CSTRUCT) > return false; > if (memchr_inv(&hmo->pad1, 0, sizeof(hmo->pad1))) > return false; > @@ -89,6 +762,19 @@ xfs_healthmon_validate( > > /* Emit some data about the health monitoring fd. */ > #ifdef CONFIG_PROC_FS > +static const char * > +xfs_healthmon_format_string(const struct xfs_healthmon *hm) > +{ > + switch (hm->format) { > + case XFS_HEALTH_MONITOR_FMT_JSON: > + return "json"; > + case XFS_HEALTH_MONITOR_FMT_CSTRUCT: > + return "blob"; > + } > + > + return ""; > +} > + > static void > xfs_healthmon_show_fdinfo( > struct seq_file *m, > @@ -96,8 +782,13 @@ xfs_healthmon_show_fdinfo( > { > struct xfs_healthmon *hm = file->private_data; > > - seq_printf(m, "state:\talive\ndev:\t%s\n", > - hm->mp->m_super->s_id); > + mutex_lock(&hm->lock); > + seq_printf(m, "state:\talive\ndev:\t%s\nformat:\t%s\nevents:\t%llu\nlost:\t%llu\n", > + hm->mp->m_super->s_id, > + xfs_healthmon_format_string(hm), > + hm->total_events, > + hm->total_lost); > + mutex_unlock(&hm->lock); > } > #endif > > @@ -107,6 +798,7 @@ static const struct file_operations xfs_healthmon_fops = { > .show_fdinfo = xfs_healthmon_show_fdinfo, > #endif > .read_iter = xfs_healthmon_read_iter, > + .poll = xfs_healthmon_poll, > .release = xfs_healthmon_release, > }; > > @@ -121,6 +813,7 @@ xfs_ioc_health_monitor( > { > struct xfs_health_monitor hmo; > struct xfs_healthmon *hm; > + struct xfs_healthmon_event *event; > int fd; > int ret; > > @@ -137,6 +830,23 @@ xfs_ioc_health_monitor( > if (!hm) > return -ENOMEM; > hm->mp = mp; > + hm->format = hmo.format; > + > + seq_buf_init(&hm->outbuf, NULL, 0); > + mutex_init(&hm->lock); > + init_waitqueue_head(&hm->wait); > + > + if (hmo.flags & XFS_HEALTH_MONITOR_VERBOSE) > + hm->verbose = true; > + > + /* Queue up the first event that lets the client know we're running. */ > + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, > + XFS_HEALTHMON_MOUNT); > + if (!event) { > + ret = -ENOMEM; > + goto out_mutex; > + } > + __xfs_healthmon_push(hm, event); > > /* > * Create the anonymous file. If it succeeds, the file owns hm and > @@ -146,12 +856,16 @@ xfs_ioc_health_monitor( > O_CLOEXEC | O_RDONLY); > if (fd < 0) { > ret = fd; > - goto out_hm; > + goto out_mutex; > } > > + trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); > + > return fd; > > -out_hm: > +out_mutex: > + mutex_destroy(&hm->lock); > + xfs_healthmon_free_events(hm); > kfree(hm); > return ret; > } > diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c > index a60556dbd172ee..d42b864a3837a2 100644 > --- a/fs/xfs/xfs_trace.c > +++ b/fs/xfs/xfs_trace.c > @@ -51,6 +51,8 @@ > #include "xfs_rtgroup.h" > #include "xfs_zone_alloc.h" > #include "xfs_zone_priv.h" > +#include "xfs_health.h" > +#include "xfs_healthmon.h" > > /* > * We include this last to have the helpers above available for the trace > diff --git a/lib/seq_buf.c b/lib/seq_buf.c > index f3f3436d60a940..f6a1fb46a1d6c9 100644 > --- a/lib/seq_buf.c > +++ b/lib/seq_buf.c > @@ -245,6 +245,7 @@ int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len) > seq_buf_set_overflow(s); > return -1; > } > +EXPORT_SYMBOL_GPL(seq_buf_putmem); > > #define MAX_MEMHEX_BYTES 8U > #define HEX_CHARS (MAX_MEMHEX_BYTES*2 + 1) > > ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 12/19] xfs: report metadata health events through healthmon 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (10 preceding siblings ...) 2025-10-23 0:03 ` [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong @ 2025-10-23 0:03 ` Darrick J. Wong 2025-10-23 0:04 ` [PATCH 13/19] xfs: report shutdown " Darrick J. Wong ` (6 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:03 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a metadata health event hook so that we can send events to userspace as we collect information. The unmount hook severs the weak reference between the health monitor and the filesystem it's monitoring; when this happens, we stop reporting events because there's no longer any point. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 38 +++ fs/xfs/libxfs/xfs_health.h | 5 fs/xfs/xfs_healthmon.h | 31 ++ fs/xfs/xfs_trace.h | 98 ++++++- fs/xfs/libxfs/xfs_healthmon.schema.json | 315 +++++++++++++++++++++ fs/xfs/xfs_health.c | 67 ++++ fs/xfs/xfs_healthmon.c | 465 +++++++++++++++++++++++++++++++ 7 files changed, 1010 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 4b642eea18b5ca..358abe98776d69 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1008,17 +1008,52 @@ struct xfs_rtgroup_geometry { /* affects the whole fs */ #define XFS_HEALTH_MONITOR_DOMAIN_MOUNT (0) +/* metadata health events */ +#define XFS_HEALTH_MONITOR_DOMAIN_FS (1) +#define XFS_HEALTH_MONITOR_DOMAIN_AG (2) +#define XFS_HEALTH_MONITOR_DOMAIN_INODE (3) +#define XFS_HEALTH_MONITOR_DOMAIN_RTGROUP (4) + /* Health monitor event types */ /* status of the monitor itself */ #define XFS_HEALTH_MONITOR_TYPE_RUNNING (0) #define XFS_HEALTH_MONITOR_TYPE_LOST (1) +/* metadata health events */ +#define XFS_HEALTH_MONITOR_TYPE_SICK (2) +#define XFS_HEALTH_MONITOR_TYPE_CORRUPT (3) +#define XFS_HEALTH_MONITOR_TYPE_HEALTHY (4) + +/* filesystem was unmounted */ +#define XFS_HEALTH_MONITOR_TYPE_UNMOUNT (5) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; }; +/* fs/rt metadata */ +struct xfs_health_monitor_fs { + /* XFS_FSOP_GEOM_SICK_* flags */ + __u32 mask; +}; + +/* ag/rtgroup metadata */ +struct xfs_health_monitor_group { + /* XFS_{AG,RTGROUP}_SICK_* flags */ + __u32 mask; + __u32 gno; +}; + +/* inode metadata */ +struct xfs_health_monitor_inode { + /* XFS_BS_SICK_* flags */ + __u32 mask; + __u32 gen; + __u64 ino; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1036,6 +1071,9 @@ struct xfs_health_monitor_event { */ union { struct xfs_health_monitor_lost lost; + struct xfs_health_monitor_fs fs; + struct xfs_health_monitor_group group; + struct xfs_health_monitor_inode inode; } e; /* zeroes */ diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 39fef33dedc6a8..9ff3bf8ba4ed8f 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -336,4 +336,9 @@ void xfs_health_hook_del(struct xfs_mount *mp, struct xfs_health_hook *hook); void xfs_health_hook_setup(struct xfs_health_hook *hook, notifier_fn_t mod_fn); #endif /* CONFIG_XFS_LIVE_HOOKS */ +unsigned int xfs_healthmon_inode_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_rtgroup_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_perag_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_fs_mask(unsigned int sick_mask); + #endif /* __XFS_HEALTH_H__ */ diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index ea2d6a327dfb16..3f3ba16d5af56a 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -9,10 +9,23 @@ enum xfs_healthmon_type { XFS_HEALTHMON_RUNNING, /* monitor running */ XFS_HEALTHMON_LOST, /* message lost */ + XFS_HEALTHMON_UNMOUNT, /* filesystem is unmounting */ + + /* metadata health events */ + XFS_HEALTHMON_SICK, /* runtime corruption observed */ + XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ + XFS_HEALTHMON_HEALTHY, /* fsck reported healthy structure */ + }; enum xfs_healthmon_domain { XFS_HEALTHMON_MOUNT, /* affects the whole fs */ + + /* metadata health events */ + XFS_HEALTHMON_FS, /* main filesystem metadata */ + XFS_HEALTHMON_AG, /* allocation group metadata */ + XFS_HEALTHMON_INODE, /* inode metadata */ + XFS_HEALTHMON_RTGROUP, /* realtime group metadata */ }; struct xfs_healthmon_event { @@ -32,6 +45,24 @@ struct xfs_healthmon_event { struct { unsigned int flags; }; + /* fs/rt metadata */ + struct { + /* XFS_SICK_* flags */ + unsigned int fsmask; + }; + /* ag/rtgroup metadata */ + struct { + /* XFS_SICK_* flags */ + unsigned int grpmask; + unsigned int group; + }; + /* inode metadata */ + struct { + /* XFS_SICK_INO_* flags */ + unsigned int imask; + uint32_t gen; + xfs_ino_t ino; + }; }; }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 17af5efee026c9..df09c225e13c2e 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6011,14 +6011,30 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); #define XFS_HEALTHMON_TYPE_STRINGS \ - { XFS_HEALTHMON_LOST, "lost" } + { XFS_HEALTHMON_LOST, "lost" }, \ + { XFS_HEALTHMON_UNMOUNT, "unmount" }, \ + { XFS_HEALTHMON_SICK, "sick" }, \ + { XFS_HEALTHMON_CORRUPT, "corrupt" }, \ + { XFS_HEALTHMON_HEALTHY, "healthy" } #define XFS_HEALTHMON_DOMAIN_STRINGS \ - { XFS_HEALTHMON_MOUNT, "mount" } + { XFS_HEALTHMON_MOUNT, "mount" }, \ + { XFS_HEALTHMON_FS, "fs" }, \ + { XFS_HEALTHMON_AG, "ag" }, \ + { XFS_HEALTHMON_INODE, "inode" }, \ + { XFS_HEALTHMON_RTGROUP, "rtgroup" } TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_UNMOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_SICK); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_CORRUPT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_HEALTHY); TRACE_DEFINE_ENUM(XFS_HEALTHMON_MOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_FS); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_AG); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_INODE); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_RTGROUP); DECLARE_EVENT_CLASS(xfs_healthmon_event_class, TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), @@ -6054,6 +6070,19 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, break; } break; + case XFS_HEALTHMON_FS: + __entry->mask = event->fsmask; + break; + case XFS_HEALTHMON_AG: + case XFS_HEALTHMON_RTGROUP: + __entry->mask = event->grpmask; + __entry->group = event->group; + break; + case XFS_HEALTHMON_INODE: + __entry->mask = event->imask; + __entry->ino = event->ino; + __entry->gen = event->gen; + break; } ), TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x offset 0x%llx len 0x%llx group 0x%x lost %llu", @@ -6072,11 +6101,76 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, DEFINE_EVENT(xfs_healthmon_event_class, name, \ TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), \ TP_ARGS(mp, event)) +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_insert); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_push); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_pop); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format_overflow); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_drop); + +#define XFS_HEALTHUP_TYPE_STRINGS \ + { XFS_HEALTHUP_UNMOUNT, "unmount" }, \ + { XFS_HEALTHUP_SICK, "sick" }, \ + { XFS_HEALTHUP_CORRUPT, "corrupt" }, \ + { XFS_HEALTHUP_HEALTHY, "healthy" } + +#define XFS_HEALTHUP_DOMAIN_STRINGS \ + { XFS_HEALTHUP_FS, "fs" }, \ + { XFS_HEALTHUP_AG, "ag" }, \ + { XFS_HEALTHUP_INODE, "inode" }, \ + { XFS_HEALTHUP_RTGROUP, "rtgroup" } + +TRACE_DEFINE_ENUM(XFS_HEALTHUP_UNMOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_SICK); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_CORRUPT); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_HEALTHY); + +TRACE_DEFINE_ENUM(XFS_HEALTHUP_FS); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_AG); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_INODE); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_RTGROUP); + +TRACE_EVENT(xfs_healthmon_metadata_hook, + TP_PROTO(const struct xfs_mount *mp, unsigned long type, + const struct xfs_health_update_params *update, + unsigned int events, unsigned long long lost_prev), + TP_ARGS(mp, type, update, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned long, type) + __field(unsigned int, domain) + __field(unsigned int, old_mask) + __field(unsigned int, new_mask) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(unsigned int, group) + __field(unsigned int, events) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->type = type; + __entry->domain = update->domain; + __entry->old_mask = update->old_mask; + __entry->new_mask = update->new_mask; + __entry->ino = update->ino; + __entry->gen = update->gen; + __entry->group = update->group; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d type %s domain %s oldmask 0x%x newmask 0x%x ino 0x%llx gen 0x%x group 0x%x events %u lost_prev %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XFS_HEALTHUP_TYPE_STRINGS), + __print_symbolic(__entry->domain, XFS_HEALTHUP_DOMAIN_STRINGS), + __entry->old_mask, + __entry->new_mask, + __entry->ino, + __entry->gen, + __entry->group, + __entry->events, + __entry->lost_prev) +); #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index 68762738b04191..dd78f1b71d587b 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -24,6 +24,18 @@ }, { "$ref": "#/$events/lost" + }, + { + "$ref": "#/$events/fs_metadata" + }, + { + "$ref": "#/$events/rtgroup_metadata" + }, + { + "$ref": "#/$events/perag_metadata" + }, + { + "$ref": "#/$events/inode_metadata" } ], @@ -39,6 +51,156 @@ "description": "Number of events.", "type": "integer", "minimum": 1 + }, + "xfs_agnumber_t": { + "description": "Allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_rgnumber_t": { + "description": "Realtime allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_ino_t": { + "description": "Inode number", + "type": "integer", + "minimum": 1 + }, + "i_generation": { + "description": "Inode generation number", + "type": "integer" + } + }, + + "$comment": "Filesystem metadata event data are defined here.", + "$metadata": { + "status": { + "description": "Metadata health status", + "$comment": [ + "One of:", + "", + " * sick: metadata corruption discovered", + " during a runtime operation.", + " * corrupt: corruption discovered during", + " an xfs_scrub run.", + " * healthy: metadata object was found to be", + " ok by xfs_scrub." + ], + "enum": [ + "sick", + "corrupt", + "healthy" + ] + }, + "fs": { + "description": [ + "Metadata structures that affect the entire", + "filesystem. Options include:", + "", + " * fscounters: summary counters", + " * usrquota: user quota records", + " * grpquota: group quota records", + " * prjquota: project quota records", + " * quotacheck: quota counters", + " * nlinks: file link counts", + " * metadir: metadata directory", + " * metapath: metadata inode paths" + ], + "enum": [ + "fscounters", + "grpquota", + "metadir", + "metapath", + "nlinks", + "prjquota", + "quotacheck", + "usrquota" + ] + }, + "perag": { + "description": [ + "Metadata structures owned by allocation", + "groups on the data device. Options include:", + "", + " * agf: group space header", + " * agfl: per-group free block list", + " * agi: group inode header", + " * bnobt: free space by position btree", + " * cntbt: free space by length btree", + " * finobt: free inode btree", + " * inobt: inode btree", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * inodes: problems were recorded for", + " this group's inodes, but the", + " inodes themselves had to be", + " reclaimed.", + " * super: superblock" + ], + "enum": [ + "agf", + "agfl", + "agi", + "bnobt", + "cntbt", + "finobt", + "inobt", + "inodes", + "refcountbt", + "rmapbt", + "super" + ] + }, + "rtgroup": { + "description": [ + "Metadata structures owned by allocation", + "groups on the realtime volume. Options", + "include:", + "", + " * bitmap: free space bitmap contents", + " for this group", + " * summary: realtime free space summary file", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * super: group superblock" + ], + "enum": [ + "bitmap", + "summary", + "refcountbt", + "rmapbt", + "super" + ] + }, + "inode": { + "description": [ + "Metadata structures owned by file inodes.", + "Options include:", + "", + " * bmapbta: attr fork", + " * bmapbtc: cow fork", + " * bmapbtd: data fork", + " * core: inode record", + " * directory: directory entries", + " * dirtree: directory tree problems detected", + " * parent: directory parent pointer", + " * symlink: symbolic link target", + " * xattr: extended attributes" + ], + "enum": [ + "bmapbta", + "bmapbtc", + "bmapbtd", + "core", + "directory", + "dirtree", + "parent", + "symlink", + "xattr" + ] } }, @@ -124,6 +286,159 @@ "time_ns", "domain" ] + }, + "fs_metadata": { + "title": "Filesystem-wide metadata event", + "description": [ + "Health status updates for filesystem-wide", + "metadata objects." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "fs" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/fs" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "structures" + ] + }, + "perag_metadata": { + "title": "Data device allocation group metadata event", + "description": [ + "Health status updates for data device ", + "allocation group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "perag" + }, + "group": { + "$ref": "#/$defs/xfs_agnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/perag" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "rtgroup_metadata": { + "title": "Realtime allocation group metadata event", + "description": [ + "Health status updates for realtime allocation", + "group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "rtgroup" + }, + "group": { + "$ref": "#/$defs/xfs_rgnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/rtgroup" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "inode_metadata": { + "title": "Inode metadata event", + "description": [ + "Health status updates for inode metadata.", + "The inode and generation number describe the", + "file that is affected by the change." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "inode" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/inode" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "structures" + ] } } } diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index abf9460ae79953..70e1b098c8b449 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -607,6 +607,25 @@ xfs_fsop_geom_health( } } +/* + * Translate XFS_SICK_FS_* into XFS_FSOP_GEOM_SICK_* except for the rt free + * space codes, which are sent via the rtgroup events. + */ +unsigned int +xfs_healthmon_fs_mask( + unsigned int sick_mask) +{ + const struct ioctl_sick_map *m; + unsigned int ioctl_mask = 0; + + for_each_sick_map(fs_map, m) { + if (sick_mask & m->sick_mask) + ioctl_mask |= m->ioctl_mask; + } + + return ioctl_mask; +} + static const struct ioctl_sick_map ag_map[] = { { XFS_SICK_AG_SB, XFS_AG_GEOM_SICK_SB }, { XFS_SICK_AG_AGF, XFS_AG_GEOM_SICK_AGF }, @@ -643,6 +662,22 @@ xfs_ag_geom_health( } } +/* Translate XFS_SICK_AG_* into XFS_AG_GEOM_SICK_*. */ +unsigned int +xfs_healthmon_perag_mask( + unsigned int sick_mask) +{ + const struct ioctl_sick_map *m; + unsigned int ioctl_mask = 0; + + for_each_sick_map(ag_map, m) { + if (sick_mask & m->sick_mask) + ioctl_mask |= m->ioctl_mask; + } + + return ioctl_mask; +} + static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RG_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RG_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, @@ -673,6 +708,22 @@ xfs_rtgroup_geom_health( } } +/* Translate XFS_SICK_RG_* into XFS_RTGROUP_GEOM_SICK_*. */ +unsigned int +xfs_healthmon_rtgroup_mask( + unsigned int sick_mask) +{ + const struct ioctl_sick_map *m; + unsigned int ioctl_mask = 0; + + for_each_sick_map(rtgroup_map, m) { + if (sick_mask & m->sick_mask) + ioctl_mask |= m->ioctl_mask; + } + + return ioctl_mask; +} + static const struct ioctl_sick_map ino_map[] = { { XFS_SICK_INO_CORE, XFS_BS_SICK_INODE }, { XFS_SICK_INO_BMBTD, XFS_BS_SICK_BMBTD }, @@ -711,6 +762,22 @@ xfs_bulkstat_health( } } +/* Translate XFS_SICK_INO_* into XFS_BS_SICK_*. */ +unsigned int +xfs_healthmon_inode_mask( + unsigned int sick_mask) +{ + const struct ioctl_sick_map *m; + unsigned int ioctl_mask = 0; + + for_each_sick_map(ino_map, m) { + if (sick_mask & m->sick_mask) + ioctl_mask |= m->ioctl_mask; + } + + return ioctl_mask; +} + /* Mark a block mapping sick. */ void xfs_bmap_mark_sick( diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index d5ca6ef8015c0e..05c67fe40f2bac 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -18,6 +18,7 @@ #include "xfs_da_btree.h" #include "xfs_quota_defs.h" #include "xfs_rtgroup.h" +#include "xfs_health.h" #include "xfs_healthmon.h" #include <linux/anon_inodes.h> @@ -65,8 +66,15 @@ struct xfs_healthmon { struct xfs_healthmon_event *first_event; struct xfs_healthmon_event *last_event; + /* live update hooks */ + struct xfs_health_hook hhook; + + /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; + /* filesystem type for safe cleanup of hooks; requires module_get */ + struct file_system_type *fstyp; + /* number of events */ unsigned int events; @@ -131,6 +139,23 @@ xfs_healthmon_free_head( return 0; } +/* Insert an event onto the start of the list. */ +static inline void +__xfs_healthmon_insert( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + event->next = hm->first_event; + if (!hm->first_event) + hm->first_event = event; + if (!hm->last_event) + hm->last_event = event; + xfs_healthmon_bump_events(hm); + wake_up(&hm->wait); + + trace_xfs_healthmon_insert(hm->mp, event); +} + /* Push an event onto the end of the list. */ static inline void __xfs_healthmon_push( @@ -202,6 +227,10 @@ xfs_healthmon_start_live_update( { struct xfs_healthmon_event *event; + /* Filesystem already unmounted, do nothing. */ + if (!hm->mp) + return -ESHUTDOWN; + /* If the queue is already full.... */ if (hm->events >= XFS_HEALTHMON_MAX_EVENTS) { trace_xfs_healthmon_lost_event(hm->mp, hm->lost_prev_event); @@ -274,6 +303,185 @@ xfs_healthmon_start_live_update( return 0; } +/* Compute the reporting mask. */ +static inline bool +xfs_healthmon_event_mask( + struct xfs_healthmon *hm, + enum xfs_health_update_type type, + const struct xfs_health_update_params *hup, + unsigned int *mask) +{ + /* Always report unmounts. */ + if (type == XFS_HEALTHUP_UNMOUNT) + return true; + + /* If we want all events, return all events. */ + if (hm->verbose) { + *mask = hup->new_mask; + return true; + } + + switch (type) { + case XFS_HEALTHUP_SICK: + /* Always report runtime corruptions */ + *mask = hup->new_mask; + break; + case XFS_HEALTHUP_CORRUPT: + /* Only report new fsck errors */ + *mask = hup->new_mask & ~hup->old_mask; + break; + case XFS_HEALTHUP_HEALTHY: + /* Only report healthy metadata that got fixed */ + *mask = hup->new_mask & hup->old_mask; + break; + case XFS_HEALTHUP_UNMOUNT: + /* This is here for static enum checking */ + break; + } + + /* If not in verbose mode, mask state has to change. */ + return *mask != 0; +} + +static inline enum xfs_healthmon_type +health_update_to_type( + enum xfs_health_update_type type) +{ + switch (type) { + case XFS_HEALTHUP_SICK: + return XFS_HEALTHMON_SICK; + case XFS_HEALTHUP_CORRUPT: + return XFS_HEALTHMON_CORRUPT; + case XFS_HEALTHUP_HEALTHY: + return XFS_HEALTHMON_HEALTHY; + case XFS_HEALTHUP_UNMOUNT: + /* static checking */ + break; + } + return XFS_HEALTHMON_UNMOUNT; +} + +static inline enum xfs_healthmon_domain +health_update_to_domain( + enum xfs_health_update_domain domain) +{ + switch (domain) { + case XFS_HEALTHUP_FS: + return XFS_HEALTHMON_FS; + case XFS_HEALTHUP_AG: + return XFS_HEALTHMON_AG; + case XFS_HEALTHUP_RTGROUP: + return XFS_HEALTHMON_RTGROUP; + case XFS_HEALTHUP_INODE: + /* static checking */ + break; + } + return XFS_HEALTHMON_INODE; +} + +/* Add a health event to the reporting queue. */ +STATIC int +xfs_healthmon_metadata_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_health_update_params *hup = data; + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + enum xfs_health_update_type type = action; + unsigned int mask = 0; + int error; + + hm = container_of(nb, struct xfs_healthmon, hhook.health_hook.nb); + + /* Decode event mask and skip events we don't care about. */ + if (!xfs_healthmon_event_mask(hm, type, hup, &mask)) + return NOTIFY_DONE; + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_metadata_hook(hm->mp, action, hup, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + if (type == XFS_HEALTHUP_UNMOUNT) { + /* + * The filesystem is unmounting, so we must detach from the + * mount. After this point, the healthmon thread has no + * connection to the mounted filesystem and must not touch its + * hooks. + */ + trace_xfs_healthmon_unmount(hm->mp, hm->events, + hm->lost_prev_event); + + hm->mp = NULL; + + /* + * Try to add an unmount message to the head of the list so + * that userspace will notice the unmount. If we can't add + * the event, wake up the reader directly. + */ + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_UNMOUNT, + XFS_HEALTHMON_MOUNT); + if (event) + __xfs_healthmon_insert(hm, event); + else + wake_up(&hm->wait); + + goto out_unlock; + } + + event = xfs_healthmon_alloc(hm, health_update_to_type(type), + health_update_to_domain(hup->domain)); + if (!event) + goto out_unlock; + + /* Ignore the event if it's only reporting a secondary health state. */ + switch (event->domain) { + case XFS_HEALTHMON_FS: + event->fsmask = mask & ~XFS_SICK_FS_SECONDARY; + if (!event->fsmask) + goto out_event; + break; + case XFS_HEALTHMON_AG: + event->grpmask = mask & ~XFS_SICK_AG_SECONDARY; + if (!event->grpmask) + goto out_event; + event->group = hup->group; + break; + case XFS_HEALTHMON_RTGROUP: + event->grpmask = mask & ~XFS_SICK_RG_SECONDARY; + if (!event->grpmask) + goto out_event; + event->group = hup->group; + break; + case XFS_HEALTHMON_INODE: + event->imask = mask & ~XFS_SICK_INO_SECONDARY; + if (!event->imask) + goto out_event; + event->ino = hup->ino; + event->gen = hup->gen; + break; + default: + ASSERT(0); + break; + } + error = xfs_healthmon_push(hm, event); + if (error) + goto out_event; + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +out_event: + kfree(event); + goto out_unlock; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -282,6 +490,10 @@ xfs_healthmon_typestring( static const char *type_strings[] = { [XFS_HEALTHMON_RUNNING] = "running", [XFS_HEALTHMON_LOST] = "lost", + [XFS_HEALTHMON_UNMOUNT] = "unmount", + [XFS_HEALTHMON_SICK] = "sick", + [XFS_HEALTHMON_CORRUPT] = "corrupt", + [XFS_HEALTHMON_HEALTHY] = "healthy", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -297,6 +509,10 @@ xfs_healthmon_domstring( { static const char *dom_strings[] = { [XFS_HEALTHMON_MOUNT] = "mount", + [XFS_HEALTHMON_FS] = "fs", + [XFS_HEALTHMON_AG] = "perag", + [XFS_HEALTHMON_INODE] = "inode", + [XFS_HEALTHMON_RTGROUP] = "rtgroup", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -322,6 +538,11 @@ xfs_healthmon_format_flags( if (!(p->mask & flags)) continue; + if (!p->str) { + flags &= ~p->mask; + continue; + } + ret = seq_buf_printf(outbuf, "%s\"%s\"", first ? "" : ", ", p->str); if (ret < 0) @@ -372,6 +593,113 @@ __xfs_healthmon_format_mask( #define xfs_healthmon_format_mask(o, d, s, m) \ __xfs_healthmon_format_mask((o), (d), (s), ARRAY_SIZE(s), (m)) +/* Render fs sickness mask as a string set */ +static int +xfs_healthmon_format_fs( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_FSOP_GEOM_SICK_COUNTERS, "fscounters" }, + { XFS_FSOP_GEOM_SICK_UQUOTA, "usrquota" }, + { XFS_FSOP_GEOM_SICK_GQUOTA, "grpquota" }, + { XFS_FSOP_GEOM_SICK_PQUOTA, "prjquota" }, + { XFS_FSOP_GEOM_SICK_QUOTACHECK, "quotacheck" }, + { XFS_FSOP_GEOM_SICK_NLINKS, "nlinks" }, + { XFS_FSOP_GEOM_SICK_METADIR, "metadir" }, + { XFS_FSOP_GEOM_SICK_METAPATH, "metapath" }, + }; + + return xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + xfs_healthmon_fs_mask(event->fsmask)); +} + +/* Render rtgroup sickness mask as a string set */ +static int +xfs_healthmon_format_rtgroup( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_RTGROUP_GEOM_SICK_SUPER, "super" }, + { XFS_RTGROUP_GEOM_SICK_BITMAP, "bitmap" }, + { XFS_RTGROUP_GEOM_SICK_SUMMARY, "summary" }, + { XFS_RTGROUP_GEOM_SICK_RMAPBT, "rmapbt" }, + { XFS_RTGROUP_GEOM_SICK_REFCNTBT, "refcountbt" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + xfs_healthmon_rtgroup_mask(event->grpmask)); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"group\": %u,\n", + event->group); +} + +/* Render perag sickness mask as a string set */ +static int +xfs_healthmon_format_ag( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_AG_GEOM_SICK_SB, "super" }, + { XFS_AG_GEOM_SICK_AGF, "agf" }, + { XFS_AG_GEOM_SICK_AGFL, "agfl" }, + { XFS_AG_GEOM_SICK_AGI, "agi" }, + { XFS_AG_GEOM_SICK_BNOBT, "bnobt" }, + { XFS_AG_GEOM_SICK_CNTBT, "cntbt" }, + { XFS_AG_GEOM_SICK_INOBT, "inobt" }, + { XFS_AG_GEOM_SICK_FINOBT, "finobt" }, + { XFS_AG_GEOM_SICK_RMAPBT, "rmapbt" }, + { XFS_AG_GEOM_SICK_REFCNTBT, "refcountbt" }, + { XFS_AG_GEOM_SICK_INODES, "inodes" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + xfs_healthmon_perag_mask(event->grpmask)); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"group\": %u,\n", + event->group); +} + +/* Render inode sickness mask as a string set */ +static int +xfs_healthmon_format_inode( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_BS_SICK_INODE, "core" }, + { XFS_BS_SICK_BMBTD, "bmapbtd" }, + { XFS_BS_SICK_BMBTA, "bmapbta" }, + { XFS_BS_SICK_BMBTC, "bmapbtc" }, + { XFS_BS_SICK_DIR, "directory" }, + { XFS_BS_SICK_XATTR, "xattr" }, + { XFS_BS_SICK_SYMLINK, "symlink" }, + { XFS_BS_SICK_PARENT, "parent" }, + { XFS_BS_SICK_DIRTREE, "dirtree" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + xfs_healthmon_inode_mask(event->imask)); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"inumber\": %llu,\n", + event->ino); + if (ret < 0) + return ret; + return seq_buf_printf(outbuf, " \"generation\": %u,\n", + event->gen); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -433,6 +761,18 @@ xfs_healthmon_format_json( break; } break; + case XFS_HEALTHMON_FS: + ret = xfs_healthmon_format_fs(outbuf, event); + break; + case XFS_HEALTHMON_RTGROUP: + ret = xfs_healthmon_format_rtgroup(outbuf, event); + break; + case XFS_HEALTHMON_AG: + ret = xfs_healthmon_format_ag(outbuf, event); + break; + case XFS_HEALTHMON_INODE: + ret = xfs_healthmon_format_inode(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -461,11 +801,19 @@ xfs_healthmon_format_json( static const unsigned int domain_map[] = { [XFS_HEALTHMON_MOUNT] = XFS_HEALTH_MONITOR_DOMAIN_MOUNT, + [XFS_HEALTHMON_FS] = XFS_HEALTH_MONITOR_DOMAIN_FS, + [XFS_HEALTHMON_AG] = XFS_HEALTH_MONITOR_DOMAIN_AG, + [XFS_HEALTHMON_INODE] = XFS_HEALTH_MONITOR_DOMAIN_INODE, + [XFS_HEALTHMON_RTGROUP] = XFS_HEALTH_MONITOR_DOMAIN_RTGROUP, }; static const unsigned int type_map[] = { [XFS_HEALTHMON_RUNNING] = XFS_HEALTH_MONITOR_TYPE_RUNNING, [XFS_HEALTHMON_LOST] = XFS_HEALTH_MONITOR_TYPE_LOST, + [XFS_HEALTHMON_SICK] = XFS_HEALTH_MONITOR_TYPE_SICK, + [XFS_HEALTHMON_CORRUPT] = XFS_HEALTH_MONITOR_TYPE_CORRUPT, + [XFS_HEALTHMON_HEALTHY] = XFS_HEALTH_MONITOR_TYPE_HEALTHY, + [XFS_HEALTHMON_UNMOUNT] = XFS_HEALTH_MONITOR_TYPE_UNMOUNT, }; /* Render event as a C structure */ @@ -501,6 +849,22 @@ xfs_healthmon_format_cstruct( break; } break; + case XFS_HEALTHMON_FS: + hme.e.fs.mask = xfs_healthmon_fs_mask(event->fsmask); + break; + case XFS_HEALTHMON_RTGROUP: + hme.e.group.mask = xfs_healthmon_rtgroup_mask(event->grpmask); + hme.e.group.gno = event->group; + break; + case XFS_HEALTHMON_AG: + hme.e.group.mask = xfs_healthmon_perag_mask(event->grpmask); + hme.e.group.gno = event->group; + break; + case XFS_HEALTHMON_INODE: + hme.e.inode.mask = xfs_healthmon_inode_mask(event->imask); + hme.e.inode.ino = event->ino; + hme.e.inode.gen = event->gen; + break; default: break; } @@ -541,7 +905,7 @@ static inline bool xfs_healthmon_has_eventdata( struct xfs_healthmon *hm) { - return hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; + return !hm->mp || hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; } /* Try to copy the rest of the outbuf to the iov iter. */ @@ -584,10 +948,16 @@ xfs_healthmon_peek( struct xfs_healthmon_event *event; mutex_lock(&hm->lock); + event = hm->first_event; if (hm->mp) - event = hm->first_event; - else - event = NULL; + goto done; + + /* If the filesystem is unmounted, only return the unmount event */ + if (event && event->type == XFS_HEALTHMON_UNMOUNT) + goto done; + event = NULL; + +done: mutex_unlock(&hm->lock); return event; } @@ -722,6 +1092,58 @@ xfs_healthmon_free_events( hm->first_event = hm->last_event = NULL; } +/* + * Detach all filesystem hooks that were set up for a health monitor. Only + * call this from iterate_super*. + */ +STATIC void +xfs_healthmon_detach_hooks( + struct super_block *sb, + void *arg) +{ + struct xfs_healthmon *hm = arg; + + mutex_lock(&hm->lock); + + /* + * Because health monitors have a weak reference to the filesystem + * they're monitoring, the hook deletions below must not race against + * that filesystem being unmounted because that could lead to UAF + * errors. + * + * If hm->mp is NULL, the health unmount hook already ran and the hook + * chain head (contained within the xfs_mount structure) is gone. Do + * not detach any hooks; just let them get freed when the healthmon + * object is torn down. + */ + if (!hm->mp) + goto out_unlock; + + /* + * Otherwise, the caller gave us a non-dying @sb with s_umount held in + * shared mode, which means that @sb cannot be running through + * deactivate_locked_super and cannot be freed. It's safe to compare + * @sb against the super that we snapshotted when we set up the health + * monitor. + */ + if (hm->mp->m_super != sb) + goto out_unlock; + + mutex_unlock(&hm->lock); + + /* + * Now we know that the filesystem @hm->mp is active and cannot be + * deactivated until this function returns. Unmount events are sent + * through the health monitoring subsystem from xfs_fs_put_super, so + * it is now time to detach the hooks. + */ + xfs_health_hook_del(hm->mp, &hm->hhook); + return; + +out_unlock: + mutex_unlock(&hm->lock); +} + /* Free the health monitoring information. */ STATIC int xfs_healthmon_release( @@ -734,6 +1156,9 @@ xfs_healthmon_release( wake_up_all(&hm->wait); + iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_health_hook_disable(); + mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); if (hm->outbuf.size) @@ -783,11 +1208,18 @@ xfs_healthmon_show_fdinfo( struct xfs_healthmon *hm = file->private_data; mutex_lock(&hm->lock); + if (!hm->mp) { + seq_printf(m, "state:\tdead\n"); + goto out_unlock; + } + seq_printf(m, "state:\talive\ndev:\t%s\nformat:\t%s\nevents:\t%llu\nlost:\t%llu\n", hm->mp->m_super->s_id, xfs_healthmon_format_string(hm), hm->total_events, hm->total_lost); + +out_unlock: mutex_unlock(&hm->lock); } #endif @@ -832,6 +1264,13 @@ xfs_ioc_health_monitor( hm->mp = mp; hm->format = hmo.format; + /* + * Since we already got a ref to the module, take a reference to the + * fstype to make it easier to detach the hooks when we tear things + * down later. + */ + hm->fstyp = mp->m_super->s_type; + seq_buf_init(&hm->outbuf, NULL, 0); mutex_init(&hm->lock); init_waitqueue_head(&hm->wait); @@ -839,12 +1278,21 @@ xfs_ioc_health_monitor( if (hmo.flags & XFS_HEALTH_MONITOR_VERBOSE) hm->verbose = true; + /* Enable hooks to receive events, generally. */ + xfs_health_hook_enable(); + + /* Attach specific event hooks to this monitor. */ + xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); + ret = xfs_health_hook_add(mp, &hm->hhook); + if (ret) + goto out_hooks; + /* Queue up the first event that lets the client know we're running. */ event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, XFS_HEALTHMON_MOUNT); if (!event) { ret = -ENOMEM; - goto out_mutex; + goto out_healthhook; } __xfs_healthmon_push(hm, event); @@ -856,14 +1304,17 @@ xfs_ioc_health_monitor( O_CLOEXEC | O_RDONLY); if (fd < 0) { ret = fd; - goto out_mutex; + goto out_healthhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; -out_mutex: +out_healthhook: + xfs_health_hook_del(mp, &hm->hhook); +out_hooks: + xfs_health_hook_disable(); mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); kfree(hm); ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 13/19] xfs: report shutdown events through healthmon 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (11 preceding siblings ...) 2025-10-23 0:03 ` [PATCH 12/19] xfs: report metadata health events through healthmon Darrick J. Wong @ 2025-10-23 0:04 ` Darrick J. Wong 2025-10-23 0:04 ` [PATCH 14/19] xfs: report media errors " Darrick J. Wong ` (5 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:04 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a shutdown hook so that we can send notifications to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 18 +++++ fs/xfs/xfs_healthmon.h | 5 + fs/xfs/xfs_trace.h | 28 +++++++ fs/xfs/libxfs/xfs_healthmon.schema.json | 62 ++++++++++++++++ fs/xfs/xfs_healthmon.c | 119 ++++++++++++++++++++++++++++++- 5 files changed, 229 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 358abe98776d69..918362a7294f27 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1028,6 +1028,9 @@ struct xfs_rtgroup_geometry { /* filesystem was unmounted */ #define XFS_HEALTH_MONITOR_TYPE_UNMOUNT (5) +/* filesystem shutdown */ +#define XFS_HEALTH_MONITOR_TYPE_SHUTDOWN (6) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1054,6 +1057,20 @@ struct xfs_health_monitor_inode { __u64 ino; }; +/* shutdown reasons */ +#define XFS_HEALTH_SHUTDOWN_META_IO_ERROR (1u << 0) +#define XFS_HEALTH_SHUTDOWN_LOG_IO_ERROR (1u << 1) +#define XFS_HEALTH_SHUTDOWN_FORCE_UMOUNT (1u << 2) +#define XFS_HEALTH_SHUTDOWN_CORRUPT_INCORE (1u << 3) +#define XFS_HEALTH_SHUTDOWN_CORRUPT_ONDISK (1u << 4) +#define XFS_HEALTH_SHUTDOWN_DEVICE_REMOVED (1u << 5) + +/* shutdown */ +struct xfs_health_monitor_shutdown { + /* XFS_HEALTH_SHUTDOWN_* flags */ + __u32 reasons; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1074,6 +1091,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_fs fs; struct xfs_health_monitor_group group; struct xfs_health_monitor_inode inode; + struct xfs_health_monitor_shutdown shutdown; } e; /* zeroes */ diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 3f3ba16d5af56a..a82a684bbc0e03 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -11,6 +11,9 @@ enum xfs_healthmon_type { XFS_HEALTHMON_LOST, /* message lost */ XFS_HEALTHMON_UNMOUNT, /* filesystem is unmounting */ + /* filesystem shutdown */ + XFS_HEALTHMON_SHUTDOWN, + /* metadata health events */ XFS_HEALTHMON_SICK, /* runtime corruption observed */ XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ @@ -41,7 +44,7 @@ struct xfs_healthmon_event { struct { uint64_t lostcount; }; - /* mount */ + /* shutdown */ struct { unsigned int flags; }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index df09c225e13c2e..e39138293c2782 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6010,8 +6010,32 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_finish); DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); +TRACE_EVENT(xfs_healthmon_shutdown_hook, + TP_PROTO(const struct xfs_mount *mp, uint32_t shutdown_flags, + unsigned int events, unsigned long long lost_prev), + TP_ARGS(mp, shutdown_flags, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(uint32_t, shutdown_flags) + __field(unsigned int, events) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->shutdown_flags = shutdown_flags; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d shutdown_flags %s events %u lost_prev? %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_flags(__entry->shutdown_flags, "|", XFS_SHUTDOWN_STRINGS), + __entry->events, + __entry->lost_prev) +); + #define XFS_HEALTHMON_TYPE_STRINGS \ { XFS_HEALTHMON_LOST, "lost" }, \ + { XFS_HEALTHMON_SHUTDOWN, "shutdown" }, \ { XFS_HEALTHMON_UNMOUNT, "unmount" }, \ { XFS_HEALTHMON_SICK, "sick" }, \ { XFS_HEALTHMON_CORRUPT, "corrupt" }, \ @@ -6025,6 +6049,7 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); { XFS_HEALTHMON_RTGROUP, "rtgroup" } TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_SHUTDOWN); TRACE_DEFINE_ENUM(XFS_HEALTHMON_UNMOUNT); TRACE_DEFINE_ENUM(XFS_HEALTHMON_SICK); TRACE_DEFINE_ENUM(XFS_HEALTHMON_CORRUPT); @@ -6065,6 +6090,9 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, switch (__entry->domain) { case XFS_HEALTHMON_MOUNT: switch (__entry->type) { + case XFS_HEALTHMON_SHUTDOWN: + __entry->mask = event->flags; + break; case XFS_HEALTHMON_LOST: __entry->lostcount = event->lostcount; break; diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index dd78f1b71d587b..1657ccc482edff 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -36,6 +36,9 @@ }, { "$ref": "#/$events/inode_metadata" + }, + { + "$ref": "#/$events/shutdown" } ], @@ -204,6 +207,31 @@ } }, + "$comment": "Shutdown event data are defined here.", + "$shutdown": { + "reason": { + "description": [ + "Reason for a filesystem to shut down.", + "Options include:", + "", + " * corrupt_incore: in-memory corruption", + " * corrupt_ondisk: on-disk corruption", + " * device_removed: device removed", + " * force_umount: userspace asked for it", + " * log_ioerr: log write IO error", + " * meta_ioerr: metadata writeback IO error" + ], + "enum": [ + "corrupt_incore", + "corrupt_ondisk", + "device_removed", + "force_umount", + "log_ioerr", + "meta_ioerr" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "running": { @@ -439,6 +467,40 @@ "generation", "structures" ] + }, + "shutdown": { + "title": "Abnormal Shutdown Event", + "description": [ + "The filesystem went offline due to", + "unrecoverable errors." + ], + "type": "object", + + "properties": { + "type": { + "const": "shutdown" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + }, + "reasons": { + "type": "array", + "items": { + "$ref": "#/$shutdown/reason" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "reasons" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 05c67fe40f2bac..76de516708e8f9 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -20,6 +20,7 @@ #include "xfs_rtgroup.h" #include "xfs_health.h" #include "xfs_healthmon.h" +#include "xfs_fsops.h" #include <linux/anon_inodes.h> #include <linux/eventpoll.h> @@ -67,6 +68,7 @@ struct xfs_healthmon { struct xfs_healthmon_event *last_event; /* live update hooks */ + struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; /* filesystem mount, or NULL if we've unmounted */ @@ -482,6 +484,43 @@ xfs_healthmon_metadata_hook( goto out_unlock; } +/* Add a shutdown event to the reporting queue. */ +STATIC int +xfs_healthmon_shutdown_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + int error; + + hm = container_of(nb, struct xfs_healthmon, shook.shutdown_hook.nb); + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_shutdown_hook(hm->mp, action, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_SHUTDOWN, + XFS_HEALTHMON_MOUNT); + if (!event) + goto out_unlock; + + event->flags = action; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -490,6 +529,7 @@ xfs_healthmon_typestring( static const char *type_strings[] = { [XFS_HEALTHMON_RUNNING] = "running", [XFS_HEALTHMON_LOST] = "lost", + [XFS_HEALTHMON_SHUTDOWN] = "shutdown", [XFS_HEALTHMON_UNMOUNT] = "unmount", [XFS_HEALTHMON_SICK] = "sick", [XFS_HEALTHMON_CORRUPT] = "corrupt", @@ -700,6 +740,25 @@ xfs_healthmon_format_inode( event->gen); } +/* Render shutdown mask as a string set */ +static int +xfs_healthmon_format_shutdown( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { SHUTDOWN_META_IO_ERROR, "meta_ioerr" }, + { SHUTDOWN_LOG_IO_ERROR, "log_ioerr" }, + { SHUTDOWN_FORCE_UMOUNT, "force_umount" }, + { SHUTDOWN_CORRUPT_INCORE, "corrupt_incore" }, + { SHUTDOWN_CORRUPT_ONDISK, "corrupt_ondisk" }, + { SHUTDOWN_DEVICE_REMOVED, "device_removed" }, + }; + + return xfs_healthmon_format_mask(outbuf, "reasons", mask_strings, + event->flags); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -757,6 +816,9 @@ xfs_healthmon_format_json( case XFS_HEALTHMON_LOST: ret = xfs_healthmon_format_lost(outbuf, event); break; + case XFS_HEALTHMON_SHUTDOWN: + ret = xfs_healthmon_format_shutdown(outbuf, event); + break; default: break; } @@ -799,6 +861,44 @@ xfs_healthmon_format_json( return -1; } +struct flags_map { + unsigned int in_mask; + unsigned int out_mask; +}; + +static const struct flags_map shutdown_map[] = { + { SHUTDOWN_META_IO_ERROR, XFS_HEALTH_SHUTDOWN_META_IO_ERROR }, + { SHUTDOWN_LOG_IO_ERROR, XFS_HEALTH_SHUTDOWN_LOG_IO_ERROR }, + { SHUTDOWN_FORCE_UMOUNT, XFS_HEALTH_SHUTDOWN_FORCE_UMOUNT }, + { SHUTDOWN_CORRUPT_INCORE, XFS_HEALTH_SHUTDOWN_CORRUPT_INCORE }, + { SHUTDOWN_CORRUPT_ONDISK, XFS_HEALTH_SHUTDOWN_CORRUPT_ONDISK }, + { SHUTDOWN_DEVICE_REMOVED, XFS_HEALTH_SHUTDOWN_DEVICE_REMOVED }, +}; + +static inline unsigned int +__map_flags( + const struct flags_map *map, + size_t array_len, + unsigned int flags) +{ + const struct flags_map *m; + unsigned int ret = 0; + + for (m = map; m < map + array_len; m++) { + if (flags & m->in_mask) + ret |= m->out_mask; + } + + return ret; +} + +#define map_flags(map, flags) __map_flags((map), ARRAY_SIZE(map), (flags)) + +static inline unsigned int shutdown_mask(unsigned int in) +{ + return map_flags(shutdown_map, in); +} + static const unsigned int domain_map[] = { [XFS_HEALTHMON_MOUNT] = XFS_HEALTH_MONITOR_DOMAIN_MOUNT, [XFS_HEALTHMON_FS] = XFS_HEALTH_MONITOR_DOMAIN_FS, @@ -814,6 +914,7 @@ static const unsigned int type_map[] = { [XFS_HEALTHMON_CORRUPT] = XFS_HEALTH_MONITOR_TYPE_CORRUPT, [XFS_HEALTHMON_HEALTHY] = XFS_HEALTH_MONITOR_TYPE_HEALTHY, [XFS_HEALTHMON_UNMOUNT] = XFS_HEALTH_MONITOR_TYPE_UNMOUNT, + [XFS_HEALTHMON_SHUTDOWN] = XFS_HEALTH_MONITOR_TYPE_SHUTDOWN, }; /* Render event as a C structure */ @@ -845,6 +946,9 @@ xfs_healthmon_format_cstruct( case XFS_HEALTHMON_LOST: hme.e.lost.count = event->lostcount; break; + case XFS_HEALTHMON_SHUTDOWN: + hme.e.shutdown.reasons = shutdown_mask(event->flags); + break; default: break; } @@ -1137,6 +1241,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); return; @@ -1157,6 +1262,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_shutdown_hook_disable(); xfs_health_hook_disable(); mutex_destroy(&hm->lock); @@ -1280,6 +1386,7 @@ xfs_ioc_health_monitor( /* Enable hooks to receive events, generally. */ xfs_health_hook_enable(); + xfs_shutdown_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1287,12 +1394,17 @@ xfs_ioc_health_monitor( if (ret) goto out_hooks; + xfs_shutdown_hook_setup(&hm->shook, xfs_healthmon_shutdown_hook); + ret = xfs_shutdown_hook_add(mp, &hm->shook); + if (ret) + goto out_healthhook; + /* Queue up the first event that lets the client know we're running. */ event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, XFS_HEALTHMON_MOUNT); if (!event) { ret = -ENOMEM; - goto out_healthhook; + goto out_shutdownhook; } __xfs_healthmon_push(hm, event); @@ -1304,17 +1416,20 @@ xfs_ioc_health_monitor( O_CLOEXEC | O_RDONLY); if (fd < 0) { ret = fd; - goto out_healthhook; + goto out_shutdownhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_shutdownhook: + xfs_shutdown_hook_del(mp, &hm->shook); out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: xfs_health_hook_disable(); + xfs_shutdown_hook_disable(); mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); kfree(hm); ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 14/19] xfs: report media errors through healthmon 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (12 preceding siblings ...) 2025-10-23 0:04 ` [PATCH 13/19] xfs: report shutdown " Darrick J. Wong @ 2025-10-23 0:04 ` Darrick J. Wong 2025-10-23 0:04 ` [PATCH 15/19] xfs: report file io " Darrick J. Wong ` (4 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:04 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we have hooks to report media errors, connect this to the health monitor as well. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 15 ++++ fs/xfs/xfs_healthmon.h | 12 ++++ fs/xfs/xfs_trace.h | 57 +++++++++++++++++ fs/xfs/libxfs/xfs_healthmon.schema.json | 65 +++++++++++++++++++ fs/xfs/xfs_healthmon.c | 106 ++++++++++++++++++++++++++++++- fs/xfs/xfs_trace.c | 1 6 files changed, 254 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 918362a7294f27..a551b1d5d0db58 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1014,6 +1014,11 @@ struct xfs_rtgroup_geometry { #define XFS_HEALTH_MONITOR_DOMAIN_INODE (3) #define XFS_HEALTH_MONITOR_DOMAIN_RTGROUP (4) +/* disk events */ +#define XFS_HEALTH_MONITOR_DOMAIN_DATADEV (5) +#define XFS_HEALTH_MONITOR_DOMAIN_RTDEV (6) +#define XFS_HEALTH_MONITOR_DOMAIN_LOGDEV (7) + /* Health monitor event types */ /* status of the monitor itself */ @@ -1031,6 +1036,9 @@ struct xfs_rtgroup_geometry { /* filesystem shutdown */ #define XFS_HEALTH_MONITOR_TYPE_SHUTDOWN (6) +/* media errors */ +#define XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR (7) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1071,6 +1079,12 @@ struct xfs_health_monitor_shutdown { __u32 reasons; }; +/* disk media errors */ +struct xfs_health_monitor_media { + __u64 daddr; + __u64 bbcount; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1092,6 +1106,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_group group; struct xfs_health_monitor_inode inode; struct xfs_health_monitor_shutdown shutdown; + struct xfs_health_monitor_media media; } e; /* zeroes */ diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index a82a684bbc0e03..407c5e1f466726 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -19,6 +19,8 @@ enum xfs_healthmon_type { XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ XFS_HEALTHMON_HEALTHY, /* fsck reported healthy structure */ + /* media errors */ + XFS_HEALTHMON_MEDIA_ERROR, }; enum xfs_healthmon_domain { @@ -29,6 +31,11 @@ enum xfs_healthmon_domain { XFS_HEALTHMON_AG, /* allocation group metadata */ XFS_HEALTHMON_INODE, /* inode metadata */ XFS_HEALTHMON_RTGROUP, /* realtime group metadata */ + + /* media errors */ + XFS_HEALTHMON_DATADEV, + XFS_HEALTHMON_RTDEV, + XFS_HEALTHMON_LOGDEV, }; struct xfs_healthmon_event { @@ -66,6 +73,11 @@ struct xfs_healthmon_event { uint32_t gen; xfs_ino_t ino; }; + /* media errors */ + struct { + xfs_daddr_t daddr; + uint64_t bbcount; + }; }; }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index e39138293c2782..11d70e3792493a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -105,6 +105,7 @@ struct xfs_rtgroup; struct xfs_open_zone; struct xfs_healthmon_event; struct xfs_health_update_params; +struct xfs_media_error_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -6111,6 +6112,12 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, __entry->ino = event->ino; __entry->gen = event->gen; break; + case XFS_HEALTHMON_DATADEV: + case XFS_HEALTHMON_LOGDEV: + case XFS_HEALTHMON_RTDEV: + __entry->offset = event->daddr; + __entry->length = event->bbcount; + break; } ), TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x offset 0x%llx len 0x%llx group 0x%x lost %llu", @@ -6199,6 +6206,56 @@ TRACE_EVENT(xfs_healthmon_metadata_hook, __entry->events, __entry->lost_prev) ); + +#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +TRACE_EVENT(xfs_healthmon_media_error_hook, + TP_PROTO(const struct xfs_media_error_params *p, + unsigned int events, unsigned long long lost_prev), + TP_ARGS(p, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, error_dev) + __field(uint64_t, daddr) + __field(uint64_t, bbcount) + __field(int, pre_remove) + __field(unsigned int, events) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + struct xfs_mount *mp = p->mp; + struct xfs_buftarg *btp = NULL; + + switch (p->fdev) { + case XFS_FAILED_DATADEV: + btp = mp->m_ddev_targp; + break; + case XFS_FAILED_LOGDEV: + btp = mp->m_logdev_targp; + break; + case XFS_FAILED_RTDEV: + btp = mp->m_rtdev_targp; + break; + } + + __entry->dev = mp->m_super->s_dev; + if (btp) + __entry->error_dev = btp->bt_dev; + __entry->daddr = p->daddr; + __entry->bbcount = p->bbcount; + __entry->pre_remove = p->pre_remove; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d error_dev %d:%d daddr 0x%llx bbcount 0x%llx pre_remove? %d events %u lost_prev? %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->error_dev), MINOR(__entry->error_dev), + __entry->daddr, + __entry->bbcount, + __entry->pre_remove, + __entry->events, + __entry->lost_prev) +); +#endif #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index 1657ccc482edff..d3b537a040cb83 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -39,6 +39,9 @@ }, { "$ref": "#/$events/shutdown" + }, + { + "$ref": "#/$events/media_error" } ], @@ -75,6 +78,31 @@ "i_generation": { "description": "Inode generation number", "type": "integer" + }, + "storage_devs": { + "description": "Storage devices in a filesystem", + "_comment": [ + "One of:", + "", + " * datadev: filesystem device", + " * logdev: external log device", + " * rtdev: realtime volume" + ], + "enum": [ + "datadev", + "logdev", + "rtdev" + ] + }, + "xfs_daddr_t": { + "description": "Storage device address, in units of 512-byte blocks", + "type": "integer", + "minimum": 0 + }, + "bbcount": { + "description": "Storage space length, in units of 512-byte blocks", + "type": "integer", + "minimum": 1 } }, @@ -501,6 +529,43 @@ "domain", "reasons" ] + }, + "media_error": { + "title": "Media Error", + "description": [ + "A storage device reported a media error.", + "The domain element tells us which storage", + "device reported the media failure. The", + "daddr and bbcount elements tell us where", + "inside that device the failure was observed." + ], + "type": "object", + + "properties": { + "type": { + "const": "media" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "$ref": "#/$defs/storage_devs" + }, + "daddr": { + "$ref": "#/$defs/xfs_daddr_t" + }, + "bbcount": { + "$ref": "#/$defs/bbcount" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "daddr", + "bbcount" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 76de516708e8f9..52b8be8eb7a11b 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -21,6 +21,7 @@ #include "xfs_health.h" #include "xfs_healthmon.h" #include "xfs_fsops.h" +#include "xfs_notify_failure.h" #include <linux/anon_inodes.h> #include <linux/eventpoll.h> @@ -70,6 +71,7 @@ struct xfs_healthmon { /* live update hooks */ struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; + struct xfs_media_error_hook mhook; /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; @@ -521,6 +523,59 @@ xfs_healthmon_shutdown_hook( return NOTIFY_DONE; } +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +/* Add a media error event to the reporting queue. */ +STATIC int +xfs_healthmon_media_error_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + struct xfs_media_error_params *p = data; + enum xfs_healthmon_domain domain = 0; /* shut up gcc */ + int error; + + hm = container_of(nb, struct xfs_healthmon, mhook.error_hook.nb); + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_media_error_hook(p, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + switch (p->fdev) { + case XFS_FAILED_LOGDEV: + domain = XFS_HEALTHMON_LOGDEV; + break; + case XFS_FAILED_RTDEV: + domain = XFS_HEALTHMON_RTDEV; + break; + case XFS_FAILED_DATADEV: + domain = XFS_HEALTHMON_DATADEV; + break; + } + + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_MEDIA_ERROR, domain); + if (!event) + goto out_unlock; + + event->daddr = p->daddr; + event->bbcount = p->bbcount; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} +#endif + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -534,6 +589,7 @@ xfs_healthmon_typestring( [XFS_HEALTHMON_SICK] = "sick", [XFS_HEALTHMON_CORRUPT] = "corrupt", [XFS_HEALTHMON_HEALTHY] = "healthy", + [XFS_HEALTHMON_MEDIA_ERROR] = "media", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -553,6 +609,9 @@ xfs_healthmon_domstring( [XFS_HEALTHMON_AG] = "perag", [XFS_HEALTHMON_INODE] = "inode", [XFS_HEALTHMON_RTGROUP] = "rtgroup", + [XFS_HEALTHMON_DATADEV] = "datadev", + [XFS_HEALTHMON_LOGDEV] = "logdev", + [XFS_HEALTHMON_RTDEV] = "rtdev", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -759,6 +818,23 @@ xfs_healthmon_format_shutdown( event->flags); } +/* Render media error as a string set */ +static int +xfs_healthmon_format_media_error( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"daddr\": %llu,\n", + event->daddr); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"bbcount\": %llu,\n", + event->bbcount); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -835,6 +911,11 @@ xfs_healthmon_format_json( case XFS_HEALTHMON_INODE: ret = xfs_healthmon_format_inode(outbuf, event); break; + case XFS_HEALTHMON_DATADEV: + case XFS_HEALTHMON_LOGDEV: + case XFS_HEALTHMON_RTDEV: + ret = xfs_healthmon_format_media_error(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -905,6 +986,9 @@ static const unsigned int domain_map[] = { [XFS_HEALTHMON_AG] = XFS_HEALTH_MONITOR_DOMAIN_AG, [XFS_HEALTHMON_INODE] = XFS_HEALTH_MONITOR_DOMAIN_INODE, [XFS_HEALTHMON_RTGROUP] = XFS_HEALTH_MONITOR_DOMAIN_RTGROUP, + [XFS_HEALTHMON_DATADEV] = XFS_HEALTH_MONITOR_DOMAIN_DATADEV, + [XFS_HEALTHMON_RTDEV] = XFS_HEALTH_MONITOR_DOMAIN_RTDEV, + [XFS_HEALTHMON_LOGDEV] = XFS_HEALTH_MONITOR_DOMAIN_LOGDEV, }; static const unsigned int type_map[] = { @@ -915,6 +999,7 @@ static const unsigned int type_map[] = { [XFS_HEALTHMON_HEALTHY] = XFS_HEALTH_MONITOR_TYPE_HEALTHY, [XFS_HEALTHMON_UNMOUNT] = XFS_HEALTH_MONITOR_TYPE_UNMOUNT, [XFS_HEALTHMON_SHUTDOWN] = XFS_HEALTH_MONITOR_TYPE_SHUTDOWN, + [XFS_HEALTHMON_MEDIA_ERROR] = XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR, }; /* Render event as a C structure */ @@ -969,6 +1054,12 @@ xfs_healthmon_format_cstruct( hme.e.inode.ino = event->ino; hme.e.inode.gen = event->gen; break; + case XFS_HEALTHMON_DATADEV: + case XFS_HEALTHMON_LOGDEV: + case XFS_HEALTHMON_RTDEV: + hme.e.media.daddr = event->daddr; + hme.e.media.bbcount = event->bbcount; + break; default: break; } @@ -1241,6 +1332,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_media_error_hook_del(hm->mp, &hm->mhook); xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); return; @@ -1262,6 +1354,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_media_error_hook_disable(); xfs_shutdown_hook_disable(); xfs_health_hook_disable(); @@ -1387,6 +1480,7 @@ xfs_ioc_health_monitor( /* Enable hooks to receive events, generally. */ xfs_health_hook_enable(); xfs_shutdown_hook_enable(); + xfs_media_error_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1399,12 +1493,17 @@ xfs_ioc_health_monitor( if (ret) goto out_healthhook; + xfs_media_error_hook_setup(&hm->mhook, xfs_healthmon_media_error_hook); + ret = xfs_media_error_hook_add(mp, &hm->mhook); + if (ret) + goto out_shutdownhook; + /* Queue up the first event that lets the client know we're running. */ event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, XFS_HEALTHMON_MOUNT); if (!event) { ret = -ENOMEM; - goto out_shutdownhook; + goto out_mediahook; } __xfs_healthmon_push(hm, event); @@ -1416,18 +1515,21 @@ xfs_ioc_health_monitor( O_CLOEXEC | O_RDONLY); if (fd < 0) { ret = fd; - goto out_shutdownhook; + goto out_mediahook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_mediahook: + xfs_media_error_hook_del(mp, &hm->mhook); out_shutdownhook: xfs_shutdown_hook_del(mp, &hm->shook); out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: + xfs_media_error_hook_disable(); xfs_health_hook_disable(); xfs_shutdown_hook_disable(); mutex_destroy(&hm->lock); diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index d42b864a3837a2..08ddab700a6cd3 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -53,6 +53,7 @@ #include "xfs_zone_priv.h" #include "xfs_health.h" #include "xfs_healthmon.h" +#include "xfs_notify_failure.h" /* * We include this last to have the helpers above available for the trace ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 15/19] xfs: report file io errors through healthmon 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (13 preceding siblings ...) 2025-10-23 0:04 ` [PATCH 14/19] xfs: report media errors " Darrick J. Wong @ 2025-10-23 0:04 ` Darrick J. Wong 2025-10-23 0:04 ` [PATCH 16/19] xfs: allow reconfiguration of the health monitoring device Darrick J. Wong ` (3 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:04 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a file io error event hook so that we can send events about read errors, writeback errors, and directio errors to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 18 ++++ fs/xfs/xfs_healthmon.h | 16 ++++ fs/xfs/xfs_trace.h | 56 +++++++++++++ fs/xfs/libxfs/xfs_healthmon.schema.json | 77 ++++++++++++++++++ fs/xfs/xfs_healthmon.c | 131 +++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.c | 1 6 files changed, 297 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index a551b1d5d0db58..87e915baa875d6 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1019,6 +1019,9 @@ struct xfs_rtgroup_geometry { #define XFS_HEALTH_MONITOR_DOMAIN_RTDEV (6) #define XFS_HEALTH_MONITOR_DOMAIN_LOGDEV (7) +/* file range events */ +#define XFS_HEALTH_MONITOR_DOMAIN_FILERANGE (8) + /* Health monitor event types */ /* status of the monitor itself */ @@ -1039,6 +1042,12 @@ struct xfs_rtgroup_geometry { /* media errors */ #define XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR (7) +/* file range events */ +#define XFS_HEALTH_MONITOR_TYPE_BUFREAD (8) +#define XFS_HEALTH_MONITOR_TYPE_BUFWRITE (9) +#define XFS_HEALTH_MONITOR_TYPE_DIOREAD (10) +#define XFS_HEALTH_MONITOR_TYPE_DIOWRITE (11) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1079,6 +1088,14 @@ struct xfs_health_monitor_shutdown { __u32 reasons; }; +/* file range events */ +struct xfs_health_monitor_filerange { + __u64 pos; + __u64 len; + __u64 ino; + __u32 gen; +}; + /* disk media errors */ struct xfs_health_monitor_media { __u64 daddr; @@ -1107,6 +1124,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_inode inode; struct xfs_health_monitor_shutdown shutdown; struct xfs_health_monitor_media media; + struct xfs_health_monitor_filerange filerange; } e; /* zeroes */ diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 407c5e1f466726..421b46f97df482 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -21,6 +21,12 @@ enum xfs_healthmon_type { /* media errors */ XFS_HEALTHMON_MEDIA_ERROR, + + /* file range events */ + XFS_HEALTHMON_BUFREAD, + XFS_HEALTHMON_BUFWRITE, + XFS_HEALTHMON_DIOREAD, + XFS_HEALTHMON_DIOWRITE, }; enum xfs_healthmon_domain { @@ -36,6 +42,9 @@ enum xfs_healthmon_domain { XFS_HEALTHMON_DATADEV, XFS_HEALTHMON_RTDEV, XFS_HEALTHMON_LOGDEV, + + /* file range events */ + XFS_HEALTHMON_FILERANGE, }; struct xfs_healthmon_event { @@ -78,6 +87,13 @@ struct xfs_healthmon_event { xfs_daddr_t daddr; uint64_t bbcount; }; + /* file range events */ + struct { + xfs_ino_t fino; + loff_t fpos; + uint64_t flen; + uint32_t fgen; + }; }; }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 11d70e3792493a..b23f3c41db1c03 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -106,6 +106,7 @@ struct xfs_open_zone; struct xfs_healthmon_event; struct xfs_health_update_params; struct xfs_media_error_params; +struct xfs_file_ioerror_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -6118,6 +6119,12 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, __entry->offset = event->daddr; __entry->length = event->bbcount; break; + case XFS_HEALTHMON_FILERANGE: + __entry->ino = event->fino; + __entry->gen = event->fgen; + __entry->offset = event->fpos; + __entry->length = event->flen; + break; } ), TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x offset 0x%llx len 0x%llx group 0x%x lost %llu", @@ -6256,6 +6263,55 @@ TRACE_EVENT(xfs_healthmon_media_error_hook, __entry->lost_prev) ); #endif + +#define XFS_FILE_IOERROR_STRINGS \ + { XFS_FILE_IOERROR_BUFFERED_READ, "readahead" }, \ + { XFS_FILE_IOERROR_BUFFERED_WRITE, "writeback" }, \ + { XFS_FILE_IOERROR_DIRECT_READ, "directio_read" }, \ + { XFS_FILE_IOERROR_DIRECT_WRITE, "directio_write" } + +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_BUFFERED_READ); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_BUFFERED_WRITE); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_DIRECT_READ); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_DIRECT_WRITE); + +TRACE_EVENT(xfs_healthmon_file_ioerror_hook, + TP_PROTO(const struct xfs_mount *mp, + unsigned long action, + const struct xfs_file_ioerror_params *p, + unsigned int events, unsigned long long lost_prev), + TP_ARGS(mp, action, p, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, error_dev) + __field(unsigned long, action) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(long long, pos) + __field(unsigned long long, len) + __field(unsigned int, events) + __field(unsigned long long, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->action = action; + __entry->ino = p->ino; + __entry->gen = p->gen; + __entry->pos = p->pos; + __entry->len = p->len; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d ino 0x%llx gen 0x%x op %s pos 0x%llx bytecount 0x%llx events %u lost_prev? %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __entry->gen, + __print_symbolic(__entry->action, XFS_FILE_IOERROR_STRINGS), + __entry->pos, + __entry->len, + __entry->events, + __entry->lost_prev) +); #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index d3b537a040cb83..fb696dfbbfd044 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -42,6 +42,9 @@ }, { "$ref": "#/$events/media_error" + }, + { + "$ref": "#/$events/file_ioerror" } ], @@ -79,6 +82,16 @@ "description": "Inode generation number", "type": "integer" }, + "off_t": { + "description": "File position, in bytes", + "type": "integer", + "minimum": 0 + }, + "size_t": { + "description": "File operation length, in bytes", + "type": "integer", + "minimum": 1 + }, "storage_devs": { "description": "Storage devices in a filesystem", "_comment": [ @@ -260,6 +273,26 @@ } }, + "$comment": "File IO event data are defined here.", + "$fileio": { + "types": { + "description": [ + "File I/O operations. One of:", + "", + " * readahead: reads into the page cache.", + " * writeback: writeback of dirty page cache.", + " * directio_read: O_DIRECT reads.", + " * directio_owrite: O_DIRECT writes." + ], + "enum": [ + "readahead", + "writeback", + "directio_read", + "directio_write" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "running": { @@ -566,6 +599,50 @@ "daddr", "bbcount" ] + }, + "file_ioerror": { + "title": "File I/O error", + "description": [ + "A read or a write to a file failed. The", + "inode, generation, pos, and len fields", + "describe the range of the file that is", + "affected." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$fileio/types" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "filerange" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "pos": { + "$ref": "#/$defs/off_t" + }, + "length": { + "$ref": "#/$defs/size_t" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "pos", + "length" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 52b8be8eb7a11b..74ffb7c4af078c 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -22,6 +22,7 @@ #include "xfs_healthmon.h" #include "xfs_fsops.h" #include "xfs_notify_failure.h" +#include "xfs_file.h" #include <linux/anon_inodes.h> #include <linux/eventpoll.h> @@ -72,6 +73,7 @@ struct xfs_healthmon { struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; struct xfs_media_error_hook mhook; + struct xfs_file_ioerror_hook fhook; /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; @@ -576,6 +578,73 @@ xfs_healthmon_media_error_hook( } #endif +/* Add a file io error event to the reporting queue. */ +STATIC int +xfs_healthmon_file_ioerror_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + struct xfs_file_ioerror_params *p = data; + enum xfs_healthmon_type type = 0; + int error; + + hm = container_of(nb, struct xfs_healthmon, fhook.ioerror_hook.nb); + + switch (action) { + case XFS_FILE_IOERROR_BUFFERED_READ: + case XFS_FILE_IOERROR_BUFFERED_WRITE: + case XFS_FILE_IOERROR_DIRECT_READ: + case XFS_FILE_IOERROR_DIRECT_WRITE: + break; + default: + ASSERT(0); + return NOTIFY_DONE; + } + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_file_ioerror_hook(hm->mp, action, p, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + switch (action) { + case XFS_FILE_IOERROR_BUFFERED_READ: + type = XFS_HEALTHMON_BUFREAD; + break; + case XFS_FILE_IOERROR_BUFFERED_WRITE: + type = XFS_HEALTHMON_BUFWRITE; + break; + case XFS_FILE_IOERROR_DIRECT_READ: + type = XFS_HEALTHMON_DIOREAD; + break; + case XFS_FILE_IOERROR_DIRECT_WRITE: + type = XFS_HEALTHMON_DIOWRITE; + break; + } + + event = xfs_healthmon_alloc(hm, type, XFS_HEALTHMON_FILERANGE); + if (!event) + goto out_unlock; + + event->fino = p->ino; + event->fgen = p->gen; + event->fpos = p->pos; + event->flen = p->len; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -590,6 +659,10 @@ xfs_healthmon_typestring( [XFS_HEALTHMON_CORRUPT] = "corrupt", [XFS_HEALTHMON_HEALTHY] = "healthy", [XFS_HEALTHMON_MEDIA_ERROR] = "media", + [XFS_HEALTHMON_BUFREAD] = "readahead", + [XFS_HEALTHMON_BUFWRITE] = "writeback", + [XFS_HEALTHMON_DIOREAD] = "directio_read", + [XFS_HEALTHMON_DIOWRITE] = "directio_write", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -612,6 +685,7 @@ xfs_healthmon_domstring( [XFS_HEALTHMON_DATADEV] = "datadev", [XFS_HEALTHMON_LOGDEV] = "logdev", [XFS_HEALTHMON_RTDEV] = "rtdev", + [XFS_HEALTHMON_FILERANGE] = "filerange", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -835,6 +909,33 @@ xfs_healthmon_format_media_error( event->bbcount); } +/* Render file range events as a string set */ +static int +xfs_healthmon_format_filerange( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"inumber\": %llu,\n", + event->fino); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"generation\": %u,\n", + event->fgen); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"pos\": %llu,\n", + event->fpos); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"length\": %llu,\n", + event->flen); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -916,6 +1017,9 @@ xfs_healthmon_format_json( case XFS_HEALTHMON_RTDEV: ret = xfs_healthmon_format_media_error(outbuf, event); break; + case XFS_HEALTHMON_FILERANGE: + ret = xfs_healthmon_format_filerange(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -989,6 +1093,7 @@ static const unsigned int domain_map[] = { [XFS_HEALTHMON_DATADEV] = XFS_HEALTH_MONITOR_DOMAIN_DATADEV, [XFS_HEALTHMON_RTDEV] = XFS_HEALTH_MONITOR_DOMAIN_RTDEV, [XFS_HEALTHMON_LOGDEV] = XFS_HEALTH_MONITOR_DOMAIN_LOGDEV, + [XFS_HEALTHMON_FILERANGE] = XFS_HEALTH_MONITOR_DOMAIN_FILERANGE, }; static const unsigned int type_map[] = { @@ -1000,6 +1105,10 @@ static const unsigned int type_map[] = { [XFS_HEALTHMON_UNMOUNT] = XFS_HEALTH_MONITOR_TYPE_UNMOUNT, [XFS_HEALTHMON_SHUTDOWN] = XFS_HEALTH_MONITOR_TYPE_SHUTDOWN, [XFS_HEALTHMON_MEDIA_ERROR] = XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR, + [XFS_HEALTHMON_BUFREAD] = XFS_HEALTH_MONITOR_TYPE_BUFREAD, + [XFS_HEALTHMON_BUFWRITE] = XFS_HEALTH_MONITOR_TYPE_BUFWRITE, + [XFS_HEALTHMON_DIOREAD] = XFS_HEALTH_MONITOR_TYPE_DIOREAD, + [XFS_HEALTHMON_DIOWRITE] = XFS_HEALTH_MONITOR_TYPE_DIOWRITE, }; /* Render event as a C structure */ @@ -1060,6 +1169,12 @@ xfs_healthmon_format_cstruct( hme.e.media.daddr = event->daddr; hme.e.media.bbcount = event->bbcount; break; + case XFS_HEALTHMON_FILERANGE: + hme.e.filerange.ino = event->fino; + hme.e.filerange.gen = event->fgen; + hme.e.filerange.pos = event->fpos; + hme.e.filerange.len = event->flen; + break; default: break; } @@ -1332,6 +1447,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_file_ioerror_hook_del(hm->mp, &hm->fhook); xfs_media_error_hook_del(hm->mp, &hm->mhook); xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); @@ -1354,6 +1470,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_file_ioerror_hook_disable(); xfs_media_error_hook_disable(); xfs_shutdown_hook_disable(); xfs_health_hook_disable(); @@ -1481,6 +1598,7 @@ xfs_ioc_health_monitor( xfs_health_hook_enable(); xfs_shutdown_hook_enable(); xfs_media_error_hook_enable(); + xfs_file_ioerror_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1498,12 +1616,18 @@ xfs_ioc_health_monitor( if (ret) goto out_shutdownhook; + xfs_file_ioerror_hook_setup(&hm->fhook, + xfs_healthmon_file_ioerror_hook); + ret = xfs_file_ioerror_hook_add(mp, &hm->fhook); + if (ret) + goto out_mediahook; + /* Queue up the first event that lets the client know we're running. */ event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_RUNNING, XFS_HEALTHMON_MOUNT); if (!event) { ret = -ENOMEM; - goto out_mediahook; + goto out_ioerrhook; } __xfs_healthmon_push(hm, event); @@ -1515,13 +1639,15 @@ xfs_ioc_health_monitor( O_CLOEXEC | O_RDONLY); if (fd < 0) { ret = fd; - goto out_mediahook; + goto out_ioerrhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_ioerrhook: + xfs_file_ioerror_hook_del(mp, &hm->fhook); out_mediahook: xfs_media_error_hook_del(mp, &hm->mhook); out_shutdownhook: @@ -1529,6 +1655,7 @@ xfs_ioc_health_monitor( out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: + xfs_file_ioerror_hook_disable(); xfs_media_error_hook_disable(); xfs_health_hook_disable(); xfs_shutdown_hook_disable(); diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 08ddab700a6cd3..eb35015c091570 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -54,6 +54,7 @@ #include "xfs_health.h" #include "xfs_healthmon.h" #include "xfs_notify_failure.h" +#include "xfs_file.h" /* * We include this last to have the helpers above available for the trace ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 16/19] xfs: allow reconfiguration of the health monitoring device 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (14 preceding siblings ...) 2025-10-23 0:04 ` [PATCH 15/19] xfs: report file io " Darrick J. Wong @ 2025-10-23 0:04 ` Darrick J. Wong 2025-10-23 0:05 ` [PATCH 17/19] xfs: validate fds against running healthmon Darrick J. Wong ` (2 subsequent siblings) 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:04 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that we can reconfigure the health monitoring device by calling the XFS_IOC_HEALTH_MONITOR ioctl on it. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_healthmon.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 74ffb7c4af078c..ce84cd90df2379 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -23,6 +23,8 @@ #include "xfs_fsops.h" #include "xfs_notify_failure.h" #include "xfs_file.h" +#include "xfs_fs.h" +#include "xfs_ioctl.h" #include <linux/anon_inodes.h> #include <linux/eventpoll.h> @@ -1540,6 +1542,48 @@ xfs_healthmon_show_fdinfo( } #endif +/* Reconfigure the health monitor. */ +STATIC long +xfs_healthmon_reconfigure( + struct file *file, + unsigned int cmd, + void __user *arg) +{ + struct xfs_health_monitor hmo; + struct xfs_healthmon *hm = file->private_data; + + if (copy_from_user(&hmo, arg, sizeof(hmo))) + return -EFAULT; + + if (!xfs_healthmon_validate(&hmo)) + return -EINVAL; + + mutex_lock(&hm->lock); + hm->format = hmo.format; + hm->verbose = !!(hmo.flags & XFS_HEALTH_MONITOR_VERBOSE); + mutex_unlock(&hm->lock); + return 0; +} + +/* Handle ioctls for the health monitoring thread. */ +STATIC long +xfs_healthmon_ioctl( + struct file *file, + unsigned int cmd, + unsigned long p) +{ + void __user *arg = (void __user *)p; + + switch (cmd) { + case XFS_IOC_HEALTH_MONITOR: + return xfs_healthmon_reconfigure(file, cmd, arg); + default: + break; + } + + return -ENOTTY; +} + static const struct file_operations xfs_healthmon_fops = { .owner = THIS_MODULE, #ifdef CONFIG_PROC_FS @@ -1548,6 +1592,7 @@ static const struct file_operations xfs_healthmon_fops = { .read_iter = xfs_healthmon_read_iter, .poll = xfs_healthmon_poll, .release = xfs_healthmon_release, + .unlocked_ioctl = xfs_healthmon_ioctl, }; /* ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 17/19] xfs: validate fds against running healthmon 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (15 preceding siblings ...) 2025-10-23 0:04 ` [PATCH 16/19] xfs: allow reconfiguration of the health monitoring device Darrick J. Wong @ 2025-10-23 0:05 ` Darrick J. Wong 2025-10-23 0:05 ` [PATCH 18/19] xfs: add media error reporting ioctl Darrick J. Wong 2025-10-23 0:05 ` [PATCH 19/19] xfs: send uevents when major filesystem events happen Darrick J. Wong 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:05 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new ioctl for the healthmon file that checks that a given fd points to the same filesystem that the healthmon file is monitoring. This allows xfs_healer to check that when it reopens a mountpoint to perform repairs, the file that it gets matches the filesystem that generated the corruption report. (Note that xfs_healer doesn't maintain an open fd to a filesystem that it's monitoring so that it doesn't pin the mount.) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 10 ++++++++++ fs/xfs/xfs_healthmon.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 87e915baa875d6..b5a00ef6ce5fb9 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1149,6 +1149,15 @@ struct xfs_health_monitor { /* Return events in JSON format */ #define XFS_HEALTH_MONITOR_FMT_JSON (1) +/* + * Check that a given fd points to the same filesystem that the health monitor + * is monitoring. + */ +struct xfs_health_samefs { + __s32 fd; + __u32 flags; /* zero for now */ +}; + /* * ioctl commands that are used by Linux filesystems */ @@ -1189,6 +1198,7 @@ struct xfs_health_monitor { #define XFS_IOC_SCRUBV_METADATA _IOWR('X', 64, struct xfs_scrub_vec_head) #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) #define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) +#define XFS_IOC_HEALTH_SAMEFS _IOW ('X', 69, struct xfs_health_samefs) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index ce84cd90df2379..666c27d73efbdc 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -1565,6 +1565,36 @@ xfs_healthmon_reconfigure( return 0; } +/* Does the fd point to the same filesystem as the one we're monitoring? */ +STATIC long +xfs_healthmon_samefs( + struct file *file, + unsigned int cmd, + void __user *arg) +{ + struct xfs_health_samefs hms; + struct xfs_healthmon *hm = file->private_data; + struct inode *hms_inode; + int ret = 0; + + if (copy_from_user(&hms, arg, sizeof(hms))) + return -EFAULT; + + if (hms.flags) + return -EINVAL; + + CLASS(fd, hms_fd)(hms.fd); + if (fd_empty(hms_fd)) + return -EBADF; + + hms_inode = file_inode(fd_file(hms_fd)); + mutex_lock(&hm->lock); + if (!hm->mp || hm->mp->m_super != hms_inode->i_sb) + ret = -ESTALE; + mutex_unlock(&hm->lock); + return ret; +} + /* Handle ioctls for the health monitoring thread. */ STATIC long xfs_healthmon_ioctl( @@ -1577,6 +1607,8 @@ xfs_healthmon_ioctl( switch (cmd) { case XFS_IOC_HEALTH_MONITOR: return xfs_healthmon_reconfigure(file, cmd, arg); + case XFS_IOC_HEALTH_SAMEFS: + return xfs_healthmon_samefs(file, cmd, arg); default: break; } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 18/19] xfs: add media error reporting ioctl 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (16 preceding siblings ...) 2025-10-23 0:05 ` [PATCH 17/19] xfs: validate fds against running healthmon Darrick J. Wong @ 2025-10-23 0:05 ` Darrick J. Wong 2025-10-23 0:05 ` [PATCH 19/19] xfs: send uevents when major filesystem events happen Darrick J. Wong 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:05 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new privileged ioctl so that xfs_scrub can report media errors to the kernel for further processing. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 16 +++++++++++++ fs/xfs/xfs_notify_failure.h | 8 ++++++ fs/xfs/xfs_trace.h | 2 -- fs/xfs/Makefile | 6 +---- fs/xfs/xfs_healthmon.c | 2 -- fs/xfs/xfs_ioctl.c | 3 ++ fs/xfs/xfs_notify_failure.c | 53 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 79 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index b5a00ef6ce5fb9..5d35d67b10e153 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1158,6 +1158,21 @@ struct xfs_health_samefs { __u32 flags; /* zero for now */ }; +/* Report a media error */ +struct xfs_media_error { + __u64 flags; /* flags */ + __u64 daddr; /* disk address of range */ + __u64 bbcount; /* length, in 512b blocks */ + __u64 pad; /* zero */ +}; + +#define XFS_MEDIA_ERROR_DATADEV (1) /* data device */ +#define XFS_MEDIA_ERROR_LOGDEV (2) /* external log device */ +#define XFS_MEDIA_ERROR_RTDEV (3) /* realtime device */ + +/* bottom byte of flags is the device code */ +#define XFS_MEDIA_ERROR_DEVMASK (0xFF) + /* * ioctl commands that are used by Linux filesystems */ @@ -1199,6 +1214,7 @@ struct xfs_health_samefs { #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) #define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) #define XFS_IOC_HEALTH_SAMEFS _IOW ('X', 69, struct xfs_health_samefs) +#define XFS_IOC_MEDIA_ERROR _IOW ('X', 70, struct xfs_media_error) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h index 528317ff24320a..e9ee74aa540bff 100644 --- a/fs/xfs/xfs_notify_failure.h +++ b/fs/xfs/xfs_notify_failure.h @@ -6,7 +6,9 @@ #ifndef __XFS_NOTIFY_FAILURE_H__ #define __XFS_NOTIFY_FAILURE_H__ +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) extern const struct dax_holder_operations xfs_dax_holder_operations; +#endif enum xfs_failed_device { XFS_FAILED_DATADEV, @@ -14,7 +16,7 @@ enum xfs_failed_device { XFS_FAILED_RTDEV, }; -#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +#if defined(CONFIG_XFS_LIVE_HOOKS) struct xfs_media_error_params { struct xfs_mount *mp; enum xfs_failed_device fdev; @@ -46,4 +48,8 @@ struct xfs_media_error_hook { }; # define xfs_media_error_hook_setup(...) ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +struct xfs_media_error; +int xfs_ioc_media_error(struct xfs_mount *mp, + struct xfs_media_error __user *arg); + #endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b23f3c41db1c03..10b1ef735a7c9c 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6214,7 +6214,6 @@ TRACE_EVENT(xfs_healthmon_metadata_hook, __entry->lost_prev) ); -#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) TRACE_EVENT(xfs_healthmon_media_error_hook, TP_PROTO(const struct xfs_media_error_params *p, unsigned int events, unsigned long long lost_prev), @@ -6262,7 +6261,6 @@ TRACE_EVENT(xfs_healthmon_media_error_hook, __entry->events, __entry->lost_prev) ); -#endif #define XFS_FILE_IOERROR_STRINGS \ { XFS_FILE_IOERROR_BUFFERED_READ, "readahead" }, \ diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index d4e9070a9326ba..2279cb0b874814 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -98,6 +98,7 @@ xfs-y += xfs_aops.o \ xfs_message.o \ xfs_mount.o \ xfs_mru_cache.o \ + xfs_notify_failure.o \ xfs_pwork.o \ xfs_reflink.o \ xfs_stats.o \ @@ -148,11 +149,6 @@ xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o -# notify failure -ifeq ($(CONFIG_MEMORY_FAILURE),y) -xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o -endif - xfs-$(CONFIG_XFS_DRAIN_INTENTS) += xfs_drain.o xfs-$(CONFIG_XFS_LIVE_HOOKS) += xfs_hooks.o xfs-$(CONFIG_XFS_MEMORY_BUFS) += xfs_buf_mem.o diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 666c27d73efbdc..3053b2da6b3109 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -527,7 +527,6 @@ xfs_healthmon_shutdown_hook( return NOTIFY_DONE; } -#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) /* Add a media error event to the reporting queue. */ STATIC int xfs_healthmon_media_error_hook( @@ -578,7 +577,6 @@ xfs_healthmon_media_error_hook( mutex_unlock(&hm->lock); return NOTIFY_DONE; } -#endif /* Add a file io error event to the reporting queue. */ STATIC int diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 08998d84554f09..7a80a6ad4b2d99 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -42,6 +42,7 @@ #include "xfs_handle.h" #include "xfs_rtgroup.h" #include "xfs_healthmon.h" +#include "xfs_notify_failure.h" #include <linux/mount.h> #include <linux/fileattr.h> @@ -1424,6 +1425,8 @@ xfs_file_ioctl( case XFS_IOC_HEALTH_MONITOR: return xfs_ioc_health_monitor(mp, arg); + case XFS_IOC_MEDIA_ERROR: + return xfs_ioc_media_error(mp, arg); default: return -ENOTTY; diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index 2098ff452a3b87..00120dd1ddefbd 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -91,9 +91,19 @@ xfs_media_error_hook_setup( xfs_hook_setup(&hook->error_hook, mod_fn); } #else -# define xfs_media_error_hook(...) ((void)0) +static inline void +xfs_media_error_hook( + struct xfs_mount *mp, + enum xfs_failed_device fdev, + xfs_daddr_t daddr, + uint64_t bbcount, + bool pre_remove) +{ + /* empty */ +} #endif /* CONFIG_XFS_LIVE_HOOKS */ +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) struct xfs_failure_info { xfs_agblock_t startblock; xfs_extlen_t blockcount; @@ -458,3 +468,44 @@ xfs_dax_notify_failure( const struct dax_holder_operations xfs_dax_holder_operations = { .notify_failure = xfs_dax_notify_failure, }; +#endif /* CONFIG_MEMORY_FAILURE && CONFIG_FS_DAX */ + +#define XFS_VALID_MEDIA_ERROR_FLAGS (XFS_MEDIA_ERROR_DATADEV | \ + XFS_MEDIA_ERROR_LOGDEV | \ + XFS_MEDIA_ERROR_RTDEV) +int +xfs_ioc_media_error( + struct xfs_mount *mp, + struct xfs_media_error __user *arg) +{ + struct xfs_media_error me; + enum xfs_failed_device fdev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&me, arg, sizeof(me))) + return -EFAULT; + + if (me.pad) + return -EINVAL; + if (me.flags & ~XFS_VALID_MEDIA_ERROR_FLAGS) + return -EINVAL; + + switch (me.flags & XFS_MEDIA_ERROR_DEVMASK) { + case XFS_MEDIA_ERROR_DATADEV: + fdev = XFS_FAILED_DATADEV; + break; + case XFS_MEDIA_ERROR_LOGDEV: + fdev = XFS_FAILED_LOGDEV; + break; + case XFS_MEDIA_ERROR_RTDEV: + fdev = XFS_FAILED_RTDEV; + break; + default: + return -EINVAL; + } + + xfs_media_error_hook(mp, fdev, me.daddr, me.bbcount, false); + return 0; +} ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 19/19] xfs: send uevents when major filesystem events happen 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong ` (17 preceding siblings ...) 2025-10-23 0:05 ` [PATCH 18/19] xfs: add media error reporting ioctl Darrick J. Wong @ 2025-10-23 0:05 ` Darrick J. Wong 18 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:05 UTC (permalink / raw) To: cem, djwong; +Cc: linux-fsdevel, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Send uevents when we mount, unmount, and shut down the filesystem, so that we can trigger systemd services when major events happen. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_super.h | 13 +++++++ fs/xfs/xfs_fsops.c | 18 ++++++++++ fs/xfs/xfs_super.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 125 insertions(+) diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h index c0e85c1e42f27d..6d428bd04a0248 100644 --- a/fs/xfs/xfs_super.h +++ b/fs/xfs/xfs_super.h @@ -101,4 +101,17 @@ extern struct workqueue_struct *xfs_discard_wq; struct dentry *xfs_debugfs_mkdir(const char *name, struct dentry *parent); +#define XFS_UEVENT_BUFLEN ( \ + sizeof("SID=") + sizeof_field(struct super_block, s_id) + \ + sizeof("UUID=") + UUID_STRING_LEN + \ + sizeof("META_UUID=") + UUID_STRING_LEN) + +#define XFS_UEVENT_STR_PTRS \ + NULL, /* sid */ \ + NULL, /* uuid */ \ + NULL /* metauuid */ + +int xfs_format_uevent_strings(struct xfs_mount *mp, char *buf, ssize_t buflen, + char **env); + #endif /* __XFS_SUPER_H__ */ diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 69918cd1ba1dbc..b3a01361318320 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -537,6 +537,23 @@ xfs_shutdown_hook_setup( # define xfs_shutdown_hook(...) ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +static void +xfs_send_shutdown_uevent( + struct xfs_mount *mp) +{ + char buf[XFS_UEVENT_BUFLEN]; + char *env[] = { + "TYPE=shutdown", + XFS_UEVENT_STR_PTRS, + NULL, + }; + int error; + + error = xfs_format_uevent_strings(mp, buf, sizeof(buf), &env[2]); + if (!error) + kobject_uevent_env(&mp->m_kobj.kobject, KOBJ_OFFLINE, env); +} + /* * Force a shutdown of the filesystem instantly while keeping the filesystem * consistent. We don't do an unmount here; just shutdown the shop, make sure @@ -587,6 +604,7 @@ xfs_do_force_shutdown( } trace_xfs_force_shutdown(mp, tag, flags, fname, lnnum); + xfs_send_shutdown_uevent(mp); xfs_alert_tag(mp, tag, "%s (0x%x) detected at %pS (%s:%d). Shutting down filesystem.", diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index b6a6027b4df8d8..5137f4cb8640b8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -53,6 +53,7 @@ #include <linux/magic.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> +#include <linux/uuid.h> static const struct super_operations xfs_super_operations; @@ -1238,12 +1239,73 @@ xfs_inodegc_free_percpu( free_percpu(mp->m_inodegc); } +int +xfs_format_uevent_strings( + struct xfs_mount *mp, + char *buf, + ssize_t buflen, + char **env) +{ + ssize_t written; + + ASSERT(buflen >= XFS_UEVENT_BUFLEN); + + written = snprintf(buf, buflen, "SID=%s", mp->m_super->s_id); + if (written >= buflen) + return -EINVAL; + + *env = buf; + env++; + buf += written + 1; + buflen -= written + 1; + + written = snprintf(buf, buflen, "UUID=%pU", &mp->m_sb.sb_uuid); + if (written >= buflen) + return EINVAL; + + *env = buf; + env++; + buf += written + 1; + buflen -= written + 1; + + written = snprintf(buf, buflen, "META_UUID=%pU", + &mp->m_sb.sb_meta_uuid); + if (written >= buflen) + return EINVAL; + + *env = buf; + env++; + buf += written + 1; + buflen -= written + 1; + + ASSERT(buflen >= 0); + return 0; +} + +static void +xfs_send_unmount_uevent( + struct xfs_mount *mp) +{ + char buf[XFS_UEVENT_BUFLEN]; + char *env[] = { + "TYPE=mount", + XFS_UEVENT_STR_PTRS, + NULL, + }; + int error; + + error = xfs_format_uevent_strings(mp, buf, sizeof(buf), &env[1]); + if (!error) + kobject_uevent_env(&mp->m_kobj.kobject, KOBJ_REMOVE, env); +} + static void xfs_fs_put_super( struct super_block *sb) { struct xfs_mount *mp = XFS_M(sb); + xfs_send_unmount_uevent(mp); xfs_notice(mp, "Unmounting Filesystem %pU", &mp->m_sb.sb_uuid); xfs_filestream_unmount(mp); xfs_unmountfs(mp); @@ -1661,6 +1723,37 @@ xfs_debugfs_mkdir( return child; } +/* + * Send a uevent signalling that the mount succeeded so we can use udev rules + * to start background services. + */ +static void +xfs_send_mount_uevent( + struct fs_context *fc, + struct xfs_mount *mp) +{ + char *source; + char buf[XFS_UEVENT_BUFLEN]; + char *env[] = { + "TYPE=mount", + NULL, /* source */ + XFS_UEVENT_STR_PTRS, + NULL, + }; + int error; + + source = kasprintf(GFP_KERNEL, "SOURCE=%s", fc->source); + if (!source) + return; + env[1] = source; + + error = xfs_format_uevent_strings(mp, buf, sizeof(buf), &env[2]); + if (!error) + kobject_uevent_env(&mp->m_kobj.kobject, KOBJ_ADD, env); + + kfree(source); +} + static int xfs_fs_fill_super( struct super_block *sb, @@ -1974,6 +2067,7 @@ xfs_fs_fill_super( mp->m_debugfs_uuid = NULL; } + xfs_send_mount_uevent(fc, mp); return 0; out_filestream_unmount: ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems 2025-10-22 23:56 [PATCHBOMB 6.19] xfs: autonomous self healing Darrick J. Wong 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong @ 2025-10-23 0:00 ` Darrick J. Wong 2025-10-23 0:05 ` [PATCH 01/26] xfs: create hooks for monitoring health updates Darrick J. Wong ` (25 more replies) 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong 3 siblings, 26 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:00 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs Hi all, This patchset builds new functionality to deliver live information about filesystem health events to userspace. This is done by creating an anonymous file that can be read() for events by userspace programs. Events are captured by hooking various parts of XFS and iomap so that metadata health failures, file I/O errors, and major changes in filesystem state (unmounts, shutdowns, etc.) can be observed by programs. When an event occurs, the hook functions queue an event object to each event anonfd for later processing. Programs must have CAP_SYS_ADMIN to open the anonfd and there's a maximum event lag to prevent resource overconsumption. The events themselves can be read() from the anonfd either as json objects for human readability, or as C structs for daemons. In userspace, we create a new daemon program that will read the event objects and initiate repairs automatically. This daemon is managed entirely by systemd and will not block unmounting of the filesystem unless repairs are ongoing. It is autostarted via some udev rules. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=health-monitoring --- Commits in this patchset: * xfs: create hooks for monitoring health updates * xfs: create a special file to pass filesystem health to userspace * xfs: create event queuing, formatting, and discovery infrastructure * xfs: report metadata health events through healthmon * xfs: report shutdown events through healthmon * xfs: report media errors through healthmon * xfs: report file io errors through healthmon * xfs: validate fds against running healthmon * xfs: add media error reporting ioctl * xfs_io: monitor filesystem health events * xfs_io: add a media error reporting command * xfs_healer: create daemon to listen for health events * xfs_healer: check events against schema * xfs_healer: enable repairing filesystems * xfs_healer: check for fs features needed for effective repairs * xfs_healer: use getparents to look up file names * builddefs: refactor udev directory specification * xfs_healer: create a background monitoring service * xfs_healer: don't start service if kernel support unavailable * xfs_healer: use the autofsck fsproperty to select mode * xfs_healer: run full scrub after lost corruption events or targeted repair failure * xfs_healer: use getmntent to find moved filesystems * xfs_healer: validate that repair fds point to the monitored fs * xfs_healer: add a manual page * xfs_scrub: report media scrub failures to the kernel * debian: enable xfs_healer on the root filesystem by default --- io/io.h | 1 libxfs/xfs_fs.h | 173 +++++ libxfs/xfs_health.h | 52 + Makefile | 5 configure.ac | 8 debian/control | 2 debian/postinst | 8 debian/prerm | 13 debian/rules | 2 healer/Makefile | 68 ++ healer/system-xfs_healer.slice | 31 + healer/xfs_healer.py.in | 1432 ++++++++++++++++++++++++++++++++++++++ healer/xfs_healer.rules | 7 healer/xfs_healer@.service.in | 108 +++ healer/xfs_healer_start | 17 include/builddefs.in | 5 io/Makefile | 1 io/healthmon.c | 183 +++++ io/init.c | 1 io/shutdown.c | 113 +++ libxfs/Makefile | 10 libxfs/xfs_healthmon.schema.json | 648 +++++++++++++++++ m4/package_services.m4 | 30 - man/man8/xfs_healer.8 | 85 ++ man/man8/xfs_io.8 | 46 + scrub/Makefile | 7 scrub/phase6.c | 25 + 27 files changed, 3054 insertions(+), 27 deletions(-) create mode 100644 debian/prerm create mode 100644 healer/Makefile create mode 100644 healer/system-xfs_healer.slice create mode 100644 healer/xfs_healer.py.in create mode 100644 healer/xfs_healer.rules create mode 100644 healer/xfs_healer@.service.in create mode 100755 healer/xfs_healer_start create mode 100644 io/healthmon.c create mode 100644 libxfs/xfs_healthmon.schema.json create mode 100644 man/man8/xfs_healer.8 ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 01/26] xfs: create hooks for monitoring health updates 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong @ 2025-10-23 0:05 ` Darrick J. Wong 2025-10-23 0:06 ` [PATCH 02/26] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong ` (24 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:05 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create hooks for monitoring health events. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_health.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index b31000f7190ce5..39fef33dedc6a8 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -289,4 +289,51 @@ void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs); #define xfs_metadata_is_sick(error) \ (unlikely((error) == -EFSCORRUPTED || (error) == -EFSBADCRC)) +/* + * Parameters for tracking health updates. The enum below is passed as the + * hook function argument. + */ +enum xfs_health_update_type { + XFS_HEALTHUP_SICK = 1, /* runtime corruption observed */ + XFS_HEALTHUP_CORRUPT, /* fsck reported corruption */ + XFS_HEALTHUP_HEALTHY, /* fsck reported healthy structure */ + XFS_HEALTHUP_UNMOUNT, /* filesystem is unmounting */ +}; + +/* Where in the filesystem was the event observed? */ +enum xfs_health_update_domain { + XFS_HEALTHUP_FS = 1, /* main filesystem */ + XFS_HEALTHUP_AG, /* allocation group */ + XFS_HEALTHUP_INODE, /* inode */ + XFS_HEALTHUP_RTGROUP, /* realtime group */ +}; + +struct xfs_health_update_params { + /* XFS_HEALTHUP_INODE */ + xfs_ino_t ino; + uint32_t gen; + + /* XFS_HEALTHUP_AG/RTGROUP */ + uint32_t group; + + /* XFS_SICK_* flags */ + unsigned int old_mask; + unsigned int new_mask; + + enum xfs_health_update_domain domain; +}; + +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_health_hook { + struct xfs_hook health_hook; +}; + +void xfs_health_hook_disable(void); +void xfs_health_hook_enable(void); + +int xfs_health_hook_add(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_del(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_setup(struct xfs_health_hook *hook, notifier_fn_t mod_fn); +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_HEALTH_H__ */ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 02/26] xfs: create a special file to pass filesystem health to userspace 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:05 ` [PATCH 01/26] xfs: create hooks for monitoring health updates Darrick J. Wong @ 2025-10-23 0:06 ` Darrick J. Wong 2025-10-23 0:06 ` [PATCH 03/26] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong ` (23 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:06 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an ioctl that installs a file descriptor backed by an anon_inode file that will convey filesystem health events to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 12463ba766da05..dba7896f716092 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1003,6 +1003,13 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ #define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1U << 4) /* reference counts */ +struct xfs_health_monitor { + __u64 flags; /* flags */ + __u8 format; /* output format */ + __u8 pad1[7]; /* zeroes */ + __u64 pad2[2]; /* zeroes */ +}; + /* * ioctl commands that are used by Linux filesystems */ @@ -1042,6 +1049,7 @@ struct xfs_rtgroup_geometry { #define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle) #define XFS_IOC_SCRUBV_METADATA _IOWR('X', 64, struct xfs_scrub_vec_head) #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) +#define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) /* * ioctl commands that replace IRIX syssgi()'s ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 03/26] xfs: create event queuing, formatting, and discovery infrastructure 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:05 ` [PATCH 01/26] xfs: create hooks for monitoring health updates Darrick J. Wong 2025-10-23 0:06 ` [PATCH 02/26] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong @ 2025-10-23 0:06 ` Darrick J. Wong 2025-10-23 0:06 ` [PATCH 04/26] xfs: report metadata health events through healthmon Darrick J. Wong ` (22 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:06 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create the basic infrastructure that we need to report health events to userspace. We need a compact form for recording critical information about an event and queueing them; a means to notice that we've lost some events; and a means to format the events into something that userspace can handle. Here, we've chosen json to export information to userspace. The structured key-value nature of json gives us enormous flexibility to modify the schema of what we'll send to userspace because we can add new keys at any time. Userspace can use whatever json parsers are available to consume the events and will not be confused by keys they don't recognize. Note that we do NOT allow sending json back to the kernel, nor is there any intent to do that. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 50 +++++++++++++++ libxfs/xfs_healthmon.schema.json | 129 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) create mode 100644 libxfs/xfs_healthmon.schema.json diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index dba7896f716092..4b642eea18b5ca 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1003,6 +1003,45 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ #define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1U << 4) /* reference counts */ +/* Health monitor event domains */ + +/* affects the whole fs */ +#define XFS_HEALTH_MONITOR_DOMAIN_MOUNT (0) + +/* Health monitor event types */ + +/* status of the monitor itself */ +#define XFS_HEALTH_MONITOR_TYPE_RUNNING (0) +#define XFS_HEALTH_MONITOR_TYPE_LOST (1) + +/* lost events */ +struct xfs_health_monitor_lost { + __u64 count; +}; + +struct xfs_health_monitor_event { + /* XFS_HEALTH_MONITOR_DOMAIN_* */ + __u32 domain; + + /* XFS_HEALTH_MONITOR_TYPE_* */ + __u32 type; + + /* Timestamp of the event, in nanoseconds since the Unix epoch */ + __u64 time_ns; + + /* + * Details of the event. The primary clients are written in python + * and rust, so break this up because bindgen hates anonymous structs + * and unions. + */ + union { + struct xfs_health_monitor_lost lost; + } e; + + /* zeroes */ + __u64 pad[2]; +}; + struct xfs_health_monitor { __u64 flags; /* flags */ __u8 format; /* output format */ @@ -1010,6 +1049,17 @@ struct xfs_health_monitor { __u64 pad2[2]; /* zeroes */ }; +/* Return all health status events, not just deltas */ +#define XFS_HEALTH_MONITOR_VERBOSE (1ULL << 0) + +#define XFS_HEALTH_MONITOR_ALL (XFS_HEALTH_MONITOR_VERBOSE) + +/* Return events in a C structure */ +#define XFS_HEALTH_MONITOR_FMT_CSTRUCT (0) + +/* Return events in JSON format */ +#define XFS_HEALTH_MONITOR_FMT_JSON (1) + /* * ioctl commands that are used by Linux filesystems */ diff --git a/libxfs/xfs_healthmon.schema.json b/libxfs/xfs_healthmon.schema.json new file mode 100644 index 00000000000000..68762738b04191 --- /dev/null +++ b/libxfs/xfs_healthmon.schema.json @@ -0,0 +1,129 @@ +{ + "$comment": [ + "SPDX-License-Identifier: GPL-2.0-or-later", + "Copyright (c) 2024-2025 Oracle. All Rights Reserved.", + "Author: Darrick J. Wong <djwong@kernel.org>", + "", + "This schema file describes the format of the json objects", + "readable from the fd returned by the XFS_IOC_HEALTHMON", + "ioctl." + ], + + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/fs/xfs/libxfs/xfs_healthmon.schema.json", + + "title": "XFS Health Monitoring Events", + + "$comment": "Events must be one of the following types:", + "oneOf": [ + { + "$ref": "#/$events/running" + }, + { + "$ref": "#/$events/unmount" + }, + { + "$ref": "#/$events/lost" + } + ], + + "$comment": "Simple data types are defined here.", + "$defs": { + "time_ns": { + "title": "Time of Event", + "description": "Timestamp of the event, in nanoseconds since the Unix epoch.", + "type": "integer" + }, + "count": { + "title": "Count of events", + "description": "Number of events.", + "type": "integer", + "minimum": 1 + } + }, + + "$comment": "Event types are defined here.", + "$events": { + "running": { + "title": "Health Monitoring Running", + "$comment": [ + "The health monitor is actually running." + ], + "type": "object", + + "properties": { + "type": { + "const": "running" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "time_ns", + "domain" + ] + }, + "unmount": { + "title": "Filesystem Unmounted", + "$comment": [ + "The filesystem was unmounted." + ], + "type": "object", + + "properties": { + "type": { + "const": "unmount" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "time_ns", + "domain" + ] + }, + "lost": { + "title": "Health Monitoring Events Lost", + "$comment": [ + "Previous health monitoring events were", + "dropped due to memory allocation failures", + "or queue limits." + ], + "type": "object", + + "properties": { + "type": { + "const": "lost" + }, + "count": { + "$ref": "#/$defs/count" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "count", + "time_ns", + "domain" + ] + } + } +} ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 04/26] xfs: report metadata health events through healthmon 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (2 preceding siblings ...) 2025-10-23 0:06 ` [PATCH 03/26] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong @ 2025-10-23 0:06 ` Darrick J. Wong 2025-10-23 0:06 ` [PATCH 05/26] xfs: report shutdown " Darrick J. Wong ` (21 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:06 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a metadata health event hook so that we can send events to userspace as we collect information. The unmount hook severs the weak reference between the health monitor and the filesystem it's monitoring; when this happens, we stop reporting events because there's no longer any point. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 38 +++++ libxfs/xfs_health.h | 5 + libxfs/xfs_healthmon.schema.json | 315 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 358 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 4b642eea18b5ca..358abe98776d69 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1008,17 +1008,52 @@ struct xfs_rtgroup_geometry { /* affects the whole fs */ #define XFS_HEALTH_MONITOR_DOMAIN_MOUNT (0) +/* metadata health events */ +#define XFS_HEALTH_MONITOR_DOMAIN_FS (1) +#define XFS_HEALTH_MONITOR_DOMAIN_AG (2) +#define XFS_HEALTH_MONITOR_DOMAIN_INODE (3) +#define XFS_HEALTH_MONITOR_DOMAIN_RTGROUP (4) + /* Health monitor event types */ /* status of the monitor itself */ #define XFS_HEALTH_MONITOR_TYPE_RUNNING (0) #define XFS_HEALTH_MONITOR_TYPE_LOST (1) +/* metadata health events */ +#define XFS_HEALTH_MONITOR_TYPE_SICK (2) +#define XFS_HEALTH_MONITOR_TYPE_CORRUPT (3) +#define XFS_HEALTH_MONITOR_TYPE_HEALTHY (4) + +/* filesystem was unmounted */ +#define XFS_HEALTH_MONITOR_TYPE_UNMOUNT (5) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; }; +/* fs/rt metadata */ +struct xfs_health_monitor_fs { + /* XFS_FSOP_GEOM_SICK_* flags */ + __u32 mask; +}; + +/* ag/rtgroup metadata */ +struct xfs_health_monitor_group { + /* XFS_{AG,RTGROUP}_SICK_* flags */ + __u32 mask; + __u32 gno; +}; + +/* inode metadata */ +struct xfs_health_monitor_inode { + /* XFS_BS_SICK_* flags */ + __u32 mask; + __u32 gen; + __u64 ino; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1036,6 +1071,9 @@ struct xfs_health_monitor_event { */ union { struct xfs_health_monitor_lost lost; + struct xfs_health_monitor_fs fs; + struct xfs_health_monitor_group group; + struct xfs_health_monitor_inode inode; } e; /* zeroes */ diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index 39fef33dedc6a8..9ff3bf8ba4ed8f 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -336,4 +336,9 @@ void xfs_health_hook_del(struct xfs_mount *mp, struct xfs_health_hook *hook); void xfs_health_hook_setup(struct xfs_health_hook *hook, notifier_fn_t mod_fn); #endif /* CONFIG_XFS_LIVE_HOOKS */ +unsigned int xfs_healthmon_inode_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_rtgroup_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_perag_mask(unsigned int sick_mask); +unsigned int xfs_healthmon_fs_mask(unsigned int sick_mask); + #endif /* __XFS_HEALTH_H__ */ diff --git a/libxfs/xfs_healthmon.schema.json b/libxfs/xfs_healthmon.schema.json index 68762738b04191..dd78f1b71d587b 100644 --- a/libxfs/xfs_healthmon.schema.json +++ b/libxfs/xfs_healthmon.schema.json @@ -24,6 +24,18 @@ }, { "$ref": "#/$events/lost" + }, + { + "$ref": "#/$events/fs_metadata" + }, + { + "$ref": "#/$events/rtgroup_metadata" + }, + { + "$ref": "#/$events/perag_metadata" + }, + { + "$ref": "#/$events/inode_metadata" } ], @@ -39,6 +51,156 @@ "description": "Number of events.", "type": "integer", "minimum": 1 + }, + "xfs_agnumber_t": { + "description": "Allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_rgnumber_t": { + "description": "Realtime allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_ino_t": { + "description": "Inode number", + "type": "integer", + "minimum": 1 + }, + "i_generation": { + "description": "Inode generation number", + "type": "integer" + } + }, + + "$comment": "Filesystem metadata event data are defined here.", + "$metadata": { + "status": { + "description": "Metadata health status", + "$comment": [ + "One of:", + "", + " * sick: metadata corruption discovered", + " during a runtime operation.", + " * corrupt: corruption discovered during", + " an xfs_scrub run.", + " * healthy: metadata object was found to be", + " ok by xfs_scrub." + ], + "enum": [ + "sick", + "corrupt", + "healthy" + ] + }, + "fs": { + "description": [ + "Metadata structures that affect the entire", + "filesystem. Options include:", + "", + " * fscounters: summary counters", + " * usrquota: user quota records", + " * grpquota: group quota records", + " * prjquota: project quota records", + " * quotacheck: quota counters", + " * nlinks: file link counts", + " * metadir: metadata directory", + " * metapath: metadata inode paths" + ], + "enum": [ + "fscounters", + "grpquota", + "metadir", + "metapath", + "nlinks", + "prjquota", + "quotacheck", + "usrquota" + ] + }, + "perag": { + "description": [ + "Metadata structures owned by allocation", + "groups on the data device. Options include:", + "", + " * agf: group space header", + " * agfl: per-group free block list", + " * agi: group inode header", + " * bnobt: free space by position btree", + " * cntbt: free space by length btree", + " * finobt: free inode btree", + " * inobt: inode btree", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * inodes: problems were recorded for", + " this group's inodes, but the", + " inodes themselves had to be", + " reclaimed.", + " * super: superblock" + ], + "enum": [ + "agf", + "agfl", + "agi", + "bnobt", + "cntbt", + "finobt", + "inobt", + "inodes", + "refcountbt", + "rmapbt", + "super" + ] + }, + "rtgroup": { + "description": [ + "Metadata structures owned by allocation", + "groups on the realtime volume. Options", + "include:", + "", + " * bitmap: free space bitmap contents", + " for this group", + " * summary: realtime free space summary file", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * super: group superblock" + ], + "enum": [ + "bitmap", + "summary", + "refcountbt", + "rmapbt", + "super" + ] + }, + "inode": { + "description": [ + "Metadata structures owned by file inodes.", + "Options include:", + "", + " * bmapbta: attr fork", + " * bmapbtc: cow fork", + " * bmapbtd: data fork", + " * core: inode record", + " * directory: directory entries", + " * dirtree: directory tree problems detected", + " * parent: directory parent pointer", + " * symlink: symbolic link target", + " * xattr: extended attributes" + ], + "enum": [ + "bmapbta", + "bmapbtc", + "bmapbtd", + "core", + "directory", + "dirtree", + "parent", + "symlink", + "xattr" + ] } }, @@ -124,6 +286,159 @@ "time_ns", "domain" ] + }, + "fs_metadata": { + "title": "Filesystem-wide metadata event", + "description": [ + "Health status updates for filesystem-wide", + "metadata objects." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "fs" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/fs" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "structures" + ] + }, + "perag_metadata": { + "title": "Data device allocation group metadata event", + "description": [ + "Health status updates for data device ", + "allocation group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "perag" + }, + "group": { + "$ref": "#/$defs/xfs_agnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/perag" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "rtgroup_metadata": { + "title": "Realtime allocation group metadata event", + "description": [ + "Health status updates for realtime allocation", + "group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "rtgroup" + }, + "group": { + "$ref": "#/$defs/xfs_rgnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/rtgroup" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "inode_metadata": { + "title": "Inode metadata event", + "description": [ + "Health status updates for inode metadata.", + "The inode and generation number describe the", + "file that is affected by the change." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "inode" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/inode" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "structures" + ] } } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 05/26] xfs: report shutdown events through healthmon 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (3 preceding siblings ...) 2025-10-23 0:06 ` [PATCH 04/26] xfs: report metadata health events through healthmon Darrick J. Wong @ 2025-10-23 0:06 ` Darrick J. Wong 2025-10-23 0:07 ` [PATCH 06/26] xfs: report media errors " Darrick J. Wong ` (20 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:06 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a shutdown hook so that we can send notifications to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 18 +++++++++++ libxfs/xfs_healthmon.schema.json | 62 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 358abe98776d69..918362a7294f27 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1028,6 +1028,9 @@ struct xfs_rtgroup_geometry { /* filesystem was unmounted */ #define XFS_HEALTH_MONITOR_TYPE_UNMOUNT (5) +/* filesystem shutdown */ +#define XFS_HEALTH_MONITOR_TYPE_SHUTDOWN (6) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1054,6 +1057,20 @@ struct xfs_health_monitor_inode { __u64 ino; }; +/* shutdown reasons */ +#define XFS_HEALTH_SHUTDOWN_META_IO_ERROR (1u << 0) +#define XFS_HEALTH_SHUTDOWN_LOG_IO_ERROR (1u << 1) +#define XFS_HEALTH_SHUTDOWN_FORCE_UMOUNT (1u << 2) +#define XFS_HEALTH_SHUTDOWN_CORRUPT_INCORE (1u << 3) +#define XFS_HEALTH_SHUTDOWN_CORRUPT_ONDISK (1u << 4) +#define XFS_HEALTH_SHUTDOWN_DEVICE_REMOVED (1u << 5) + +/* shutdown */ +struct xfs_health_monitor_shutdown { + /* XFS_HEALTH_SHUTDOWN_* flags */ + __u32 reasons; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1074,6 +1091,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_fs fs; struct xfs_health_monitor_group group; struct xfs_health_monitor_inode inode; + struct xfs_health_monitor_shutdown shutdown; } e; /* zeroes */ diff --git a/libxfs/xfs_healthmon.schema.json b/libxfs/xfs_healthmon.schema.json index dd78f1b71d587b..1657ccc482edff 100644 --- a/libxfs/xfs_healthmon.schema.json +++ b/libxfs/xfs_healthmon.schema.json @@ -36,6 +36,9 @@ }, { "$ref": "#/$events/inode_metadata" + }, + { + "$ref": "#/$events/shutdown" } ], @@ -204,6 +207,31 @@ } }, + "$comment": "Shutdown event data are defined here.", + "$shutdown": { + "reason": { + "description": [ + "Reason for a filesystem to shut down.", + "Options include:", + "", + " * corrupt_incore: in-memory corruption", + " * corrupt_ondisk: on-disk corruption", + " * device_removed: device removed", + " * force_umount: userspace asked for it", + " * log_ioerr: log write IO error", + " * meta_ioerr: metadata writeback IO error" + ], + "enum": [ + "corrupt_incore", + "corrupt_ondisk", + "device_removed", + "force_umount", + "log_ioerr", + "meta_ioerr" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "running": { @@ -439,6 +467,40 @@ "generation", "structures" ] + }, + "shutdown": { + "title": "Abnormal Shutdown Event", + "description": [ + "The filesystem went offline due to", + "unrecoverable errors." + ], + "type": "object", + + "properties": { + "type": { + "const": "shutdown" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + }, + "reasons": { + "type": "array", + "items": { + "$ref": "#/$shutdown/reason" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "reasons" + ] } } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 06/26] xfs: report media errors through healthmon 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (4 preceding siblings ...) 2025-10-23 0:06 ` [PATCH 05/26] xfs: report shutdown " Darrick J. Wong @ 2025-10-23 0:07 ` Darrick J. Wong 2025-10-23 0:07 ` [PATCH 07/26] xfs: report file io " Darrick J. Wong ` (19 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:07 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we have hooks to report media errors, connect this to the health monitor as well. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 15 +++++++++ libxfs/xfs_healthmon.schema.json | 65 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 918362a7294f27..a551b1d5d0db58 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1014,6 +1014,11 @@ struct xfs_rtgroup_geometry { #define XFS_HEALTH_MONITOR_DOMAIN_INODE (3) #define XFS_HEALTH_MONITOR_DOMAIN_RTGROUP (4) +/* disk events */ +#define XFS_HEALTH_MONITOR_DOMAIN_DATADEV (5) +#define XFS_HEALTH_MONITOR_DOMAIN_RTDEV (6) +#define XFS_HEALTH_MONITOR_DOMAIN_LOGDEV (7) + /* Health monitor event types */ /* status of the monitor itself */ @@ -1031,6 +1036,9 @@ struct xfs_rtgroup_geometry { /* filesystem shutdown */ #define XFS_HEALTH_MONITOR_TYPE_SHUTDOWN (6) +/* media errors */ +#define XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR (7) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1071,6 +1079,12 @@ struct xfs_health_monitor_shutdown { __u32 reasons; }; +/* disk media errors */ +struct xfs_health_monitor_media { + __u64 daddr; + __u64 bbcount; +}; + struct xfs_health_monitor_event { /* XFS_HEALTH_MONITOR_DOMAIN_* */ __u32 domain; @@ -1092,6 +1106,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_group group; struct xfs_health_monitor_inode inode; struct xfs_health_monitor_shutdown shutdown; + struct xfs_health_monitor_media media; } e; /* zeroes */ diff --git a/libxfs/xfs_healthmon.schema.json b/libxfs/xfs_healthmon.schema.json index 1657ccc482edff..d3b537a040cb83 100644 --- a/libxfs/xfs_healthmon.schema.json +++ b/libxfs/xfs_healthmon.schema.json @@ -39,6 +39,9 @@ }, { "$ref": "#/$events/shutdown" + }, + { + "$ref": "#/$events/media_error" } ], @@ -75,6 +78,31 @@ "i_generation": { "description": "Inode generation number", "type": "integer" + }, + "storage_devs": { + "description": "Storage devices in a filesystem", + "_comment": [ + "One of:", + "", + " * datadev: filesystem device", + " * logdev: external log device", + " * rtdev: realtime volume" + ], + "enum": [ + "datadev", + "logdev", + "rtdev" + ] + }, + "xfs_daddr_t": { + "description": "Storage device address, in units of 512-byte blocks", + "type": "integer", + "minimum": 0 + }, + "bbcount": { + "description": "Storage space length, in units of 512-byte blocks", + "type": "integer", + "minimum": 1 } }, @@ -501,6 +529,43 @@ "domain", "reasons" ] + }, + "media_error": { + "title": "Media Error", + "description": [ + "A storage device reported a media error.", + "The domain element tells us which storage", + "device reported the media failure. The", + "daddr and bbcount elements tell us where", + "inside that device the failure was observed." + ], + "type": "object", + + "properties": { + "type": { + "const": "media" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "$ref": "#/$defs/storage_devs" + }, + "daddr": { + "$ref": "#/$defs/xfs_daddr_t" + }, + "bbcount": { + "$ref": "#/$defs/bbcount" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "daddr", + "bbcount" + ] } } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 07/26] xfs: report file io errors through healthmon 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (5 preceding siblings ...) 2025-10-23 0:07 ` [PATCH 06/26] xfs: report media errors " Darrick J. Wong @ 2025-10-23 0:07 ` Darrick J. Wong 2025-10-23 0:07 ` [PATCH 08/26] xfs: validate fds against running healthmon Darrick J. Wong ` (18 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:07 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set up a file io error event hook so that we can send events about read errors, writeback errors, and directio errors to userspace. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 18 +++++++++ libxfs/xfs_healthmon.schema.json | 77 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index a551b1d5d0db58..87e915baa875d6 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1019,6 +1019,9 @@ struct xfs_rtgroup_geometry { #define XFS_HEALTH_MONITOR_DOMAIN_RTDEV (6) #define XFS_HEALTH_MONITOR_DOMAIN_LOGDEV (7) +/* file range events */ +#define XFS_HEALTH_MONITOR_DOMAIN_FILERANGE (8) + /* Health monitor event types */ /* status of the monitor itself */ @@ -1039,6 +1042,12 @@ struct xfs_rtgroup_geometry { /* media errors */ #define XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR (7) +/* file range events */ +#define XFS_HEALTH_MONITOR_TYPE_BUFREAD (8) +#define XFS_HEALTH_MONITOR_TYPE_BUFWRITE (9) +#define XFS_HEALTH_MONITOR_TYPE_DIOREAD (10) +#define XFS_HEALTH_MONITOR_TYPE_DIOWRITE (11) + /* lost events */ struct xfs_health_monitor_lost { __u64 count; @@ -1079,6 +1088,14 @@ struct xfs_health_monitor_shutdown { __u32 reasons; }; +/* file range events */ +struct xfs_health_monitor_filerange { + __u64 pos; + __u64 len; + __u64 ino; + __u32 gen; +}; + /* disk media errors */ struct xfs_health_monitor_media { __u64 daddr; @@ -1107,6 +1124,7 @@ struct xfs_health_monitor_event { struct xfs_health_monitor_inode inode; struct xfs_health_monitor_shutdown shutdown; struct xfs_health_monitor_media media; + struct xfs_health_monitor_filerange filerange; } e; /* zeroes */ diff --git a/libxfs/xfs_healthmon.schema.json b/libxfs/xfs_healthmon.schema.json index d3b537a040cb83..fb696dfbbfd044 100644 --- a/libxfs/xfs_healthmon.schema.json +++ b/libxfs/xfs_healthmon.schema.json @@ -42,6 +42,9 @@ }, { "$ref": "#/$events/media_error" + }, + { + "$ref": "#/$events/file_ioerror" } ], @@ -79,6 +82,16 @@ "description": "Inode generation number", "type": "integer" }, + "off_t": { + "description": "File position, in bytes", + "type": "integer", + "minimum": 0 + }, + "size_t": { + "description": "File operation length, in bytes", + "type": "integer", + "minimum": 1 + }, "storage_devs": { "description": "Storage devices in a filesystem", "_comment": [ @@ -260,6 +273,26 @@ } }, + "$comment": "File IO event data are defined here.", + "$fileio": { + "types": { + "description": [ + "File I/O operations. One of:", + "", + " * readahead: reads into the page cache.", + " * writeback: writeback of dirty page cache.", + " * directio_read: O_DIRECT reads.", + " * directio_owrite: O_DIRECT writes." + ], + "enum": [ + "readahead", + "writeback", + "directio_read", + "directio_write" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "running": { @@ -566,6 +599,50 @@ "daddr", "bbcount" ] + }, + "file_ioerror": { + "title": "File I/O error", + "description": [ + "A read or a write to a file failed. The", + "inode, generation, pos, and len fields", + "describe the range of the file that is", + "affected." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$fileio/types" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "filerange" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "pos": { + "$ref": "#/$defs/off_t" + }, + "length": { + "$ref": "#/$defs/size_t" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "pos", + "length" + ] } } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 08/26] xfs: validate fds against running healthmon 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (6 preceding siblings ...) 2025-10-23 0:07 ` [PATCH 07/26] xfs: report file io " Darrick J. Wong @ 2025-10-23 0:07 ` Darrick J. Wong 2025-10-23 0:07 ` [PATCH 09/26] xfs: add media error reporting ioctl Darrick J. Wong ` (17 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:07 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new ioctl for the healthmon file that checks that a given fd points to the same filesystem that the healthmon file is monitoring. This allows xfs_healer to check that when it reopens a mountpoint to perform repairs, the file that it gets matches the filesystem that generated the corruption report. (Note that xfs_healer doesn't maintain an open fd to a filesystem that it's monitoring so that it doesn't pin the mount.) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 87e915baa875d6..b5a00ef6ce5fb9 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1149,6 +1149,15 @@ struct xfs_health_monitor { /* Return events in JSON format */ #define XFS_HEALTH_MONITOR_FMT_JSON (1) +/* + * Check that a given fd points to the same filesystem that the health monitor + * is monitoring. + */ +struct xfs_health_samefs { + __s32 fd; + __u32 flags; /* zero for now */ +}; + /* * ioctl commands that are used by Linux filesystems */ @@ -1189,6 +1198,7 @@ struct xfs_health_monitor { #define XFS_IOC_SCRUBV_METADATA _IOWR('X', 64, struct xfs_scrub_vec_head) #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) #define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) +#define XFS_IOC_HEALTH_SAMEFS _IOW ('X', 69, struct xfs_health_samefs) /* * ioctl commands that replace IRIX syssgi()'s ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 09/26] xfs: add media error reporting ioctl 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (7 preceding siblings ...) 2025-10-23 0:07 ` [PATCH 08/26] xfs: validate fds against running healthmon Darrick J. Wong @ 2025-10-23 0:07 ` Darrick J. Wong 2025-10-23 0:08 ` [PATCH 10/26] xfs_io: monitor filesystem health events Darrick J. Wong ` (16 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:07 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new privileged ioctl so that xfs_scrub can report media errors to the kernel for further processing. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- libxfs/xfs_fs.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index b5a00ef6ce5fb9..5d35d67b10e153 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -1158,6 +1158,21 @@ struct xfs_health_samefs { __u32 flags; /* zero for now */ }; +/* Report a media error */ +struct xfs_media_error { + __u64 flags; /* flags */ + __u64 daddr; /* disk address of range */ + __u64 bbcount; /* length, in 512b blocks */ + __u64 pad; /* zero */ +}; + +#define XFS_MEDIA_ERROR_DATADEV (1) /* data device */ +#define XFS_MEDIA_ERROR_LOGDEV (2) /* external log device */ +#define XFS_MEDIA_ERROR_RTDEV (3) /* realtime device */ + +/* bottom byte of flags is the device code */ +#define XFS_MEDIA_ERROR_DEVMASK (0xFF) + /* * ioctl commands that are used by Linux filesystems */ @@ -1199,6 +1214,7 @@ struct xfs_health_samefs { #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) #define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) #define XFS_IOC_HEALTH_SAMEFS _IOW ('X', 69, struct xfs_health_samefs) +#define XFS_IOC_MEDIA_ERROR _IOW ('X', 70, struct xfs_media_error) /* * ioctl commands that replace IRIX syssgi()'s ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 10/26] xfs_io: monitor filesystem health events 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (8 preceding siblings ...) 2025-10-23 0:07 ` [PATCH 09/26] xfs: add media error reporting ioctl Darrick J. Wong @ 2025-10-23 0:08 ` Darrick J. Wong 2025-10-23 0:08 ` [PATCH 11/26] xfs_io: add a media error reporting command Darrick J. Wong ` (15 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:08 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a subcommand to monitor for health events generated by the kernel. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- io/io.h | 1 io/Makefile | 1 io/healthmon.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++++ io/init.c | 1 man/man8/xfs_io.8 | 25 +++++++ 5 files changed, 211 insertions(+) create mode 100644 io/healthmon.c diff --git a/io/io.h b/io/io.h index 35fb8339eeb5aa..2f5262bce6acbb 100644 --- a/io/io.h +++ b/io/io.h @@ -162,3 +162,4 @@ extern void bulkstat_init(void); void exchangerange_init(void); void fsprops_init(void); void aginfo_init(void); +void healthmon_init(void); diff --git a/io/Makefile b/io/Makefile index 444e2d6a557d5d..8e3783353a52b5 100644 --- a/io/Makefile +++ b/io/Makefile @@ -25,6 +25,7 @@ CFILES = \ fsuuid.c \ fsync.c \ getrusage.c \ + healthmon.c \ imap.c \ init.c \ inject.c \ diff --git a/io/healthmon.c b/io/healthmon.c new file mode 100644 index 00000000000000..7d372d7d8c532b --- /dev/null +++ b/io/healthmon.c @@ -0,0 +1,183 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2024-2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs.h" +#include "libfrog/fsgeom.h" +#include "libfrog/paths.h" +#include "command.h" +#include "init.h" +#include "io.h" + +static void +healthmon_help(void) +{ + printf(_( +"Monitor filesystem health events" +"\n" +"-c Replace the open file with the monitor file.\n" +"-d delay_ms Sleep this many milliseconds between reads.\n" +"-p Only probe for the existence of the ioctl.\n" +"-v Request all events.\n" +"\n")); +} + +static inline int +monitor_sleep( + int delay_ms) +{ + struct timespec ts; + + if (!delay_ms) + return 0; + + ts.tv_sec = delay_ms / 1000; + ts.tv_nsec = (delay_ms % 1000) * 1000000; + + return nanosleep(&ts, NULL); +} + +static int +monitor( + size_t bufsize, + bool consume, + int delay_ms, + bool verbose, + bool only_probe) +{ + struct xfs_health_monitor hmo = { + .format = XFS_HEALTH_MONITOR_FMT_JSON, + }; + char *buf; + ssize_t bytes_read; + int mon_fd; + int ret = 1; + + if (verbose) + hmo.flags |= XFS_HEALTH_MONITOR_ALL; + + mon_fd = ioctl(file->fd, XFS_IOC_HEALTH_MONITOR, &hmo); + if (mon_fd < 0) { + perror("XFS_IOC_HEALTH_MONITOR"); + return 1; + } + + if (only_probe) { + ret = 0; + goto out_mon; + } + + buf = malloc(bufsize); + if (!buf) { + perror("malloc"); + goto out_mon; + } + + if (consume) { + close(file->fd); + file->fd = mon_fd; + } + + monitor_sleep(delay_ms); + while ((bytes_read = read(mon_fd, buf, bufsize)) > 0) { + char *write_ptr = buf; + ssize_t bytes_written; + size_t to_write = bytes_read; + + while ((bytes_written = write(STDOUT_FILENO, write_ptr, to_write)) > 0) { + write_ptr += bytes_written; + to_write -= bytes_written; + } + if (bytes_written < 0) { + perror("healthdump"); + goto out_buf; + } + + monitor_sleep(delay_ms); + } + if (bytes_read < 0) { + perror("healthmon"); + goto out_buf; + } + + ret = 0; + +out_buf: + free(buf); +out_mon: + close(mon_fd); + return ret; +} + +static int +healthmon_f( + int argc, + char **argv) +{ + size_t bufsize = 4096; + bool consume = false; + bool verbose = false; + bool only_probe = false; + int delay_ms = 0; + int c; + + while ((c = getopt(argc, argv, "b:cd:pv")) != EOF) { + switch (c) { + case 'b': + errno = 0; + c = atoi(optarg); + if (c < 0 || errno) { + printf("%s: bufsize must be positive\n", + optarg); + exitcode = 1; + return 0; + } + bufsize = c; + break; + case 'c': + consume = true; + break; + case 'd': + errno = 0; + delay_ms = atoi(optarg); + if (delay_ms < 0 || errno) { + printf("%s: delay must be positive msecs\n", + optarg); + exitcode = 1; + return 0; + } + break; + case 'p': + only_probe = true; + break; + case 'v': + verbose = true; + break; + default: + exitcode = 1; + healthmon_help(); + return 0; + } + } + + return monitor(bufsize, consume, delay_ms, verbose, only_probe); +} + +static struct cmdinfo healthmon_cmd = { + .name = "healthmon", + .cfunc = healthmon_f, + .argmin = 0, + .argmax = -1, + .flags = CMD_FLAG_ONESHOT | CMD_NOMAP_OK, + .args = "[-c] [-d delay_ms] [-v]", + .help = healthmon_help, +}; + +void +healthmon_init(void) +{ + healthmon_cmd.oneline = _("monitor filesystem health events"); + + add_command(&healthmon_cmd); +} diff --git a/io/init.c b/io/init.c index 49e9e7cb88214b..cb5573f45ccfbc 100644 --- a/io/init.c +++ b/io/init.c @@ -92,6 +92,7 @@ init_commands(void) crc32cselftest_init(); exchangerange_init(); fsprops_init(); + healthmon_init(); } /* diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8 index 0a673322fde3a1..f7f2956a54a7aa 100644 --- a/man/man8/xfs_io.8 +++ b/man/man8/xfs_io.8 @@ -1356,6 +1356,31 @@ .SH FILESYSTEM COMMANDS .B thaw Undo the effects of a filesystem freeze operation. Only available in expert mode and requires privileges. +.TP +.BI "healthmon [ \-c " bufsize " ] [ \-c ] [ \-d " delay_ms " ] [ \-p ] [ \-v ]" +Watch for filesystem health events and write them to the console. +.RE +.RS 1.0i +.PD 0 +.TP +.BI "\-b " bufsize +Use a buffer of this size to read events from the kernel. +.TP +.BI \-c +Close the open file and replace it with the monitor file. +.TP +.BI "\-d " delay_ms +Sleep for this long between read attempts. +.TP +.B \-p +Probe for the existence of the functionality by opening the monitoring fd and +closing it immediately. +.TP +.BI \-v +Request all health events, even if nothing changed. +.PD +.RE + .TP .BI "inject [ " tag " ]" Inject errors into a filesystem to observe filesystem behavior at ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 11/26] xfs_io: add a media error reporting command 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (9 preceding siblings ...) 2025-10-23 0:08 ` [PATCH 10/26] xfs_io: monitor filesystem health events Darrick J. Wong @ 2025-10-23 0:08 ` Darrick J. Wong 2025-10-23 0:08 ` [PATCH 12/26] xfs_healer: create daemon to listen for health events Darrick J. Wong ` (14 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:08 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a subcommand to invoke the media error ioctl to make sure it works. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- io/shutdown.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++ man/man8/xfs_io.8 | 21 ++++++++++ 2 files changed, 133 insertions(+), 1 deletion(-) diff --git a/io/shutdown.c b/io/shutdown.c index 3c29ea790643f8..b4fba7d78ba83b 100644 --- a/io/shutdown.c +++ b/io/shutdown.c @@ -53,6 +53,115 @@ shutdown_help(void) "\n")); } +static void +mediaerror_help(void) +{ + printf(_( +"\n" +" Report a media error on the data device to the filesystem.\n" +"\n" +" -l -- Report against the log device.\n" +" -r -- Report against the realtime device.\n" +"\n" +" offset is the byte offset of the start of the failed range. If offset is\n" +" specified, mapping length may (optionally) be specified as well." +"\n" +" length is the byte length of the failed range.\n" +"\n" +" If neither offset nor length are specified, the media error report will\n" +" be made against the entire device." +"\n")); +} + +static int +mediaerror_f( + int argc, + char **argv) +{ + struct xfs_media_error me = { + .daddr = 0, + .bbcount = -1ULL, + .flags = XFS_MEDIA_ERROR_DATADEV, + }; + long long l; + size_t fsblocksize, fssectsize; + int c, ret; + + init_cvtnum(&fsblocksize, &fssectsize); + + while ((c = getopt(argc, argv, "lr")) != EOF) { + switch (c) { + case 'l': + me.flags = (me.flags & ~XFS_MEDIA_ERROR_DEVMASK) | + XFS_MEDIA_ERROR_LOGDEV; + break; + case 'r': + me.flags = (me.flags & ~XFS_MEDIA_ERROR_DEVMASK) | + XFS_MEDIA_ERROR_RTDEV; + break; + default: + mediaerror_help(); + exitcode = 1; + return 0; + } + } + + /* Range start (optional) */ + if (optind < argc) { + l = cvtnum(fsblocksize, fssectsize, argv[optind]); + if (l < 0) { + printf("non-numeric offset argument -- %s\n", + argv[optind]); + exitcode = 1; + return 0; + } + + me.daddr = l / 512; + optind++; + } + + /* Range length (optional if range start was specified) */ + if (optind < argc) { + l = cvtnum(fsblocksize, fssectsize, argv[optind]); + if (l < 0) { + printf("non-numeric len argument -- %s\n", + argv[optind]); + exitcode = 1; + return 0; + } + + me.bbcount = howmany(l, 512); + optind++; + } + + if (optind < argc) { + printf("too many arguments -- %s\n", argv[optind]); + exitcode = 1; + return 0; + } + + ret = ioctl(file->fd, XFS_IOC_MEDIA_ERROR, &me); + if (ret) { + fprintf(stderr, + "%s: ioctl(XFS_IOC_MEDIA_ERROR) [\"%s\"]: %s\n", + progname, file->name, strerror(errno)); + exitcode = 1; + return 0; + } + + return 0; +} + +static struct cmdinfo mediaerror_cmd = { + .name = "mediaerror", + .cfunc = mediaerror_f, + .argmin = 0, + .argmax = -1, + .flags = CMD_FLAG_ONESHOT | CMD_NOMAP_OK, + .args = "[-lr] [offset [length]]", + .help = mediaerror_help, +}; + void shutdown_init(void) { @@ -66,6 +175,8 @@ shutdown_init(void) shutdown_cmd.oneline = _("shuts down the filesystem where the current file resides"); - if (expert) + if (expert) { add_command(&shutdown_cmd); + add_command(&mediaerror_cmd); + } } diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8 index f7f2956a54a7aa..aa22db4150ac24 100644 --- a/man/man8/xfs_io.8 +++ b/man/man8/xfs_io.8 @@ -1389,6 +1389,27 @@ .SH FILESYSTEM COMMANDS argument, displays the list of error tags available. Only available in expert mode and requires privileges. +.TP +.BI "mediaerror [ \-lr ] [ " offset " [ " length " ]]" +Report a media error against the data device of an XFS filesystem. +The +.I offset +and +.I length +parameters are specified in units of bytes. +If neither are specified, the entire device will be reported. +.RE +.RS 1.0i +.PD 0 +.TP +.BI \-l +Report against the log device instead of the data device. +.TP +.BI \-r +Report against the realtime device instead of the data device. +.PD +.RE + .TP .BI "rginfo [ \-r " rgno " ]" Show information about or update the state of realtime allocation groups. ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 12/26] xfs_healer: create daemon to listen for health events 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (10 preceding siblings ...) 2025-10-23 0:08 ` [PATCH 11/26] xfs_io: add a media error reporting command Darrick J. Wong @ 2025-10-23 0:08 ` Darrick J. Wong 2025-10-23 0:08 ` [PATCH 13/26] xfs_healer: check events against schema Darrick J. Wong ` (13 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:08 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a daemon program that can listen for and log health events. Eventually this will be used to self-heal filesystems in real time. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- Makefile | 5 + configure.ac | 6 + healer/Makefile | 34 ++++ healer/xfs_healer.py.in | 368 +++++++++++++++++++++++++++++++++++++++++++++++ include/builddefs.in | 1 5 files changed, 414 insertions(+) create mode 100644 healer/Makefile create mode 100644 healer/xfs_healer.py.in diff --git a/Makefile b/Makefile index c73aa391bc5f43..6056723b21348a 100644 --- a/Makefile +++ b/Makefile @@ -69,6 +69,10 @@ ifeq ("$(ENABLE_SCRUB)","yes") TOOL_SUBDIRS += scrub endif +ifeq ("$(ENABLE_SCRUB)","yes") +TOOL_SUBDIRS += healer +endif + ifneq ("$(XGETTEXT)","") TOOL_SUBDIRS += po endif @@ -100,6 +104,7 @@ mkfs: libxcmd spaceman: libxcmd libhandle scrub: libhandle libxcmd rtcp: libfrog +healer: ifeq ($(HAVE_BUILDDEFS), yes) include $(BUILDRULES) diff --git a/configure.ac b/configure.ac index 1f6c11e5e78ebb..369cdd1696380a 100644 --- a/configure.ac +++ b/configure.ac @@ -110,6 +110,12 @@ AC_ARG_ENABLE(libicu, [ --enable-libicu=[yes/no] Enable Unicode name scanning in xfs_scrub (libicu) [default=probe]],, enable_libicu=probe) +# Enable xfs_healer build +AC_ARG_ENABLE(healer, +[ --enable-healer=[yes/no] Enable build of xfs_healer utility [[default=yes]]],, + enable_healer=yes) +AC_SUBST(enable_healer) + # # If the user specified a libdir ending in lib64 do not append another # 64 to the library names. diff --git a/healer/Makefile b/healer/Makefile new file mode 100644 index 00000000000000..f0b3a62cd068b6 --- /dev/null +++ b/healer/Makefile @@ -0,0 +1,34 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024-2025 Oracle. All Rights Reserved. +# + +TOPDIR = .. +builddefs=$(TOPDIR)/include/builddefs +include $(builddefs) + +XFS_HEALER_PROG = xfs_healer.py +INSTALL_HEALER = install-healer + +LDIRT = $(XFS_HEALER_PROG) + +default: $(XFS_HEALER_PROG) + +$(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext.py + @echo " [SED] $@" + $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ + -e '/@INIT_GETTEXT@/r $(TOPDIR)/libfrog/gettext.py' \ + -e '/@INIT_GETTEXT@/d' \ + < $< > $@ + $(Q)chmod a+x $@ + +include $(BUILDRULES) + +install: $(INSTALL_HEALER) + +install-healer: default + $(INSTALL) -m 755 -d $(PKG_LIBEXEC_DIR) + $(INSTALL) -m 755 $(XFS_HEALER_PROG) $(PKG_LIBEXEC_DIR)/xfs_healer + +install-dev: + +-include .dep diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in new file mode 100644 index 00000000000000..507f537aea0d9a --- /dev/null +++ b/healer/xfs_healer.py.in @@ -0,0 +1,368 @@ +#!/usr/bin/python3 + +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2024-2025 Oracle. All rights reserved. +# +# Author: Darrick J. Wong <djwong@kernel.org> + +# Daemon to listen for and react to filesystem health events + +@INIT_GETTEXT@ +import sys +import os +import argparse +import fcntl +import json +import datetime +import errno +import ctypes +import pathlib +import gc +from concurrent.futures import ProcessPoolExecutor + +debug = False +log = False +everything = False +debug_fast = False +printf_prefix = '' + +def printlogln(*args, **kwargs): + '''Print a log message to stdout and flush it.''' + print(*args, **kwargs) + sys.stdout.flush() + +def eprintln(*args, **kwargs): + '''Print an error message to stderr.''' + print(*args, file = sys.stderr, **kwargs) + +# ioctl encoding stuff +_IOC_NRBITS = 8 +_IOC_TYPEBITS = 8 +_IOC_SIZEBITS = 14 +_IOC_DIRBITS = 2 + +_IOC_NRMASK = (1 << _IOC_NRBITS) - 1 +_IOC_TYPEMASK = (1 << _IOC_TYPEBITS) - 1 +_IOC_SIZEMASK = (1 << _IOC_SIZEBITS) - 1 +_IOC_DIRMASK = (1 << _IOC_DIRBITS) - 1 + +_IOC_NRSHIFT = 0 +_IOC_TYPESHIFT = (_IOC_NRSHIFT + _IOC_NRBITS) +_IOC_SIZESHIFT = (_IOC_TYPESHIFT + _IOC_TYPEBITS) +_IOC_DIRSHIFT = (_IOC_SIZESHIFT + _IOC_SIZEBITS) + +_IOC_NONE = 0 +_IOC_WRITE = 1 +_IOC_READ = 2 + +def _IOC(direction, type, nr, t): + assert direction <= _IOC_DIRMASK, direction + assert type <= _IOC_TYPEMASK, type + assert nr <= _IOC_NRMASK, nr + + size = ctypes.sizeof(t) + assert size <= _IOC_SIZEMASK, size + + return (((direction) << _IOC_DIRSHIFT) | + ((type) << _IOC_TYPESHIFT) | + ((nr) << _IOC_NRSHIFT) | + ((size) << _IOC_SIZESHIFT)) + +def _IOR(type, number, size): + return _IOC(_IOC_READ, type, number, size) + +def _IOW(type, number, size): + return _IOC(_IOC_WRITE, type, number, size) + +def _IOWR(type, number, size): + return _IOC(_IOC_READ | _IOC_WRITE, type, number, size) + +# xfs health monitoring ioctl stuff +XFS_HEALTH_MONITOR_FMT_JSON = 1 +XFS_HEALTH_MONITOR_VERBOSE = 1 << 0 + +class xfs_health_monitor(ctypes.Structure): + _fields_ = [ + ('flags', ctypes.c_ulonglong), + ('format', ctypes.c_ubyte), + ('_pad0', ctypes.c_ubyte * 7), + ('_pad1', ctypes.c_ulonglong * 2) + ] +assert ctypes.sizeof(xfs_health_monitor) == 32 + +XFS_IOC_HEALTH_MONITOR = _IOW(0x58, 68, xfs_health_monitor) + +def open_health_monitor(fd, verbose = False): + '''Return a health monitoring fd.''' + + arg = xfs_health_monitor() + arg.format = XFS_HEALTH_MONITOR_FMT_JSON + + if verbose: + arg.flags |= XFS_HEALTH_MONITOR_VERBOSE + + ret = fcntl.ioctl(fd, XFS_IOC_HEALTH_MONITOR, arg) + return ret + +# main program + +def health_reports(mon_fp): + '''Generate python objects describing health events.''' + global debug + global printf_prefix + + lines = [] + buf = mon_fp.readline() + while buf != '': + for line in buf.split('\0'): + line = line.strip() + if debug: + n = _("new line") + printlogln(f'{n}: {line}') + if line == '': + continue + + lines.append(line) + if line == '}': + yield lines + lines = [] + buf = mon_fp.readline() + +def report_event(event): + '''Log a monitoring event to stdout.''' + global printf_prefix + + if event['domain'] == 'inode': + structures = ', '.join([_(x) for x in event['structures']]) + status = _(event['type']) + printlogln(f"{printf_prefix}: {structures} {status}") + + elif event['domain'] == 'perag': + structures = ', '.join([_(x) for x in event['structures']]) + status = _(event['type']) + group = event['group'] + agnom = _("agno") + printlogln(f"{printf_prefix}: {agnom} {group} {structures} {status}") + + elif event['domain'] == 'fs': + structures = ', '.join([_(x) for x in event['structures']]) + status = _(event['type']) + printlogln(f"{printf_prefix}: {structures} {status}") + + elif event['domain'] == 'rtgroup': + structures = ', '.join([_(x) for x in event['structures']]) + status = _(event['type']) + group = event['group'] + rgnom = _("rgno") + printlogln(f"{printf_prefix}: {rgnom} {group} {structures} {status}") + + elif event['domain'] in ('datadev', 'logdev', 'rtdev'): + device = _(event['domain']) + daddr = event['daddr'] + bbcount = event['bbcount'] + msg = _("media error on") + daddrm = _("daddr") + bbcountm = _("bbcount") + printlogln(f"{printf_prefix}: {msg} {device} {daddrm} {daddr:#x} {bbcountm} {bbcount:#x}") + + elif event['domain'] == 'filerange': + event_type = _(event['type']) + pos = event['pos'] + length = event['length'] + posm = _("pos") + lenm = _("len") + printlogln(f"{printf_prefix}: {event_type} {posm} {pos} {lenm} {length}") + +def report_lost(event): + '''Report that the kernel lost events.''' + global printf_prefix + + msg = _("events lost") + printlogln(f"{printf_prefix}: {msg}") + +def report_running(event): + '''Report that the monitor is running.''' + global printf_prefix + + msg = _("monitoring started") + printlogln(f"{printf_prefix}: {msg}") + +def report_unmount(event): + '''Report that the filesystem was unmounted.''' + global printf_prefix + + msg = _("filesystem unmounted") + printlogln(f"{printf_prefix}: {msg}") + +def report_shutdown(event): + '''Report an abortive shutdown of the filesystem.''' + global printf_prefix + REASONS = { + "meta_ioerr": _("metadata IO error"), + "log_ioerr": _("log IO error"), + "force_umount": _("forced unmount"), + "corrupt_incore": _("in-memory state corruption"), + "corrupt_ondisk": _("ondisk metadata corruption"), + "device_removed": _("device removal"), + } + + reasons = [] + for reason in event['reasons']: + if reason in REASONS: + reasons.append(REASONS[reason]) + else: + reasons.append(reason) + + some_reasons = ', '.join([_(x) for x in reasons]) + msg = _("filesystem shut down due to") + printlogln(f"{printf_prefix}: {msg} {some_reasons}") + +def handle_event(lines): + '''Handle an event asynchronously.''' + global log + + # Convert array of strings into a json object + try: + event = json.loads(''.join(lines)) + except json.decoder.JSONDecodeError as e: + fromm = _("from") + eprintln(f"{printf_prefix}: {e} {fromm} {s}") + return + + # Deal with reporting-only events; these should always generate log + # messages. + if event['type'] == 'lost': + report_lost(event) + return + + if event['type'] == 'running': + report_running(event) + return + + if event['type'] == 'unmount': + report_unmount(event) + return + + if event['type'] == 'shutdown': + report_shutdown(event) + return + + # Deal with everything else. + if log: + try: + report_event(event) + except Exception as e: + eprintln(f"event reporting: {e}") + +def monitor(mountpoint, event_queue, **kwargs): + '''Monitor the given mountpoint for health events.''' + global everything + + def event_loop(mon_fd, event_queue): + # Ownership of mon_fd (and hence responsibility for closing it) + # is transferred to the mon_fp object. + with os.fdopen(mon_fd) as mon_fp: + nr = 0 + for lines in health_reports(mon_fp): + event_queue.submit(handle_event, lines) + + # Periodically run the garbage collector to + # constrain memory usage in the main thread. + # If only there was a way to submit to a queue + # without everything being tied up in a Future + if nr % 5355 == 0: + gc.collect() + nr += 1 + + try: + fd = os.open(mountpoint, os.O_RDONLY) + except Exception as e: + eprintln(f"{mountpoint}: {e}") + return 1 + + try: + mon_fd = open_health_monitor(fd, verbose = everything) + except OSError as e: + if e.errno == errno.ENOTTY or e.errno == errno.EOPNOTSUPP: + msg = _("XFS health monitoring not supported.") + eprintln(f"{mountpoint}: {msg}") + else: + eprintln(f"{mountpoint}: {e}") + return 1 + except Exception as e: + eprintln(f"{mountpoint}: {e}") + return 1 + finally: + # Close the mountpoint if opening the health monitor fails + os.close(fd) + + try: + # mon_fd is consumed by this function + event_loop(mon_fd, event_queue) + except Exception as e: + eprintln(f"{mountpoint}: {e}") + return 1 + + return 0 + +def main(): + global debug + global log + global printf_prefix + global everything + global debug_fast + + parser = argparse.ArgumentParser( \ + description = _("Automatically heal damage to XFS filesystem metadata")) + parser.add_argument("-V", action = "store_true", \ + help = _("Print version")) + parser.add_argument("--debug", action = "store_true", \ + help = _("Enable debugging messages")) + parser.add_argument("--log", action = "store_true", \ + help = _("Log health events to stdout")) + parser.add_argument("--everything", action = "store_true", \ + help = _("Capture all events")) + parser.add_argument('mountpoint', default = None, nargs = '?', metavar = _('PATH'), type = pathlib.Path, \ + help = _('XFS filesystem mountpoint to monitor')) + parser.add_argument('--debug-fast', action = 'store_true', \ + help = argparse.SUPPRESS) + args = parser.parse_args() + + if args.V: + vs = _("xfs_healer version") + pkgver = "@pkg_version@" + printlogln(f"{vs} {pkgver}") + return 0 + + if args.mountpoint is None: + parser.error(_("The following arguments are required: mountpoint")) + return 1 + + if args.debug: + debug = True + if args.log: + log = True + if args.everything: + everything = True + if args.debug_fast: + debug_fast = True + + # Use a separate subprocess to handle the events so that the main event + # reading process does not block on the GIL of the event handling + # subprocess. The downside is that we cannot pass function pointers + # and all data must be pickleable; the upside is not losing events. + args.event_queue = ProcessPoolExecutor() + + printf_prefix = args.mountpoint + ret = 0 + try: + ret = monitor(**vars(args)) + except KeyboardInterrupt: + # Consider SIGINT to be a clean exit. + pass + + args.event_queue.shutdown() + return ret + +if __name__ == '__main__': + sys.exit(main()) diff --git a/include/builddefs.in b/include/builddefs.in index b38a099b7d525a..cb43029dc1f4c1 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -91,6 +91,7 @@ ENABLE_SHARED = @enable_shared@ ENABLE_GETTEXT = @enable_gettext@ ENABLE_EDITLINE = @enable_editline@ ENABLE_SCRUB = @enable_scrub@ +ENABLE_HEALER = @enable_healer@ HAVE_ZIPPED_MANPAGES = @have_zipped_manpages@ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 13/26] xfs_healer: check events against schema 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (11 preceding siblings ...) 2025-10-23 0:08 ` [PATCH 12/26] xfs_healer: create daemon to listen for health events Darrick J. Wong @ 2025-10-23 0:08 ` Darrick J. Wong 2025-10-23 0:09 ` [PATCH 14/26] xfs_healer: enable repairing filesystems Darrick J. Wong ` (12 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:08 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Validate that the event objects that we get from the kernel actually obey the schema that the kernel publishes. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 + healer/xfs_healer.py.in | 62 +++++++++++++++++++++++++++++++++++++++++++++++ libxfs/Makefile | 10 +++++--- 3 files changed, 70 insertions(+), 3 deletions(-) diff --git a/healer/Makefile b/healer/Makefile index f0b3a62cd068b6..100e99cc9ef0a2 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -18,6 +18,7 @@ $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ -e '/@INIT_GETTEXT@/r $(TOPDIR)/libfrog/gettext.py' \ -e '/@INIT_GETTEXT@/d' \ + -e "s|@pkg_data_dir@|$(PKG_DATA_DIR)|g" \ < $< > $@ $(Q)chmod a+x $@ diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index 507f537aea0d9a..459a07d3804ab5 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -20,6 +20,52 @@ import pathlib import gc from concurrent.futures import ProcessPoolExecutor +try: + # Not all systems will have this json schema validation libarary, + # so we make it optional. + import jsonschema + + def init_validation(args): + '''Initialize event json validation.''' + schema_file = args.event_schema + try: + with open(schema_file) as fp: + schema_js = json.load(fp) + except Exception as e: + eprintln(f"{schema_file}: {e}") + return + + try: + vcls = jsonschema.validators.validator_for(schema_js) + vcls.check_schema(schema_js) + validator = vcls(schema_js) + except jsonschema.exceptions.SchemaError as e: + msg = _("invalid event data") + eprintln(f"{schema_file}: {msg}: {e.message}") + return + except Exception as e: + eprintln(f"{schema_file}: {e}") + return + + def v(i): + e = jsonschema.exceptions.best_match( + validator.iter_errors(i)) + if e: + eprintln(f"{printf_prefix}: {e.message}") + return False + return True + + return v + +except: + def init_validation(args): + if args.require_validation: + eprintln(_("JSON schema validation not available.")) + return + + return lambda instance: True + +validator_fn = None debug = False log = False everything = False @@ -229,6 +275,12 @@ def handle_event(lines): eprintln(f"{printf_prefix}: {e} {fromm} {s}") return + # Ignore any event that doesn't pass our schema. This program must + # not try to handle a newer kernel that say things that it is not + # prepared to handle. + if not validator_fn(event): + return + # Deal with reporting-only events; these should always generate log # messages. if event['type'] == 'lost': @@ -311,6 +363,7 @@ def main(): global printf_prefix global everything global debug_fast + global validator_fn parser = argparse.ArgumentParser( \ description = _("Automatically heal damage to XFS filesystem metadata")) @@ -326,6 +379,11 @@ def main(): help = _('XFS filesystem mountpoint to monitor')) parser.add_argument('--debug-fast', action = 'store_true', \ help = argparse.SUPPRESS) + parser.add_argument('--require-validation', action = 'store_true', \ + help = argparse.SUPPRESS) + parser.add_argument('--event-schema', type = str, \ + default = '@pkg_data_dir@/xfs_healthmon.schema.json', \ + help = argparse.SUPPRESS) args = parser.parse_args() if args.V: @@ -338,6 +396,10 @@ def main(): parser.error(_("The following arguments are required: mountpoint")) return 1 + validator_fn = init_validation(args) + if not validator_fn: + return 1 + if args.debug: debug = True if args.log: diff --git a/libxfs/Makefile b/libxfs/Makefile index 61c43529b532b6..f84eb5b43cdddd 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -151,6 +151,8 @@ EXTRA_OBJECTS=\ LDIRT += $(EXTRA_OBJECTS) +JSON_SCHEMAS=xfs_healthmon.schema.json + # # Tracing flags: # -DMEM_DEBUG all zone memory use @@ -174,7 +176,7 @@ LTLIBS = $(LIBPTHREAD) $(LIBRT) # don't try linking xfs_repair with a debug libxfs. DEBUG = -DNDEBUG -default: ltdepend $(LTLIBRARY) $(EXTRA_OBJECTS) +default: ltdepend $(LTLIBRARY) $(EXTRA_OBJECTS) $(JSON_SCHEMAS) %dummy.o: %dummy.cpp @echo " [CXXD] $@" @@ -196,14 +198,16 @@ MAKECXXDEP := $(MAKEDEPEND) $(CXXFLAGS) include $(BUILDRULES) install: default - $(INSTALL) -m 755 -d $(PKG_INC_DIR) + $(INSTALL) -m 755 -d $(PKG_DATA_DIR) + $(INSTALL) -m 644 $(JSON_SCHEMAS) $(PKG_DATA_DIR) install-headers: $(addsuffix -hdrs, $(PKGHFILES)) %-hdrs: $(Q)$(LN_S) -f $(CURDIR)/$* $(TOPDIR)/include/xfs/$* -install-dev: install +install-dev: default + $(INSTALL) -m 755 -d $(PKG_INC_DIR) $(INSTALL) -m 644 $(PKGHFILES) $(PKG_INC_DIR) # We need to install the headers before building the dependencies. If we ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 14/26] xfs_healer: enable repairing filesystems 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (12 preceding siblings ...) 2025-10-23 0:08 ` [PATCH 13/26] xfs_healer: check events against schema Darrick J. Wong @ 2025-10-23 0:09 ` Darrick J. Wong 2025-10-23 0:09 ` [PATCH 15/26] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong ` (11 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:09 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that our health monitoring daemon can initiate repairs. Because repairs can take a while to run, so we don't actually want to be doing that work in the event thread because the kernel queue can drop events if userspace doesn't respond in time. Therefore, create a subprocess executor to run the repairs in the background, and do the repairs from there. The subprocess executor is similar in concept to what a libfrog workqueue does, but the workers do not share address space, which eliminates GIL contention. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 395 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 390 insertions(+), 5 deletions(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index 459a07d3804ab5..f12e84aff8d177 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -19,6 +19,8 @@ import ctypes import pathlib import gc from concurrent.futures import ProcessPoolExecutor +import ctypes.util +from enum import Enum try: # Not all systems will have this json schema validation libarary, @@ -71,6 +73,8 @@ log = False everything = False debug_fast = False printf_prefix = '' +want_repair = False +libhandle = None def printlogln(*args, **kwargs): '''Print a log message to stdout and flush it.''' @@ -123,6 +127,9 @@ def _IOW(type, number, size): def _IOWR(type, number, size): return _IOC(_IOC_READ | _IOC_WRITE, type, number, size) +def _IOWR(type, number, size): + return _IOC(_IOC_READ | _IOC_WRITE, type, number, size) + # xfs health monitoring ioctl stuff XFS_HEALTH_MONITOR_FMT_JSON = 1 XFS_HEALTH_MONITOR_VERBOSE = 1 << 0 @@ -150,6 +157,238 @@ def open_health_monitor(fd, verbose = False): ret = fcntl.ioctl(fd, XFS_IOC_HEALTH_MONITOR, arg) return ret +# libhandle stuff +class xfs_fsid(ctypes.Structure): + _fields_ = [ + ("_val0", ctypes.c_uint), + ("_val1", ctypes.c_uint) + ] + +class xfs_fid(ctypes.Structure): + _fields_ = [ + ("fid_len", ctypes.c_ushort), + ("fid_pad", ctypes.c_ushort), + ("fid_gen", ctypes.c_uint), + ("fid_ino", ctypes.c_ulonglong) + ] + +class xfs_handle(ctypes.Structure): + _fields_ = [ + ("_ha_fsid", xfs_fsid), + ("ha_fid", xfs_fid) + ] +assert ctypes.sizeof(xfs_handle) == 24 + +class fshandle(object): + def __init__(self, fd, mountpoint): + global libhandle + global printf_prefix + + self.handle = xfs_handle() + + if mountpoint is None: + raise Exception(_('fshandle needs a mountpoint')) + + self.mountpoint = mountpoint + + # Create the file and fs handles for the open mountpoint + # so that we can compare them later + buf = ctypes.c_void_p() + buflen = ctypes.c_size_t() + ret = libhandle.fd_to_handle(fd, buf, buflen) + if ret < 0: + errcode = ctypes.get_errno() + errstr = os.strerror(errcode) + msg = _("cannot create handle") + raise OSError(errcode, f'{msg}: {errstr}', + printf_prefix) + expected_size = ctypes.sizeof(xfs_handle) + if buflen.value != expected_size: + libhandle.free_handle(buf, buflen.value) + msg = _("Bad file handle size") + raise Exception(f"{msg}: {buflen.value}") + + hanp = ctypes.cast(buf, ctypes.POINTER(xfs_handle)) + self.handle = hanp.contents + + def reopen(self): + '''Reopen a file handle obtained via weak reference.''' + global libhandle + global printf_prefix + + buf = ctypes.c_void_p() + buflen = ctypes.c_size_t() + + fd = os.open(self.mountpoint, os.O_RDONLY) + + # Create the file and fs handles for the open mountpoint + # so that we can compare them later + ret = libhandle.fd_to_handle(fd, buf, buflen) + if ret < 0: + errcode = ctypes.get_errno() + errstr = os.strerror(errcode) + os.close(fd) + msg = _("resampling handle") + raise OSError(errcode, f'{msg}: {errstr}', + printf_prefix) + + hanp = ctypes.cast(buf, ctypes.POINTER(xfs_handle)) + + # Did we get the same handle? + if buflen.value != ctypes.sizeof(xfs_handle) or \ + bytes(hanp.contents) != bytes(self.handle): + os.close(fd) + libhandle.free_handle(buf, buflen) + msg = _("reopening") + errstr = os.strerror(errno.ESTALE) + raise OSError(errno.ESTALE, f'{msg}: {errstr}', + printf_prefix) + + libhandle.free_handle(buf, buflen) + return fd + +def libhandle_load(): + '''Load libhandle and set things up.''' + global libhandle + + soname = ctypes.util.find_library('handle') + if soname is None: + errstr = os.strerror(errno.ENOENT) + msg = _("while finding library") + raise OSError(errno.ENOENT, f'{msg}: {errstr}', 'libhandle') + + libhandle = ctypes.CDLL(soname, use_errno = True) + libhandle.fd_to_handle.argtypes = ( + ctypes.c_int, + ctypes.POINTER(ctypes.c_void_p), + ctypes.POINTER(ctypes.c_size_t)) + libhandle.handle_to_fshandle.argtypes = ( + ctypes.c_void_p, + ctypes.c_size_t, + ctypes.POINTER(ctypes.c_void_p), + ctypes.POINTER(ctypes.c_size_t)) + libhandle.path_to_fshandle.argtypes = ( + ctypes.c_char_p, + ctypes.c_void_p, + ctypes.c_size_t) + libhandle.free_handle.argtypes = ( + ctypes.c_void_p, + ctypes.c_size_t) + +# metadata scrubbing stuff +XFS_SCRUB_TYPE_PROBE = 0 +XFS_SCRUB_TYPE_SB = 1 +XFS_SCRUB_TYPE_AGF = 2 +XFS_SCRUB_TYPE_AGFL = 3 +XFS_SCRUB_TYPE_AGI = 4 +XFS_SCRUB_TYPE_BNOBT = 5 +XFS_SCRUB_TYPE_CNTBT = 6 +XFS_SCRUB_TYPE_INOBT = 7 +XFS_SCRUB_TYPE_FINOBT = 8 +XFS_SCRUB_TYPE_RMAPBT = 9 +XFS_SCRUB_TYPE_REFCNTBT = 10 +XFS_SCRUB_TYPE_INODE = 11 +XFS_SCRUB_TYPE_BMBTD = 12 +XFS_SCRUB_TYPE_BMBTA = 13 +XFS_SCRUB_TYPE_BMBTC = 14 +XFS_SCRUB_TYPE_DIR = 15 +XFS_SCRUB_TYPE_XATTR = 16 +XFS_SCRUB_TYPE_SYMLINK = 17 +XFS_SCRUB_TYPE_PARENT = 18 +XFS_SCRUB_TYPE_RTBITMAP = 19 +XFS_SCRUB_TYPE_RTSUM = 20 +XFS_SCRUB_TYPE_UQUOTA = 21 +XFS_SCRUB_TYPE_GQUOTA = 22 +XFS_SCRUB_TYPE_PQUOTA = 23 +XFS_SCRUB_TYPE_FSCOUNTERS = 24 +XFS_SCRUB_TYPE_QUOTACHECK = 25 +XFS_SCRUB_TYPE_NLINKS = 26 +XFS_SCRUB_TYPE_HEALTHY = 27 +XFS_SCRUB_TYPE_DIRTREE = 28 +XFS_SCRUB_TYPE_METAPATH = 29 +XFS_SCRUB_TYPE_RGSUPER = 30 +XFS_SCRUB_TYPE_RGBITMAP = 31 +XFS_SCRUB_TYPE_RTRMAPBT = 32 +XFS_SCRUB_TYPE_RTREFCBT = 33 + +XFS_SCRUB_IFLAG_REPAIR = 1 << 0 +XFS_SCRUB_OFLAG_CORRUPT = 1 << 1 +XFS_SCRUB_OFLAG_PREEN = 1 << 2 +XFS_SCRUB_OFLAG_XFAIL = 1 << 3 +XFS_SCRUB_OFLAG_XCORRUPT = 1 << 4 +XFS_SCRUB_OFLAG_INCOMPLETE = 1 << 5 +XFS_SCRUB_OFLAG_WARNING = 1 << 6 +XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED = 1 << 7 +XFS_SCRUB_IFLAG_FORCE_REBUILD = 1 << 8 + +class xfs_scrub_metadata(ctypes.Structure): + _fields_ = [ + ('sm_type', ctypes.c_uint), + ('sm_flags', ctypes.c_uint), + ('sm_ino', ctypes.c_ulonglong), + ('sm_gen', ctypes.c_uint), + ('sm_agno', ctypes.c_uint), + ('_pad', ctypes.c_ulonglong * 5), + ] +assert ctypes.sizeof(xfs_scrub_metadata) == 64 + +XFS_IOC_SCRUB_METADATA = _IOWR(0x58, 60, xfs_scrub_metadata) + +def __xfs_repair_metadata(fd, type, group, ino, gen): + '''Call the kernel to repair some inode metadata.''' + + arg = xfs_scrub_metadata() + arg.sm_type = type + arg.sm_flags = XFS_SCRUB_IFLAG_REPAIR + arg.sm_ino = ino + arg.sm_gen = gen + arg.sm_agno = group + + fcntl.ioctl(fd, XFS_IOC_SCRUB_METADATA, arg) + return arg.sm_flags + +def xfs_repair_fs_metadata(fd, type): + '''Call the kernel to repair some whole-fs metadata.''' + return __xfs_repair_metadata(fd, type, 0, 0, 0) + +def xfs_repair_group_metadata(fd, type, group): + '''Call the kernel to repair some group metadata.''' + return __xfs_repair_metadata(fd, type, group, 0, 0) + +def xfs_repair_inode_metadata(fd, type, ino, gen): + '''Call the kernel to repair some inode metadata.''' + return __xfs_repair_metadata(fd, type, 0, ino, gen) + +class RepairOutcome(Enum): + Success = 1, + Unnecessary = 2, + MightBeOk = 3, + Failed = 4, + + def from_oflags(oflags): + '''Translate scrub output flags to outcome.''' + if oflags & (XFS_SCRUB_OFLAG_CORRUPT | \ + XFS_SCRUB_OFLAG_INCOMPLETE): + return RepairOutcome.Failed + + if oflags & XFS_SCRUB_OFLAG_XFAIL: + return RepairOutcome.MightBeOk + + if oflags & XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED: + return RepairOutcome.Unnecessary + + return RepairOutcome.Success + + def report(self): + if self == RepairOutcome.Failed: + return _("Repair unsuccessful; offline repair required.") + if self == RepairOutcome.MightBeOk: + return _("Seems correct but cross-referencing failed; offline repair recommended.") + if self == RepairOutcome.Unnecessary: + return _("No modification needed.") + if self == RepairOutcome.Success: + return _("Repairs successful.") + # main program def health_reports(mon_fp): @@ -263,7 +502,7 @@ def report_shutdown(event): msg = _("filesystem shut down due to") printlogln(f"{printf_prefix}: {msg} {some_reasons}") -def handle_event(lines): +def handle_event(lines, fh): '''Handle an event asynchronously.''' global log @@ -306,17 +545,23 @@ def handle_event(lines): except Exception as e: eprintln(f"event reporting: {e}") + if want_repair and event['type'] == 'sick': + repair_metadata(event, fh) + def monitor(mountpoint, event_queue, **kwargs): '''Monitor the given mountpoint for health events.''' global everything + global log + global printf_prefix + global want_repair - def event_loop(mon_fd, event_queue): + def event_loop(mon_fd, event_queue, fh): # Ownership of mon_fd (and hence responsibility for closing it) # is transferred to the mon_fp object. with os.fdopen(mon_fd) as mon_fp: nr = 0 for lines in health_reports(mon_fp): - event_queue.submit(handle_event, lines) + event_queue.submit(handle_event, lines, fh) # Periodically run the garbage collector to # constrain memory usage in the main thread. @@ -332,6 +577,13 @@ def monitor(mountpoint, event_queue, **kwargs): eprintln(f"{mountpoint}: {e}") return 1 + try: + fh = fshandle(fd, mountpoint) if want_repair else None + except Exception as e: + eprintln(f"{mountpoint}: {e}") + os.close(fd) + return 1 + try: mon_fd = open_health_monitor(fd, verbose = everything) except OSError as e: @@ -345,18 +597,140 @@ def monitor(mountpoint, event_queue, **kwargs): eprintln(f"{mountpoint}: {e}") return 1 finally: - # Close the mountpoint if opening the health monitor fails + # Close the mountpoint if opening the health monitor fails; + # the handle object will free its own memory. os.close(fd) try: # mon_fd is consumed by this function - event_loop(mon_fd, event_queue) + event_loop(mon_fd, event_queue, fh) except Exception as e: eprintln(f"{mountpoint}: {e}") return 1 return 0 +def __scrub_type(code): + '''Convert a "structures" json list to a scrub type code.''' + SCRUB_TYPES = { + "probe": XFS_SCRUB_TYPE_PROBE, + "sb": XFS_SCRUB_TYPE_SB, + "agf": XFS_SCRUB_TYPE_AGF, + "agfl": XFS_SCRUB_TYPE_AGFL, + "agi": XFS_SCRUB_TYPE_AGI, + "bnobt": XFS_SCRUB_TYPE_BNOBT, + "cntbt": XFS_SCRUB_TYPE_CNTBT, + "inobt": XFS_SCRUB_TYPE_INOBT, + "finobt": XFS_SCRUB_TYPE_FINOBT, + "rmapbt": XFS_SCRUB_TYPE_RMAPBT, + "refcountbt": XFS_SCRUB_TYPE_REFCNTBT, + "inode": XFS_SCRUB_TYPE_INODE, + "bmapbtd": XFS_SCRUB_TYPE_BMBTD, + "bmapbta": XFS_SCRUB_TYPE_BMBTA, + "bmapbtc": XFS_SCRUB_TYPE_BMBTC, + "directory": XFS_SCRUB_TYPE_DIR, + "xattr": XFS_SCRUB_TYPE_XATTR, + "symlink": XFS_SCRUB_TYPE_SYMLINK, + "parent": XFS_SCRUB_TYPE_PARENT, + "rtbitmap": XFS_SCRUB_TYPE_RTBITMAP, + "rtsummary": XFS_SCRUB_TYPE_RTSUM, + "usrquota": XFS_SCRUB_TYPE_UQUOTA, + "grpquota": XFS_SCRUB_TYPE_GQUOTA, + "prjquota": XFS_SCRUB_TYPE_PQUOTA, + "fscounters": XFS_SCRUB_TYPE_FSCOUNTERS, + "quotacheck": XFS_SCRUB_TYPE_QUOTACHECK, + "nlinks": XFS_SCRUB_TYPE_NLINKS, + "healthy": XFS_SCRUB_TYPE_HEALTHY, + "dirtree": XFS_SCRUB_TYPE_DIRTREE, + "metapath": XFS_SCRUB_TYPE_METAPATH, + "rgsuper": XFS_SCRUB_TYPE_RGSUPER, + "rgbitmap": XFS_SCRUB_TYPE_RGBITMAP, + "rtrmapbt": XFS_SCRUB_TYPE_RTRMAPBT, + "rtrefcountbt": XFS_SCRUB_TYPE_RTREFCBT, + } + + if code not in SCRUB_TYPES: + return None + + return SCRUB_TYPES[code] + +def repair_wholefs(event, fd): + '''React to a fs-domain corruption event by repairing it.''' + for structure in event['structures']: + struct = _(structure) + scrub_type = __scrub_type(structure) + if scrub_type is None: + continue + try: + oflags = xfs_repair_fs_metadata(fd, scrub_type) + outcome = RepairOutcome.from_oflags(oflags) + report = outcome.report() + printlogln(f"{printf_prefix}: {struct}: {report}") + except Exception as e: + eprintln(f"{printf_prefix}: {struct}: {e}") + +def repair_group(event, fd, group_type): + '''React to a group-domain corruption event by repairing it.''' + for structure in event['structures']: + struct = _(structure) + scrub_type = __scrub_type(structure) + if scrub_type is None: + continue + try: + oflags = xfs_repair_group_metadata(fd, scrub_type, + event['group']) + outcome = RepairOutcome.from_oflags(oflags) + report = outcome.report() + printlogln(f"{printf_prefix}: {struct}: {report}") + except Exception as e: + eprintln(f"{printf_prefix}: {struct}: {e}") + +def repair_inode(event, fd): + '''React to a inode-domain corruption event by repairing it.''' + for structure in event['structures']: + struct = _(structure) + scrub_type = __scrub_type(structure) + if scrub_type is None: + continue + try: + oflags = xfs_repair_inode_metadata(fd, scrub_type, + event['inumber'], event['generation']) + outcome = RepairOutcome.from_oflags(oflags) + report = outcome.report() + printlogln(f"{printf_prefix}: {struct}: {report}") + except Exception as e: + eprintln(f"{printf_prefix}: {struct}: {e}") + +def repair_metadata(event, fh): + '''Repair a metadata corruption.''' + global debug + global printf_prefix + + if debug: + printlogln(f'repair {event}') + + try: + fd = fh.reopen() + except Exception as e: + eprintln(f"{printf_prefix}: {e}") + return + + try: + if event['domain'] in ['fs', 'realtime']: + repair_wholefs(event, fd) + elif event['domain'] in ['perag', 'rtgroup']: + repair_group(event, fd, event['domain']) + elif event['domain'] == 'inode': + repair_inode(event, fd) + else: + domain = event['domain'] + msg = _("Unknown metadata domain") + raise Exception(f"{msg} \"{domain}\".") + except Exception as e: + eprintln(f"{printf_prefix}: {e}") + finally: + os.close(fd) + def main(): global debug global log @@ -364,6 +738,7 @@ def main(): global everything global debug_fast global validator_fn + global want_repair parser = argparse.ArgumentParser( \ description = _("Automatically heal damage to XFS filesystem metadata")) @@ -384,6 +759,8 @@ def main(): parser.add_argument('--event-schema', type = str, \ default = '@pkg_data_dir@/xfs_healthmon.schema.json', \ help = argparse.SUPPRESS) + parser.add_argument("--repair", action = "store_true", \ + help = _("Always repair corrupt metadata")) args = parser.parse_args() if args.V: @@ -400,6 +777,12 @@ def main(): if not validator_fn: return 1 + try: + libhandle_load() + except OSError as e: + eprintln(f"libhandle: {e}") + return 1 + if args.debug: debug = True if args.log: @@ -408,6 +791,8 @@ def main(): everything = True if args.debug_fast: debug_fast = True + if args.repair: + want_repair = True # Use a separate subprocess to handle the events so that the main event # reading process does not block on the GIL of the event handling ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 15/26] xfs_healer: check for fs features needed for effective repairs 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (13 preceding siblings ...) 2025-10-23 0:09 ` [PATCH 14/26] xfs_healer: enable repairing filesystems Darrick J. Wong @ 2025-10-23 0:09 ` Darrick J. Wong 2025-10-23 0:09 ` [PATCH 16/26] xfs_healer: use getparents to look up file names Darrick J. Wong ` (10 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:09 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Online repair relies heavily on back references such as reverse mappings and directory parent pointers to add redundancy to the filesystem. Check for these two features and whine a bit if they are missing. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 92 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index f12e84aff8d177..5098193bb86ac9 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -74,6 +74,8 @@ everything = False debug_fast = False printf_prefix = '' want_repair = False +has_parent = False +has_rmapbt = False libhandle = None def printlogln(*args, **kwargs): @@ -347,6 +349,19 @@ def __xfs_repair_metadata(fd, type, group, ino, gen): fcntl.ioctl(fd, XFS_IOC_SCRUB_METADATA, arg) return arg.sm_flags +def xfs_repair_is_supported(fd): + '''Ask the kernel if it supports repairs.''' + + arg = xfs_scrub_metadata() + arg.sm_type = XFS_SCRUB_TYPE_PROBE + arg.sm_flags = XFS_SCRUB_IFLAG_REPAIR + + try: + fcntl.ioctl(fd, XFS_IOC_SCRUB_METADATA, arg) + except: + return False + return True + def xfs_repair_fs_metadata(fd, type): '''Call the kernel to repair some whole-fs metadata.''' return __xfs_repair_metadata(fd, type, 0, 0, 0) @@ -389,6 +404,57 @@ class RepairOutcome(Enum): if self == RepairOutcome.Success: return _("Repairs successful.") +# fsgeometry ioctl +class xfs_fsop_geom(ctypes.Structure): + _fields_ = [ + ("blocksize", ctypes.c_uint), + ("rtextesize", ctypes.c_uint), + ("agblocks", ctypes.c_uint), + ("agcount", ctypes.c_uint), + ("logblocks", ctypes.c_uint), + ("sectsize", ctypes.c_uint), + ("inodesize", ctypes.c_uint), + ("imaxpct", ctypes.c_uint), + ("datablocks", ctypes.c_ulonglong), + ("rtblocks", ctypes.c_ulonglong), + ("rtextents", ctypes.c_ulonglong), + ("logstart", ctypes.c_ulonglong), + ("uuid", ctypes.c_ubyte * 16), + ("sunit", ctypes.c_uint), + ("swidth", ctypes.c_uint), + ("version", ctypes.c_uint), + ("flags", ctypes.c_uint), + ("logsectsize", ctypes.c_uint), + ("rtsectsize", ctypes.c_uint), + ("dirblocksize", ctypes.c_uint), + ("logsunit", ctypes.c_uint), + ("sick", ctypes.c_uint), + ("checked", ctypes.c_uint), + ("rgblocks", ctypes.c_uint), + ("rgcount", ctypes.c_uint), + ("_pad", ctypes.c_ulonglong * 16), + ] +assert ctypes.sizeof(xfs_fsop_geom) == 256 + +XFS_FSOP_GEOM_FLAGS_RMAPBT = 1 << 19 +XFS_FSOP_GEOM_FLAGS_PARENT = 1 << 25 + +XFS_IOC_FSGEOMETRY = _IOR (0x58, 126, xfs_fsop_geom) + +def xfs_has_parent(fd): + '''Does this filesystem have parent pointers?''' + + arg = xfs_fsop_geom() + fcntl.ioctl(fd, XFS_IOC_FSGEOMETRY, arg) + return arg.flags & XFS_FSOP_GEOM_FLAGS_PARENT != 0 + +def xfs_has_rmapbt(fd): + '''Does this filesystem have reverse mapping?''' + + arg = xfs_fsop_geom() + fcntl.ioctl(fd, XFS_IOC_FSGEOMETRY, arg) + return arg.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT != 0 + # main program def health_reports(mon_fp): @@ -554,6 +620,8 @@ def monitor(mountpoint, event_queue, **kwargs): global log global printf_prefix global want_repair + global has_parent + global has_rmapbt def event_loop(mon_fd, event_queue, fh): # Ownership of mon_fd (and hence responsibility for closing it) @@ -577,6 +645,30 @@ def monitor(mountpoint, event_queue, **kwargs): eprintln(f"{mountpoint}: {e}") return 1 + try: + has_parent = xfs_has_parent(fd) + has_rmapbt = xfs_has_rmapbt(fd) + except Exception as e: + # Don't care if we can't detect parent pointers or rmap + msg = _("detecting fs features") + eprintln(_(f'{printf_prefix}: {msg}: {e}')) + + # Check that the kernel supports repairs at all. + if want_repair and not xfs_repair_is_supported(fd): + msg = _("XFS online repair is not supported, exiting") + printlogln(f"{mountpoint}: {msg}") + os.close(fd) + return 1 + + # Check for the backref metadata that makes repair effective. + if want_repair: + if not has_rmapbt: + msg = _("XFS online repair is less effective without rmap btrees.") + printlogln(f"{mountpoint}: {msg}") + if not has_parent: + msg = _("XFS online repair is less effective without parent pointers.") + printlogln(f"{mountpoint}: {msg}") + try: fh = fshandle(fd, mountpoint) if want_repair else None except Exception as e: ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 16/26] xfs_healer: use getparents to look up file names 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (14 preceding siblings ...) 2025-10-23 0:09 ` [PATCH 15/26] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong @ 2025-10-23 0:09 ` Darrick J. Wong 2025-10-23 0:09 ` [PATCH 17/26] builddefs: refactor udev directory specification Darrick J. Wong ` (9 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:09 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If the kernel tells about something that happened to a file, use the GETPARENTS ioctl to try to look up the path to that file for more ergonomic reporting. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 248 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 242 insertions(+), 6 deletions(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index 5098193bb86ac9..5ed2198a0c1687 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -21,6 +21,7 @@ import gc from concurrent.futures import ProcessPoolExecutor import ctypes.util from enum import Enum +import collections try: # Not all systems will have this json schema validation libarary, @@ -182,12 +183,18 @@ class xfs_handle(ctypes.Structure): assert ctypes.sizeof(xfs_handle) == 24 class fshandle(object): - def __init__(self, fd, mountpoint): + def __init__(self, fd, mountpoint = None): global libhandle global printf_prefix self.handle = xfs_handle() + if isinstance(fd, fshandle): + # copy an existing fshandle + self.mountpoint = fd.mountpoint + ctypes.pointer(self.handle)[0] = fd.handle + return + if mountpoint is None: raise Exception(_('fshandle needs a mountpoint')) @@ -249,6 +256,11 @@ class fshandle(object): libhandle.free_handle(buf, buflen) return fd + def subst(self, ino, gen): + '''Substitute the inode and generation components of a handle.''' + self.handle.ha_fid.fid_ino = ino + self.handle.ha_fid.fid_gen = gen + def libhandle_load(): '''Load libhandle and set things up.''' global libhandle @@ -455,6 +467,170 @@ def xfs_has_rmapbt(fd): fcntl.ioctl(fd, XFS_IOC_FSGEOMETRY, arg) return arg.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT != 0 +# getparents ioctl +class xfs_attrlist_cursor(ctypes.Structure): + _fields_ = [ + ("_opaque0", ctypes.c_uint), + ("_opaque1", ctypes.c_uint), + ("_opaque2", ctypes.c_uint), + ("_opaque3", ctypes.c_uint) + ] + +class xfs_getparents_rec(ctypes.Structure): + _fields_ = [ + ("gpr_parent", xfs_handle), + ("gpr_reclen", ctypes.c_uint), + ("_gpr_reserved", ctypes.c_uint), + ] + +xfs_getparents_tuple = collections.namedtuple('xfs_getparents_tuple', \ + ['gpr_parent', 'gpr_reclen', 'gpr_name']) + +class xfs_getparents_rec_array(object): + def __init__(self, nr_bytes): + self.nr_bytes = nr_bytes + self.bytearray = (ctypes.c_byte * int(nr_bytes))() + + def __slice_to_record(self, bufslice): + '''Compute the number of bytes in a getparents record that contain a null-terminated directory entry name.''' + rec = ctypes.cast(bytes(bufslice), \ + ctypes.POINTER(xfs_getparents_rec)) + fixedlen = ctypes.sizeof(xfs_getparents_rec) + namelen = rec.contents.gpr_reclen - fixedlen + + for i in range(0, namelen): + if bufslice[fixedlen + i] == 0: + namelen = i + break + + if namelen == 0: + return + + return xfs_getparents_tuple( + gpr_parent = rec.contents.gpr_parent, + gpr_reclen = rec.contents.gpr_reclen, + gpr_name = bufslice[fixedlen:fixedlen + namelen]) + + def get_buffer(self): + '''Return a pointer to the bytearray masquerading as an int.''' + return ctypes.addressof(self.bytearray) + + def __iter__(self): + '''Walk the getparents records in this array.''' + off = 0 + nr = 0 + buf = bytes(self.bytearray) + while off < self.nr_bytes: + bufslice = buf[off:] + t = self.__slice_to_record(bufslice) + if t is None: + break + yield t + off += t.gpr_reclen + nr += 1 + +class xfs_getparents(ctypes.Structure): + _fields_ = [ + ("_gp_cursor", xfs_attrlist_cursor), + ("gp_iflags", ctypes.c_ushort), + ("gp_oflags", ctypes.c_ushort), + ("gp_bufsize", ctypes.c_uint), + ("_pad", ctypes.c_ulonglong), + ("gp_buffer", ctypes.c_ulonglong) + ] + + def __init__(self, fd, nr_bytes): + self.fd = fd + self.records = xfs_getparents_rec_array(nr_bytes) + self.gp_buffer = self.records.get_buffer() + self.gp_bufsize = nr_bytes + + def __call_kernel(self): + if self.gp_oflags & XFS_GETPARENTS_OFLAG_DONE: + return False + + ret = fcntl.ioctl(self.fd, XFS_IOC_GETPARENTS, self) + if ret != 0: + return False + + return self.gp_oflags & XFS_GETPARENTS_OFLAG_ROOT == 0 + + def __iter__(self): + ctypes.memset(ctypes.pointer(self._gp_cursor), 0, \ + ctypes.sizeof(xfs_attrlist_cursor)) + + while self.__call_kernel(): + for i in self.records: + yield i + +class xfs_getparents_by_handle(ctypes.Structure): + _fields_ = [ + ("gph_handle", xfs_handle), + ("gph_request", xfs_getparents) + ] + + def __init__(self, fd, fh, nr_bytes): + self.fd = fd + self.records = xfs_getparents_rec_array(nr_bytes) + self.gph_request.gp_buffer = self.records.get_buffer() + self.gph_request.gp_bufsize = nr_bytes + self.gph_handle = fh.handle + + def __call_kernel(self): + if self.gph_request.gp_oflags & XFS_GETPARENTS_OFLAG_DONE: + return False + + ret = fcntl.ioctl(self.fd, XFS_IOC_GETPARENTS_BY_HANDLE, self) + if ret != 0: + return False + + return self.gph_request.gp_oflags & XFS_GETPARENTS_OFLAG_ROOT == 0 + + def __iter__(self): + ctypes.memset(ctypes.pointer(self.gph_request._gp_cursor), 0, \ + ctypes.sizeof(xfs_attrlist_cursor)) + while self.__call_kernel(): + for i in self.records: + yield i + +assert ctypes.sizeof(xfs_getparents) == 40 +assert ctypes.sizeof(xfs_getparents_by_handle) == 64 +assert ctypes.sizeof(xfs_getparents_rec) == 32 + +XFS_GETPARENTS_OFLAG_ROOT = 1 << 0 +XFS_GETPARENTS_OFLAG_DONE = 1 << 1 + +XFS_IOC_GETPARENTS = _IOWR(0x58, 62, xfs_getparents) +XFS_IOC_GETPARENTS_BY_HANDLE = _IOWR(0x58, 63, xfs_getparents_by_handle) + +def fgetparents(fd, fh = None, bufsize = 1024): + '''Return all the parent pointers for a given fd and/or handle.''' + + if fh is not None: + return xfs_getparents_by_handle(fd, fh, bufsize) + return xfs_getparents(fd, bufsize) + +def fgetpath(fd, fh = None, mountpoint = None): + '''Return a list of path components up to the root dir of the filesystem for a given fd.''' + ret = [] + if fh is None: + nfh = fshandle(fd, mountpoint) + else: + # Don't subst into the caller's handle + nfh = fshandle(fh) + + while True: + added = False + for pptr in fgetparents(fd, nfh): + ret.insert(0, pptr.gpr_name) + nfh.subst(pptr.gpr_parent.ha_fid.fid_ino, \ + pptr.gpr_parent.ha_fid.fid_gen) + added = True + break + if not added: + break + return ret + # main program def health_reports(mon_fp): @@ -479,14 +655,29 @@ def health_reports(mon_fp): lines = [] buf = mon_fp.readline() +def file_event_to_prefix(event): + '''Compute the logging prefix for this event.''' + global printf_prefix + + if 'path' in event: + path = event['path'] + return f"{printf_prefix}{os.sep}{path}:" + + inumber = event['inumber'] + igen = event['generation'] + inom = _("ino") + igenm = _("gen") + return f"{printf_prefix}: {inom} {inumber} {igenm} {igen:#x}" + def report_event(event): '''Log a monitoring event to stdout.''' global printf_prefix if event['domain'] == 'inode': + prefix = file_event_to_prefix(event) structures = ', '.join([_(x) for x in event['structures']]) status = _(event['type']) - printlogln(f"{printf_prefix}: {structures} {status}") + printlogln(f"{prefix} {structures} {status}") elif event['domain'] == 'perag': structures = ', '.join([_(x) for x in event['structures']]) @@ -517,12 +708,13 @@ def report_event(event): printlogln(f"{printf_prefix}: {msg} {device} {daddrm} {daddr:#x} {bbcountm} {bbcount:#x}") elif event['domain'] == 'filerange': + prefix = file_event_to_prefix(event) event_type = _(event['type']) pos = event['pos'] length = event['length'] posm = _("pos") lenm = _("len") - printlogln(f"{printf_prefix}: {event_type} {posm} {pos} {lenm} {length}") + printlogln(f"{prefix} {event_type} pos {pos} len {length}") def report_lost(event): '''Report that the kernel lost events.''' @@ -571,6 +763,40 @@ def report_shutdown(event): def handle_event(lines, fh): '''Handle an event asynchronously.''' global log + global has_parent + + def pathify_event(event, fh): + '''Come up with a directory tree path for a file event.''' + try: + path_fd = fh.reopen() + except Exception as e: + # Not the end of the world if we get nothing + return + + try: + fh2 = fshandle(fh) + except OSError as e: + if e.errno != errno.EOPNOTSUPP: + msg = _("making new file handle") + eprintln(f'{printf_prefix}: {msg}: {e}') + os.close(path_fd) + return + except Exception as e: + # Not the end of the world if we get nothing + os.close(path_fd) + return + + try: + fh2.subst(event['inumber'], event['generation']) + components = [x.decode('utf-8') for x in fgetpath(path_fd, fh2)] + event['path'] = os.sep.join(components) + except Exception as e: + # Path walking might be unavailable if the directory + # tree is corrupt. Since this is optional, we don't + # report anything. + pass + finally: + os.close(path_fd) # Convert array of strings into a json object try: @@ -605,13 +831,21 @@ def handle_event(lines, fh): return # Deal with everything else. + maybe_pathify = event['domain'] in ('inode', 'filerange') and has_parent if log: + if maybe_pathify and not debug_fast: + pathify_event(event, fh) + maybe_pathify = False + try: report_event(event) except Exception as e: eprintln(f"event reporting: {e}") if want_repair and event['type'] == 'sick': + if maybe_pathify: + pathify_event(event, fh) + repair_metadata(event, fh) def monitor(mountpoint, event_queue, **kwargs): @@ -670,7 +904,7 @@ def monitor(mountpoint, event_queue, **kwargs): printlogln(f"{mountpoint}: {msg}") try: - fh = fshandle(fd, mountpoint) if want_repair else None + fh = fshandle(fd, mountpoint) if want_repair or has_parent else None except Exception as e: eprintln(f"{mountpoint}: {e}") os.close(fd) @@ -779,6 +1013,8 @@ def repair_group(event, fd, group_type): def repair_inode(event, fd): '''React to a inode-domain corruption event by repairing it.''' + prefix = file_event_to_prefix(event) + for structure in event['structures']: struct = _(structure) scrub_type = __scrub_type(structure) @@ -789,9 +1025,9 @@ def repair_inode(event, fd): event['inumber'], event['generation']) outcome = RepairOutcome.from_oflags(oflags) report = outcome.report() - printlogln(f"{printf_prefix}: {struct}: {report}") + printlogln(f"{prefix} {structure}: {report}") except Exception as e: - eprintln(f"{printf_prefix}: {struct}: {e}") + eprintln(f"{prefix} {structure}: {e}") def repair_metadata(event, fh): '''Repair a metadata corruption.''' ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 17/26] builddefs: refactor udev directory specification 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (15 preceding siblings ...) 2025-10-23 0:09 ` [PATCH 16/26] xfs_healer: use getparents to look up file names Darrick J. Wong @ 2025-10-23 0:09 ` Darrick J. Wong 2025-10-23 0:10 ` [PATCH 18/26] xfs_healer: create a background monitoring service Darrick J. Wong ` (8 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:09 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor the code that finds the udev rules directory to detect the location of the parent udev directory instead. IOWs, we go from: UDEV_RULE_DIR=/foo/bar/rules.d to: UDEV_DIR=/foo/bar UDEV_RULE_DIR=/foo/bar/rules.d This is needed by the next patch, which adds a helper script. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- configure.ac | 2 +- include/builddefs.in | 3 ++- m4/package_services.m4 | 30 +++++++++++++++--------------- 3 files changed, 18 insertions(+), 17 deletions(-) diff --git a/configure.ac b/configure.ac index 369cdd1696380a..4e7b917c38ae7c 100644 --- a/configure.ac +++ b/configure.ac @@ -184,7 +184,7 @@ if test "$enable_scrub" = "yes"; then fi AC_CONFIG_SYSTEMD_SYSTEM_UNIT_DIR AC_CONFIG_CROND_DIR -AC_CONFIG_UDEV_RULE_DIR +AC_CONFIG_UDEV_DIR AC_HAVE_BLKID_TOPO AC_HAVE_TRIVIAL_AUTO_VAR_INIT AC_STRERROR_R_RETURNS_STRING diff --git a/include/builddefs.in b/include/builddefs.in index cb43029dc1f4c1..ddcc784361f0b9 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -116,7 +116,8 @@ SYSTEMD_SYSTEM_UNIT_DIR = @systemd_system_unit_dir@ HAVE_CROND = @have_crond@ CROND_DIR = @crond_dir@ HAVE_UDEV = @have_udev@ -UDEV_RULE_DIR = @udev_rule_dir@ +UDEV_DIR = @udev_dir@ +UDEV_RULE_DIR = @udev_dir@/rules.d HAVE_LIBURCU_ATOMIC64 = @have_liburcu_atomic64@ STRERROR_R_RETURNS_STRING = @strerror_r_returns_string@ diff --git a/m4/package_services.m4 b/m4/package_services.m4 index a683ddb93e0e91..de0504df0c206f 100644 --- a/m4/package_services.m4 +++ b/m4/package_services.m4 @@ -77,33 +77,33 @@ AC_DEFUN([AC_CONFIG_CROND_DIR], ]) # -# Figure out where to put udev rule files +# Figure out where to put udev files # -AC_DEFUN([AC_CONFIG_UDEV_RULE_DIR], +AC_DEFUN([AC_CONFIG_UDEV_DIR], [ AC_REQUIRE([PKG_PROG_PKG_CONFIG]) - AC_ARG_WITH([udev_rule_dir], - [AS_HELP_STRING([--with-udev-rule-dir@<:@=DIR@:>@], - [Install udev rules into DIR.])], + AC_ARG_WITH([udev_dir], + [AS_HELP_STRING([--with-udev-dir@<:@=DIR@:>@], + [Install udev files underneath DIR.])], [], - [with_udev_rule_dir=yes]) - AS_IF([test "x${with_udev_rule_dir}" != "xno"], + [with_udev_dir=yes]) + AS_IF([test "x${with_udev_dir}" != "xno"], [ - AS_IF([test "x${with_udev_rule_dir}" = "xyes"], + AS_IF([test "x${with_udev_dir}" = "xyes"], [ PKG_CHECK_MODULES([udev], [udev], [ - with_udev_rule_dir="$($PKG_CONFIG --variable=udev_dir udev)/rules.d" + with_udev_dir="$($PKG_CONFIG --variable=udev_dir udev)" ], [ - with_udev_rule_dir="" + with_udev_dir="" ]) m4_pattern_allow([^PKG_(MAJOR|MINOR|BUILD|REVISION)$]) ]) - AC_MSG_CHECKING([for udev rule dir]) - udev_rule_dir="${with_udev_rule_dir}" - AS_IF([test -n "${udev_rule_dir}"], + AC_MSG_CHECKING([for udev dir]) + udev_dir="${with_udev_dir}" + AS_IF([test -n "${udev_dir}"], [ - AC_MSG_RESULT(${udev_rule_dir}) + AC_MSG_RESULT(${udev_dir}) have_udev="yes" ], [ @@ -115,5 +115,5 @@ AC_DEFUN([AC_CONFIG_UDEV_RULE_DIR], have_udev="disabled" ]) AC_SUBST(have_udev) - AC_SUBST(udev_rule_dir) + AC_SUBST(udev_dir) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 18/26] xfs_healer: create a background monitoring service 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (16 preceding siblings ...) 2025-10-23 0:09 ` [PATCH 17/26] builddefs: refactor udev directory specification Darrick J. Wong @ 2025-10-23 0:10 ` Darrick J. Wong 2025-10-23 0:10 ` [PATCH 19/26] xfs_healer: don't start service if kernel support unavailable Darrick J. Wong ` (7 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:10 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a systemd service for our self-healing service and activate it automatically. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 34 ++++++++++++- healer/system-xfs_healer.slice | 31 ++++++++++++ healer/xfs_healer.py.in | 9 +++ healer/xfs_healer.rules | 7 +++ healer/xfs_healer@.service.in | 107 ++++++++++++++++++++++++++++++++++++++++ healer/xfs_healer_start | 17 ++++++ 6 files changed, 204 insertions(+), 1 deletion(-) create mode 100644 healer/system-xfs_healer.slice create mode 100644 healer/xfs_healer.rules create mode 100644 healer/xfs_healer@.service.in create mode 100755 healer/xfs_healer_start diff --git a/healer/Makefile b/healer/Makefile index 100e99cc9ef0a2..a30e0714309295 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -9,9 +9,24 @@ include $(builddefs) XFS_HEALER_PROG = xfs_healer.py INSTALL_HEALER = install-healer +ifeq ($(HAVE_SYSTEMD),yes) +INSTALL_HEALER += install-systemd +SYSTEMD_SERVICES=\ + system-xfs_healer.slice \ + xfs_healer@.service +OPTIONAL_TARGETS += $(SYSTEMD_SERVICES) + +ifeq ($(HAVE_UDEV),yes) + UDEV_RULES = xfs_healer.rules + XFS_HEALER_HELPER = xfs_healer_start + INSTALL_HEALER += install-udev + OPTIONAL_TARGETS += $(XFS_HEALER_HELPER) +endif +endif + LDIRT = $(XFS_HEALER_PROG) -default: $(XFS_HEALER_PROG) +default: $(XFS_HEALER_PROG) $(SYSTEMD_SERVICES) $(UDEV_RULES) $(XFS_HEALER_HELPER) $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext.py @echo " [SED] $@" @@ -22,6 +37,11 @@ $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext < $< > $@ $(Q)chmod a+x $@ +%.service: %.service.in $(builddefs) + @echo " [SED] $@" + $(Q)$(SED) -e "s|@pkg_libexec_dir@|$(PKG_LIBEXEC_DIR)|g" \ + < $< > $@ + include $(BUILDRULES) install: $(INSTALL_HEALER) @@ -30,6 +50,18 @@ install-healer: default $(INSTALL) -m 755 -d $(PKG_LIBEXEC_DIR) $(INSTALL) -m 755 $(XFS_HEALER_PROG) $(PKG_LIBEXEC_DIR)/xfs_healer +install-systemd: default + $(INSTALL) -m 755 -d $(SYSTEMD_SYSTEM_UNIT_DIR) + $(INSTALL) -m 644 $(SYSTEMD_SERVICES) $(SYSTEMD_SYSTEM_UNIT_DIR) + +install-udev: default + $(INSTALL) -m 755 -d $(UDEV_DIR) + $(INSTALL) -m 755 $(XFS_HEALER_HELPER) $(UDEV_DIR) + $(INSTALL) -m 755 -d $(UDEV_RULE_DIR) + for i in $(UDEV_RULES); do \ + $(INSTALL) -m 644 $$i $(UDEV_RULE_DIR)/64-$$i; \ + done + install-dev: -include .dep diff --git a/healer/system-xfs_healer.slice b/healer/system-xfs_healer.slice new file mode 100644 index 00000000000000..c58d6813549e50 --- /dev/null +++ b/healer/system-xfs_healer.slice @@ -0,0 +1,31 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# Author: Darrick J. Wong <djwong@kernel.org> + +[Unit] +Description=xfs_healer background service slice +Before=slices.target + +[Slice] + +# If the CPU usage cgroup controller is available, don't use more than 2 cores +# for all background processes. One thread to read events, another to run +# repairs. +CPUQuota=200% +CPUAccounting=true + +[Install] +# As of systemd 249, the systemd cgroupv2 configuration code will drop resource +# controllers from the root and system.slice cgroups at startup if it doesn't +# find any direct dependencies that require a given controller. Newly +# activated units with resource control directives are created under the system +# slice but do not cause a reconfiguration of the slice's resource controllers. +# Hence we cannot put CPUQuota= into the xfs_healer service units directly. +# +# For the CPUQuota directive to have any effect, we must therefore create an +# explicit definition file for the slice that systemd creates to contain the +# xfs_healer instance units (e.g. xfs_healer@.service) and we must configure +# this slice as a dependency of the system slice to establish the direct +# dependency relation. +WantedBy=system.slice diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index 5ed2198a0c1687..e594ad4fc2c53e 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -22,6 +22,7 @@ from concurrent.futures import ProcessPoolExecutor import ctypes.util from enum import Enum import collections +import time try: # Not all systems will have this json schema validation libarary, @@ -1137,6 +1138,14 @@ def main(): pass args.event_queue.shutdown() + + # See the service mode comments in xfs_scrub.c for why we sleep and + # compress all nonzero exit codes to 1. + if 'SERVICE_MODE' in os.environ: + time.sleep(2) + if ret != 0: + ret = 1 + return ret if __name__ == '__main__': diff --git a/healer/xfs_healer.rules b/healer/xfs_healer.rules new file mode 100644 index 00000000000000..c9bbb4c9f28186 --- /dev/null +++ b/healer/xfs_healer.rules @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-or-later +# +# Copyright (c) 2024-2025 Oracle. All rights reserved. +# Author: Darrick J. Wong <djwong@kernel.org> +# +# Start autonomous self healing automatically +ACTION=="add", SUBSYSTEM=="xfs", ENV{TYPE}=="mount", RUN+="xfs_healer_start" diff --git a/healer/xfs_healer@.service.in b/healer/xfs_healer@.service.in new file mode 100644 index 00000000000000..1f74fc000ce490 --- /dev/null +++ b/healer/xfs_healer@.service.in @@ -0,0 +1,107 @@ +# SPDX-License-Identifier: GPL-2.0-or-later +# +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# Author: Darrick J. Wong <djwong@kernel.org> + +[Unit] +Description=Self Healing of XFS Metadata for %f + +# Explicitly require the capabilities that this program needs +ConditionCapability=CAP_SYS_ADMIN +ConditionCapability=CAP_DAC_OVERRIDE + +# Must be a mountpoint +ConditionPathIsMountPoint=%f +RequiresMountsFor=%f + +[Service] +Type=exec +Environment=SERVICE_MODE=1 +ExecStart=@pkg_libexec_dir@/xfs_healer --log %f +SyslogIdentifier=%N + +# Create the service underneath the healer background service slice so that we +# can control resource usage. +Slice=system-xfs_healer.slice + +# No realtime CPU scheduling +RestrictRealtime=true + +# xfs_healer avoids pinning mounted filesystems by recording the file handle +# for the provided mountpoint (%f) before opening the health monitor, after +# which it closes the fd for the mountpoint. If repairs are needed, it will +# reopen the mountpoint, resample the file handle, and proceed only if the +# handles match. If the filesystem is unmounted, the daemon exits. If the +# mountpoint moves, repairs will not be attempted against the wrong filesystem. +# +# Due to this resampling behavior, xfs_healer must see the same filesystem +# mount tree inside the service container as outside, with the same ro/rw +# state. BindPaths doesn't work on the paths that are made readonly by +# ProtectSystem and ProtectHome, so it is not possible to set either option. +# DynamicUser sets ProtectSystem, so that also cannot be used. We cannot use +# BindPaths to bind the desired mountpoint somewhere under /tmp like xfs_scrub +# does because that pins the mount. +# +# Regrettably, this leaves xfs_healer less hardened than xfs_scrub. +# Surprisingly, this doesn't affect xfs_healer's score dramatically. +DynamicUser=false +ProtectSystem=false +ProtectHome=no +PrivateTmp=true +PrivateDevices=true + +# Don't let healer complain about paths in /etc/projects that have been hidden +# by our sandboxing. healer doesn't care about project ids anyway. +InaccessiblePaths=-/etc/projects + +# No network access +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=none +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_healer needs these privileges to open the rootdir and monitor +CapabilityBoundingSet=CAP_SYS_ADMIN CAP_DAC_OVERRIDE +AmbientCapabilities=CAP_SYS_ADMIN CAP_DAC_OVERRIDE +NoNewPrivileges=true + +# xfs_healer doesn't create files +UMask=7777 + +# No access to hardware /dev files except for block devices +ProtectClock=true +DevicePolicy=closed + +[Install] +WantedBy=multi-user.target +# If someone tries to enable the template itself, translate that into enabling +# this service on the root directory at systemd startup time. In the +# initramfs, the udev rules in xfs_healer.rules run before systemd starts. +DefaultInstance=- diff --git a/healer/xfs_healer_start b/healer/xfs_healer_start new file mode 100755 index 00000000000000..6f2d23318828d6 --- /dev/null +++ b/healer/xfs_healer_start @@ -0,0 +1,17 @@ +#!/bin/sh + +# SPDX-License-Identifier: GPL-2.0-or-later +# +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# Author: Darrick J. Wong <djwong@kernel.org> + +# Start the xfs_healer service when the filesystem is mounted + +command -v systemctl || exit 0 + +grep "^$SOURCE[[:space:]]" /proc/mounts | while read source mntpt therest; do + inst="$(systemd-escape --path "$mntpt")" + systemctl restart --no-block "xfs_healer@$inst" && break +done + +exit 0 ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 19/26] xfs_healer: don't start service if kernel support unavailable 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (17 preceding siblings ...) 2025-10-23 0:10 ` [PATCH 18/26] xfs_healer: create a background monitoring service Darrick J. Wong @ 2025-10-23 0:10 ` Darrick J. Wong 2025-10-23 0:10 ` [PATCH 20/26] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong ` (6 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:10 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use ExecCondition= in the system service to check if kernel support for the health monitor is available. If not, we don't want to run the service, have it fail, and generate a bunch of silly log messages. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 30 +++++++++++++++++++++++++++++- healer/xfs_healer@.service.in | 1 + 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index e594ad4fc2c53e..df4bd906d530fc 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -849,7 +849,27 @@ def handle_event(lines, fh): repair_metadata(event, fh) -def monitor(mountpoint, event_queue, **kwargs): +def check_monitor(mountpoint, fd): + '''Check if the kernel can send us health events for the given mountpoint.''' + + try: + mon_fd = open_health_monitor(fd, verbose = everything) + except OSError as e: + # Error opening monitor (or it's simply not there); monitor + # not available. + if e.errno == errno.ENOTTY or e.errno == errno.EOPNOTSUPP: + msg = _("XFS health monitoring not supported.") + eprintln(f"{mountpoint}: {msg}") + os.close(fd) + return 1 + + os.close(mon_fd) + os.close(fd) + + # Monitor available; success! + return 0 + +def monitor(mountpoint, event_queue, check, **kwargs): '''Monitor the given mountpoint for health events.''' global everything global log @@ -904,6 +924,12 @@ def monitor(mountpoint, event_queue, **kwargs): msg = _("XFS online repair is less effective without parent pointers.") printlogln(f"{mountpoint}: {msg}") + # Now that we know that we can repair if the user wanted to, make sure + # that the kernel supports reporting events if that was as far as the + # user wanted us to go. + if check: + return check_monitor(mountpoint, fd) + try: fh = fshandle(fd, mountpoint) if want_repair or has_parent else None except Exception as e: @@ -1090,6 +1116,8 @@ def main(): help = argparse.SUPPRESS) parser.add_argument("--repair", action = "store_true", \ help = _("Always repair corrupt metadata")) + parser.add_argument("--check", action = "store_true", \ + help = _("Check that health monitoring is supported")) args = parser.parse_args() if args.V: diff --git a/healer/xfs_healer@.service.in b/healer/xfs_healer@.service.in index 1f74fc000ce490..5660050a1aa3e4 100644 --- a/healer/xfs_healer@.service.in +++ b/healer/xfs_healer@.service.in @@ -17,6 +17,7 @@ RequiresMountsFor=%f [Service] Type=exec Environment=SERVICE_MODE=1 +ExecCondition=@pkg_libexec_dir@/xfs_healer --check %f ExecStart=@pkg_libexec_dir@/xfs_healer --log %f SyslogIdentifier=%N ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 20/26] xfs_healer: use the autofsck fsproperty to select mode 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (18 preceding siblings ...) 2025-10-23 0:10 ` [PATCH 19/26] xfs_healer: don't start service if kernel support unavailable Darrick J. Wong @ 2025-10-23 0:10 ` Darrick J. Wong 2025-10-23 0:11 ` [PATCH 21/26] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong ` (5 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:10 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the xfs_healer background service query the autofsck filesystem property to figure out which operating mode it should use. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 75 +++++++++++++++++++++++++++++++++++++++-- healer/xfs_healer@.service.in | 2 + 2 files changed, 72 insertions(+), 5 deletions(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index df4bd906d530fc..4c6ab2662f6f50 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -632,6 +632,21 @@ def fgetpath(fd, fh = None, mountpoint = None): break return ret +# Filesystem properties + +FSPROP_NAMESPACE = "trusted." +FSPROP_NAME_PREFIX = "xfs:" +FSPROP_AUTOFSCK_NAME = "autofsck" + +def fsprop_attrname(n): + '''Construct the xattr name for a filesystem property.''' + return f"{FSPROP_NAMESPACE}{FSPROP_NAME_PREFIX}{n}" + +def fsprop_getstr(fd, n): + '''Return the value of a filesystem property as a string.''' + attrname = fsprop_attrname(n) + return os.getxattr(fd, attrname).decode('utf-8') + # main program def health_reports(mon_fp): @@ -869,6 +884,31 @@ def check_monitor(mountpoint, fd): # Monitor available; success! return 0 +def want_repair_from_autofsck(fd): + '''Determine want_repair from the autofsck filesystem property.''' + global has_parent + global has_rmapbt + + try: + advice = fsprop_getstr(fd, FSPROP_AUTOFSCK_NAME) + if advice == "repair": + return True + if advice == "check" or advice == "optimize": + return False + if advice == "none": + return None + except: + # Any OS error (including ENODATA) or string parsing error is + # treated the same as an unrecognized value. + pass + + # For an unrecognized value, log but do not fix runtime corruption if + # backref metadata are enabled. If no backref metadata are available, + # the fs is too old so don't run at all. + if has_rmapbt or has_parent: + return False + return None + def monitor(mountpoint, event_queue, check, **kwargs): '''Monitor the given mountpoint for health events.''' global everything @@ -877,6 +917,7 @@ def monitor(mountpoint, event_queue, check, **kwargs): global want_repair global has_parent global has_rmapbt + use_autofsck = want_repair is None def event_loop(mon_fd, event_queue, fh): # Ownership of mon_fd (and hence responsibility for closing it) @@ -908,12 +949,33 @@ def monitor(mountpoint, event_queue, check, **kwargs): msg = _("detecting fs features") eprintln(_(f'{printf_prefix}: {msg}: {e}')) + # Does the sysadmin have any advice for us about whether or not to + # background scrub? + if use_autofsck: + want_repair = want_repair_from_autofsck(fd) + if want_repair is None: + msg = _("Disabling daemon per autofsck directive.") + printlogln(f"{mountpoint}: {msg}") + os.close(fd) + return 0 + elif want_repair: + msg = _("Automatically repairing per autofsck directive.") + printlogln(f"{mountpoint}: {msg}") + else: + msg = _("Only logging errors per autofsck directive.") + printlogln(f"{mountpoint}: {msg}") + # Check that the kernel supports repairs at all. if want_repair and not xfs_repair_is_supported(fd): - msg = _("XFS online repair is not supported, exiting") + if not use_autofsck: + msg = _("XFS online repair is not supported, exiting") + printlogln(f"{mountpoint}: {msg}") + os.close(fd) + return 1 + + msg = _("XFS online repair is not supported, will report only") printlogln(f"{mountpoint}: {msg}") - os.close(fd) - return 1 + want_repair = False # Check for the backref metadata that makes repair effective. if want_repair: @@ -1114,8 +1176,11 @@ def main(): parser.add_argument('--event-schema', type = str, \ default = '@pkg_data_dir@/xfs_healthmon.schema.json', \ help = argparse.SUPPRESS) - parser.add_argument("--repair", action = "store_true", \ + action_group = parser.add_mutually_exclusive_group() + action_group.add_argument("--repair", action = "store_true", \ help = _("Always repair corrupt metadata")) + action_group.add_argument("--autofsck", action = "store_true", \ + help = _("Use the \"autofsck\" fs property to decide to repair")) parser.add_argument("--check", action = "store_true", \ help = _("Check that health monitoring is supported")) args = parser.parse_args() @@ -1148,6 +1213,8 @@ def main(): everything = True if args.debug_fast: debug_fast = True + if args.autofsck: + want_repair = None if args.repair: want_repair = True diff --git a/healer/xfs_healer@.service.in b/healer/xfs_healer@.service.in index 5660050a1aa3e4..e12135b3c808c5 100644 --- a/healer/xfs_healer@.service.in +++ b/healer/xfs_healer@.service.in @@ -18,7 +18,7 @@ RequiresMountsFor=%f Type=exec Environment=SERVICE_MODE=1 ExecCondition=@pkg_libexec_dir@/xfs_healer --check %f -ExecStart=@pkg_libexec_dir@/xfs_healer --log %f +ExecStart=@pkg_libexec_dir@/xfs_healer --autofsck --log %f SyslogIdentifier=%N # Create the service underneath the healer background service slice so that we ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 21/26] xfs_healer: run full scrub after lost corruption events or targeted repair failure 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (19 preceding siblings ...) 2025-10-23 0:10 ` [PATCH 20/26] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong @ 2025-10-23 0:11 ` Darrick J. Wong 2025-10-23 0:11 ` [PATCH 22/26] xfs_healer: use getmntent to find moved filesystems Darrick J. Wong ` (4 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:11 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we fail to perform a spot repair of metadata or the kernel tells us that it lost corruption events due to queue limits, initiate a full run of the online fsck service to try to fix the error. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 + healer/xfs_healer.py.in | 59 +++++++++++++++++++++++++++++++++++++++++------ include/builddefs.in | 1 + scrub/Makefile | 7 ++---- 4 files changed, 57 insertions(+), 11 deletions(-) diff --git a/healer/Makefile b/healer/Makefile index a30e0714309295..798c6f2c8a58e0 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -34,6 +34,7 @@ $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext -e '/@INIT_GETTEXT@/r $(TOPDIR)/libfrog/gettext.py' \ -e '/@INIT_GETTEXT@/d' \ -e "s|@pkg_data_dir@|$(PKG_DATA_DIR)|g" \ + -e "s|@scrub_svcname@|$(XFS_SCRUB_SVCNAME)|g" \ < $< > $@ $(Q)chmod a+x $@ diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index 4c6ab2662f6f50..a96c9e812f5791 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -23,6 +23,7 @@ import ctypes.util from enum import Enum import collections import time +import subprocess try: # Not all systems will have this json schema validation libarary, @@ -262,6 +263,17 @@ class fshandle(object): self.handle.ha_fid.fid_ino = ino self.handle.ha_fid.fid_gen = gen + def instance_unit_name(self, service_template): + '''Compute the systemd instance unit name for this mountpoint.''' + + cmd = ['systemd-escape', '--template', service_template, + '--path', self.mountpoint] + + proc = subprocess.Popen(cmd, stdout = subprocess.PIPE) + proc.wait() + for line in proc.stdout: + return line.decode(sys.stdout.encoding).strip() + def libhandle_load(): '''Load libhandle and set things up.''' global libhandle @@ -776,7 +788,7 @@ def report_shutdown(event): msg = _("filesystem shut down due to") printlogln(f"{printf_prefix}: {msg} {some_reasons}") -def handle_event(lines, fh): +def handle_event(lines, fh, everything): '''Handle an event asynchronously.''' global log global has_parent @@ -832,6 +844,8 @@ def handle_event(lines, fh): # messages. if event['type'] == 'lost': report_lost(event) + if want_repair and not everything: + run_full_repair(fh) return if event['type'] == 'running': @@ -919,13 +933,14 @@ def monitor(mountpoint, event_queue, check, **kwargs): global has_rmapbt use_autofsck = want_repair is None - def event_loop(mon_fd, event_queue, fh): + def event_loop(mon_fd, event_queue, fh, everything): # Ownership of mon_fd (and hence responsibility for closing it) # is transferred to the mon_fp object. with os.fdopen(mon_fd) as mon_fp: nr = 0 for lines in health_reports(mon_fp): - event_queue.submit(handle_event, lines, fh) + event_queue.submit(handle_event, lines, fh, + everything) # Periodically run the garbage collector to # constrain memory usage in the main thread. @@ -1018,7 +1033,7 @@ def monitor(mountpoint, event_queue, check, **kwargs): try: # mon_fd is consumed by this function - event_loop(mon_fd, event_queue, fh) + event_loop(mon_fd, event_queue, fh, everything) except Exception as e: eprintln(f"{mountpoint}: {e}") return 1 @@ -1081,6 +1096,8 @@ def repair_wholefs(event, fd): outcome = RepairOutcome.from_oflags(oflags) report = outcome.report() printlogln(f"{printf_prefix}: {struct}: {report}") + if outcome == RepairOutcome.Failed: + return outcome except Exception as e: eprintln(f"{printf_prefix}: {struct}: {e}") @@ -1097,6 +1114,8 @@ def repair_group(event, fd, group_type): outcome = RepairOutcome.from_oflags(oflags) report = outcome.report() printlogln(f"{printf_prefix}: {struct}: {report}") + if outcome == RepairOutcome.Failed: + return outcome except Exception as e: eprintln(f"{printf_prefix}: {struct}: {e}") @@ -1115,6 +1134,8 @@ def repair_inode(event, fd): outcome = RepairOutcome.from_oflags(oflags) report = outcome.report() printlogln(f"{prefix} {structure}: {report}") + if outcome == RepairOutcome.Failed: + return outcome except Exception as e: eprintln(f"{prefix} {structure}: {e}") @@ -1134,20 +1155,44 @@ def repair_metadata(event, fh): try: if event['domain'] in ['fs', 'realtime']: - repair_wholefs(event, fd) + outcome = repair_wholefs(event, fd) elif event['domain'] in ['perag', 'rtgroup']: - repair_group(event, fd, event['domain']) + outcome = repair_group(event, fd, event['domain']) elif event['domain'] == 'inode': - repair_inode(event, fd) + outcome = repair_inode(event, fd) else: domain = event['domain'] msg = _("Unknown metadata domain") raise Exception(f"{msg} \"{domain}\".") + + # Transform into a full repair if we failed to fix this item. + if outcome == RepairOutcome.Failed: + run_full_repair(fh) except Exception as e: eprintln(f"{printf_prefix}: {e}") finally: os.close(fd) +def run_full_repair(fh): + '''Run a full repair of the filesystem using the background fsck.''' + global printf_prefix + + try: + unit_name = fh.instance_unit_name("@scrub_svcname@") + cmd = ['systemctl', 'start', '--no-block', unit_name] + + proc = subprocess.Popen(cmd) + proc.wait() + if proc.returncode == 0: + msg = _("Full repair: Repairs in progress.") + printlogln(f"{printf_prefix}: {msg}") + else: + msg = _("Could not start xfs_scrub service.") + eprintln(f"{printf_prefix}: {msg}"); + except Exception as e: + eprintln(f"{printf_prefix}: {e}") + + def main(): global debug global log diff --git a/include/builddefs.in b/include/builddefs.in index ddcc784361f0b9..7cf6e0782788ca 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -62,6 +62,7 @@ MKFS_CFG_DIR = @datadir@/@pkg_name@/mkfs PKG_STATE_DIR = @localstatedir@/lib/@pkg_name@ XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_STAMP=$(PKG_STATE_DIR)/xfs_scrub_all_media.stamp +XFS_SCRUB_SVCNAME=xfs_scrub@.service CC = @cc@ BUILD_CC = @BUILD_CC@ diff --git a/scrub/Makefile b/scrub/Makefile index 6375d77a291bcb..18f476de24252b 100644 --- a/scrub/Makefile +++ b/scrub/Makefile @@ -8,7 +8,6 @@ include $(builddefs) SCRUB_PREREQS=$(HAVE_GETFSMAP) -scrub_svcname=xfs_scrub@.service scrub_media_svcname=xfs_scrub_media@.service ifeq ($(SCRUB_PREREQS),yes) @@ -21,7 +20,7 @@ XFS_SCRUB_SERVICE_ARGS = -b -o autofsck ifeq ($(HAVE_SYSTEMD),yes) INSTALL_SCRUB += install-systemd SYSTEMD_SERVICES=\ - $(scrub_svcname) \ + $(XFS_SCRUB_SVCNAME) \ xfs_scrub_fail@.service \ $(scrub_media_svcname) \ xfs_scrub_media_fail@.service \ @@ -123,7 +122,7 @@ xfs_scrub_all.timer: xfs_scrub_all.timer.in $(builddefs) $(XFS_SCRUB_ALL_PROG): $(XFS_SCRUB_ALL_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext.py @echo " [SED] $@" $(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \ - -e "s|@scrub_svcname@|$(scrub_svcname)|g" \ + -e "s|@scrub_svcname@|$(XFS_SCRUB_SVCNAME)|g" \ -e "s|@scrub_media_svcname@|$(scrub_media_svcname)|g" \ -e "s|@pkg_version@|$(PKG_VERSION)|g" \ -e "s|@stampfile@|$(XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_STAMP)|g" \ @@ -137,7 +136,7 @@ $(XFS_SCRUB_ALL_PROG): $(XFS_SCRUB_ALL_PROG).in $(builddefs) $(TOPDIR)/libfrog/g xfs_scrub_fail: xfs_scrub_fail.in $(builddefs) @echo " [SED] $@" $(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \ - -e "s|@scrub_svcname@|$(scrub_svcname)|g" \ + -e "s|@scrub_svcname@|$(XFS_SCRUB_SVCNAME)|g" \ -e "s|@pkg_version@|$(PKG_VERSION)|g" < $< > $@ $(Q)chmod a+x $@ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 22/26] xfs_healer: use getmntent to find moved filesystems 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (20 preceding siblings ...) 2025-10-23 0:11 ` [PATCH 21/26] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong @ 2025-10-23 0:11 ` Darrick J. Wong 2025-10-23 0:11 ` [PATCH 23/26] xfs_healer: validate that repair fds point to the monitored fs Darrick J. Wong ` (3 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:11 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wrap the libc getmntent function in an iterator. This enables xfs_healer to record the fsname (or fs spec) of the mountpoint that it's running against, and use that fsname to walk /proc/mounts to re-find the filesystem if the mount has moved elsewhere if it needs to open the fs to perform repairs. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 117 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 114 insertions(+), 3 deletions(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index a96c9e812f5791..fac7df9d741cb0 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -80,6 +80,7 @@ want_repair = False has_parent = False has_rmapbt = False libhandle = None +libc = None def printlogln(*args, **kwargs): '''Print a log message to stdout and flush it.''' @@ -184,6 +185,16 @@ class xfs_handle(ctypes.Structure): ] assert ctypes.sizeof(xfs_handle) == 24 +def find_xfs_dev(mountpoint): + '''Find the xfs device for a particular mount point or raise exception.''' + + for mnt in some_mount_entries( + lambda mnt: mnt.type == 'xfs' and \ + mnt.dir == mountpoint): + return mnt.fsname + + raise Exception(_('Cannot find xfs device')) + class fshandle(object): def __init__(self, fd, mountpoint = None): global libhandle @@ -193,6 +204,7 @@ class fshandle(object): if isinstance(fd, fshandle): # copy an existing fshandle + self.fsname = fd.fsname self.mountpoint = fd.mountpoint ctypes.pointer(self.handle)[0] = fd.handle return @@ -201,6 +213,7 @@ class fshandle(object): raise Exception(_('fshandle needs a mountpoint')) self.mountpoint = mountpoint + self.fsname = find_xfs_dev(mountpoint) # Create the file and fs handles for the open mountpoint # so that we can compare them later @@ -222,15 +235,16 @@ class fshandle(object): hanp = ctypes.cast(buf, ctypes.POINTER(xfs_handle)) self.handle = hanp.contents - def reopen(self): - '''Reopen a file handle obtained via weak reference.''' + def reopen_from(self, mountpoint): + '''Reopen a file handle obtained via weak reference, using + a specific mount point.''' global libhandle global printf_prefix buf = ctypes.c_void_p() buflen = ctypes.c_size_t() - fd = os.open(self.mountpoint, os.O_RDONLY) + fd = os.open(mountpoint, os.O_RDONLY) # Create the file and fs handles for the open mountpoint # so that we can compare them later @@ -258,6 +272,26 @@ class fshandle(object): libhandle.free_handle(buf, buflen) return fd + def reopen(self): + '''Reopen a file handle obtained via weak reference.''' + + # First try the original mountpoint + try: + return self.reopen_from(self.mountpoint) + except Exception as e: + # Now scan /proc/self/mounts for any other bind mounts + # of this filesystem + for mnt in some_mount_entries( + lambda mnt: mnt.type == 'xfs' and \ + mnt.fsname == self.fsname): + try: + return self.reopen_from(mnt.dir) + except: + pass + + # Return original error + raise e + def subst(self, ino, gen): '''Substitute the inode and generation components of a handle.''' self.handle.ha_fid.fid_ino = ino @@ -302,6 +336,77 @@ def libhandle_load(): ctypes.c_void_p, ctypes.c_size_t) +class libc_mntent(ctypes.Structure): + _fields_ = [ + ("mnt_fsname", ctypes.c_char_p), + ("mnt_dir", ctypes.c_char_p), + ("mnt_type", ctypes.c_char_p), + ("mnt_opts", ctypes.c_char_p), + ("mnt_freq", ctypes.c_int), + ("mnt_passno", ctypes.c_int), + ] + +class MountEntry(object): + '''Description of a mounted filesystem.''' + def __init__(self, fsname, dir, type, opts): + self.fsname = fsname + self.dir = pathlib.Path(dir.decode('utf-8')) + self.type = type.decode('utf-8') + self.opts = opts + +def mount_entries(): + '''Iterate all mounted filesystems in the system.''' + try: + fp = libc.setmntent(b"/proc/self/mounts", b"r") + mntbuf = libc_mntent() + namebuflen = 4096 + namebuf = ctypes.create_string_buffer(namebuflen) + mnt = libc.getmntent_r(fp, mntbuf, namebuf, namebuflen); + while mnt: + yield MountEntry(mnt.contents.mnt_fsname, + mnt.contents.mnt_dir, + mnt.contents.mnt_type, + mnt.contents.mnt_opts) + mnt = libc.getmntent_r(fp, mntbuf, namebuf, namebuflen); + finally: + libc.endmntent(fp); + +def some_mount_entries(filter_fn): + '''Iterate some of the mounted filesystems in the system.''' + return filter(filter_fn, mount_entries()) + +def libc_load(): + '''Load libc and set things up.''' + global libc + + soname = ctypes.util.find_library('c') + if soname is None: + errstr = os.strerror(errno.ENOENT) + msg = _("while finding library") + raise OSError(errno.ENOENT, f'{msg}: {errstr}', 'libc') + + libc = ctypes.CDLL(soname, use_errno = True) + + libc.setmntent.argtypes = ( + ctypes.c_char_p, + ctypes.c_char_p) + libc.setmntent.restype = ctypes.c_void_p + + libc.getmntent_r.argtypes = ( + ctypes.c_void_p, + ctypes.POINTER(libc_mntent), + ctypes.POINTER(ctypes.c_char), + ctypes.c_int) + libc.getmntent_r.restype = ctypes.POINTER(libc_mntent) + + libc.getmntent.argtypes = ( + ctypes.c_void_p,) + libc.getmntent.restype = ctypes.POINTER(libc_mntent) + + libc.endmntent.argtypes = ( + ctypes.c_void_p,) + libc.endmntent.restype = ctypes.c_int + # metadata scrubbing stuff XFS_SCRUB_TYPE_PROBE = 0 XFS_SCRUB_TYPE_SB = 1 @@ -1244,6 +1349,12 @@ def main(): if not validator_fn: return 1 + try: + libc_load() + except OSError as e: + eprintln(f"libc: {e}") + return 1 + try: libhandle_load() except OSError as e: ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 23/26] xfs_healer: validate that repair fds point to the monitored fs 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (21 preceding siblings ...) 2025-10-23 0:11 ` [PATCH 22/26] xfs_healer: use getmntent to find moved filesystems Darrick J. Wong @ 2025-10-23 0:11 ` Darrick J. Wong 2025-10-23 0:11 ` [PATCH 24/26] xfs_healer: add a manual page Darrick J. Wong ` (2 subsequent siblings) 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:11 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When xfs_healer reopens a mountpoint to perform a repair, it should validate that the opened fd points to a file on the same filesystem as the one being monitored. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/xfs_healer.py.in | 53 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 12 deletions(-) diff --git a/healer/xfs_healer.py.in b/healer/xfs_healer.py.in index fac7df9d741cb0..ea7f1bb5ab45bc 100644 --- a/healer/xfs_healer.py.in +++ b/healer/xfs_healer.py.in @@ -163,6 +163,28 @@ def open_health_monitor(fd, verbose = False): ret = fcntl.ioctl(fd, XFS_IOC_HEALTH_MONITOR, arg) return ret +class xfs_health_samefs(ctypes.Structure): + _fields_ = [ + ('fd', ctypes.c_int), + ('flags', ctypes.c_uint), + ] +assert ctypes.sizeof(xfs_health_samefs) == 8 + +XFS_IOC_HEALTH_SAMEFS = _IOW(0x58, 69, xfs_health_samefs) + +def is_same_fs(mon_fd, fd): + '''Does fd point to the same filesystem as the monitor fd?''' + arg = xfs_health_samefs() + arg.fd = fd + arg.flags = 0 + + try: + fcntl.ioctl(mon_fd, XFS_IOC_HEALTH_SAMEFS, arg) + except OSError as e: + # any error means this isn't the right fs mount + return False + return True + # libhandle stuff class xfs_fsid(ctypes.Structure): _fields_ = [ @@ -235,7 +257,7 @@ class fshandle(object): hanp = ctypes.cast(buf, ctypes.POINTER(xfs_handle)) self.handle = hanp.contents - def reopen_from(self, mountpoint): + def reopen_from(self, mountpoint, is_acceptable): '''Reopen a file handle obtained via weak reference, using a specific mount point.''' global libhandle @@ -261,7 +283,8 @@ class fshandle(object): # Did we get the same handle? if buflen.value != ctypes.sizeof(xfs_handle) or \ - bytes(hanp.contents) != bytes(self.handle): + bytes(hanp.contents) != bytes(self.handle) or \ + not is_acceptable(fd): os.close(fd) libhandle.free_handle(buf, buflen) msg = _("reopening") @@ -270,14 +293,15 @@ class fshandle(object): printf_prefix) libhandle.free_handle(buf, buflen) + return fd - def reopen(self): + def reopen(self, is_acceptable = lambda x: True): '''Reopen a file handle obtained via weak reference.''' # First try the original mountpoint try: - return self.reopen_from(self.mountpoint) + return self.reopen_from(self.mountpoint, is_acceptable) except Exception as e: # Now scan /proc/self/mounts for any other bind mounts # of this filesystem @@ -285,7 +309,8 @@ class fshandle(object): lambda mnt: mnt.type == 'xfs' and \ mnt.fsname == self.fsname): try: - return self.reopen_from(mnt.dir) + return self.reopen_from(mnt.dir, + is_acceptable) except: pass @@ -893,7 +918,7 @@ def report_shutdown(event): msg = _("filesystem shut down due to") printlogln(f"{printf_prefix}: {msg} {some_reasons}") -def handle_event(lines, fh, everything): +def handle_event(lines, fh, everything, mon_fd): '''Handle an event asynchronously.''' global log global has_parent @@ -981,7 +1006,7 @@ def handle_event(lines, fh, everything): if maybe_pathify: pathify_event(event, fh) - repair_metadata(event, fh) + repair_metadata(event, fh, mon_fd) def check_monitor(mountpoint, fd): '''Check if the kernel can send us health events for the given mountpoint.''' @@ -1045,7 +1070,7 @@ def monitor(mountpoint, event_queue, check, **kwargs): nr = 0 for lines in health_reports(mon_fp): event_queue.submit(handle_event, lines, fh, - everything) + everything, mon_fd) # Periodically run the garbage collector to # constrain memory usage in the main thread. @@ -1055,6 +1080,12 @@ def monitor(mountpoint, event_queue, check, **kwargs): gc.collect() nr += 1 + # Once we run out of events to process, shut down all + # the workers and wait for them to complete before we + # close mon_fp so that repair reopen can't walk off + # freed mon_fd. + event_queue.shutdown(wait = True, cancel_futures = True) + try: fd = os.open(mountpoint, os.O_RDONLY) except Exception as e: @@ -1244,7 +1275,7 @@ def repair_inode(event, fd): except Exception as e: eprintln(f"{prefix} {structure}: {e}") -def repair_metadata(event, fh): +def repair_metadata(event, fh, mon_fd): '''Repair a metadata corruption.''' global debug global printf_prefix @@ -1253,7 +1284,7 @@ def repair_metadata(event, fh): printlogln(f'repair {event}') try: - fd = fh.reopen() + fd = fh.reopen(lambda fdx: is_same_fs(mon_fd, fdx)) except Exception as e: eprintln(f"{printf_prefix}: {e}") return @@ -1388,8 +1419,6 @@ def main(): # Consider SIGINT to be a clean exit. pass - args.event_queue.shutdown() - # See the service mode comments in xfs_scrub.c for why we sleep and # compress all nonzero exit codes to 1. if 'SERVICE_MODE' in os.environ: ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 24/26] xfs_healer: add a manual page 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (22 preceding siblings ...) 2025-10-23 0:11 ` [PATCH 23/26] xfs_healer: validate that repair fds point to the monitored fs Darrick J. Wong @ 2025-10-23 0:11 ` Darrick J. Wong 2025-10-23 0:12 ` [PATCH 25/26] xfs_scrub: report media scrub failures to the kernel Darrick J. Wong 2025-10-23 0:12 ` [PATCH 26/26] debian: enable xfs_healer on the root filesystem by default Darrick J. Wong 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:11 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new section 8 manpage for this service daemon so others can read about what this program is supposed to do. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- man/man8/xfs_healer.8 | 85 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 man/man8/xfs_healer.8 diff --git a/man/man8/xfs_healer.8 b/man/man8/xfs_healer.8 new file mode 100644 index 00000000000000..0e6d8e7cd0e658 --- /dev/null +++ b/man/man8/xfs_healer.8 @@ -0,0 +1,85 @@ +.TH xfs_healer 8 +.SH NAME +xfs_healer \- automatically heal damage to XFS filesystem metadata +.SH SYNOPSIS +.B xfs_healer +[ +.B OPTIONS +] +.I mount-point +.br +.B xfs_healer \-V +.SH DESCRIPTION +.B xfs_healer +is a daemon that tries to automatically repair damaged XFS filesystem metadata. +.PP +.B WARNING! +This program is +.BR EXPERIMENTAL "," +which means that its behavior and interface +could change at any time! +.PP +.B xfs_healer +asks the kernel to report all observations of corrupt metadata, media errors, +filesystem shutdowns, and file I/O errors. +The program can respond to runtime metadata corruption errors by initiating +targeted repairs of the suspect metadata or a full online fsck of the +filesystem. + +Normally this program runs as a systemd service. +The service is activated through udev whenever a filesystem is mounted. +Only systemd is supported. + +The kernel may not support repairing or optimizing the filesystem. +If this is the case, the filesystem must be unmounted and +.BR xfs_repair (8) +run on the filesystem to fix the problems. +.SH OPTIONS +.TP +.B \-\-autofsck +Use the +.I autofsck +filesystem property to decide whether or not to repair corrupt metadata. +See the +.B \-\-repair +option for more details. +If this option is specified but the kernel does not support repairs, the +program will report but not act upon corruptions. +.TP +.B \-\-check +Check if the filesystem supports sending health events. +Exits with 0 if it does, and non-zero if not. +.TP +.BI \-\-everything +Ask the kernel to send us good metadata health events, not only events related +to metadata corruption, media errors, shutdowns, and I/O errors. +.TP +.B \-\-log +Print every event to standard output. +.TP +.B \-\-repair +Always try to repair each piece of corrupt metadata when the kernel tells us +about it. +If an individual repair fails or the kernel tells us that health events were +lost, the +.I xfs_scrub +service for this mount point will be launched. +The default is not to try to repair anything. +If this option is specified but the kernel does not support repairs, the +program will exit. +.TP +.BI \-V +Prints the version number and exit. +.SH CAVEATS +.B xfs_healer +is an immature utility! +Do not run this program unless you have backups of your data! +This program takes advantage of in-kernel scrubbing to verify a given +data structure with locks held and can keep the filesystem busy for a +long time. +The kernel must be new enough to support the SCRUB_METADATA ioctl. +.PP +If errors are found and cannot be repaired, the filesystem must be +unmounted and repaired. +.SH SEE ALSO +.BR xfs_repair (8). ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 25/26] xfs_scrub: report media scrub failures to the kernel 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (23 preceding siblings ...) 2025-10-23 0:11 ` [PATCH 24/26] xfs_healer: add a manual page Darrick J. Wong @ 2025-10-23 0:12 ` Darrick J. Wong 2025-10-23 0:12 ` [PATCH 26/26] debian: enable xfs_healer on the root filesystem by default Darrick J. Wong 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:12 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If the media scan finds that media have been lost, report this to the kernel so that the healthmon code can pass that along to xfs_healer. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- scrub/phase6.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/scrub/phase6.c b/scrub/phase6.c index abf6f9713f1a4d..345a461af0d9e2 100644 --- a/scrub/phase6.c +++ b/scrub/phase6.c @@ -673,6 +673,29 @@ clean_pool( return ret; } +static void +report_ioerr_to_kernel( + struct scrub_ctx *ctx, + struct disk *disk, + uint64_t start, + uint64_t length) +{ + struct xfs_media_error me = { + .daddr = start, + .bbcount = length, + }; + dev_t dev = disk_to_dev(ctx, disk); + + if (dev == ctx->fsinfo.fs_datadev) + me.flags |= XFS_MEDIA_ERROR_DATADEV; + else if (dev == ctx->fsinfo.fs_rtdev) + me.flags |= XFS_MEDIA_ERROR_RTDEV; + else if (dev == ctx->fsinfo.fs_logdev) + me.flags |= XFS_MEDIA_ERROR_LOGDEV; + + ioctl(ctx->mnt.fd, XFS_IOC_MEDIA_ERROR, &me); +} + /* Remember a media error for later. */ static void remember_ioerr( @@ -697,6 +720,8 @@ remember_ioerr( return; } + report_ioerr_to_kernel(ctx, disk, start, length); + tree = bitmap_for_disk(ctx, disk, vs); if (!tree) { str_liberror(ctx, ENOENT, _("finding bad block bitmap")); ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 26/26] debian: enable xfs_healer on the root filesystem by default 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong ` (24 preceding siblings ...) 2025-10-23 0:12 ` [PATCH 25/26] xfs_scrub: report media scrub failures to the kernel Darrick J. Wong @ 2025-10-23 0:12 ` Darrick J. Wong 25 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:12 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we're finished building autonomous repair, enable the service on the root filesystem by default. The root filesystem is mounted by the initrd prior to starting systemd, which is why the udev rule cannot autostart the service for the root filesystem. dh_installsystemd won't activate a template service (aka one with an at-sign in the name) even if it provides a DefaultInstance directive to make that possible. Use a fugly shim for this. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- debian/control | 2 +- debian/postinst | 8 ++++++++ debian/prerm | 13 +++++++++++++ debian/rules | 2 +- 4 files changed, 23 insertions(+), 2 deletions(-) create mode 100644 debian/prerm diff --git a/debian/control b/debian/control index 66b0a47a36ee24..31ea1e988f66be 100644 --- a/debian/control +++ b/debian/control @@ -10,7 +10,7 @@ Homepage: https://xfs.wiki.kernel.org/ Package: xfsprogs Depends: ${shlibs:Depends}, ${misc:Depends}, python3-dbus, python3:any Provides: fsck-backend -Suggests: xfsdump, acl, attr, quota +Suggests: xfsdump, acl, attr, quota, python3-jsonschema Breaks: xfsdump (<< 3.0.0) Replaces: xfsdump (<< 3.0.0) Architecture: linux-any diff --git a/debian/postinst b/debian/postinst index 2ad9174658ceb4..e9ca1c22c43d25 100644 --- a/debian/postinst +++ b/debian/postinst @@ -24,5 +24,13 @@ case "${1}" in esac #DEBHELPER# +# +# dh_installsystemd doesn't handle template services even if we supply a +# default instance, so we'll install it here. +if [ -z "${DPKG_ROOT:-}" ] && [ -d /run/systemd/system ] ; then + if [ "$1" = "configure" ] || [ "$1" = "abort-upgrade" ] || [ "$1" = "abort-deconfigure" ] || [ "$1" = "abort-remove" ] ; then + /bin/systemctl enable xfs_healer@.service || true + fi +fi exit 0 diff --git a/debian/prerm b/debian/prerm new file mode 100644 index 00000000000000..c526dcdd1d7103 --- /dev/null +++ b/debian/prerm @@ -0,0 +1,13 @@ +#!/bin/sh + +set -e + +# dh_installsystemd doesn't handle template services even if we supply a +# default instance, so we'll install it here. +if [ -z "${DPKG_ROOT:-}" ] && [ "$1" = remove ] && [ -d /run/systemd/system ] ; then + /bin/systemctl disable xfs_healer@.service || true +fi + +#DEBHELPER# + +exit 0 diff --git a/debian/rules b/debian/rules index 7c9f90e6c483ff..2bf736f340c53d 100755 --- a/debian/rules +++ b/debian/rules @@ -97,4 +97,4 @@ override_dh_installdocs: dh_installdocs -XCHANGES override_dh_installsystemd: - dh_installsystemd -p xfsprogs --no-restart-after-upgrade --no-stop-on-upgrade system-xfs_scrub.slice xfs_scrub_all.timer + dh_installsystemd -p xfsprogs --no-restart-after-upgrade --no-stop-on-upgrade system-xfs_scrub.slice xfs_scrub_all.timer system-xfs_healer.slice ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust 2025-10-22 23:56 [PATCHBOMB 6.19] xfs: autonomous self healing Darrick J. Wong 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong @ 2025-10-23 0:00 ` Darrick J. Wong 2025-10-23 0:12 ` [PATCH 01/19] xfs_healer: start building a Rust version Darrick J. Wong ` (19 more replies) 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong 3 siblings, 20 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:00 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs Hi all, The initial implementation of the self healing daemon is written in Python. This was useful for rapid prototyping, but a more performant and typechecked codebase is valuable. Write a second implementation in Rust to reduce event processing overhead and library dependence. This could have been done in C, but I decided to use an environment with somewhat fewer footguns. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust --- Commits in this patchset: * xfs_healer: start building a Rust version * xfs_healer: enable gettext for localization * xfs_healer: bindgen xfs_fs.h * xfs_healer: define Rust objects for health events and kernel interface * xfs_healer: read binary health events from the kernel * xfs_healer: read json health events from the kernel * xfs_healer: create a weak file handle so we don't pin the mount * xfs_healer: fix broken filesystem metadata * xfs_healer: check for fs features needed for effective repairs * xfs_healer: use getparents to look up file names * xfs_healer: make the rust program check if kernel support available * xfs_healer: use the autofsck fsproperty to select mode * xfs_healer: use rc on the mountpoint instead of lifetime annotations * xfs_healer: use thread pools * xfs_healer: run full scrub after lost corruption events or targeted repair failure * xfs_healer: use getmntent in Rust to find moved filesystems * xfs_healer: validate that repair fds point to the monitored fs in Rust * debian/control: listify the build dependencies * debian/control: pull in build dependencies for xfs_healer --- healer/bindgen_xfs_fs.h | 6 + configure.ac | 84 ++++++++ debian/control | 30 +++ debian/rules | 3 healer/.cargo/config.toml.system | 6 + healer/Cargo.toml.in | 37 +++ healer/Makefile | 143 +++++++++++++ healer/rbindgen | 57 +++++ healer/src/fsgeom.rs | 41 ++++ healer/src/fsprops.rs | 101 +++++++++ healer/src/getmntent.rs | 117 +++++++++++ healer/src/getparents.rs | 210 ++++++++++++++++++++ healer/src/healthmon/cstruct.rs | 354 +++++++++++++++++++++++++++++++++ healer/src/healthmon/event.rs | 122 +++++++++++ healer/src/healthmon/fs.rs | 163 +++++++++++++++ healer/src/healthmon/groups.rs | 160 +++++++++++++++ healer/src/healthmon/inodes.rs | 142 +++++++++++++ healer/src/healthmon/json.rs | 409 ++++++++++++++++++++++++++++++++++++++ healer/src/healthmon/mod.rs | 47 ++++ healer/src/healthmon/samefs.rs | 33 +++ healer/src/lib.rs | 17 ++ healer/src/main.rs | 390 ++++++++++++++++++++++++++++++++++++ healer/src/repair.rs | 390 ++++++++++++++++++++++++++++++++++++ healer/src/util.rs | 81 ++++++++ healer/src/weakhandle.rs | 209 +++++++++++++++++++ healer/src/xfs_types.rs | 292 +++++++++++++++++++++++++++ healer/src/xfsprogs.rs.in | 33 +++ include/builddefs.in | 13 + include/buildrules | 1 m4/Makefile | 1 m4/package_rust.m4 | 163 +++++++++++++++ 31 files changed, 3851 insertions(+), 4 deletions(-) create mode 100644 healer/bindgen_xfs_fs.h create mode 100644 healer/.cargo/config.toml.system create mode 100644 healer/Cargo.toml.in create mode 100755 healer/rbindgen create mode 100644 healer/src/fsgeom.rs create mode 100644 healer/src/fsprops.rs create mode 100644 healer/src/getmntent.rs create mode 100644 healer/src/getparents.rs create mode 100644 healer/src/healthmon/cstruct.rs create mode 100644 healer/src/healthmon/event.rs create mode 100644 healer/src/healthmon/fs.rs create mode 100644 healer/src/healthmon/groups.rs create mode 100644 healer/src/healthmon/inodes.rs create mode 100644 healer/src/healthmon/json.rs create mode 100644 healer/src/healthmon/mod.rs create mode 100644 healer/src/healthmon/samefs.rs create mode 100644 healer/src/lib.rs create mode 100644 healer/src/main.rs create mode 100644 healer/src/repair.rs create mode 100644 healer/src/util.rs create mode 100644 healer/src/weakhandle.rs create mode 100644 healer/src/xfs_types.rs create mode 100644 healer/src/xfsprogs.rs.in create mode 100644 m4/package_rust.m4 ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 01/19] xfs_healer: start building a Rust version 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong @ 2025-10-23 0:12 ` Darrick J. Wong 2025-10-23 0:12 ` [PATCH 02/19] xfs_healer: enable gettext for localization Darrick J. Wong ` (18 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:12 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the build infrastructure to support building rust programs and add a stub xfs_healer program. This is a little gross because cargo install doesn't handle packaging, according to its maintainers. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- configure.ac | 59 ++++++++++++++++++ debian/rules | 2 + healer/.cargo/config.toml.system | 6 ++ healer/Cargo.toml.in | 15 ++++ healer/Makefile | 96 ++++++++++++++++++++++++++++- healer/src/lib.rs | 7 ++ healer/src/main.rs | 110 +++++++++++++++++++++++++++++++++ healer/src/xfsprogs.rs.in | 12 ++++ include/builddefs.in | 9 +++ m4/Makefile | 1 m4/package_rust.m4 | 128 ++++++++++++++++++++++++++++++++++++++ 11 files changed, 443 insertions(+), 2 deletions(-) create mode 100644 healer/.cargo/config.toml.system create mode 100644 healer/Cargo.toml.in create mode 100644 healer/src/lib.rs create mode 100644 healer/src/main.rs create mode 100644 healer/src/xfsprogs.rs.in create mode 100644 m4/package_rust.m4 diff --git a/configure.ac b/configure.ac index 4e7b917c38ae7c..2902fe9ca2227e 100644 --- a/configure.ac +++ b/configure.ac @@ -13,6 +13,10 @@ fi if test "${CXXFLAGS+set}" != "set"; then CXXFLAGS="-g -O2 -std=gnu++11" fi +if test "${CARGOFLAGS+set}" != "set"; then + CARGOFLAGS="" +fi +AC_SUBST(CARGOFLAGS) AC_PROG_INSTALL LT_INIT @@ -116,6 +120,29 @@ AC_ARG_ENABLE(healer, enable_healer=yes) AC_SUBST(enable_healer) +# Is this a release build? Mostly important for cargo/rustc. +AC_ARG_ENABLE(release, +[ --enable-release=[yes/no] This is a release build [[default=no]]],, + enable_release=no) +AC_SUBST(enable_release) + +# Should we build Rust with the system crates? "yes" means it's required, +# "no" means use crates.io, "probe" means figure it out from the distro. +AC_ARG_WITH([system-crates], + [AS_HELP_STRING([--with-system-crates=[yes/no/probe]], +[Build Rust programs with system crates instead of downloading from crates.io. [default=no]])], + [], + [with_system_crates=no]) +AC_SUBST(with_system_crates) + +# Should we check for Rust crates and skip builds if they are not installed? +# Distros that package crates themselves and establish build dependencies on +# those packages can skip the checks. +AC_ARG_ENABLE(crate-checks, +[ --enable-crate-checks=[yes/no] Check for Rust crates before building [[default=yes]]],, + enable_crate_checks=yes) +AC_SUBST(enable_crate_checks) + # # If the user specified a libdir ending in lib64 do not append another # 64 to the library names. @@ -224,5 +251,37 @@ fi AC_MANUAL_FORMAT AC_HAVE_LIBURCU_ATOMIC64 +# Check for a Rust compiler +# XXX: I don't know how to cross compile Rust yet +if test "$host" = "$build"; then + AC_HAVE_RUSTC +fi + +# If we have rustc, check if LTO is supported (it should be) +if test "$have_rustc" = "yes"; then + if test "$enable_lto" = "yes" || test "$enable_lto" = "probe"; then + AC_RUSTC_CHECK_LTO + fi +fi +if test "$enable_lto" = "yes" && test "$have_rustc_lto" != "yes"; then + AC_MSG_ERROR([LTO not supported by Rust compiler.]) +fi + +# If we still have rustc, check that we have cargo for crate management +if test "$have_rustc" = "yes"; then + AC_HAVE_CARGO +fi + +# If we have cargo, check that our crate dependencies are present +if test "$have_cargo" = "yes"; then + if test "$with_system_crates" = "yes"; then + AC_USE_SYSTEM_CRATES + elif test "$with_system_crates" = "probe"; then + AC_MAYBE_USE_SYSTEM_CRATES + fi + AC_HAVE_CLIPPY + AC_HAVE_HEALER_CRATES +fi + AC_CONFIG_FILES([include/builddefs]) AC_OUTPUT diff --git a/debian/rules b/debian/rules index 2bf736f340c53d..d13ff5cf954cd2 100755 --- a/debian/rules +++ b/debian/rules @@ -39,6 +39,8 @@ configure_options = \ --disable-addrsan \ --disable-threadsan \ --enable-lto \ + --enable-release \ + --with-system-crates \ --localstatedir=/var options = export DEBUG=-DNDEBUG DISTRIBUTION=debian \ diff --git a/healer/.cargo/config.toml.system b/healer/.cargo/config.toml.system new file mode 100644 index 00000000000000..83e5cb05d0d22a --- /dev/null +++ b/healer/.cargo/config.toml.system @@ -0,0 +1,6 @@ +# XXX gross hack so that we don't download crates from the internet +[source] +[source.system-packages] +directory = "/usr/share/cargo/registry" +[source.crates-io] +replace-with = "system-packages" diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in new file mode 100644 index 00000000000000..bbd6f930510059 --- /dev/null +++ b/healer/Cargo.toml.in @@ -0,0 +1,15 @@ +[package] +name = "xfs_healer" +version = "@pkg_version@" +edition = "2021" + +[profile.dev] +lto = @cargo_lto@ + +[profile.release] +lto = @cargo_lto@ + +# Be sure to update AC_HAVE_HEALER_CRATES if you update the dependency list. +[dependencies] +clap = { version = "4.0.32", features = ["derive"] } +anyhow = { version = "1.0.69" } diff --git a/healer/Makefile b/healer/Makefile index 798c6f2c8a58e0..4c97430b26bd42 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -6,9 +6,55 @@ TOPDIR = .. builddefs=$(TOPDIR)/include/builddefs include $(builddefs) +# Python implementation XFS_HEALER_PROG = xfs_healer.py INSTALL_HEALER = install-healer +# Rust implementation +ifeq ($(HAVE_HEALER_CRATES),yes) + +RUSTFILES = \ + src/lib.rs \ + src/main.rs \ + src/xfsprogs.rs + +BUILT_RUSTFILES = \ + src/xfsprogs.rs + +CARGO_MANIFEST=Cargo.toml +CARGO_CONFIG=.cargo/config.toml + +XFS_HEALER_RUST += bin/xfs_healer +INSTALL_HEALER += install-rust-healer +CLEAN_HEALER += clean-rust-healer + +ifeq ($(HAVE_CLIPPY),yes) + CLIPPY=$(Q)cargo clippy +else + CLIPPY=@true +endif + +ifeq ($(ENABLE_RELEASE),yes) + CARGO_CLIPPY_FLAGS=--no-deps + CARGO_BUILD_FLAGS=--release + CARGO_INSTALL_FLAGS= +else + CARGO_CLIPPY_FLAGS=--no-deps + CARGO_BUILD_FLAGS= + CARGO_INSTALL_FLAGS=--debug +endif + +# Assume that if rustc supports LTO then cargo knows how to configure it. +# rustc and cargo support LTO as of Rust-2021. +ifeq ($(HAVE_RUSTC_LTO),yes) + CARGO_LTO=true +else + CARGO_LTO=false +endif + +RUST_DIRT=$(CARGO_MANIFEST) $(CARGO_CONFIG) $(XFS_HEALER_RUST) $(BUILT_RUSTFILES) +endif # HAVE_HEALER_CRATES + ifeq ($(HAVE_SYSTEMD),yes) INSTALL_HEALER += install-systemd SYSTEMD_SERVICES=\ @@ -24,9 +70,14 @@ ifeq ($(HAVE_UDEV),yes) endif endif -LDIRT = $(XFS_HEALER_PROG) +LDIRT = $(XFS_HEALER_PROG) $(RUST_DIRT) -default: $(XFS_HEALER_PROG) $(SYSTEMD_SERVICES) $(UDEV_RULES) $(XFS_HEALER_HELPER) +default: $(XFS_HEALER_PROG) $(XFS_HEALER_RUST) $(SYSTEMD_SERVICES) $(UDEV_RULES) $(XFS_HEALER_HELPER) + +clean: $(CLEAN_HEALER) + +clean-rust-healer: + -cargo clean $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext.py @echo " [SED] $@" @@ -43,6 +94,41 @@ $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext $(Q)$(SED) -e "s|@pkg_libexec_dir@|$(PKG_LIBEXEC_DIR)|g" \ < $< > $@ +src/xfsprogs.rs: src/xfsprogs.rs.in $(builddefs) + @echo " [SED] $@" + $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ + < $< > $@ + +$(CARGO_MANIFEST): $(CARGO_MANIFEST).in $(builddefs) + @echo " [TOML] $@" + $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ + -e "s|@cargo_lto@|$(CARGO_LTO)|g" \ + < $< | sed -e 's|\~WIP.*|"|g' > $@ + +ifeq ($(USE_SYSTEM_CRATES),yes) +$(CARGO_CONFIG): $(CARGO_CONFIG).system + @echo " [TOML] $@" + $(Q)cp $< $@ +else +$(CARGO_CONFIG): + touch $@ +endif + +docs: + @echo " [CARGO] doc $@" + $(Q)cargo doc --no-deps + +# cargo install only knows how to build a binary and install it to $root/bin, +# so we install it to ./rust/bin/ and let the install-rust target move it to +# $prefix/usr/libexec/xfsprogs like we want. +$(XFS_HEALER_RUST): $(RUSTFILES) $(CARGO_MANIFEST) $(CARGO_CONFIG) + @echo " [CARGO] clippy $@ $(CARGO_CLIPPY_FLAGS)" + $(CLIPPY) $(CARGO_CLIPPY_FLAGS) + @echo " [CARGO] build $@ $(CARGO_BUILD_FLAGS)" + $(Q)cargo build $(CARGOFLAGS) $(CARGO_BUILD_FLAGS) + @echo " [CARGO] install $@ $(CARGO_INSTALL_FLAGS)" + $(Q)cargo install --path . --root . $(CARGOFLAGS) $(CARGO_INSTALL_FLAGS) &>/dev/null + include $(BUILDRULES) install: $(INSTALL_HEALER) @@ -63,6 +149,12 @@ install-udev: default $(INSTALL) -m 644 $$i $(UDEV_RULE_DIR)/64-$$i; \ done +# Leave the python version in the installed system for now +install-rust-healer: default install-healer + $(INSTALL) -m 755 -d $(PKG_LIBEXEC_DIR) + $(INSTALL) -m 755 $(XFS_HEALER_PROG) $(PKG_LIBEXEC_DIR) + $(INSTALL) -m 755 $(XFS_HEALER_RUST) $(PKG_LIBEXEC_DIR) + install-dev: -include .dep diff --git a/healer/src/lib.rs b/healer/src/lib.rs new file mode 100644 index 00000000000000..34ab19e07de82f --- /dev/null +++ b/healer/src/lib.rs @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ + +pub mod xfsprogs; diff --git a/healer/src/main.rs b/healer/src/main.rs new file mode 100644 index 00000000000000..e58ffdb3eca5e3 --- /dev/null +++ b/healer/src/main.rs @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use anyhow::{Context, Result}; +use clap::{value_parser, Arg, ArgAction, ArgMatches, Command}; +use std::fs::File; +use std::path::PathBuf; +use std::process::ExitCode; +use xfs_healer::xfsprogs; +use xfs_healer::xfsprogs::M_; + +/// Contains command line arguments +#[derive(Debug)] +struct Cli(ArgMatches); + +impl Cli { + pub fn new() -> Self { + Cli(Command::new("xfs_healer") + .disable_version_flag(true) + .about(M_("Automatically heal damage to XFS filesystem metadata")) + .arg( + Arg::new("version") + .short('V') + .help(M_("Print version")) + .action(ArgAction::SetTrue), + ) + .arg( + Arg::new("debug") + .long("debug") + .help(M_("Enable debugging messages")) + .action(ArgAction::SetTrue), + ) + .arg( + Arg::new("log") + .long("log") + .help(M_("Log health events to stdout")) + .action(ArgAction::SetTrue), + ) + .arg( + Arg::new("everything") + .long("everything") + .help(M_("Capture all events")) + .action(ArgAction::SetTrue), + ) + .arg( + Arg::new("path") + .help(M_("XFS filesystem mountpoint to monitor")) + .value_parser(value_parser!(PathBuf)) + .required_unless_present("version"), + ) + .get_matches()) + } +} + +/// Contains all the global program state but allows more flexibility. +#[derive(Debug)] +struct App { + debug: bool, + log: bool, + everything: bool, + path: PathBuf, +} + +impl App { + /// Return mountpoint as string, for printing messages + fn mountpoint(&self) -> String { + self.path.display().to_string() + } + + /// Main app method + fn main(&self) -> Result<ExitCode> { + let _fp = File::open(&self.path).with_context(|| "Opening filesystem failed")?; + + Ok(ExitCode::SUCCESS) + } +} + +impl From<Cli> for App { + fn from(cli: Cli) -> Self { + App { + debug: cli.0.get_flag("debug"), + log: cli.0.get_flag("log"), + everything: cli.0.get_flag("everything"), + path: cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf(), + } + } +} + +fn main() -> ExitCode { + let args = Cli::new(); + if args.0.get_flag("version") { + println!("{} {}", M_("xfs_healer version"), xfsprogs::VERSION); + return ExitCode::SUCCESS; + } + + if args.0.get_flag("debug") { + println!("args: {:?}", args); + } + + let app: App = args.into(); + match app.main() { + Ok(f) => f, + Err(e) => { + eprintln!("{}: {:#}", app.mountpoint(), e); + ExitCode::FAILURE + } + } +} diff --git a/healer/src/xfsprogs.rs.in b/healer/src/xfsprogs.rs.in new file mode 100644 index 00000000000000..bc5a9b227d26f0 --- /dev/null +++ b/healer/src/xfsprogs.rs.in @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +pub const VERSION: &str = "@pkg_version@"; + +/// Dummy function to simulate looking up a string in a message catalog +#[allow(non_snake_case)] +pub fn M_<T: Into<String>>(msgid: T) -> String { + msgid.into() +} diff --git a/include/builddefs.in b/include/builddefs.in index 7cf6e0782788ca..e477a77f753a22 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -17,6 +17,15 @@ CFLAGS = @CFLAGS@ -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wno-address-of-packed- CXXFLAGS = @CXXFLAGS@ -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wno-address-of-packed-member BUILD_CFLAGS = @BUILD_CFLAGS@ -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 +ENABLE_RELEASE = @enable_release@ +HAVE_RUSTC = @have_rustc@ +HAVE_RUSTC_LTO = @have_rustc_lto@ +HAVE_CARGO = @have_cargo@ +HAVE_HEALER_CRATES = @have_healer_crates@ +CARGOFLAGS = @CARGOFLAGS@ +USE_SYSTEM_CRATES = @use_system_crates@ +HAVE_CLIPPY = @have_clippy@ + # make sure we don't pick up whacky LDFLAGS from the make environment and # only use what we calculate from the configured options above. LDFLAGS = diff --git a/m4/Makefile b/m4/Makefile index 84174c3d3e3023..715d35d592cbe3 100644 --- a/m4/Makefile +++ b/m4/Makefile @@ -23,6 +23,7 @@ LSRCFILES = \ package_sanitizer.m4 \ package_services.m4 \ package_icu.m4 \ + package_rust.m4 \ package_urcu.m4 \ package_utilies.m4 \ package_uuiddev.m4 \ diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 new file mode 100644 index 00000000000000..7c2504b3390941 --- /dev/null +++ b/m4/package_rust.m4 @@ -0,0 +1,128 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2025 Oracle. All Rights Reserved. + +## Check if the platform has rust tools such as cargo + +# Check if rustc is installed +AC_DEFUN([AC_HAVE_RUSTC], +[ + AC_CHECK_PROG([have_rustc], [rustc], [yes], [no]) + AC_SUBST(have_rustc) +]) + +# Check if cargo is installed +AC_DEFUN([AC_HAVE_CARGO], +[ + AC_CHECK_PROG([have_cargo], [cargo], [yes], [no]) + AC_SUBST(have_cargo) +]) + +# Check if cargo-clippy (aka the linter) is installed +AC_DEFUN([AC_HAVE_CLIPPY], +[ + AC_CHECK_PROG([have_clippy], [cargo-clippy], [yes], [no]) + AC_SUBST(have_clippy) +]) + +# Require that we use the system crates +AC_DEFUN([AC_USE_SYSTEM_CRATES], +[ + use_system_crates=yes + AC_SUBST(use_system_crates) +]) + +# Check if we're building Rust under one of those distributions that provides +# stabilized Rust crates (e.g. Debian, EPEL) and should therefore use them. +AC_DEFUN([AC_MAYBE_USE_SYSTEM_CRATES], +[ + AC_MSG_CHECKING([if we use system Rust crates]) + if test -f /etc/debian_version || test -f /etc/redhat-release; then + use_system_crates=yes + AC_MSG_RESULT(yes) + else + AC_MSG_RESULT(no) + fi + AC_SUBST(use_system_crates) +]) + +# Check if rustc knows about the LTO option +AC_DEFUN([AC_RUSTC_CHECK_LTO], +[ + AC_MSG_CHECKING([if Rust compiler supports LTO]) + rm -f /tmp/enoent.rs + # check that rustc fails because it can't find enoent.rs, not + # because codegen doesn't recognize lto. + if LANG=C rustc -C lto /tmp/enoent.rs 2>&1 | grep -q -i 'enoent.rs.*no.*such'; then + have_rustc_lto=yes + AC_MSG_RESULT(yes) + else + AC_MSG_RESULT(no) + fi + AC_SUBST(have_rustc_lto) +]) + +# Check if we have a particular crate configuration. The arguments are: +# +# 1. Name of variable to set. +# 2. User-friendly description of what we're checking. +# 3. List of crates in Cargo.toml dependencies format. +# 4. Value if the test build succeeds. +# 5. Value if the test build fails. +# +# The variable will be AC_SUBST'd automatically. Be careful to escape the +# brackets that rustc/cargo want. +AC_DEFUN([AC_CHECK_CRATES], +[ + if test "$enable_crate_checks" = "no"; then + $1=$4 + AC_SUBST([$1]) + else + AC_MSG_CHECKING([for Rust crates for $2]) + rm -r -f .havecrate + mkdir -p .havecrate/src/ + cat > .havecrate/Cargo.toml << ENDL +[[package]] +name = "havecrate" +version = "0.1.0" +edition = "2021" + +[[dependencies]] +$3 +ENDL + cat > .havecrate/src/main.rs << ENDL +fn main() { } +ENDL + if test -n "$use_system_crates"; then + mkdir -p .havecrate/.cargo + cat > .havecrate/.cargo/config.toml << ENDL +[[source]] +[[source.system-packages]] +directory = "/usr/share/cargo/registry" +[[source.crates-io]] +replace-with = "system-packages" +ENDL + fi + cat .havecrate/Cargo.toml >> config.log + # Is there a faster way to check crate presence than this? + if (cd .havecrate && cargo check) >>config.log 2>&1; then + AC_MSG_RESULT([$4]) + $1=$4 + else + AC_MSG_RESULT([$5]) + $1=$5 + fi + AC_SUBST([$1]) + rm -r -f .havecrate + fi +]) + +# Do we have all the crates we need for xfs_healer? +AC_DEFUN([AC_HAVE_HEALER_CRATES], +[ + AC_CHECK_CRATES([have_healer_crates], [xfs_healer], + [ +clap = { version = "4.0.32", features = [["derive"]] } +anyhow = { version = "1.0.69" } +], + [yes], [no]) +]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 02/19] xfs_healer: enable gettext for localization 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 2025-10-23 0:12 ` [PATCH 01/19] xfs_healer: start building a Rust version Darrick J. Wong @ 2025-10-23 0:12 ` Darrick J. Wong 2025-10-23 0:13 ` [PATCH 03/19] xfs_healer: bindgen xfs_fs.h Darrick J. Wong ` (17 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:12 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Include gettext-rs in our Rust application if the builder wants localization. It's not clear to me what we're really supposed to use to localize Rust programs, but xfsprogs uses gettext so let's just plug into that for now. Note that xgettext prior to 0.24 doesn't technically support Rust, but it matches patterns well enough to extract simple format strings (e.g. M_("hello")) despite the warnings. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- configure.ac | 13 ++++++++++++- healer/Cargo.toml.in | 15 +++++++++++++++ healer/Makefile | 16 ++++++++++++++++ healer/src/main.rs | 4 +++- healer/src/xfsprogs.rs.in | 20 ++++++++++++++++++++ include/builddefs.in | 1 + include/buildrules | 1 + m4/package_rust.m4 | 6 ++++++ 8 files changed, 74 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 2902fe9ca2227e..4cb253592ce09b 100644 --- a/configure.ac +++ b/configure.ac @@ -143,6 +143,17 @@ AC_ARG_ENABLE(crate-checks, enable_crate_checks=yes) AC_SUBST(enable_crate_checks) +# Some distributions do not package gettext-rs; provide a way to disable it +# for Rust without disabling it for the C programs. +AC_ARG_ENABLE(gettext-rs, +[ --enable-gettext-rs=[yes/no] Enable gettext-rs support if gettext is enabled],, + enable_gettext_rs="$enable_gettext") +# If the main gettext is not enabled, then we don't want the Rust version. +if test "$enable_gettext" = "no"; then + enable_gettext_rs="no" +fi +AC_SUBST(enable_gettext_rs) + # # If the user specified a libdir ending in lib64 do not append another # 64 to the library names. @@ -163,7 +174,7 @@ test -n "$multiarch" && enable_lib64=no # to "find" is required, to avoid including such directories in the # list. LOCALIZED_FILES="" -for lfile in `find ${srcdir} -path './.??*' -prune -o -name '*.c' -print -o -name '*.py.in' -print || exit 1`; do +for lfile in `find ${srcdir} -path './.??*' -prune -o -name '*.c' -print -o -name '*.py.in' -print -o -name '*.rs' -print || exit 1`; do LOCALIZED_FILES="$LOCALIZED_FILES \$(TOPDIR)/$lfile" done AC_SUBST(LOCALIZED_FILES) diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in index bbd6f930510059..e62480ff17d58e 100644 --- a/healer/Cargo.toml.in +++ b/healer/Cargo.toml.in @@ -9,7 +9,22 @@ lto = @cargo_lto@ [profile.release] lto = @cargo_lto@ +[features] +@cargo_cmt_gettext@gettext = ["dep:gettext-rs"] + # Be sure to update AC_HAVE_HEALER_CRATES if you update the dependency list. [dependencies] clap = { version = "4.0.32", features = ["derive"] } anyhow = { version = "1.0.69" } + +# XXX: Crates with major version 0 are not considered ABI-stable, so the minor +# version is treated as if it were the major version. This creates problems +# pulling in distro-packaged crates, so we only specify the dependency as major +# version 0. Until these crates reach 1.0.0, we'll have to patch when things +# break. Ref: +# https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html + +# Dynamically comment out all the gettextrs related dependency information in +# Cargo.toml becuse cargo requires the crate to be present so that it can +# generate a Cargo.lock file even if the build never uses it. +@cargo_cmt_gettext@gettext-rs = { version = "0", optional = true } # 0.7.0 diff --git a/healer/Makefile b/healer/Makefile index 4c97430b26bd42..ae248bc984b178 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -44,6 +44,19 @@ else CARGO_INSTALL_FLAGS=--debug endif +# Enable gettext if it's available +ifeq ($(ENABLE_GETTEXT_RS),yes) +CARGO_BUILD_FLAGS+=--features gettext +CARGO_CLIPPY_FLAGS+=--features gettext +CARGO_INSTALL_FLAGS+=--features gettext +CARGO_CMT_GETTEXT= +else + # This is what you have to do to define a variable to a octothorpe. + define CARGO_CMT_GETTEXT +# +endef +endif + # Assume that if rustc supports LTO then cargo knows how to configure it. # rustc and cargo support LTO as of Rust-2021. ifeq ($(HAVE_RUSTC_LTO),yes) @@ -97,12 +110,15 @@ $(XFS_HEALER_PROG): $(XFS_HEALER_PROG).in $(builddefs) $(TOPDIR)/libfrog/gettext src/xfsprogs.rs: src/xfsprogs.rs.in $(builddefs) @echo " [SED] $@" $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ + -e "s|@PACKAGE@|$(PKG_NAME)|g" \ + -e "s|@LOCALEDIR@|$(PKG_LOCALE_DIR)|g" \ < $< > $@ $(CARGO_MANIFEST): $(CARGO_MANIFEST).in $(builddefs) @echo " [TOML] $@" $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ -e "s|@cargo_lto@|$(CARGO_LTO)|g" \ + -e "s|@cargo_cmt_gettext@|$(CARGO_CMT_GETTEXT)|g" \ < $< | sed -e 's|\~WIP.*|"|g' > $@ ifeq ($(USE_SYSTEM_CRATES),yes) diff --git a/healer/src/main.rs b/healer/src/main.rs index e58ffdb3eca5e3..d43640e140d46c 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -71,7 +71,7 @@ impl App { /// Main app method fn main(&self) -> Result<ExitCode> { - let _fp = File::open(&self.path).with_context(|| "Opening filesystem failed")?; + let _fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; Ok(ExitCode::SUCCESS) } @@ -89,6 +89,8 @@ impl From<Cli> for App { } fn main() -> ExitCode { + xfsprogs::init_localization(); + let args = Cli::new(); if args.0.get_flag("version") { println!("{} {}", M_("xfs_healer version"), xfsprogs::VERSION); diff --git a/healer/src/xfsprogs.rs.in b/healer/src/xfsprogs.rs.in index bc5a9b227d26f0..0c5cd2d00f7c26 100644 --- a/healer/src/xfsprogs.rs.in +++ b/healer/src/xfsprogs.rs.in @@ -3,10 +3,30 @@ * Copyright (C) 2025 Oracle. All Rights Reserved. * Author: Darrick J. Wong <djwong@kernel.org> */ +#![allow(unexpected_cfgs)] + pub const VERSION: &str = "@pkg_version@"; +/// Try to initialize a localization library. Like the other xfsprogs utilities, we don't care +/// if this fails. +#[cfg(feature = "gettext")] +pub fn init_localization() { + let _ = gettextrs::setlocale(gettextrs::LocaleCategory::LcAll, ""); + let _ = gettextrs::bindtextdomain("@PACKAGE@", "@LOCALEDIR@"); + let _ = gettextrs::textdomain("@PACKAGE@"); +} + +/// Look up a string in a message catalog +#[cfg(feature = "gettext")] +pub use gettextrs::gettext as M_; + +/// Pretend to initialize localization library +#[cfg(not(feature = "gettext"))] +pub fn init_localization() {} + /// Dummy function to simulate looking up a string in a message catalog #[allow(non_snake_case)] +#[cfg(not(feature = "gettext"))] pub fn M_<T: Into<String>>(msgid: T) -> String { msgid.into() } diff --git a/include/builddefs.in b/include/builddefs.in index e477a77f753a22..3ac4147de8c815 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -102,6 +102,7 @@ ENABLE_GETTEXT = @enable_gettext@ ENABLE_EDITLINE = @enable_editline@ ENABLE_SCRUB = @enable_scrub@ ENABLE_HEALER = @enable_healer@ +ENABLE_GETTEXT_RS = @enable_gettext_rs@ HAVE_ZIPPED_MANPAGES = @have_zipped_manpages@ diff --git a/include/buildrules b/include/buildrules index 871e92db02de14..814e0b79ffb8ae 100644 --- a/include/buildrules +++ b/include/buildrules @@ -88,6 +88,7 @@ ifdef POTHEAD XGETTEXT_FLAGS=\ --keyword=_ \ --keyword=N_ \ + --keyword=M_ \ --package-name=$(PKG_NAME) \ --package-version=$(PKG_VERSION) \ --msgid-bugs-address=$(PKG_BUGREPORT) diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index 7c2504b3390941..a596ec0740f51e 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -119,10 +119,16 @@ ENDL # Do we have all the crates we need for xfs_healer? AC_DEFUN([AC_HAVE_HEALER_CRATES], [ + if test "$enable_gettext_rs" = "yes"; then + gettext_dep='gettext-rs = { version = "0", optional = true }' # 0.7.0 + else + gettext_dep="" + fi AC_CHECK_CRATES([have_healer_crates], [xfs_healer], [ clap = { version = "4.0.32", features = [["derive"]] } anyhow = { version = "1.0.69" } +$gettext_dep ], [yes], [no]) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 03/19] xfs_healer: bindgen xfs_fs.h 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 2025-10-23 0:12 ` [PATCH 01/19] xfs_healer: start building a Rust version Darrick J. Wong 2025-10-23 0:12 ` [PATCH 02/19] xfs_healer: enable gettext for localization Darrick J. Wong @ 2025-10-23 0:13 ` Darrick J. Wong 2025-10-23 0:13 ` [PATCH 04/19] xfs_healer: define Rust objects for health events and kernel interface Darrick J. Wong ` (16 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:13 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create Rust bindings for all the stuff in xfs_fs.h so that we can call the health monitor and online fsck ioctls later. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/bindgen_xfs_fs.h | 6 +++++ configure.ac | 14 +++++++++++- healer/Makefile | 14 ++++++++++++ healer/rbindgen | 57 +++++++++++++++++++++++++++++++++++++++++++++++ healer/src/lib.rs | 1 + include/builddefs.in | 3 ++ m4/package_rust.m4 | 22 ++++++++++++++++++ 7 files changed, 116 insertions(+), 1 deletion(-) create mode 100644 healer/bindgen_xfs_fs.h create mode 100755 healer/rbindgen diff --git a/healer/bindgen_xfs_fs.h b/healer/bindgen_xfs_fs.h new file mode 100644 index 00000000000000..82d11182bd11f3 --- /dev/null +++ b/healer/bindgen_xfs_fs.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" diff --git a/configure.ac b/configure.ac index 4cb253592ce09b..8b9aad143c2cec 100644 --- a/configure.ac +++ b/configure.ac @@ -283,7 +283,7 @@ if test "$have_rustc" = "yes"; then AC_HAVE_CARGO fi -# If we have cargo, check that our crate dependencies are present +# If we have cargo, check that we have the first two dependencies for bindgen if test "$have_cargo" = "yes"; then if test "$with_system_crates" = "yes"; then AC_USE_SYSTEM_CRATES @@ -291,6 +291,18 @@ if test "$have_cargo" = "yes"; then AC_MAYBE_USE_SYSTEM_CRATES fi AC_HAVE_CLIPPY + AC_HAVE_CLANG + AC_HAVE_RUSTFMT +fi + +# If we have the first two deps for bindgen, check that we have bindgen +if test "$have_clang:$have_rustfmt" = "yes:yes"; then + AC_HAVE_BINDGEN +fi + +# If we have rustc, cargo, clang, rustfmt, and bindgen, check that our crate +# dependencies are present +if test "$have_bindgen" = "yes"; then AC_HAVE_HEALER_CRATES fi diff --git a/healer/Makefile b/healer/Makefile index ae248bc984b178..407e49ad868f4d 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -13,12 +13,17 @@ INSTALL_HEALER = install-healer # Rust implementation ifeq ($(HAVE_HEALER_CRATES),yes) +HFILES = \ + bindgen_xfs_fs.h + RUSTFILES = \ src/lib.rs \ src/main.rs \ + src/xfs_fs.rs \ src/xfsprogs.rs BUILT_RUSTFILES = \ + src/xfs_fs.rs \ src/xfsprogs.rs CARGO_MANIFEST=Cargo.toml @@ -130,10 +135,19 @@ $(CARGO_CONFIG): touch $@ endif +ifeq ($(HAVE_RUSTFMT),yes) +rustfmt: $(RUSTFILES) + rustfmt $^ +endif + docs: @echo " [CARGO] doc $@" $(Q)cargo doc --no-deps +src/xfs_fs.rs: bindgen_xfs_fs.h ../libxfs/xfs_fs.h rbindgen + @echo " [RBIND] $@" + $(Q)./rbindgen $< $@ '*xfs_fs.h*' + # cargo install only knows how to build a binary and install it to $root/bin, # so we install it to ./rust/bin/ and let the install-rust target move it to # $prefix/usr/libexec/xfsprogs like we want. diff --git a/healer/rbindgen b/healer/rbindgen new file mode 100755 index 00000000000000..8f31678e845606 --- /dev/null +++ b/healer/rbindgen @@ -0,0 +1,57 @@ +#!/bin/bash + +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Oracle. All Rights Reserved. + +# Wrap bindgen so that it gives us what we want + +input="$1" +shift +output="$1" +shift + +if [ -z "$1" ] || [ -z "${input}" ] || [ -z "${output}" ] || [ "$1" = "--help" ]; then + echo "Usage: $0 src dest" + exit 1 +fi + +bindgen_args=("${input}" -o "${output}") + +# Try to generate inline functions +bindgen_args+=(--generate-inline-functions) + +# Implement Debug on all generated structures so we can debug C FFI issues +bindgen_args+=(--impl-debug) + +# Implement PartialEq when possible so that we can compare file handles +bindgen_args+=(--with-derive-partialeq) + +# Implement Default when possible so that we can zero-init things +bindgen_args+=(--with-derive-default) + +# Don't complain about unsafe code and improperly cased typenames being wrong +# in this file, we want the C versions as-is +bindgen_args+=(--raw-line '#![allow(non_camel_case_types)]') +bindgen_args+=(--raw-line '#![allow(non_snake_case)]') +bindgen_args+=(--raw-line '#![allow(non_upper_case_globals)]') + +# aarch64 libc defines va_args as an opaque u64 array which causes rustc to +# complain about passing arrays by reference. We don't call out to va_args +# functions from Rust, so this is irrelevant. +bindgen_args+=(--raw-line '#![allow(improper_ctypes)]') + +# Don't complain about unsafe code missing safety docs because we implicitly +# trust the C programmers +bindgen_args+=(--raw-line '#![allow(clippy::missing_safety_doc)]') + +# Older versions of bindgen (e.g. 0.60) required us to request promotion of +# size_t to usize. This seems to be the default as of 0.69, so force it here. +if bindgen --help | grep -q -w -- '--size_t-is-usize'; then + bindgen_args+=(--size_t-is-usize) +fi + +# Include xfsprogs C headers; if bindgen can't find stddef.h then you need to +# install clang +clang_args=(-I ../include/ -I ../libxfs/ -I ../) + +exec bindgen "${bindgen_args[@]}" -- "${clang_args[@]}" diff --git a/healer/src/lib.rs b/healer/src/lib.rs index 34ab19e07de82f..9455ed840b3ab0 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -5,3 +5,4 @@ */ pub mod xfsprogs; +pub mod xfs_fs; diff --git a/include/builddefs.in b/include/builddefs.in index 3ac4147de8c815..20bd2d85b755e0 100644 --- a/include/builddefs.in +++ b/include/builddefs.in @@ -25,6 +25,9 @@ HAVE_HEALER_CRATES = @have_healer_crates@ CARGOFLAGS = @CARGOFLAGS@ USE_SYSTEM_CRATES = @use_system_crates@ HAVE_CLIPPY = @have_clippy@ +HAVE_CLANG = @have_clang@ +HAVE_RUSTFMT = @have_rustfmt@ +HAVE_BINDGEN = @have_bindgen@ # make sure we don't pick up whacky LDFLAGS from the make environment and # only use what we calculate from the configured options above. diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index a596ec0740f51e..0c25d7fba02243 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -132,3 +132,25 @@ $gettext_dep ], [yes], [no]) ]) + +# Check if clang is installed so that bindgen can find system headers. +AC_DEFUN([AC_HAVE_CLANG], +[ + AC_CHECK_PROG([have_clang], [clang], [yes], [no]) + AC_SUBST(have_clang) +]) + +# Check if rustfmt is installed; bindgen needs this to produce readable source +# code. +AC_DEFUN([AC_HAVE_RUSTFMT], +[ + AC_CHECK_PROG([have_rustfmt], [rustfmt], [yes], [no]) + AC_SUBST(have_rustfmt) +]) + +# Check if bindgen (aka the C FFI generator) is installed +AC_DEFUN([AC_HAVE_BINDGEN], +[ + AC_CHECK_PROG([have_bindgen], [bindgen], [yes], [no]) + AC_SUBST(have_bindgen) +]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 04/19] xfs_healer: define Rust objects for health events and kernel interface 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (2 preceding siblings ...) 2025-10-23 0:13 ` [PATCH 03/19] xfs_healer: bindgen xfs_fs.h Darrick J. Wong @ 2025-10-23 0:13 ` Darrick J. Wong 2025-10-23 0:13 ` [PATCH 05/19] xfs_healer: read binary health events from the kernel Darrick J. Wong ` (15 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:13 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create our own Rust types for health events; these objects will know how to report themselves and (later) how to initiate repairs. Create the Rust binding to the kernel ioctl interface. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Cargo.toml.in | 2 healer/Makefile | 9 +- healer/src/healthmon/event.rs | 82 ++++++++++++++ healer/src/healthmon/fs.rs | 146 ++++++++++++++++++++++++ healer/src/healthmon/groups.rs | 139 +++++++++++++++++++++++ healer/src/healthmon/inodes.rs | 119 ++++++++++++++++++++ healer/src/healthmon/mod.rs | 14 ++ healer/src/lib.rs | 3 + healer/src/main.rs | 5 + healer/src/util.rs | 72 ++++++++++++ healer/src/xfs_types.rs | 240 ++++++++++++++++++++++++++++++++++++++++ m4/package_rust.m4 | 2 12 files changed, 830 insertions(+), 3 deletions(-) create mode 100644 healer/src/healthmon/event.rs create mode 100644 healer/src/healthmon/fs.rs create mode 100644 healer/src/healthmon/groups.rs create mode 100644 healer/src/healthmon/inodes.rs create mode 100644 healer/src/healthmon/mod.rs create mode 100644 healer/src/util.rs create mode 100644 healer/src/xfs_types.rs diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in index e62480ff17d58e..04e9df5c1a2a70 100644 --- a/healer/Cargo.toml.in +++ b/healer/Cargo.toml.in @@ -16,6 +16,7 @@ lto = @cargo_lto@ [dependencies] clap = { version = "4.0.32", features = ["derive"] } anyhow = { version = "1.0.69" } +enumset = { version = "1.0.12" } # XXX: Crates with major version 0 are not considered ABI-stable, so the minor # version is treated as if it were the major version. This creates problems @@ -23,6 +24,7 @@ anyhow = { version = "1.0.69" } # version 0. Until these crates reach 1.0.0, we'll have to patch when things # break. Ref: # https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html +nix = { version = "0", features = ["ioctl"] } # 0.26.1 # Dynamically comment out all the gettextrs related dependency information in # Cargo.toml becuse cargo requires the crate to be present so that it can diff --git a/healer/Makefile b/healer/Makefile index 407e49ad868f4d..5df3ca105e143a 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -19,8 +19,15 @@ HFILES = \ RUSTFILES = \ src/lib.rs \ src/main.rs \ + src/util.rs \ src/xfs_fs.rs \ - src/xfsprogs.rs + src/xfsprogs.rs \ + src/xfs_types.rs \ + src/healthmon/event.rs \ + src/healthmon/fs.rs \ + src/healthmon/groups.rs \ + src/healthmon/inodes.rs \ + src/healthmon/mod.rs BUILT_RUSTFILES = \ src/xfs_fs.rs \ diff --git a/healer/src/healthmon/event.rs b/healer/src/healthmon/event.rs new file mode 100644 index 00000000000000..fe15156ca9e95a --- /dev/null +++ b/healer/src/healthmon/event.rs @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::display_for_enum; +use crate::xfsprogs::M_; + +/// Common behaviors of all health events +pub trait XfsHealthEvent { + /// Return true if this event should always be logged + fn must_log(&self) -> bool { + false + } + + /// Format this event as something we can display + fn format(&self) -> String; +} + +/// Health status for metadata events +#[derive(Debug)] +pub enum XfsHealthStatus { + /// Problems have been observed at runtime + Sick, + + /// Problems have been observed by xfs_scrub + Corrupt, + + /// No problems at all + Healthy, +} + +display_for_enum!(XfsHealthStatus, { + Sick => M_("sick"), + Corrupt => M_("corrupt"), + Healthy => M_("healthy"), +}); + +/// Event for the kernel losing events due to us being slow +pub struct LostEvent { + /// Number of events lost + count: u64, +} + +impl LostEvent { + /// Create a new lost event object + pub fn new(count: u64) -> LostEvent { + LostEvent { count } + } +} + +impl XfsHealthEvent for LostEvent { + fn must_log(&self) -> bool { + true + } + + fn format(&self) -> String { + format!("{} {}", self.count, M_("events lost")) + } +} + +/// Event for the monitor starting up +pub struct RunningEvent {} + +impl XfsHealthEvent for RunningEvent { + fn format(&self) -> String { + M_("monitoring started") + } +} + +/// Event for the program losing events due to unrecognized inputs +pub struct UnknownEvent {} + +impl XfsHealthEvent for UnknownEvent { + fn must_log(&self) -> bool { + true + } + + fn format(&self) -> String { + M_("unrecognized event") + } +} diff --git a/healer/src/healthmon/fs.rs b/healer/src/healthmon/fs.rs new file mode 100644 index 00000000000000..ca50683dce7f04 --- /dev/null +++ b/healer/src/healthmon/fs.rs @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::display_for_enum; +use crate::healthmon::event::XfsHealthEvent; +use crate::healthmon::event::XfsHealthStatus; +use crate::util::format_set; +use crate::xfs_types::XfsPhysRange; +use crate::xfsprogs::M_; +use enumset::EnumSet; +use enumset::EnumSetType; + +/// Metadata types for an XFS whole-fs metadata +#[derive(EnumSetType, Debug)] +pub enum XfsWholeFsMetadata { + FsCounters, + GrpQuota, + MetaDir, + MetaPath, + NLinks, + PrjQuota, + QuotaCheck, + UsrQuota, +} + +display_for_enum!(XfsWholeFsMetadata, { + FsCounters => M_("fscounters"), + GrpQuota => M_("grpquota"), + MetaDir => M_("metadir"), + MetaPath => M_("metapath"), + NLinks => M_("nlinks"), + PrjQuota => M_("prjquota"), + QuotaCheck => M_("quotacheck"), + UsrQuota => M_("usrquota"), +}); + +/// XFS whole-fs health event +#[derive(Debug)] +pub struct XfsWholeFsEvent { + /// What is being reported on? + metadata: EnumSet<XfsWholeFsMetadata>, + + /// Reported state + status: XfsHealthStatus, +} + +impl XfsWholeFsEvent { + /// Create a new whole-fs event object + pub fn new(metadata: EnumSet<XfsWholeFsMetadata>, status: XfsHealthStatus) -> XfsWholeFsEvent { + XfsWholeFsEvent { metadata, status } + } +} + +impl XfsHealthEvent for XfsWholeFsEvent { + fn format(&self) -> String { + format!( + "{} {} {}", + format_set(self.metadata), + M_("status"), + self.status + ) + } +} + +/// Reasons for a filesystem shutdown event +#[derive(EnumSetType, Debug)] +pub enum XfsShutdownReason { + CorruptIncore, + CorruptOndisk, + DeviceRemoved, + ForceUmount, + LogIoerr, + MetaIoerr, +} + +display_for_enum!(XfsShutdownReason, { + CorruptIncore => M_("in-memory state corruption"), + CorruptOndisk => M_("ondisk metadata corruption"), + DeviceRemoved => M_("device removed"), + ForceUmount => M_("forced unmount"), + LogIoerr => M_("log I/O error"), + MetaIoerr => M_("metadata I/O error"), +}); + +/// XFS shutdown health event +#[derive(Debug)] +pub struct XfsShutdownEvent { + /// Why did the filesystem shut down? + reasons: EnumSet<XfsShutdownReason>, +} + +impl XfsShutdownEvent { + /// Create a new whole-fs event object + pub fn new(reasons: EnumSet<XfsShutdownReason>) -> XfsShutdownEvent { + XfsShutdownEvent { reasons } + } +} + +impl XfsHealthEvent for XfsShutdownEvent { + fn must_log(&self) -> bool { + true + } + + fn format(&self) -> String { + format!( + "{} {}", + M_("filesystem shut down due to"), + format_set(self.reasons) + ) + } +} + +/// Event for the filesystem being unmounted +pub struct XfsUnmountEvent {} + +impl XfsHealthEvent for XfsUnmountEvent { + fn must_log(&self) -> bool { + true + } + + fn format(&self) -> String { + M_("filesystem unmounted") + } +} + +/// Media error event +#[derive(Debug)] +pub struct XfsMediaErrorEvent { + /// Where was the media error? + range: XfsPhysRange, +} + +impl XfsMediaErrorEvent { + /// Create a new file IO error event object + pub fn new(range: XfsPhysRange) -> XfsMediaErrorEvent { + XfsMediaErrorEvent { range } + } +} + +impl XfsHealthEvent for XfsMediaErrorEvent { + fn format(&self) -> String { + format!("{} {}", M_("media error on"), self.range) + } +} diff --git a/healer/src/healthmon/groups.rs b/healer/src/healthmon/groups.rs new file mode 100644 index 00000000000000..0c3719fc5099eb --- /dev/null +++ b/healer/src/healthmon/groups.rs @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::display_for_enum; +use crate::healthmon::event::XfsHealthEvent; +use crate::healthmon::event::XfsHealthStatus; +use crate::util::format_set; +use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; +use crate::xfsprogs::M_; +use enumset::EnumSet; +use enumset::EnumSetType; + +/// Metadata types for an allocation group on the data device +#[derive(EnumSetType, Debug)] +pub enum XfsPeragMetadata { + Agf, + Agfl, + Agi, + Bnobt, + Cntbt, + Finobt, + Inobt, + Inodes, + Refcountbt, + Rmapbt, + Super, +} + +display_for_enum!(XfsPeragMetadata, { + Agf => M_("agf"), + Agfl => M_("agfl"), + Agi => M_("agi"), + Bnobt => M_("bnobt"), + Cntbt => M_("cntbt"), + Finobt => M_("finobt"), + Inobt => M_("inobt"), + Inodes => M_("inodes"), + Refcountbt => M_("refcountbt"), + Rmapbt => M_("rmapbt"), + Super => M_("super"), +}); + +/// XFS perag health event +#[derive(Debug)] +pub struct XfsPeragEvent { + /// Allocation group number + group: XfsAgNumber, + + /// What is being reported on? + metadata: EnumSet<XfsPeragMetadata>, + + /// Reported state + status: XfsHealthStatus, +} + +impl XfsPeragEvent { + /// Create a new perag event object + pub fn new( + group: XfsAgNumber, + metadata: EnumSet<XfsPeragMetadata>, + status: XfsHealthStatus, + ) -> XfsPeragEvent { + XfsPeragEvent { + group, + metadata, + status, + } + } +} + +impl XfsHealthEvent for XfsPeragEvent { + fn format(&self) -> String { + format!( + "{} {} {}", + self.group, + format_set(self.metadata), + self.status + ) + } +} + +/// Metadata types for an allocation group on the realtime device +#[derive(EnumSetType, Debug)] +pub enum XfsRtgroupMetadata { + Bitmap, + Summary, + Refcountbt, + Rmapbt, + Super, +} + +display_for_enum!(XfsRtgroupMetadata, { + Bitmap => M_("bitmap"), + Summary => M_("summary"), + Refcountbt => M_("refcountbt"), + Rmapbt => M_("rmapbt"), + Super => M_("super"), +}); + +/// XFS rtgroup health event +#[derive(Debug)] +pub struct XfsRtgroupEvent { + /// Allocation group number + group: XfsRgNumber, + + /// What is being reported on? + metadata: EnumSet<XfsRtgroupMetadata>, + + /// Reported state + status: XfsHealthStatus, +} + +impl XfsRtgroupEvent { + /// Create a new rtgroup event object + pub fn new( + group: XfsRgNumber, + metadata: EnumSet<XfsRtgroupMetadata>, + status: XfsHealthStatus, + ) -> XfsRtgroupEvent { + XfsRtgroupEvent { + group, + metadata, + status, + } + } +} + +impl XfsHealthEvent for XfsRtgroupEvent { + fn format(&self) -> String { + format!( + "{} {} {}", + self.group, + format_set(self.metadata), + self.status + ) + } +} diff --git a/healer/src/healthmon/inodes.rs b/healer/src/healthmon/inodes.rs new file mode 100644 index 00000000000000..5fac02a9d9cbe7 --- /dev/null +++ b/healer/src/healthmon/inodes.rs @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::display_for_enum; +use crate::healthmon::event::XfsHealthEvent; +use crate::healthmon::event::XfsHealthStatus; +use crate::util::format_set; +use crate::xfs_types::{XfsFid, XfsFileRange}; +use crate::xfsprogs::M_; +use enumset::EnumSet; +use enumset::EnumSetType; + +/// Metadata types for an XFS inode +#[derive(EnumSetType, Debug)] +pub enum XfsInodeMetadata { + Bmapbta, + Bmapbtc, + Bmapbtd, + Core, + Directory, + Dirtree, + Parent, + Symlink, + Xattr, +} + +display_for_enum!(XfsInodeMetadata, { + Bmapbta => M_("attrfork"), + Bmapbtc => M_("cowfork"), + Bmapbtd => M_("datafork"), + Core => M_("core"), + Directory => M_("directory"), + Dirtree => M_("dirtree"), + Parent => M_("parent"), + Symlink => M_("symlink"), + Xattr => M_("xattr"), +}); + +/// XFS inode health event +#[derive(Debug)] +pub struct XfsInodeEvent { + /// File information + fid: XfsFid, + + /// What is being reported on? + metadata: EnumSet<XfsInodeMetadata>, + + /// Reported state + status: XfsHealthStatus, +} + +impl XfsInodeEvent { + /// Create a new inode metadata event object + pub fn new( + fid: XfsFid, + metadata: EnumSet<XfsInodeMetadata>, + status: XfsHealthStatus, + ) -> XfsInodeEvent { + XfsInodeEvent { + fid, + metadata, + status, + } + } +} + +impl XfsHealthEvent for XfsInodeEvent { + fn format(&self) -> String { + format!("{} {} {}", self.fid, format_set(self.metadata), self.status) + } +} + +/// File I/O types +#[derive(Debug)] +pub enum XfsFileIoErrorType { + Readahead, + Writeback, + DirectioRead, + DirectioWrite, +} + +display_for_enum!(XfsFileIoErrorType, { + Readahead => M_("readahead"), + Writeback => M_("writeback"), + DirectioRead => M_("directio_read"), + DirectioWrite => M_("directio_write"), +}); + +/// XFS file I/O error event +#[derive(Debug)] +pub struct XfsFileIoErrorEvent { + /// What file I/O went wrong? + iotype: XfsFileIoErrorType, + + /// Which file? + fid: XfsFid, + + /// Which file and where? + range: XfsFileRange, +} + +impl XfsFileIoErrorEvent { + /// Create a new file IO error event object + pub fn new( + iotype: XfsFileIoErrorType, + fid: XfsFid, + range: XfsFileRange, + ) -> XfsFileIoErrorEvent { + XfsFileIoErrorEvent { iotype, fid, range } + } +} + +impl XfsHealthEvent for XfsFileIoErrorEvent { + fn format(&self) -> String { + format!("{} {} {}", self.fid, self.iotype, self.range) + } +} diff --git a/healer/src/healthmon/mod.rs b/healer/src/healthmon/mod.rs new file mode 100644 index 00000000000000..a22248398a53a7 --- /dev/null +++ b/healer/src/healthmon/mod.rs @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::xfs_fs::xfs_health_monitor; +use nix::ioctl_write_ptr; + +pub mod event; +pub mod fs; +pub mod groups; +pub mod inodes; + +ioctl_write_ptr!(xfs_ioc_health_monitor, 'X', 68, xfs_health_monitor); diff --git a/healer/src/lib.rs b/healer/src/lib.rs index 9455ed840b3ab0..e9b4795be00904 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -6,3 +6,6 @@ pub mod xfsprogs; pub mod xfs_fs; +pub mod xfs_types; +pub mod util; +pub mod healthmon; diff --git a/healer/src/main.rs b/healer/src/main.rs index d43640e140d46c..3908dcd23922da 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -8,6 +8,7 @@ use clap::{value_parser, Arg, ArgAction, ArgMatches, Command}; use std::fs::File; use std::path::PathBuf; use std::process::ExitCode; +use xfs_healer::printlogln; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -93,12 +94,12 @@ fn main() -> ExitCode { let args = Cli::new(); if args.0.get_flag("version") { - println!("{} {}", M_("xfs_healer version"), xfsprogs::VERSION); + printlogln!("{} {}", M_("xfs_healer version"), xfsprogs::VERSION); return ExitCode::SUCCESS; } if args.0.get_flag("debug") { - println!("args: {:?}", args); + printlogln!("args: {:?}", args); } let app: App = args.into(); diff --git a/healer/src/util.rs b/healer/src/util.rs new file mode 100644 index 00000000000000..bce48f83b01da0 --- /dev/null +++ b/healer/src/util.rs @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use enumset::EnumSet; +use enumset::EnumSetType; +use std::fmt::Display; + +/// Simple macro for creating errors for badly formatted event data. The first parameter describes +/// why the data is bad; the second is the target type; and the third is value provided. +#[macro_export] +macro_rules! baddata { + ($message:expr , $type:tt , $value:expr) => {{ + match (&$message, &$value) { + (message_val, value_val) => { + let s = format!( + "{}: {} {} {}.", + value_val, + message_val, + $crate::xfsprogs::M_("for"), + std::any::type_name::<$type>() + ); + std::io::Error::new(std::io::ErrorKind::InvalidData, s) + } + } + }}; +} + +/// Write a line to standard output and flush it. +#[macro_export] +macro_rules! printlogln { + ( $($t:tt)* ) => { + { + use std::io::Write; + let mut h = std::io::stdout().lock(); + write!(h, $($t)* ).unwrap(); + write!(h, "\n").unwrap(); + h.flush().unwrap(); + } + } +} + +/// Boilerplate to stamp out functions to convert an enum to some sort of pretty string. +// XXX: This could have been a derive macro +#[macro_export] +macro_rules! display_for_enum { + ($enum_type:ty , { $($a:ident => $b:expr,)+ } ) => { + impl std::fmt::Display for $enum_type { + /// Convert from an enum to a string + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "{}", match self { $(<$enum_type>::$a => $b,)+ }) + } + } + }; +} + +/// Format an enum set into a string +pub fn format_set<T: EnumSetType + Display>(f: EnumSet<T>) -> String { + let mut ret = "".to_string(); + let mut is_first = true; + + for v in f { + if !is_first { + ret.push_str(", "); + } + is_first = false; + ret.push_str(&format!("{}", v)); + } + + ret +} diff --git a/healer/src/xfs_types.rs b/healer/src/xfs_types.rs new file mode 100644 index 00000000000000..5ce1d73d8e9342 --- /dev/null +++ b/healer/src/xfs_types.rs @@ -0,0 +1,240 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::baddata; +use crate::display_for_enum; +use crate::xfsprogs::M_; +use anyhow::{Error, Result}; +use std::fmt::Display; +use std::fmt::Formatter; + +/// Allocation group number on the data device +#[derive(Debug)] +pub struct XfsAgNumber(u32); + +impl TryFrom<u64> for XfsAgNumber { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > i32::MAX as u64 { + Err(baddata!(M_("AG number too large"), Self, v).into()) + } else { + Ok(XfsAgNumber(v as u32)) + } + } +} + +impl Display for XfsAgNumber { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", M_("agno"), self.0) + } +} + +/// Realtime group number on the realtime device +#[derive(Debug)] +pub struct XfsRgNumber(u32); + +impl Display for XfsRgNumber { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", M_("rgno"), self.0) + } +} + +impl TryFrom<u64> for XfsRgNumber { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > i32::MAX as u64 { + Err(baddata!(M_("rtgroup number too large"), Self, v).into()) + } else { + Ok(XfsRgNumber(v as u32)) + } + } +} + +/// Disk devices +#[derive(Debug)] +pub enum XfsDevice { + Data, + Log, + Realtime, +} + +display_for_enum!(XfsDevice, { + Data => M_("datadev"), + Log => M_("logdev"), + Realtime => M_("rtdev"), +}); + +/// Disk address, in 512b units +#[derive(Debug)] +pub struct XfsDaddr(u64); + +impl Display for XfsDaddr { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {:#x}", M_("daddr"), self.0) + } +} + +impl From<u64> for XfsDaddr { + fn from(v: u64) -> XfsDaddr { + XfsDaddr(v) + } +} + +/// Disk space length, in 512b units +#[derive(Debug)] +pub struct XfsBbcount(u64); + +impl Display for XfsBbcount { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {:#x}", M_("bbcount"), self.0) + } +} + +impl From<u64> for XfsBbcount { + fn from(v: u64) -> XfsBbcount { + XfsBbcount(v) + } +} + +/// Range of physical storage +#[derive(Debug)] +pub struct XfsPhysRange { + /// Which device is this? + pub device: XfsDevice, + + /// Start of the range, in 512b units + pub daddr: XfsDaddr, + + /// Size of the range, in 512b units + pub bbcount: XfsBbcount, +} + +impl Display for XfsPhysRange { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {} {}", self.device, self.daddr, self.bbcount) + } +} + +/// Inode number +#[derive(Debug)] +pub struct XfsIno(u64); + +impl Display for XfsIno { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", M_("ino"), self.0) + } +} + +impl TryFrom<u64> for XfsIno { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > i64::MAX as u64 { + Err(baddata!(M_("inode number too large"), Self, v).into()) + } else { + Ok(XfsIno(v)) + } + } +} + +/// Inode generation number +#[derive(Debug)] +pub struct XfsIgeneration(u32); + +impl Display for XfsIgeneration { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {:#x}", M_("gen"), self.0) + } +} + +impl TryFrom<u64> for XfsIgeneration { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > u32::MAX as u64 { + Err(baddata!(M_("inode generation number too large"), Self, v).into()) + } else { + Ok(XfsIgeneration(v as u32)) + } + } +} + +/// Miniature FID for a handle +#[derive(Debug)] +pub struct XfsFid { + /// Inode number + pub ino: XfsIno, + + /// Inode generation + pub gen: XfsIgeneration, +} + +impl Display for XfsFid { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", self.ino, self.gen) + } +} + +/// File position +#[derive(Debug)] +pub struct XfsPos(u64); + +impl Display for XfsPos { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", M_("pos"), self.0) + } +} + +impl TryFrom<u64> for XfsPos { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > i64::MAX as u64 { + Err(baddata!(M_("file position too large"), Self, v).into()) + } else { + Ok(XfsPos(v)) + } + } +} + +/// File IO length +#[derive(Debug)] +pub struct XfsIoLen(i64); + +impl Display for XfsIoLen { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", M_("len"), self.0) + } +} + +impl TryFrom<u64> for XfsIoLen { + type Error = Error; + + fn try_from(v: u64) -> Result<Self> { + if v > i64::MAX as u64 { + Err(baddata!(M_("file IO length too large"), Self, v).into()) + } else { + Ok(XfsIoLen(v as i64)) + } + } +} + +/// Range of a file's bytes +#[derive(Debug)] +pub struct XfsFileRange { + /// Start of range, in bytes + pub pos: XfsPos, + + /// Length of range, in bytes + pub len: XfsIoLen, +} + +impl Display for XfsFileRange { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + write!(f, "{} {}", self.pos, self.len) + } +} diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index 0c25d7fba02243..4b426f968c263c 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -129,6 +129,8 @@ AC_DEFUN([AC_HAVE_HEALER_CRATES], clap = { version = "4.0.32", features = [["derive"]] } anyhow = { version = "1.0.69" } $gettext_dep +nix = { version = "0", features = [["ioctl"]] } # 0.26.1 +enumset = { version = "1.0.12" } ], [yes], [no]) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 05/19] xfs_healer: read binary health events from the kernel 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (3 preceding siblings ...) 2025-10-23 0:13 ` [PATCH 04/19] xfs_healer: define Rust objects for health events and kernel interface Darrick J. Wong @ 2025-10-23 0:13 ` Darrick J. Wong 2025-10-23 0:13 ` [PATCH 06/19] xfs_healer: read json " Darrick J. Wong ` (14 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:13 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Decode binary health events objects read from the kernel into the corresponding Rust objects so that we can deal with the events. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 healer/src/healthmon/cstruct.rs | 343 +++++++++++++++++++++++++++++++++++++++ healer/src/healthmon/mod.rs | 1 healer/src/main.rs | 25 +++ 4 files changed, 369 insertions(+), 1 deletion(-) create mode 100644 healer/src/healthmon/cstruct.rs diff --git a/healer/Makefile b/healer/Makefile index 5df3ca105e143a..c40663bcc79075 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -23,6 +23,7 @@ RUSTFILES = \ src/xfs_fs.rs \ src/xfsprogs.rs \ src/xfs_types.rs \ + src/healthmon/cstruct.rs \ src/healthmon/event.rs \ src/healthmon/fs.rs \ src/healthmon/groups.rs \ diff --git a/healer/src/healthmon/cstruct.rs b/healer/src/healthmon/cstruct.rs new file mode 100644 index 00000000000000..58463b0f6fa5b9 --- /dev/null +++ b/healer/src/healthmon/cstruct.rs @@ -0,0 +1,343 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::baddata; +use crate::healthmon::event::LostEvent; +use crate::healthmon::event::RunningEvent; +use crate::healthmon::event::UnknownEvent; +use crate::healthmon::event::XfsHealthEvent; +use crate::healthmon::event::XfsHealthStatus; +use crate::healthmon::fs::XfsMediaErrorEvent; +use crate::healthmon::fs::XfsUnmountEvent; +use crate::healthmon::fs::{XfsShutdownEvent, XfsShutdownReason}; +use crate::healthmon::fs::{XfsWholeFsEvent, XfsWholeFsMetadata}; +use crate::healthmon::groups::{XfsPeragEvent, XfsPeragMetadata}; +use crate::healthmon::groups::{XfsRtgroupEvent, XfsRtgroupMetadata}; +use crate::healthmon::inodes::{XfsFileIoErrorEvent, XfsFileIoErrorType}; +use crate::healthmon::inodes::{XfsInodeEvent, XfsInodeMetadata}; +use crate::healthmon::xfs_ioc_health_monitor; +use crate::xfs_fs; +use crate::xfs_fs::xfs_health_monitor; +use crate::xfs_fs::xfs_health_monitor_event; +use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; +use crate::xfs_types::{XfsDevice, XfsPhysRange}; +use crate::xfs_types::{XfsFid, XfsFileRange, XfsIgeneration, XfsIno, XfsIoLen, XfsPos}; +use crate::xfsprogs::M_; +use anyhow::{Context, Result}; +use std::fs::File; +use std::io::BufReader; +use std::io::ErrorKind; +use std::io::Read; +use std::os::fd::AsRawFd; +use std::os::fd::FromRawFd; +use std::path::Path; + +/// Boilerplate to stamp out functions to convert a u32 mask to an enumset +/// of the given enum type. +macro_rules! enum_set_from_mask { + ($enum_type:ty , $err_msg:expr , { $($a:ident => $b:ident,)+ } ) => { + impl $enum_type { + /// Convert from a bitmask to a set of enum + pub fn from_mask(mask: u32) -> std::io::Result<enumset::EnumSet<$enum_type>> { + let mut ret = enumset::EnumSet::new(); + let badmask = 0 | + $($crate::xfs_fs::$a | )+ + 0; + if mask & !badmask != 0 { return Err(baddata!($err_msg, $enum_type, mask)); } + $(if mask & $crate::xfs_fs::$a != 0 { ret |= <$enum_type>::$b; })+ + Ok(ret) + } + } + }; +} + +/// Boilerplate to stamp out functions to convert a u32 field to the given enum +/// type. +macro_rules! enum_from_field { + ($enum_type:ty , { $($a:ident => $b:ident,)+ } ) => { + impl $enum_type { + /// Convert from a u32 field to an enum + pub fn from_value(value: u32) -> std::io::Result<$enum_type> { + $(if value == $crate::xfs_fs::$a { return Ok(<$enum_type>::$b); })+ + Err(baddata!($crate::xfsprogs::M_("Unknown value"), $enum_type, value)) + } + } + }; +} + +/// Iterator object that returns health events in binary +pub struct CStructMonitor<'a> { + /// health monitor fd + objiter: BufReader<File>, + + /// path to the filesystem mountpoint + mountpoint: &'a Path, +} + +impl CStructMonitor<'_> { + /// Open a health monitor for an open file on an XFS filesystem + pub fn try_new(fp: File, mountpoint: &Path, everything: bool) -> Result<CStructMonitor> { + let mut hminfo = xfs_health_monitor { + format: xfs_fs::XFS_HEALTH_MONITOR_FMT_CSTRUCT as u8, + ..Default::default() + }; + + if everything { + hminfo.flags |= xfs_fs::XFS_HEALTH_MONITOR_VERBOSE as u64; + } + + // SAFETY: Trusting the kernel ioctl not to corrupt stack contents, and to return us a valid + // file description number. + let health_fp = unsafe { + let health_fd = xfs_ioc_health_monitor(fp.as_raw_fd(), &hminfo)?; + File::from_raw_fd(health_fd) + }; + drop(fp); + + Ok(CStructMonitor { + objiter: BufReader::new(health_fp), + mountpoint, + }) + } +} + +enum_from_field!(XfsHealthStatus, { + XFS_HEALTH_MONITOR_TYPE_SICK => Sick, + XFS_HEALTH_MONITOR_TYPE_CORRUPT => Corrupt, + XFS_HEALTH_MONITOR_TYPE_HEALTHY => Healthy, +}); + +enum_set_from_mask!(XfsPeragMetadata, M_("Unknown per-AG metadata"), { + XFS_AG_GEOM_SICK_AGF => Agf, + XFS_AG_GEOM_SICK_AGFL => Agfl, + XFS_AG_GEOM_SICK_AGI => Agi, + XFS_AG_GEOM_SICK_BNOBT => Bnobt, + XFS_AG_GEOM_SICK_CNTBT => Cntbt, + XFS_AG_GEOM_SICK_FINOBT => Finobt, + XFS_AG_GEOM_SICK_INOBT => Inobt, + XFS_AG_GEOM_SICK_INODES => Inodes, + XFS_AG_GEOM_SICK_REFCNTBT => Refcountbt, + XFS_AG_GEOM_SICK_RMAPBT => Rmapbt, + XFS_AG_GEOM_SICK_SB => Super, +}); + +/// Create a per-AG health event from C structure +fn perag_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let ge = unsafe { v.e.group }; + + Ok(Box::new(XfsPeragEvent::new( + XfsAgNumber::try_from(ge.gno as u64).with_context(|| M_("Reading per-AG event"))?, + XfsPeragMetadata::from_mask(ge.mask).with_context(|| M_("Reading per-AG event"))?, + XfsHealthStatus::from_value(v.type_).with_context(|| M_("Reading per-AG event"))?, + ))) +} + +/// Create a rtgroup health event from C structure +fn rtgroup_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let ge = unsafe { v.e.group }; + + Ok(Box::new(XfsRtgroupEvent::new( + XfsRgNumber::try_from(ge.gno as u64).with_context(|| M_("Reading rtgroup event"))?, + XfsRtgroupMetadata::from_mask(ge.mask).with_context(|| M_("Reading rtgroup event"))?, + XfsHealthStatus::from_value(v.type_).with_context(|| M_("Reading rtgroup event"))?, + ))) +} + +enum_set_from_mask!(XfsRtgroupMetadata, M_("Unknown rtgroup metadata"), { + XFS_RTGROUP_GEOM_SICK_BITMAP => Bitmap, + XFS_RTGROUP_GEOM_SICK_SUMMARY => Summary, + XFS_RTGROUP_GEOM_SICK_REFCNTBT => Refcountbt, + XFS_RTGROUP_GEOM_SICK_RMAPBT => Rmapbt, + XFS_RTGROUP_GEOM_SICK_SUPER => Super, +}); + +enum_set_from_mask!(XfsInodeMetadata, M_("Unknown inode metadata"), { + XFS_BS_SICK_BMBTA => Bmapbta, + XFS_BS_SICK_BMBTC => Bmapbtc, + XFS_BS_SICK_BMBTD => Bmapbtd, + XFS_BS_SICK_INODE => Core, + XFS_BS_SICK_DIR => Directory, + XFS_BS_SICK_DIRTREE => Dirtree, + XFS_BS_SICK_PARENT => Parent, + XFS_BS_SICK_SYMLINK => Symlink, + XFS_BS_SICK_XATTR => Xattr, +}); + +/// Create an inode health event from C structure +fn inode_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let ie = unsafe { v.e.inode }; + + Ok(Box::new(XfsInodeEvent::new( + XfsFid { + ino: XfsIno::try_from(ie.ino).with_context(|| M_("Reading inode event"))?, + gen: XfsIgeneration::try_from(ie.gen as u64) + .with_context(|| M_("Reading inode event"))?, + }, + XfsInodeMetadata::from_mask(ie.mask).with_context(|| M_("Reading inode event"))?, + XfsHealthStatus::from_value(v.type_).with_context(|| M_("Reading inode event"))?, + ))) +} + +enum_from_field!(XfsFileIoErrorType, { + XFS_HEALTH_MONITOR_TYPE_BUFREAD => Readahead, + XFS_HEALTH_MONITOR_TYPE_BUFWRITE => Writeback, + XFS_HEALTH_MONITOR_TYPE_DIOREAD => DirectioRead, + XFS_HEALTH_MONITOR_TYPE_DIOWRITE => DirectioWrite, +}); + +/// Create a file I/O error event from a C struct +fn file_io_error_event_from_cstruct( + v: xfs_health_monitor_event, +) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let fe = unsafe { v.e.filerange }; + + Ok(Box::new(XfsFileIoErrorEvent::new( + XfsFileIoErrorType::from_value(v.type_).with_context(|| M_("Reading file I/O event"))?, + XfsFid { + ino: XfsIno::try_from(fe.ino).with_context(|| M_("Reading file I/O event"))?, + gen: XfsIgeneration::try_from(fe.gen as u64) + .with_context(|| M_("Reading file I/O event"))?, + }, + XfsFileRange { + pos: XfsPos::try_from(fe.pos).with_context(|| M_("Reading file I/O event"))?, + len: XfsIoLen::try_from(fe.len).with_context(|| M_("Reading file I/O event"))?, + }, + ))) +} + +enum_set_from_mask!(XfsWholeFsMetadata, M_("Unknown whole-fs metadata"), { + XFS_FSOP_GEOM_SICK_COUNTERS => FsCounters, + XFS_FSOP_GEOM_SICK_GQUOTA => GrpQuota, + XFS_FSOP_GEOM_SICK_NLINKS => NLinks, + XFS_FSOP_GEOM_SICK_PQUOTA => PrjQuota, + XFS_FSOP_GEOM_SICK_QUOTACHECK => QuotaCheck, + XFS_FSOP_GEOM_SICK_UQUOTA => UsrQuota, + XFS_FSOP_GEOM_SICK_METADIR => MetaDir, + XFS_FSOP_GEOM_SICK_METAPATH => MetaPath, +}); + +/// Create an whole-fs health event from a C struct +fn wholefs_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let fe = unsafe { v.e.fs }; + + Ok(Box::new(XfsWholeFsEvent::new( + XfsWholeFsMetadata::from_mask(fe.mask).with_context(|| M_("Reading whole-fs event"))?, + XfsHealthStatus::from_value(v.type_).with_context(|| M_("Reading whole-fs event"))?, + ))) +} + +enum_set_from_mask!(XfsShutdownReason, M_("Unknown fs shutdown reason"), { + XFS_HEALTH_SHUTDOWN_META_IO_ERROR => MetaIoerr, + XFS_HEALTH_SHUTDOWN_LOG_IO_ERROR => LogIoerr, + XFS_HEALTH_SHUTDOWN_FORCE_UMOUNT => ForceUmount, + XFS_HEALTH_SHUTDOWN_CORRUPT_INCORE => CorruptIncore, + XFS_HEALTH_SHUTDOWN_CORRUPT_ONDISK => CorruptOndisk, + XFS_HEALTH_SHUTDOWN_DEVICE_REMOVED => DeviceRemoved, +}); + +/// Create an shutdown event from a C struct +fn shutdown_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let se = unsafe { v.e.shutdown }; + + Ok(Box::new(XfsShutdownEvent::new( + XfsShutdownReason::from_mask(se.reasons) + .with_context(|| M_("Reading fs shutdown event"))?, + ))) +} + +enum_from_field!(XfsDevice, { + XFS_HEALTH_MONITOR_DOMAIN_DATADEV => Data, + XFS_HEALTH_MONITOR_DOMAIN_RTDEV => Realtime, + XFS_HEALTH_MONITOR_DOMAIN_LOGDEV => Log, +}); + +/// Create a media error event from a C struct +fn media_error_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let me = unsafe { v.e.media }; + + Ok(Box::new(XfsMediaErrorEvent::new(XfsPhysRange { + device: XfsDevice::from_value(v.domain).with_context(|| M_("Reading media error event"))?, + daddr: me.daddr.into(), + bbcount: me.bbcount.into(), + }))) +} + +/// Create event for the kernel telling us that it lost an event +fn lost_event_from_cstruct(v: xfs_health_monitor_event) -> Result<Box<dyn XfsHealthEvent>> { + // SAFETY: Union access checked by caller + let le = unsafe { v.e.lost }; + + Ok(Box::new(LostEvent::new(le.count))) +} + +impl xfs_health_monitor_event { + /// Return an event object that can react to a health event. + pub fn cook(self) -> Result<Box<dyn XfsHealthEvent>> { + match self.domain { + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_RTGROUP => rtgroup_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_AG => perag_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_INODE => inode_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_FS => wholefs_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_MOUNT => match self.type_ { + xfs_fs::XFS_HEALTH_MONITOR_TYPE_LOST => lost_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_TYPE_SHUTDOWN => shutdown_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_TYPE_UNMOUNT => Ok(Box::new(XfsUnmountEvent {})), + + xfs_fs::XFS_HEALTH_MONITOR_TYPE_RUNNING => Ok(Box::new(RunningEvent {})), + + _ => Ok(Box::new(UnknownEvent {})), + }, + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_DATADEV + | xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_LOGDEV + | xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_RTDEV => media_error_event_from_cstruct(self), + + xfs_fs::XFS_HEALTH_MONITOR_DOMAIN_FILERANGE => file_io_error_event_from_cstruct(self), + + _ => Ok(Box::new(UnknownEvent {})), + } + } +} + +impl Iterator for CStructMonitor<'_> { + type Item = xfs_health_monitor_event; + + /// Return health monitoring events + fn next(&mut self) -> Option<Self::Item> { + let sz = std::mem::size_of::<xfs_health_monitor_event>(); + let mut buf: Vec<u8> = vec![0; sz]; + if let Err(e) = self.objiter.read_exact(&mut buf) { + if e.kind() != ErrorKind::UnexpectedEof { + eprintln!( + "{}: {}: {:#}", + self.mountpoint.display(), + M_("Reading event blob"), + e + ); + } + return None; + }; + + let hme: *const xfs_health_monitor_event = buf.as_ptr() as *const xfs_health_monitor_event; + + // SAFETY: Copying from a Vec that we sized to fit one xfs_health_monitor_event into an + // object of that type. + let ret: xfs_health_monitor_event = unsafe { *hme }; + Some(ret) + } +} diff --git a/healer/src/healthmon/mod.rs b/healer/src/healthmon/mod.rs index a22248398a53a7..ebafd767452349 100644 --- a/healer/src/healthmon/mod.rs +++ b/healer/src/healthmon/mod.rs @@ -6,6 +6,7 @@ use crate::xfs_fs::xfs_health_monitor; use nix::ioctl_write_ptr; +pub mod cstruct; pub mod event; pub mod fs; pub mod groups; diff --git a/healer/src/main.rs b/healer/src/main.rs index 3908dcd23922da..3d4d91b17708dd 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -8,6 +8,8 @@ use clap::{value_parser, Arg, ArgAction, ArgMatches, Command}; use std::fs::File; use std::path::PathBuf; use std::process::ExitCode; +use xfs_healer::healthmon::cstruct::CStructMonitor; +use xfs_healer::healthmon::event::XfsHealthEvent; use xfs_healer::printlogln; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -70,9 +72,30 @@ impl App { self.path.display().to_string() } + /// Handle a health event that has been decoded into real objects + fn process_event(&self, cooked: Result<Box<dyn XfsHealthEvent>>) { + match cooked { + Err(e) => { + eprintln!("{}: {:#}", self.path.display(), e) + } + Ok(event) => { + if self.log || event.must_log() { + printlogln!("{}: {}", self.path.display(), event.format()); + } + } + } + } + /// Main app method fn main(&self) -> Result<ExitCode> { - let _fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + + let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) + .with_context(|| M_("Opening health monitor file"))?; + + for raw_event in hmon { + self.process_event(raw_event.cook()); + } Ok(ExitCode::SUCCESS) } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 06/19] xfs_healer: read json health events from the kernel 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (4 preceding siblings ...) 2025-10-23 0:13 ` [PATCH 05/19] xfs_healer: read binary health events from the kernel Darrick J. Wong @ 2025-10-23 0:13 ` Darrick J. Wong 2025-10-23 0:14 ` [PATCH 07/19] xfs_healer: create a weak file handle so we don't pin the mount Darrick J. Wong ` (13 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:13 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The kernel can give us filesystem health events in json, so let's use the json deserializer to turn them into Rust associative arrays and return them from our iterator. This isn't totally necessary since we have the C structure variant, but it'll help us test the other interface. Note that we use a fair amount of EnumString magic to automatically provide translators for the json. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Cargo.toml.in | 3 healer/Makefile | 1 healer/src/healthmon/event.rs | 4 healer/src/healthmon/fs.rs | 7 + healer/src/healthmon/groups.rs | 7 + healer/src/healthmon/inodes.rs | 7 + healer/src/healthmon/json.rs | 398 ++++++++++++++++++++++++++++++++++++++++ healer/src/healthmon/mod.rs | 1 healer/src/main.rs | 26 ++- healer/src/xfs_types.rs | 8 + m4/package_rust.m4 | 3 11 files changed, 453 insertions(+), 12 deletions(-) create mode 100644 healer/src/healthmon/json.rs diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in index 04e9df5c1a2a70..fcf7f7a6d9373b 100644 --- a/healer/Cargo.toml.in +++ b/healer/Cargo.toml.in @@ -17,6 +17,7 @@ lto = @cargo_lto@ clap = { version = "4.0.32", features = ["derive"] } anyhow = { version = "1.0.69" } enumset = { version = "1.0.12" } +serde_json = { version = "1.0.87" } # XXX: Crates with major version 0 are not considered ABI-stable, so the minor # version is treated as if it were the major version. This creates problems @@ -25,6 +26,8 @@ enumset = { version = "1.0.12" } # break. Ref: # https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html nix = { version = "0", features = ["ioctl"] } # 0.26.1 +strum = { version = "0" } # 0.19.2 +strum_macros = { version = "0" } # 0.19.2 # Dynamically comment out all the gettextrs related dependency information in # Cargo.toml becuse cargo requires the crate to be present so that it can diff --git a/healer/Makefile b/healer/Makefile index c40663bcc79075..515238982aad24 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -28,6 +28,7 @@ RUSTFILES = \ src/healthmon/fs.rs \ src/healthmon/groups.rs \ src/healthmon/inodes.rs \ + src/healthmon/json.rs \ src/healthmon/mod.rs BUILT_RUSTFILES = \ diff --git a/healer/src/healthmon/event.rs b/healer/src/healthmon/event.rs index fe15156ca9e95a..b7a0effab94a3c 100644 --- a/healer/src/healthmon/event.rs +++ b/healer/src/healthmon/event.rs @@ -5,6 +5,7 @@ */ use crate::display_for_enum; use crate::xfsprogs::M_; +use strum_macros::EnumString; /// Common behaviors of all health events pub trait XfsHealthEvent { @@ -18,7 +19,8 @@ pub trait XfsHealthEvent { } /// Health status for metadata events -#[derive(Debug)] +#[derive(Debug, EnumString)] +#[strum(serialize_all = "lowercase")] pub enum XfsHealthStatus { /// Problems have been observed at runtime Sick, diff --git a/healer/src/healthmon/fs.rs b/healer/src/healthmon/fs.rs index ca50683dce7f04..f216867acdf71a 100644 --- a/healer/src/healthmon/fs.rs +++ b/healer/src/healthmon/fs.rs @@ -11,9 +11,11 @@ use crate::xfs_types::XfsPhysRange; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use strum_macros::EnumString; /// Metadata types for an XFS whole-fs metadata -#[derive(EnumSetType, Debug)] +#[derive(EnumSetType, Debug, EnumString)] +#[strum(serialize_all = "lowercase")] pub enum XfsWholeFsMetadata { FsCounters, GrpQuota, @@ -65,7 +67,8 @@ impl XfsHealthEvent for XfsWholeFsEvent { } /// Reasons for a filesystem shutdown event -#[derive(EnumSetType, Debug)] +#[derive(EnumSetType, Debug, EnumString)] +#[strum(serialize_all = "snake_case")] pub enum XfsShutdownReason { CorruptIncore, CorruptOndisk, diff --git a/healer/src/healthmon/groups.rs b/healer/src/healthmon/groups.rs index 0c3719fc5099eb..4384de50b4c63f 100644 --- a/healer/src/healthmon/groups.rs +++ b/healer/src/healthmon/groups.rs @@ -11,9 +11,11 @@ use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use strum_macros::EnumString; /// Metadata types for an allocation group on the data device -#[derive(EnumSetType, Debug)] +#[derive(EnumSetType, Debug, EnumString)] +#[strum(serialize_all = "lowercase")] pub enum XfsPeragMetadata { Agf, Agfl, @@ -82,7 +84,8 @@ impl XfsHealthEvent for XfsPeragEvent { } /// Metadata types for an allocation group on the realtime device -#[derive(EnumSetType, Debug)] +#[derive(EnumSetType, Debug, EnumString)] +#[strum(serialize_all = "lowercase")] pub enum XfsRtgroupMetadata { Bitmap, Summary, diff --git a/healer/src/healthmon/inodes.rs b/healer/src/healthmon/inodes.rs index 5fac02a9d9cbe7..5775f9ffa69b6b 100644 --- a/healer/src/healthmon/inodes.rs +++ b/healer/src/healthmon/inodes.rs @@ -11,9 +11,11 @@ use crate::xfs_types::{XfsFid, XfsFileRange}; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use strum_macros::EnumString; /// Metadata types for an XFS inode -#[derive(EnumSetType, Debug)] +#[derive(EnumSetType, Debug, EnumString)] +#[strum(serialize_all = "lowercase")] pub enum XfsInodeMetadata { Bmapbta, Bmapbtc, @@ -73,7 +75,8 @@ impl XfsHealthEvent for XfsInodeEvent { } /// File I/O types -#[derive(Debug)] +#[derive(Debug, EnumString)] +#[strum(serialize_all = "snake_case")] pub enum XfsFileIoErrorType { Readahead, Writeback, diff --git a/healer/src/healthmon/json.rs b/healer/src/healthmon/json.rs new file mode 100644 index 00000000000000..2fae6f4b48e68b --- /dev/null +++ b/healer/src/healthmon/json.rs @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::baddata; +use crate::healthmon::event::LostEvent; +use crate::healthmon::event::RunningEvent; +use crate::healthmon::event::UnknownEvent; +use crate::healthmon::event::XfsHealthEvent; +use crate::healthmon::event::XfsHealthStatus; +use crate::healthmon::fs::XfsMediaErrorEvent; +use crate::healthmon::fs::XfsUnmountEvent; +use crate::healthmon::fs::{XfsShutdownEvent, XfsShutdownReason}; +use crate::healthmon::fs::{XfsWholeFsEvent, XfsWholeFsMetadata}; +use crate::healthmon::groups::{XfsPeragEvent, XfsPeragMetadata}; +use crate::healthmon::groups::{XfsRtgroupEvent, XfsRtgroupMetadata}; +use crate::healthmon::inodes::{XfsFileIoErrorEvent, XfsFileIoErrorType}; +use crate::healthmon::inodes::{XfsInodeEvent, XfsInodeMetadata}; +use crate::healthmon::xfs_ioc_health_monitor; +use crate::printlogln; +use crate::xfs_fs; +use crate::xfs_fs::xfs_health_monitor; +use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; +use crate::xfs_types::{XfsDevice, XfsPhysRange}; +use crate::xfs_types::{XfsFid, XfsFileRange, XfsIgeneration, XfsIno, XfsIoLen, XfsPos}; +use crate::xfsprogs::M_; +use anyhow::{Context, Error, Result}; +use serde_json::from_str; +use serde_json::Value; +use std::fmt::Display; +use std::fmt::Formatter; +use std::fs::File; +use std::io::BufRead; +use std::io::BufReader; +use std::io::Lines; +use std::os::fd::AsRawFd; +use std::os::fd::FromRawFd; +use std::path::Path; +use std::str::FromStr; + +/// Boilerplate to stamp out functions to convert json array to an enumset +/// of the given enum type; or return an error with the given message. +// XXX: Not sure how to make this a TryFrom on EnumSet<T>. +macro_rules! enum_set_from_json { + ($enum_type:ty , $err_msg:expr) => { + impl $enum_type { + /// Convert from an array of json to a set of enum + pub fn try_set_from( + v: &serde_json::Value, + ) -> std::io::Result<enumset::EnumSet<$enum_type>> { + let array = v.as_array().ok_or(baddata!( + $crate::xfsprogs::M_("Not an array"), + $enum_type, + v + ))?; + let mut set = enumset::EnumSet::new(); + + for jsvalue in array { + let value = jsvalue.as_str().ok_or(baddata!( + $crate::xfsprogs::M_("Not a string"), + $enum_type, + jsvalue + ))?; + set |= match <$enum_type>::from_str(value) { + Ok(o) => o, + Err(_) => return Err(baddata!($err_msg, $enum_type, value)), + }; + } + Ok(set) + } + } + }; +} + +/// Boilerplate to stamp out functions to convert json array to the given enum +/// type; or return an error with the given message. +macro_rules! enum_from_json { + ($enum_type:ty , $err_msg:expr) => { + impl TryFrom<&serde_json::Value> for $enum_type { + type Error = std::io::Error; + + /// Convert from a json value to an enum + fn try_from(v: &serde_json::Value) -> std::io::Result<$enum_type> { + let value = v.as_str().ok_or(baddata!( + $crate::xfsprogs::M_("Not a string"), + $enum_type, + v + ))?; + match <$enum_type>::from_str(value) { + Ok(o) => Ok(o), + Err(_) => return Err(baddata!($err_msg, $enum_type, value)), + } + } + } + }; +} + +/// Iterator object that returns health events in json +pub struct JsonMonitor<'a> { + /// health monitor fd, but wrapped to iterate lines as they come in + lineiter: Lines<BufReader<File>>, + + /// path to the filesystem mountpoint + mountpoint: &'a Path, + + /// are we debugging? + debug: bool, +} + +impl JsonMonitor<'_> { + /// Open a health monitor for an open file on an XFS filesystem + pub fn try_new( + fp: File, + mountpoint: &Path, + everything: bool, + debug: bool, + ) -> Result<JsonMonitor> { + let mut hminfo = xfs_health_monitor { + format: xfs_fs::XFS_HEALTH_MONITOR_FMT_JSON as u8, + ..Default::default() + }; + + if everything { + hminfo.flags |= xfs_fs::XFS_HEALTH_MONITOR_VERBOSE as u64; + } + + // SAFETY: Trusting the kernel ioctl not to corrupt stack contents, and to return us a valid + // file description number. + let health_fp = unsafe { + let health_fd = xfs_ioc_health_monitor(fp.as_raw_fd(), &hminfo)?; + File::from_raw_fd(health_fd) + }; + drop(fp); + + Ok(JsonMonitor { + lineiter: BufReader::new(health_fp).lines(), + mountpoint, + debug, + }) + } +} + +/// Raw health event, used to create the real objects +pub struct JsonEventWrapper(Vec<String>); + +impl JsonEventWrapper { + /// Push a string into the event string collection + fn push(&mut self, s: String) { + self.0.push(s) + } +} + +impl TryFrom<JsonEventWrapper> for Value { + type Error = serde_json::Error; + + /// Return a json value from this raw event + fn try_from(val: JsonEventWrapper) -> serde_json::Result<Self> { + from_str(&val.0.join("")) + } +} + +impl TryFrom<&Value> for XfsAgNumber { + type Error = Error; + + /// Extract group number from a json value + fn try_from(v: &Value) -> Result<Self> { + let m = v + .as_u64() + .ok_or(baddata!(M_("AG number must be integer"), Self, v))?; + XfsAgNumber::try_from(m) + } +} + +enum_from_json!(XfsHealthStatus, M_("Unknown health event status")); + +enum_set_from_json!(XfsPeragMetadata, M_("Unknown per-AG metadata")); + +/// Create a per-AG health event from json +fn perag_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsPeragEvent::new( + XfsAgNumber::try_from(&v["group"]).with_context(|| M_("Reading per-AG event"))?, + XfsPeragMetadata::try_set_from(&v["structures"]) + .with_context(|| M_("Reading per-AG event"))?, + XfsHealthStatus::try_from(&v["type"]).with_context(|| M_("Reading per-AG event"))?, + ))) +} + +impl TryFrom<&Value> for XfsRgNumber { + type Error = Error; + + /// Extract group number from a json value + fn try_from(v: &Value) -> Result<Self> { + let m = v + .as_u64() + .ok_or(baddata!(M_("rtgroup number must be integer"), Self, v))?; + XfsRgNumber::try_from(m) + } +} + +enum_set_from_json!(XfsRtgroupMetadata, M_("Unknown rtgroup metadata")); + +fn rtgroup_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsRtgroupEvent::new( + XfsRgNumber::try_from(&v["group"]).with_context(|| M_("Reading rtgroup event"))?, + XfsRtgroupMetadata::try_set_from(&v["structures"]) + .with_context(|| M_("Reading rtgroup event"))?, + XfsHealthStatus::try_from(&v["type"]).with_context(|| M_("Reading rtgroup event"))?, + ))) +} + +/// Convert json values to a fid +fn to_fid(ino: &Value, gen: &Value) -> Result<XfsFid> { + let i = ino + .as_u64() + .ok_or(baddata!(M_("inode number must be integer"), XfsFid, ino))?; + let g = gen.as_u64().ok_or(baddata!( + M_("inode generation must be integer"), + XfsFid, + gen + ))?; + + Ok(XfsFid { + ino: XfsIno::try_from(i)?, + gen: XfsIgeneration::try_from(g)?, + }) +} + +enum_set_from_json!(XfsInodeMetadata, M_("Unknown inode metadata")); + +/// Create an inode health event from json +fn inode_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsInodeEvent::new( + to_fid(&v["inumber"], &v["generation"]).with_context(|| M_("Reading inode event"))?, + XfsInodeMetadata::try_set_from(&v["structures"]) + .with_context(|| M_("Reading inode event"))?, + XfsHealthStatus::try_from(&v["type"]).with_context(|| M_("Reading inode event"))?, + ))) +} + +/// Convert json values to a file range. +fn to_range(pos: &Value, len: &Value) -> Result<XfsFileRange> { + let p = pos.as_u64().ok_or(baddata!( + M_("file position must be integer"), + XfsFileRange, + pos + ))?; + let l = len.as_u64().ok_or(baddata!( + M_("file length must be integer"), + XfsFileRange, + len + ))?; + + Ok(XfsFileRange { + pos: XfsPos::try_from(p)?, + len: XfsIoLen::try_from(l)?, + }) +} + +enum_from_json!(XfsFileIoErrorType, M_("Unknown file I/O error type")); + +/// Create a file I/O error event from json +pub fn file_io_error_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsFileIoErrorEvent::new( + XfsFileIoErrorType::try_from(&v["type"]) + .with_context(|| M_("Reading file I/O error event"))?, + to_fid(&v["inumber"], &v["generation"]) + .with_context(|| M_("Reading file I/O error event"))?, + to_range(&v["pos"], &v["len"]).with_context(|| M_("Reading file I/O error event"))?, + ))) +} + +enum_set_from_json!(XfsWholeFsMetadata, M_("Unknown whole-fs metadata")); + +/// Create an whole-fs health event from json +fn wholefs_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsWholeFsEvent::new( + XfsWholeFsMetadata::try_set_from(&v["structures"]) + .with_context(|| M_("Reading whole-fs event"))?, + XfsHealthStatus::try_from(&v["type"]).with_context(|| M_("Reading whole-fs event"))?, + ))) +} + +enum_set_from_json!(XfsShutdownReason, M_("Unknown fs shutdown reason")); + +/// Create a shutdown event from json +fn shutdown_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsShutdownEvent::new( + XfsShutdownReason::try_set_from(&v["reasons"]) + .with_context(|| M_("Reading fs shutdown event"))?, + ))) +} + +/// Convert json values to a physrange +fn to_phys(dev: &Value, daddr: &Value, bbcount: &Value) -> Result<XfsPhysRange> { + let a = daddr + .as_u64() + .ok_or(baddata!(M_("daddr must be integer"), XfsPhysRange, daddr))?; + let b = bbcount.as_u64().ok_or(baddata!( + M_("bbcount must be integer"), + XfsPhysRange, + bbcount + ))?; + + Ok(XfsPhysRange { + device: XfsDevice::try_from(dev)?, + daddr: a.into(), + bbcount: b.into(), + }) +} + +enum_from_json!(XfsDevice, M_("Unknown XFS device")); + +/// Create a media error event from json +fn media_error_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + Ok(Box::new(XfsMediaErrorEvent::new( + to_phys(&v["domain"], &v["daddr"], &v["bbcount"]) + .with_context(|| M_("Reading media error event"))?, + ))) +} + +/// Create event for the kernel telling us that it lost an event +fn lost_event_from_json(v: Value) -> Result<Box<dyn XfsHealthEvent>> { + let r = &v["count"]; + let count = r + .as_u64() + .ok_or(baddata!(M_("Not a count"), LostEvent, r)) + .with_context(|| M_("Reading lost event"))?; + + Ok(Box::new(LostEvent::new(count))) +} + +impl JsonEventWrapper { + /// Return an event object that can react to a health event. + pub fn cook(self) -> Result<Box<dyn XfsHealthEvent>> { + let json = Value::try_from(self).with_context(|| M_("Interpreting json event"))?; + match json["domain"].as_str() { + Some("rtgroup") => rtgroup_event_from_json(json), + Some("perag") => perag_event_from_json(json), + Some("inode") => inode_event_from_json(json), + Some("fs") => wholefs_event_from_json(json), + Some("mount") => match json["type"].as_str() { + Some("lost") => lost_event_from_json(json), + Some("shutdown") => shutdown_event_from_json(json), + Some("unmount") => Ok(Box::new(XfsUnmountEvent {})), + Some("running") => Ok(Box::new(RunningEvent {})), + _ => Ok(Box::new(UnknownEvent {})), + }, + Some("datadev") | Some("rtdev") | Some("logdev") => media_error_event_from_json(json), + + Some("filerange") => file_io_error_event_from_json(json), + + _ => Ok(Box::new(UnknownEvent {})), + } + } +} + +impl Display for JsonEventWrapper { + /// Turn this collection of strings into a single string + fn fmt(&self, f: &mut Formatter) -> std::fmt::Result { + write!(f, "{}", self.0.join("")) + } +} + +impl Iterator for JsonMonitor<'_> { + type Item = JsonEventWrapper; + + /// Return health monitoring events + fn next(&mut self) -> Option<Self::Item> { + let mut ret = JsonEventWrapper(Vec::new()); + loop { + match self.lineiter.next() { + // read lines until we encounter a closing brace by itself + Some(Ok(line)) => { + if self.debug { + printlogln!("{}: \"{}\"", M_("new line"), line); + } + let done = line == "}"; + ret.push(line); + if done { + break; + } + continue; + } + + // ran out of data + None => return None, + + // io error on the monitoring fd, stop reading + Some(Err(e)) => { + eprintln!("{}: {}: {:#}", self.mountpoint.display(), M_("Reading event json object"), e); + return None; + } + } + } + Some(ret) + } +} diff --git a/healer/src/healthmon/mod.rs b/healer/src/healthmon/mod.rs index ebafd767452349..5116361146db18 100644 --- a/healer/src/healthmon/mod.rs +++ b/healer/src/healthmon/mod.rs @@ -11,5 +11,6 @@ pub mod event; pub mod fs; pub mod groups; pub mod inodes; +pub mod json; ioctl_write_ptr!(xfs_ioc_health_monitor, 'X', 68, xfs_health_monitor); diff --git a/healer/src/main.rs b/healer/src/main.rs index 3d4d91b17708dd..456dc44289d534 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -10,6 +10,7 @@ use std::path::PathBuf; use std::process::ExitCode; use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; +use xfs_healer::healthmon::json::JsonMonitor; use xfs_healer::printlogln; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -53,6 +54,12 @@ impl Cli { .value_parser(value_parser!(PathBuf)) .required_unless_present("version"), ) + .arg( + Arg::new("json") + .long("json") + .help(M_("Use the JSON kernel interface instead of C")) + .action(ArgAction::SetTrue), + ) .get_matches()) } } @@ -63,6 +70,7 @@ struct App { debug: bool, log: bool, everything: bool, + json: bool, path: PathBuf, } @@ -90,11 +98,20 @@ impl App { fn main(&self) -> Result<ExitCode> { let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; - let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) - .with_context(|| M_("Opening health monitor file"))?; + if self.json { + let hmon = JsonMonitor::try_new(fp, &self.path, self.everything, self.debug) + .with_context(|| M_("Opening js health monitor file"))?; - for raw_event in hmon { - self.process_event(raw_event.cook()); + for raw_event in hmon { + self.process_event(raw_event.cook()); + } + } else { + let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) + .with_context(|| M_("Opening health monitor file"))?; + + for raw_event in hmon { + self.process_event(raw_event.cook()); + } } Ok(ExitCode::SUCCESS) @@ -108,6 +125,7 @@ impl From<Cli> for App { log: cli.0.get_flag("log"), everything: cli.0.get_flag("everything"), path: cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf(), + json: cli.0.get_flag("json"), } } } diff --git a/healer/src/xfs_types.rs b/healer/src/xfs_types.rs index 5ce1d73d8e9342..37ca1b3e5a3cc0 100644 --- a/healer/src/xfs_types.rs +++ b/healer/src/xfs_types.rs @@ -9,6 +9,7 @@ use crate::xfsprogs::M_; use anyhow::{Error, Result}; use std::fmt::Display; use std::fmt::Formatter; +use strum_macros::EnumString; /// Allocation group number on the data device #[derive(Debug)] @@ -55,10 +56,15 @@ impl TryFrom<u64> for XfsRgNumber { } /// Disk devices -#[derive(Debug)] +#[derive(Debug, EnumString)] pub enum XfsDevice { + #[strum(serialize = "datadev")] Data, + + #[strum(serialize = "logdev")] Log, + + #[strum(serialize = "rtdev")] Realtime, } diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index 4b426f968c263c..192d84651df909 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -131,6 +131,9 @@ anyhow = { version = "1.0.69" } $gettext_dep nix = { version = "0", features = [["ioctl"]] } # 0.26.1 enumset = { version = "1.0.12" } +strum = { version = "0" } # 0.19.2 +strum_macros = { version = "0" } # 0.19.2 +serde_json = { version = "1.0.87" } ], [yes], [no]) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 07/19] xfs_healer: create a weak file handle so we don't pin the mount 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (5 preceding siblings ...) 2025-10-23 0:13 ` [PATCH 06/19] xfs_healer: read json " Darrick J. Wong @ 2025-10-23 0:14 ` Darrick J. Wong 2025-10-23 0:14 ` [PATCH 08/19] xfs_healer: fix broken filesystem metadata Darrick J. Wong ` (12 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:14 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a custom file handle object that allows us to maintain a "soft" reference to a mounted filesystem. The purpose of this is to avoid pinning the mount while xfs_healer runs by not leaving an open fd while retaining the ability to reconnect to the filesystem later so that we can look up paths for reporting, and run repairs. This means that the filesystem must still be available at the same path at reconnect time, which may result in the program exiting if mount --move is used. Note that we open-code the XFS_IOC_FD_TO_HANDLE call to avoid overcomplicating the cargo configuration to link with ../libhandle/libhandle.la as a static import for a single function call. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 healer/src/lib.rs | 1 healer/src/main.rs | 3 + healer/src/weakhandle.rs | 101 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 106 insertions(+) create mode 100644 healer/src/weakhandle.rs diff --git a/healer/Makefile b/healer/Makefile index 515238982aad24..75227820a51e79 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -20,6 +20,7 @@ RUSTFILES = \ src/lib.rs \ src/main.rs \ src/util.rs \ + src/weakhandle.rs \ src/xfs_fs.rs \ src/xfsprogs.rs \ src/xfs_types.rs \ diff --git a/healer/src/lib.rs b/healer/src/lib.rs index e9b4795be00904..bd39f4d47b5068 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -9,3 +9,4 @@ pub mod xfs_fs; pub mod xfs_types; pub mod util; pub mod healthmon; +pub mod weakhandle; diff --git a/healer/src/main.rs b/healer/src/main.rs index 456dc44289d534..24281ac7f1eeea 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -12,6 +12,7 @@ use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; use xfs_healer::healthmon::json::JsonMonitor; use xfs_healer::printlogln; +use xfs_healer::weakhandle::WeakHandle; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -97,6 +98,8 @@ impl App { /// Main app method fn main(&self) -> Result<ExitCode> { let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + let _fh = WeakHandle::try_new(&fp, &self.path) + .with_context(|| M_("Configuring filesystem handle"))?; if self.json { let hmon = JsonMonitor::try_new(fp, &self.path, self.everything, self.debug) diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs new file mode 100644 index 00000000000000..f532c530d4ff5e --- /dev/null +++ b/healer/src/weakhandle.rs @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::baddata; +use crate::xfs_fs::xfs_fsop_handlereq; +use crate::xfs_fs::xfs_handle; +use crate::xfsprogs::M_; +use anyhow::{Error, Result}; +use nix::ioctl_readwrite; +use nix::libc::O_LARGEFILE; +use std::fs::File; +use std::io::ErrorKind; +use std::os::fd::AsRawFd; +use std::os::raw::c_void; +use std::path::Path; + +ioctl_readwrite!(xfs_ioc_fd_to_handle, 'X', 106, xfs_fsop_handlereq); + +/* just pick a value we know is more than big enough */ +const MAXHANSIZ: usize = 64; + +impl PartialEq for xfs_handle { + fn eq(&self, other: &Self) -> bool { + // SAFETY: accessing an arm of a union that exists only to force memory alignment + unsafe { self.ha_u._ha_fsid == other.ha_u._ha_fsid && self.ha_fid == other.ha_fid } + } +} + +impl TryFrom<&File> for xfs_handle { + type Error = Error; + + /// Create an xfs_handle for an open file + fn try_from(fp: &File) -> Result<xfs_handle> { + assert!(MAXHANSIZ >= std::mem::size_of::<xfs_handle>()); + + let mut value: Vec<u8> = vec![0; MAXHANSIZ]; + let mut hreq: xfs_fsop_handlereq = Default::default(); + let mut hlen: u32 = 0; + + hreq.fd = fp.as_raw_fd() as u32; + hreq.oflags = O_LARGEFILE as u32; + hreq.ohandle = value.as_mut_ptr() as *mut c_void; + hreq.ohandlen = &mut hlen; + + // SAFETY: Trusting the kernel not to corrupt hreq, value, or anything else. This is wildly + // incorrect because the kernel interface does not require userspace to pass in the size of + // the object ohandle, so it writes blindly to *ohandle. + unsafe { + xfs_ioc_fd_to_handle(fp.as_raw_fd(), &mut hreq)?; + } + if hlen as usize != std::mem::size_of::<xfs_handle>() { + return Err(baddata!(M_("Bad file handle size"), xfs_handle, hlen).into()); + } + + // SAFETY: We asserted above that value is large enough to store an xfs_handle, so we can + // cast and struct copy here. + unsafe { + let hanp: *const xfs_handle = value.as_ptr() as *const xfs_handle; + let ret: xfs_handle = *hanp; + Ok(ret) + } + } +} + +/// Filesystem handle that can be disconnected from any open files +pub struct WeakHandle<'a> { + /// path to the filesystem mountpoint + mountpoint: &'a Path, + + /// Filesystem handle + handle: xfs_handle, +} + +impl WeakHandle<'_> { + /// Try to reopen the filesystem from which we got the handle. + pub fn reopen(&self) -> Result<File> { + let fp = File::open(self.mountpoint)?; + + if xfs_handle::try_from(&fp)? != self.handle { + let s = format!( + "{} {}: {}", + M_("reopening"), + self.mountpoint.display(), + M_("Stale file handle") + ); + return Err(std::io::Error::new(ErrorKind::Other, s).into()); + } + + Ok(fp) + } + + /// Create a soft handle from an open file descriptor and its mount point + pub fn try_new<'a>(fp: &File, mountpoint: &'a Path) -> Result<WeakHandle<'a>> { + Ok(WeakHandle { + mountpoint, + handle: xfs_handle::try_from(fp)?, + }) + } +} ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 08/19] xfs_healer: fix broken filesystem metadata 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (6 preceding siblings ...) 2025-10-23 0:14 ` [PATCH 07/19] xfs_healer: create a weak file handle so we don't pin the mount Darrick J. Wong @ 2025-10-23 0:14 ` Darrick J. Wong 2025-10-23 0:14 ` [PATCH 09/19] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong ` (11 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:14 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the soft file handle we created in the previous patch to schedule repairs when possible. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 healer/src/healthmon/event.rs | 29 ++++ healer/src/healthmon/fs.rs | 6 + healer/src/healthmon/groups.rs | 10 + healer/src/healthmon/inodes.rs | 6 + healer/src/lib.rs | 1 healer/src/main.rs | 21 ++- healer/src/repair.rs | 302 ++++++++++++++++++++++++++++++++++++++++ healer/src/weakhandle.rs | 13 ++ healer/src/xfs_types.rs | 56 +++++++ 10 files changed, 435 insertions(+), 10 deletions(-) create mode 100644 healer/src/repair.rs diff --git a/healer/Makefile b/healer/Makefile index 75227820a51e79..05ea73b8163a49 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -19,6 +19,7 @@ HFILES = \ RUSTFILES = \ src/lib.rs \ src/main.rs \ + src/repair.rs \ src/util.rs \ src/weakhandle.rs \ src/xfs_fs.rs \ diff --git a/healer/src/healthmon/event.rs b/healer/src/healthmon/event.rs index b7a0effab94a3c..0fcd34dee38e4c 100644 --- a/healer/src/healthmon/event.rs +++ b/healer/src/healthmon/event.rs @@ -4,6 +4,7 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::display_for_enum; +use crate::repair::Repair; use crate::xfsprogs::M_; use strum_macros::EnumString; @@ -16,10 +17,36 @@ pub trait XfsHealthEvent { /// Format this event as something we can display fn format(&self) -> String; + + /// Generate the inputs to a kernel scrub ioctl + fn schedule_repairs(&self) -> Vec<Repair> { + vec![] + } } +/// Boilerplate implementation of a schedule_repairs function. Pass a lambda +/// that generates a Repair object from &self and sm_type. +#[macro_export] +macro_rules! schedule_repairs { + ($event_type:ty , $lambda: expr ) => { + fn schedule_repairs(&self) -> Vec<$crate::repair::Repair> { + if self.status != $crate::healthmon::event::XfsHealthStatus::Sick { + return vec![]; + } + let mut ret = Vec::new(); + for f in self.metadata { + if let Some(sm_type) = f.to_scrub() { + ret.push($lambda(self, sm_type)); + } + } + ret + } + }; +} +pub(crate) use schedule_repairs; + /// Health status for metadata events -#[derive(Debug, EnumString)] +#[derive(PartialEq, Debug, EnumString)] #[strum(serialize_all = "lowercase")] pub enum XfsHealthStatus { /// Problems have been observed at runtime diff --git a/healer/src/healthmon/fs.rs b/healer/src/healthmon/fs.rs index f216867acdf71a..7a2307d29e7abd 100644 --- a/healer/src/healthmon/fs.rs +++ b/healer/src/healthmon/fs.rs @@ -4,8 +4,10 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::display_for_enum; +use crate::healthmon::event::schedule_repairs; use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; +use crate::repair::Repair; use crate::util::format_set; use crate::xfs_types::XfsPhysRange; use crate::xfsprogs::M_; @@ -64,6 +66,10 @@ impl XfsHealthEvent for XfsWholeFsEvent { self.status ) } + + schedule_repairs!(XfsWholeFsEvent, |_: &XfsWholeFsEvent, sm_type| { + Repair::from_whole_fs(sm_type) + }); } /// Reasons for a filesystem shutdown event diff --git a/healer/src/healthmon/groups.rs b/healer/src/healthmon/groups.rs index 4384de50b4c63f..60a44defb5d307 100644 --- a/healer/src/healthmon/groups.rs +++ b/healer/src/healthmon/groups.rs @@ -4,8 +4,10 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::display_for_enum; +use crate::healthmon::event::schedule_repairs; use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; +use crate::repair::Repair; use crate::util::format_set; use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; use crate::xfsprogs::M_; @@ -81,6 +83,10 @@ impl XfsHealthEvent for XfsPeragEvent { self.status ) } + + schedule_repairs!(XfsPeragEvent, |s: &XfsPeragEvent, sm_type| { + Repair::from_perag(sm_type, s.group) + }); } /// Metadata types for an allocation group on the realtime device @@ -139,4 +145,8 @@ impl XfsHealthEvent for XfsRtgroupEvent { self.status ) } + + schedule_repairs!(XfsRtgroupEvent, |s: &XfsRtgroupEvent, sm_type| { + Repair::from_rtgroup(sm_type, s.group) + }); } diff --git a/healer/src/healthmon/inodes.rs b/healer/src/healthmon/inodes.rs index 5775f9ffa69b6b..a4324c7d834b42 100644 --- a/healer/src/healthmon/inodes.rs +++ b/healer/src/healthmon/inodes.rs @@ -4,8 +4,10 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::display_for_enum; +use crate::healthmon::event::schedule_repairs; use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; +use crate::repair::Repair; use crate::util::format_set; use crate::xfs_types::{XfsFid, XfsFileRange}; use crate::xfsprogs::M_; @@ -72,6 +74,10 @@ impl XfsHealthEvent for XfsInodeEvent { fn format(&self) -> String { format!("{} {} {}", self.fid, format_set(self.metadata), self.status) } + + schedule_repairs!(XfsInodeEvent, |s: &XfsInodeEvent, sm_type| { + Repair::from_file(sm_type, s.fid) + }); } /// File I/O types diff --git a/healer/src/lib.rs b/healer/src/lib.rs index bd39f4d47b5068..f08f9a65ced674 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -10,3 +10,4 @@ pub mod xfs_types; pub mod util; pub mod healthmon; pub mod weakhandle; +pub mod repair; diff --git a/healer/src/main.rs b/healer/src/main.rs index 24281ac7f1eeea..b2a69c388bd8ef 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -61,6 +61,12 @@ impl Cli { .help(M_("Use the JSON kernel interface instead of C")) .action(ArgAction::SetTrue), ) + .arg( + Arg::new("repair") + .long("repair") + .help(M_("Always repair corrupt metadata")) + .action(ArgAction::SetTrue), + ) .get_matches()) } } @@ -72,6 +78,7 @@ struct App { log: bool, everything: bool, json: bool, + repair: bool, path: PathBuf, } @@ -82,7 +89,7 @@ impl App { } /// Handle a health event that has been decoded into real objects - fn process_event(&self, cooked: Result<Box<dyn XfsHealthEvent>>) { + fn process_event(&self, fh: &WeakHandle, cooked: Result<Box<dyn XfsHealthEvent>>) { match cooked { Err(e) => { eprintln!("{}: {:#}", self.path.display(), e) @@ -91,6 +98,11 @@ impl App { if self.log || event.must_log() { printlogln!("{}: {}", self.path.display(), event.format()); } + if self.repair { + for mut repair in event.schedule_repairs() { + repair.perform(fh) + } + } } } } @@ -98,7 +110,7 @@ impl App { /// Main app method fn main(&self) -> Result<ExitCode> { let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; - let _fh = WeakHandle::try_new(&fp, &self.path) + let fh = WeakHandle::try_new(&fp, &self.path) .with_context(|| M_("Configuring filesystem handle"))?; if self.json { @@ -106,14 +118,14 @@ impl App { .with_context(|| M_("Opening js health monitor file"))?; for raw_event in hmon { - self.process_event(raw_event.cook()); + self.process_event(&fh, raw_event.cook()); } } else { let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) .with_context(|| M_("Opening health monitor file"))?; for raw_event in hmon { - self.process_event(raw_event.cook()); + self.process_event(&fh, raw_event.cook()); } } @@ -129,6 +141,7 @@ impl From<Cli> for App { everything: cli.0.get_flag("everything"), path: cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf(), json: cli.0.get_flag("json"), + repair: cli.0.get_flag("repair"), } } } diff --git a/healer/src/repair.rs b/healer/src/repair.rs new file mode 100644 index 00000000000000..8b9a665d1bcc36 --- /dev/null +++ b/healer/src/repair.rs @@ -0,0 +1,302 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::display_for_enum; +use crate::healthmon::fs::XfsWholeFsMetadata; +use crate::healthmon::groups::{XfsPeragMetadata, XfsRtgroupMetadata}; +use crate::healthmon::inodes::XfsInodeMetadata; +use crate::printlogln; +use crate::weakhandle::WeakHandle; +use crate::xfs_fs; +use crate::xfs_fs::xfs_scrub_metadata; +use crate::xfs_types::{XfsAgNumber, XfsFid, XfsRgNumber}; +use crate::xfsprogs::M_; +use anyhow::{Context, Result}; +use nix::ioctl_readwrite; +use std::os::fd::AsRawFd; + +ioctl_readwrite!(xfs_ioc_scrub_metadata, 'X', 60, xfs_scrub_metadata); + +/// Classification information for later reporting +#[derive(Debug)] +enum RepairGroup { + WholeFs, + PerAg, + RtGroup, + File, +} + +/// What happened when we tried to repair something? +#[derive(Debug)] +enum RepairOutcome { + Queued, + Success, + Unnecessary, + MightBeOk, + Failed, +} + +display_for_enum!(RepairOutcome, { + Queued => M_("Repair queued."), + Failed => M_("Repair unsuccessful; offline repair required."), + MightBeOk => M_("Seems correct but cross-referencing failed; offline repair recommended."), + Unnecessary => M_("No modification needed."), + Success => M_("Repairs successful."), +}); + +/// Kernel scrub type code +#[derive(Debug)] +pub struct XfsScrubType(pub u32); + +/// Boilerplate to stamp out functions to convert json array to the given enum +/// type; or return an error with the given message. +macro_rules! metadata_to_scrub_type { + ($enum_type:ty , { $($a:ident => $b:ident,)+ } ) => { + impl $enum_type { + /// Convert to scrub type + #[allow(unreachable_patterns)] + pub fn to_scrub(self) -> Option<XfsScrubType> { + match self { + $(<$enum_type>::$a => Some($crate::repair::XfsScrubType($crate::xfs_fs::$b)),)+ + _ => None, + } + } + } + }; +} + +metadata_to_scrub_type!(XfsPeragMetadata, { + Agf => XFS_SCRUB_TYPE_AGF, + Agfl => XFS_SCRUB_TYPE_AGFL, + Agi => XFS_SCRUB_TYPE_AGI, + Bnobt => XFS_SCRUB_TYPE_BNOBT, + Cntbt => XFS_SCRUB_TYPE_CNTBT, + Finobt => XFS_SCRUB_TYPE_FINOBT, + Inobt => XFS_SCRUB_TYPE_INOBT, + Refcountbt => XFS_SCRUB_TYPE_REFCNTBT, + Rmapbt => XFS_SCRUB_TYPE_RMAPBT, + Super => XFS_SCRUB_TYPE_SB, +}); + +metadata_to_scrub_type!(XfsRtgroupMetadata, { + Bitmap => XFS_SCRUB_TYPE_RTBITMAP, + Summary => XFS_SCRUB_TYPE_RTSUM, + Refcountbt => XFS_SCRUB_TYPE_RTREFCBT, + Rmapbt => XFS_SCRUB_TYPE_RTRMAPBT, + Super => XFS_SCRUB_TYPE_RGSUPER, +}); + +metadata_to_scrub_type!(XfsInodeMetadata, { + Bmapbta => XFS_SCRUB_TYPE_BMBTA, + Bmapbtc => XFS_SCRUB_TYPE_BMBTC, + Bmapbtd => XFS_SCRUB_TYPE_BMBTD, + Core => XFS_SCRUB_TYPE_INODE, + Directory => XFS_SCRUB_TYPE_DIR, + Dirtree => XFS_SCRUB_TYPE_DIRTREE, + Parent => XFS_SCRUB_TYPE_PARENT, + Symlink => XFS_SCRUB_TYPE_SYMLINK, + Xattr => XFS_SCRUB_TYPE_XATTR, +}); + +metadata_to_scrub_type!(XfsWholeFsMetadata, { + FsCounters => XFS_SCRUB_TYPE_FSCOUNTERS, + GrpQuota => XFS_SCRUB_TYPE_GQUOTA, + NLinks => XFS_SCRUB_TYPE_NLINKS, + PrjQuota => XFS_SCRUB_TYPE_PQUOTA, + QuotaCheck => XFS_SCRUB_TYPE_QUOTACHECK, + UsrQuota => XFS_SCRUB_TYPE_UQUOTA, +}); + +/// Boilerplate to stamp out functions to print the scrub type newtype as a pretty string. +macro_rules! display_for_newtype { + ($newtype:ty , { $($a:ident => $b:expr,)+ } ) => { + impl std::fmt::Display for $newtype { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "{}", match self.0 { + $(xfs_fs::$a => $b,)+ + _ => $crate::xfsprogs::M_("unknown") + }) + } + } + }; +} + +display_for_newtype!(XfsScrubType, { + XFS_SCRUB_TYPE_PROBE => M_("probe"), + XFS_SCRUB_TYPE_SB => M_("sb"), + XFS_SCRUB_TYPE_AGF => M_("agf"), + XFS_SCRUB_TYPE_AGFL => M_("agfl"), + XFS_SCRUB_TYPE_AGI => M_("agi"), + XFS_SCRUB_TYPE_BNOBT => M_("bnobt"), + XFS_SCRUB_TYPE_CNTBT => M_("cntbt"), + XFS_SCRUB_TYPE_INOBT => M_("inobt"), + XFS_SCRUB_TYPE_FINOBT => M_("finobt"), + XFS_SCRUB_TYPE_RMAPBT => M_("rmapbt"), + XFS_SCRUB_TYPE_REFCNTBT => M_("refcountbt"), + XFS_SCRUB_TYPE_INODE => M_("inode"), + XFS_SCRUB_TYPE_BMBTD => M_("bmapbtd"), + XFS_SCRUB_TYPE_BMBTA => M_("bmapbta"), + XFS_SCRUB_TYPE_BMBTC => M_("bmapbtc"), + XFS_SCRUB_TYPE_DIR => M_("directory"), + XFS_SCRUB_TYPE_XATTR => M_("xattr"), + XFS_SCRUB_TYPE_SYMLINK => M_("symlink"), + XFS_SCRUB_TYPE_PARENT => M_("parent"), + XFS_SCRUB_TYPE_RTBITMAP => M_("rtbitmap"), + XFS_SCRUB_TYPE_RTSUM => M_("rtsummary"), + XFS_SCRUB_TYPE_UQUOTA => M_("usrquota"), + XFS_SCRUB_TYPE_GQUOTA => M_("grpquota"), + XFS_SCRUB_TYPE_PQUOTA => M_("prjquota"), + XFS_SCRUB_TYPE_FSCOUNTERS => M_("fscounters"), + XFS_SCRUB_TYPE_QUOTACHECK => M_("quotacheck"), + XFS_SCRUB_TYPE_NLINKS => M_("nlinks"), + XFS_SCRUB_TYPE_HEALTHY => M_("healthy"), + XFS_SCRUB_TYPE_DIRTREE => M_("dirtree"), + XFS_SCRUB_TYPE_METAPATH => M_("metapath"), +}); + +/// Information about a repair +pub struct Repair { + /// Actual details of the repair + detail: xfs_scrub_metadata, + + /// What group does this belong to? + group: RepairGroup, + + /// What scrub type did we actually pick? + scrub_type: XfsScrubType, + + /// What happened when repairs were tried? + outcome: RepairOutcome, +} + +impl Repair { + /// Schedule a full-filesystem metadata repair + pub fn from_whole_fs(t: XfsScrubType) -> Repair { + Repair { + group: RepairGroup::WholeFs, + detail: xfs_scrub_metadata { + sm_type: t.0, + sm_flags: xfs_fs::XFS_SCRUB_IFLAG_REPAIR, + ..Default::default() + }, + outcome: RepairOutcome::Queued, + scrub_type: t, + } + } + + /// Schedule a per-AG repair + pub fn from_perag(t: XfsScrubType, group: XfsAgNumber) -> Repair { + Repair { + group: RepairGroup::PerAg, + detail: xfs_scrub_metadata { + sm_type: t.0, + sm_flags: xfs_fs::XFS_SCRUB_IFLAG_REPAIR, + sm_agno: group.into(), + ..Default::default() + }, + outcome: RepairOutcome::Queued, + scrub_type: t, + } + } + + /// Schedule a rtgroup repair + pub fn from_rtgroup(t: XfsScrubType, group: XfsRgNumber) -> Repair { + Repair { + group: RepairGroup::RtGroup, + detail: xfs_scrub_metadata { + sm_type: t.0, + sm_flags: xfs_fs::XFS_SCRUB_IFLAG_REPAIR, + sm_agno: group.into(), + ..Default::default() + }, + outcome: RepairOutcome::Queued, + scrub_type: t, + } + } + + /// Schedule a file metadata repair + pub fn from_file(t: XfsScrubType, fid: XfsFid) -> Repair { + Repair { + group: RepairGroup::File, + detail: xfs_scrub_metadata { + sm_type: t.0, + sm_flags: xfs_fs::XFS_SCRUB_IFLAG_REPAIR, + sm_ino: fid.ino.into(), + sm_gen: fid.gen.into(), + ..Default::default() + }, + outcome: RepairOutcome::Queued, + scrub_type: t, + } + } + + /// Decode what happened when we tried to repair + fn outcome(detail: &xfs_scrub_metadata) -> RepairOutcome { + const REPAIR_FAILED: u32 = + xfs_fs::XFS_SCRUB_OFLAG_CORRUPT | xfs_fs::XFS_SCRUB_OFLAG_INCOMPLETE; + + if detail.sm_flags & REPAIR_FAILED != 0 { + RepairOutcome::Failed + } else if detail.sm_flags & xfs_fs::XFS_SCRUB_OFLAG_XFAIL != 0 { + RepairOutcome::MightBeOk + } else if detail.sm_flags & xfs_fs::XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED != 0 { + RepairOutcome::Unnecessary + } else { + RepairOutcome::Success + } + } + + /// Summarize this repair for reporting + fn summary(&self) -> String { + match self.group { + RepairGroup::WholeFs => { + format!("{} {}", M_("Repair of"), self.scrub_type) + } + RepairGroup::PerAg => { + let agno: XfsAgNumber = self.detail.into(); + + format!("{} {} {}", M_("Repair of"), agno, self.scrub_type) + } + RepairGroup::RtGroup => { + let rgno: XfsRgNumber = self.detail.into(); + + format!("{} {} {}", M_("Repair of"), rgno, self.scrub_type) + } + RepairGroup::File => { + let fid: XfsFid = self.detail.into(); + + format!("{} {} {}", M_("Repair of"), fid, self.scrub_type) + } + } + } + + /// Call the kernel to repair things + fn repair(&mut self, fh: &WeakHandle) -> Result<bool> { + let fp = fh + .reopen() + .with_context(|| M_("Reopening filesystem to repair metadata"))?; + + // SAFETY: Trusting the kernel not to corrupt memory. + unsafe { + xfs_ioc_scrub_metadata(fp.as_raw_fd(), &mut self.detail) + .with_context(|| self.summary().to_string())?; + } + + self.outcome = Repair::outcome(&self.detail); + Ok(true) + } + + /// Try to repair something, or log whatever went wrong + pub fn perform(&mut self, fh: &WeakHandle) { + match self.repair(fh) { + Err(e) => { + eprintln!("{}: {:#}", fh.mountpoint(), e); + } + _ => { + printlogln!("{}: {}: {}", fh.mountpoint(), self.summary(), self.outcome); + } + }; + } +} diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index f532c530d4ff5e..ccac5d86d3be41 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -10,6 +10,8 @@ use crate::xfsprogs::M_; use anyhow::{Error, Result}; use nix::ioctl_readwrite; use nix::libc::O_LARGEFILE; +use std::fmt::Display; +use std::fmt::Formatter; use std::fs::File; use std::io::ErrorKind; use std::os::fd::AsRawFd; @@ -91,6 +93,11 @@ impl WeakHandle<'_> { Ok(fp) } + /// Report mountpoint in a displayable manner + pub fn mountpoint(&self) -> String { + self.mountpoint.display().to_string() + } + /// Create a soft handle from an open file descriptor and its mount point pub fn try_new<'a>(fp: &File, mountpoint: &'a Path) -> Result<WeakHandle<'a>> { Ok(WeakHandle { @@ -99,3 +106,9 @@ impl WeakHandle<'_> { }) } } + +impl Display for WeakHandle<'_> { + fn fmt(&self, f: &mut Formatter) -> std::fmt::Result { + write!(f, "{}", self.mountpoint.display()) + } +} diff --git a/healer/src/xfs_types.rs b/healer/src/xfs_types.rs index 37ca1b3e5a3cc0..fee284b93264f5 100644 --- a/healer/src/xfs_types.rs +++ b/healer/src/xfs_types.rs @@ -5,6 +5,7 @@ */ use crate::baddata; use crate::display_for_enum; +use crate::xfs_fs::xfs_scrub_metadata; use crate::xfsprogs::M_; use anyhow::{Error, Result}; use std::fmt::Display; @@ -12,7 +13,7 @@ use std::fmt::Formatter; use strum_macros::EnumString; /// Allocation group number on the data device -#[derive(Debug)] +#[derive(Debug, Copy, Clone)] pub struct XfsAgNumber(u32); impl TryFrom<u64> for XfsAgNumber { @@ -33,8 +34,20 @@ impl Display for XfsAgNumber { } } +impl From<XfsAgNumber> for u32 { + fn from(val: XfsAgNumber) -> Self { + val.0 + } +} + +impl From<xfs_scrub_metadata> for XfsAgNumber { + fn from(val: xfs_scrub_metadata) -> Self { + XfsAgNumber(val.sm_agno) + } +} + /// Realtime group number on the realtime device -#[derive(Debug)] +#[derive(Debug, Copy, Clone)] pub struct XfsRgNumber(u32); impl Display for XfsRgNumber { @@ -55,6 +68,18 @@ impl TryFrom<u64> for XfsRgNumber { } } +impl From<XfsRgNumber> for u32 { + fn from(val: XfsRgNumber) -> Self { + val.0 + } +} + +impl From<xfs_scrub_metadata> for XfsRgNumber { + fn from(val: xfs_scrub_metadata) -> Self { + XfsRgNumber(val.sm_agno) + } +} + /// Disk devices #[derive(Debug, EnumString)] pub enum XfsDevice { @@ -126,7 +151,7 @@ impl Display for XfsPhysRange { } /// Inode number -#[derive(Debug)] +#[derive(Debug, Copy, Clone)] pub struct XfsIno(u64); impl Display for XfsIno { @@ -147,8 +172,14 @@ impl TryFrom<u64> for XfsIno { } } +impl From<XfsIno> for u64 { + fn from(val: XfsIno) -> Self { + val.0 + } +} + /// Inode generation number -#[derive(Debug)] +#[derive(Debug, Copy, Clone)] pub struct XfsIgeneration(u32); impl Display for XfsIgeneration { @@ -169,8 +200,14 @@ impl TryFrom<u64> for XfsIgeneration { } } +impl From<XfsIgeneration> for u32 { + fn from(val: XfsIgeneration) -> Self { + val.0 + } +} + /// Miniature FID for a handle -#[derive(Debug)] +#[derive(Debug, Copy, Clone)] pub struct XfsFid { /// Inode number pub ino: XfsIno, @@ -185,6 +222,15 @@ impl Display for XfsFid { } } +impl From<xfs_scrub_metadata> for XfsFid { + fn from(val: xfs_scrub_metadata) -> Self { + XfsFid { + ino: XfsIno(val.sm_ino), + gen: XfsIgeneration(val.sm_gen), + } + } +} + /// File position #[derive(Debug)] pub struct XfsPos(u64); ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 09/19] xfs_healer: check for fs features needed for effective repairs 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (7 preceding siblings ...) 2025-10-23 0:14 ` [PATCH 08/19] xfs_healer: fix broken filesystem metadata Darrick J. Wong @ 2025-10-23 0:14 ` Darrick J. Wong 2025-10-23 0:14 ` [PATCH 10/19] xfs_healer: use getparents to look up file names Darrick J. Wong ` (10 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:14 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Online repair relies heavily on back references such as reverse mappings and directory parent pointers to add redundancy to the filesystem. Make the rust program check for these two features and whine a bit if they are missing. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 + healer/src/fsgeom.rs | 41 +++++++++++++++++++++++++++++++++++++++++ healer/src/lib.rs | 1 + healer/src/main.rs | 41 +++++++++++++++++++++++++++++++++++++++++ healer/src/repair.rs | 13 +++++++++++++ 5 files changed, 97 insertions(+) create mode 100644 healer/src/fsgeom.rs diff --git a/healer/Makefile b/healer/Makefile index 05ea73b8163a49..03bfd853a193ee 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -17,6 +17,7 @@ HFILES = \ bindgen_xfs_fs.h RUSTFILES = \ + src/fsgeom.rs \ src/lib.rs \ src/main.rs \ src/repair.rs \ diff --git a/healer/src/fsgeom.rs b/healer/src/fsgeom.rs new file mode 100644 index 00000000000000..cb8e8acc107575 --- /dev/null +++ b/healer/src/fsgeom.rs @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::xfs_fs; +use crate::xfs_fs::xfs_fsop_geom; +use nix::ioctl_read; +use std::fs::File; +use std::io::Error; +use std::io::Result; +use std::os::fd::AsRawFd; + +ioctl_read!(xfs_ioc_fsgeometry, 'X', 126, xfs_fsop_geom); + +impl TryFrom<&File> for xfs_fsop_geom { + type Error = Error; + + /// Retrieve the XFS geometry of an open file. + fn try_from(fp: &File) -> Result<xfs_fsop_geom> { + let mut ret: xfs_fsop_geom = Default::default(); + + // SAFETY: Trusting the kernel not to corrupt memory. + unsafe { + xfs_ioc_fsgeometry(fp.as_raw_fd(), &mut ret)?; + Ok(ret) + } + } +} + +impl xfs_fsop_geom { + /// Does this filesystem have reverse space mappings? + pub fn has_rmapbt(&self) -> bool { + self.flags & xfs_fs::XFS_FSOP_GEOM_FLAGS_RMAPBT != 0 + } + + /// Does this filesystem have parent pointers? + pub fn has_parent(&self) -> bool { + self.flags & xfs_fs::XFS_FSOP_GEOM_FLAGS_PARENT != 0 + } +} diff --git a/healer/src/lib.rs b/healer/src/lib.rs index f08f9a65ced674..0b5735b7183138 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -11,3 +11,4 @@ pub mod util; pub mod healthmon; pub mod weakhandle; pub mod repair; +pub mod fsgeom; diff --git a/healer/src/main.rs b/healer/src/main.rs index b2a69c388bd8ef..ed118243dd911b 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -12,7 +12,9 @@ use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; use xfs_healer::healthmon::json::JsonMonitor; use xfs_healer::printlogln; +use xfs_healer::repair::Repair; use xfs_healer::weakhandle::WeakHandle; +use xfs_healer::xfs_fs::xfs_fsop_geom; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -107,9 +109,48 @@ impl App { } } + /// Complain if repairs won't be entirely effective. + fn check_repair(&self, fp: &File, fsgeom: &xfs_fsop_geom) -> Option<ExitCode> { + if !Repair::is_supported(fp) { + printlogln!( + "{}: {}", + self.path.display(), + M_("XFS online repair is not supported, exiting") + ); + return Some(ExitCode::FAILURE); + } + + if !fsgeom.has_rmapbt() { + printlogln!( + "{}: {}", + self.path.display(), + M_("XFS online repair is less effective without rmap btrees") + ); + } + if !fsgeom.has_parent() { + printlogln!( + "{}: {}", + self.path.display(), + M_("XFS online repair is less effective without parent pointers") + ); + } + + None + } + /// Main app method fn main(&self) -> Result<ExitCode> { let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + + // Make sure that we can initiate repairs + let fsgeom = + xfs_fsop_geom::try_from(&fp).with_context(|| M_("Querying filesystem geometry"))?; + if self.repair { + if let Some(ret) = self.check_repair(&fp, &fsgeom) { + return Ok(ret); + } + } + let fh = WeakHandle::try_new(&fp, &self.path) .with_context(|| M_("Configuring filesystem handle"))?; diff --git a/healer/src/repair.rs b/healer/src/repair.rs index 8b9a665d1bcc36..1312efd87281dd 100644 --- a/healer/src/repair.rs +++ b/healer/src/repair.rs @@ -15,6 +15,7 @@ use crate::xfs_types::{XfsAgNumber, XfsFid, XfsRgNumber}; use crate::xfsprogs::M_; use anyhow::{Context, Result}; use nix::ioctl_readwrite; +use std::fs::File; use std::os::fd::AsRawFd; ioctl_readwrite!(xfs_ioc_scrub_metadata, 'X', 60, xfs_scrub_metadata); @@ -172,6 +173,18 @@ pub struct Repair { } impl Repair { + /// Determine if repairs are supported by this kernel + pub fn is_supported(fp: &File) -> bool { + let mut detail = xfs_scrub_metadata { + sm_type: xfs_fs::XFS_SCRUB_TYPE_PROBE, + sm_flags: xfs_fs::XFS_SCRUB_IFLAG_REPAIR, + ..Default::default() + }; + + // SAFETY: Trusting the kernel not to corrupt memory. + unsafe { xfs_ioc_scrub_metadata(fp.as_raw_fd(), &mut detail).is_ok() } + } + /// Schedule a full-filesystem metadata repair pub fn from_whole_fs(t: XfsScrubType) -> Repair { Repair { ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 10/19] xfs_healer: use getparents to look up file names 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (8 preceding siblings ...) 2025-10-23 0:14 ` [PATCH 09/19] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong @ 2025-10-23 0:14 ` Darrick J. Wong 2025-10-23 0:15 ` [PATCH 11/19] xfs_healer: make the rust program check if kernel support available Darrick J. Wong ` (9 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:14 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If the kernel tells about something that happened to a file, use the GETPARENTS ioctl to try to look up the path to that file for more ergonomic reporting. In Rust. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 healer/src/getparents.rs | 210 ++++++++++++++++++++++++++++++++++++++++ healer/src/healthmon/event.rs | 19 ++-- healer/src/healthmon/fs.rs | 38 ++++--- healer/src/healthmon/groups.rs | 32 ++++-- healer/src/healthmon/inodes.rs | 22 +++- healer/src/lib.rs | 1 healer/src/main.rs | 8 +- healer/src/repair.rs | 22 ++++ healer/src/weakhandle.rs | 30 ++++++ 10 files changed, 339 insertions(+), 44 deletions(-) create mode 100644 healer/src/getparents.rs diff --git a/healer/Makefile b/healer/Makefile index 03bfd853a193ee..796bed3e166487 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -18,6 +18,7 @@ HFILES = \ RUSTFILES = \ src/fsgeom.rs \ + src/getparents.rs \ src/lib.rs \ src/main.rs \ src/repair.rs \ diff --git a/healer/src/getparents.rs b/healer/src/getparents.rs new file mode 100644 index 00000000000000..d6d7020e08f9d2 --- /dev/null +++ b/healer/src/getparents.rs @@ -0,0 +1,210 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::weakhandle::WeakHandle; +use crate::xfs_fs; +use crate::xfs_fs::xfs_getparents; +use crate::xfs_fs::xfs_getparents_by_handle; +use crate::xfs_fs::xfs_getparents_rec; +use crate::xfs_fs::xfs_handle; +use crate::xfs_types::XfsFid; +use nix::ioctl_readwrite; +use std::cmp::min; +use std::ffi::CStr; +use std::ffi::OsStr; +use std::fs::File; +use std::io::Result; +use std::os::fd::AsRawFd; +use std::os::unix::ffi::OsStrExt; +use std::path::Path; +use std::path::PathBuf; + +ioctl_readwrite!( + xfs_ioc_getparents_by_handle, + 'X', + 63, + xfs_getparents_by_handle +); + +const GETPARENTS_BUFSIZE: usize = 65536; + +/// File parent +#[derive(Debug)] +struct XfsParent { + /// Filename within a directory + filename: PathBuf, + + /// Handle to the parent + handle: xfs_handle, +} + +/// Iterator for all parents of this file +struct XfsGetParents<'a> { + /// Open file with which we can call the ioctl + fp: &'a File, + + /// Head object to pass to GETPARENTS call + request: xfs_getparents_by_handle, + + /// Buffer for receiving GETPARENTS information from kernel + buf: Vec<u8>, + + /// Position of next parent record in buffer + bufpos: usize, +} + +impl Iterator for XfsGetParents<'_> { + type Item = Result<XfsParent>; + + /// Return parent pointer objects + fn next(&mut self) -> Option<Self::Item> { + // Ran out of buffer... + if self.bufpos == GETPARENTS_BUFSIZE { + const STOP_FLAGS: u32 = + xfs_fs::XFS_GETPARENTS_OFLAG_DONE | xfs_fs::XFS_GETPARENTS_OFLAG_ROOT; + + // If the last request got all the parent pointers, stop + if self.request.gph_request.gp_oflags & STOP_FLAGS as xfs_fs::__u16 != 0 { + return None; + } + + // SAFETY: Trusting the kernel to give us more parent data without corrupting memory. + match unsafe { xfs_ioc_getparents_by_handle(self.fp.as_raw_fd(), &mut self.request) } { + Err(e) => return Some(Err(e.into())), + Ok(_) => self.bufpos = 0, + } + } + + // If the kernel says this is the root directory, return a parent + // with an empty filename, because errors abort the iterator. + if self.request.gph_request.gp_oflags & xfs_fs::XFS_GETPARENTS_OFLAG_ROOT as xfs_fs::__u16 + != 0 + { + self.bufpos = GETPARENTS_BUFSIZE; + return Some(Ok(XfsParent { + filename: PathBuf::from(""), + handle: self.request.gph_handle, + })); + } + + // Cast the buffer contents to a getparents record + let ret = unsafe { + // SAFETY: Casting a pointer (encoded as a u64 to avoid thunking issues) to a raw + // pointer. getparents.c in libfrog does the same thing. + let gpr: *const xfs_getparents_rec = + self.buf.as_ptr().add(self.bufpos) as *const xfs_getparents_rec; + + // Advance the buffer pointer + self.bufpos += min(GETPARENTS_BUFSIZE - self.bufpos, (*gpr).gpr_reclen as usize); + + // Construct a PathBuf from the raw bytes. Don't use a slice here because the buffer + // contents will change with the next ioctl. SAFETY: gpr_name is defined to be a + // null-terminated sequence, aka a C string. + let slice = CStr::from_ptr((*gpr).gpr_name.as_ptr()); + let osstr = OsStr::from_bytes(slice.to_bytes()); + let filename: &Path = osstr.as_ref(); + + // SAFETY: Copying from a raw pointer to a buffer containing xfs_handle to the + // xfs_handle in our new XfsParent object. + XfsParent { + filename: filename.to_path_buf(), + handle: (*gpr).gpr_parent, + } + }; + + Some(Ok(ret)) + } +} + +/// Create an iterator to walk the parents of a given handle, using the open +/// file. +fn from_handle(fp: &File, handle: xfs_handle) -> Result<XfsGetParents> { + let mut value: Vec<u8> = vec![0; GETPARENTS_BUFSIZE]; + + Ok(XfsGetParents { + request: xfs_getparents_by_handle { + gph_request: xfs_getparents { + gp_bufsize: GETPARENTS_BUFSIZE as xfs_fs::__u32, + gp_buffer: value.as_mut_ptr() as xfs_fs::__u64, + ..Default::default() + }, + gph_handle: handle, + }, + fp, + buf: value, + bufpos: GETPARENTS_BUFSIZE, + }) +} + +/// Recursively fill the path component vector. Returns true if we walked up +/// to the root directory and hence have a valid path, false if not, or None +/// if some error occurred. We only use paths for display purposes, so that's +/// why we don't pass back errors. +fn find_path_components( + fp: &File, + handle: xfs_handle, + depth: u32, + components: &mut Vec<PathBuf>, +) -> Option<bool> { + // Don't let us go too deep in the directory hierarchy because this is a + // recursive function. + if depth > 256 { + return Some(false); + } + + let parents = match from_handle(fp, handle) { + Err(_) => return None, + Ok(x) => x, + }; + + for p in parents { + match p { + Err(_) => return None, + Ok(parent) => { + if parent.filename == PathBuf::from("") { + return Some(true); + } + + components.push(parent.filename); + match find_path_components(fp, parent.handle, depth + 1, components) { + None => return None, + Some(true) => return Some(true), + Some(false) => components.pop(), + }; + } + }; + } + + Some(false) +} + +impl WeakHandle<'_> { + /// Return a path to the root for the given soft handle and ino/gen info, + /// or None if errors occurred or we couldn't find the root. + pub fn path_for(&self, fid: XfsFid) -> Option<PathBuf> { + if !self.can_get_parents() { + return None; + } + + let fp = match self.reopen() { + Err(_) => return None, + Ok(x) => x, + }; + let handle = self.subst(fid); + let mut path_components: Vec<PathBuf> = Vec::new(); + + match find_path_components(&fp, handle, 0, &mut path_components) { + None => None, + Some(false) => None, + Some(true) => { + let mut ret: PathBuf = self.mountpoint().into(); + for component in path_components.iter().rev() { + ret.push(component); + } + Some(ret) + } + } + } +} diff --git a/healer/src/healthmon/event.rs b/healer/src/healthmon/event.rs index 0fcd34dee38e4c..ea3a6b21f744df 100644 --- a/healer/src/healthmon/event.rs +++ b/healer/src/healthmon/event.rs @@ -5,7 +5,9 @@ */ use crate::display_for_enum; use crate::repair::Repair; +use crate::weakhandle::WeakHandle; use crate::xfsprogs::M_; +use std::path::PathBuf; use strum_macros::EnumString; /// Common behaviors of all health events @@ -15,8 +17,9 @@ pub trait XfsHealthEvent { false } - /// Format this event as something we can display - fn format(&self) -> String; + /// Format this event as something we can display. Returns an optional + /// pathname string, and the message. + fn format(&self, fh: &WeakHandle) -> (Option<PathBuf>, String); /// Generate the inputs to a kernel scrub ioctl fn schedule_repairs(&self) -> Vec<Repair> { @@ -83,8 +86,8 @@ impl XfsHealthEvent for LostEvent { true } - fn format(&self) -> String { - format!("{} {}", self.count, M_("events lost")) + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + (None, format!("{} {}", self.count, M_("events lost"))) } } @@ -92,8 +95,8 @@ impl XfsHealthEvent for LostEvent { pub struct RunningEvent {} impl XfsHealthEvent for RunningEvent { - fn format(&self) -> String { - M_("monitoring started") + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + (None, M_("monitoring started")) } } @@ -105,7 +108,7 @@ impl XfsHealthEvent for UnknownEvent { true } - fn format(&self) -> String { - M_("unrecognized event") + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + (None, M_("unrecognized event")) } } diff --git a/healer/src/healthmon/fs.rs b/healer/src/healthmon/fs.rs index 7a2307d29e7abd..2145427d1905ca 100644 --- a/healer/src/healthmon/fs.rs +++ b/healer/src/healthmon/fs.rs @@ -9,10 +9,12 @@ use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; use crate::repair::Repair; use crate::util::format_set; +use crate::weakhandle::WeakHandle; use crate::xfs_types::XfsPhysRange; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use std::path::PathBuf; use strum_macros::EnumString; /// Metadata types for an XFS whole-fs metadata @@ -58,12 +60,15 @@ impl XfsWholeFsEvent { } impl XfsHealthEvent for XfsWholeFsEvent { - fn format(&self) -> String { - format!( - "{} {} {}", - format_set(self.metadata), - M_("status"), - self.status + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + ( + None, + format!( + "{} {} {}", + format_set(self.metadata), + M_("status"), + self.status + ), ) } @@ -112,11 +117,14 @@ impl XfsHealthEvent for XfsShutdownEvent { true } - fn format(&self) -> String { - format!( - "{} {}", - M_("filesystem shut down due to"), - format_set(self.reasons) + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + ( + None, + format!( + "{} {}", + M_("filesystem shut down due to"), + format_set(self.reasons) + ), ) } } @@ -129,8 +137,8 @@ impl XfsHealthEvent for XfsUnmountEvent { true } - fn format(&self) -> String { - M_("filesystem unmounted") + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + (None, M_("filesystem unmounted")) } } @@ -149,7 +157,7 @@ impl XfsMediaErrorEvent { } impl XfsHealthEvent for XfsMediaErrorEvent { - fn format(&self) -> String { - format!("{} {}", M_("media error on"), self.range) + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + (None, format!("{} {}", M_("media error on"), self.range)) } } diff --git a/healer/src/healthmon/groups.rs b/healer/src/healthmon/groups.rs index 60a44defb5d307..f0b182a632d807 100644 --- a/healer/src/healthmon/groups.rs +++ b/healer/src/healthmon/groups.rs @@ -9,10 +9,12 @@ use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; use crate::repair::Repair; use crate::util::format_set; +use crate::weakhandle::WeakHandle; use crate::xfs_types::{XfsAgNumber, XfsRgNumber}; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use std::path::PathBuf; use strum_macros::EnumString; /// Metadata types for an allocation group on the data device @@ -75,12 +77,15 @@ impl XfsPeragEvent { } impl XfsHealthEvent for XfsPeragEvent { - fn format(&self) -> String { - format!( - "{} {} {}", - self.group, - format_set(self.metadata), - self.status + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + ( + None, + format!( + "{} {} {}", + self.group, + format_set(self.metadata), + self.status + ), ) } @@ -137,12 +142,15 @@ impl XfsRtgroupEvent { } impl XfsHealthEvent for XfsRtgroupEvent { - fn format(&self) -> String { - format!( - "{} {} {}", - self.group, - format_set(self.metadata), - self.status + fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { + ( + None, + format!( + "{} {} {}", + self.group, + format_set(self.metadata), + self.status + ), ) } diff --git a/healer/src/healthmon/inodes.rs b/healer/src/healthmon/inodes.rs index a4324c7d834b42..b01205cedbfb4d 100644 --- a/healer/src/healthmon/inodes.rs +++ b/healer/src/healthmon/inodes.rs @@ -9,10 +9,12 @@ use crate::healthmon::event::XfsHealthEvent; use crate::healthmon::event::XfsHealthStatus; use crate::repair::Repair; use crate::util::format_set; +use crate::weakhandle::WeakHandle; use crate::xfs_types::{XfsFid, XfsFileRange}; use crate::xfsprogs::M_; use enumset::EnumSet; use enumset::EnumSetType; +use std::path::PathBuf; use strum_macros::EnumString; /// Metadata types for an XFS inode @@ -71,8 +73,17 @@ impl XfsInodeEvent { } impl XfsHealthEvent for XfsInodeEvent { - fn format(&self) -> String { - format!("{} {} {}", self.fid, format_set(self.metadata), self.status) + fn format(&self, fh: &WeakHandle) -> (Option<PathBuf>, String) { + match fh.path_for(self.fid) { + Some(path) => ( + Some(path), + format!("{} {}", format_set(self.metadata), self.status), + ), + None => ( + None, + format!("{} {} {}", self.fid, format_set(self.metadata), self.status), + ), + } } schedule_repairs!(XfsInodeEvent, |s: &XfsInodeEvent, sm_type| { @@ -122,7 +133,10 @@ impl XfsFileIoErrorEvent { } impl XfsHealthEvent for XfsFileIoErrorEvent { - fn format(&self) -> String { - format!("{} {} {}", self.fid, self.iotype, self.range) + fn format(&self, fh: &WeakHandle) -> (Option<PathBuf>, String) { + match fh.path_for(self.fid) { + Some(path) => (Some(path), format!("{} {}", self.iotype, self.range)), + None => (None, format!("{} {} {}", self.fid, self.iotype, self.range)), + } } } diff --git a/healer/src/lib.rs b/healer/src/lib.rs index 0b5735b7183138..e0e59a5868af75 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -12,3 +12,4 @@ pub mod healthmon; pub mod weakhandle; pub mod repair; pub mod fsgeom; +pub mod getparents; diff --git a/healer/src/main.rs b/healer/src/main.rs index ed118243dd911b..191018779f335d 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -98,7 +98,11 @@ impl App { } Ok(event) => { if self.log || event.must_log() { - printlogln!("{}: {}", self.path.display(), event.format()); + let (maybe_path, message) = event.format(fh); + match maybe_path { + Some(x) => printlogln!("{}: {}", x.display(), message), + None => printlogln!("{}: {}", self.path.display(), message), + }; } if self.repair { for mut repair in event.schedule_repairs() { @@ -151,7 +155,7 @@ impl App { } } - let fh = WeakHandle::try_new(&fp, &self.path) + let fh = WeakHandle::try_new(&fp, &self.path, fsgeom) .with_context(|| M_("Configuring filesystem handle"))?; if self.json { diff --git a/healer/src/repair.rs b/healer/src/repair.rs index 1312efd87281dd..c0cd7d64306536 100644 --- a/healer/src/repair.rs +++ b/healer/src/repair.rs @@ -285,6 +285,19 @@ impl Repair { } } + /// Translate the target of this repair into a filesystem path + fn repair_path(&self, fh: &WeakHandle) -> String { + if let RepairGroup::File = self.group { + let fid: XfsFid = self.detail.into(); + + if let Some(path) = fh.path_for(fid) { + return path.display().to_string(); + } + } + + fh.mountpoint() + } + /// Call the kernel to repair things fn repair(&mut self, fh: &WeakHandle) -> Result<bool> { let fp = fh @@ -305,10 +318,15 @@ impl Repair { pub fn perform(&mut self, fh: &WeakHandle) { match self.repair(fh) { Err(e) => { - eprintln!("{}: {:#}", fh.mountpoint(), e); + eprintln!("{}: {:#}", self.repair_path(fh), e); } _ => { - printlogln!("{}: {}: {}", fh.mountpoint(), self.summary(), self.outcome); + printlogln!( + "{}: {}: {}", + self.repair_path(fh), + self.summary(), + self.outcome + ); } }; } diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index ccac5d86d3be41..57cc7602fbd25e 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -4,8 +4,11 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::baddata; +use crate::xfs_fs::xfs_fid; +use crate::xfs_fs::xfs_fsop_geom; use crate::xfs_fs::xfs_fsop_handlereq; use crate::xfs_fs::xfs_handle; +use crate::xfs_types::XfsFid; use crate::xfsprogs::M_; use anyhow::{Error, Result}; use nix::ioctl_readwrite; @@ -73,6 +76,9 @@ pub struct WeakHandle<'a> { /// Filesystem handle handle: xfs_handle, + + /// Does this filesystem support parent pointers? + has_parent: bool, } impl WeakHandle<'_> { @@ -99,12 +105,34 @@ impl WeakHandle<'_> { } /// Create a soft handle from an open file descriptor and its mount point - pub fn try_new<'a>(fp: &File, mountpoint: &'a Path) -> Result<WeakHandle<'a>> { + pub fn try_new<'a>( + fp: &File, + mountpoint: &'a Path, + fsgeom: xfs_fsop_geom, + ) -> Result<WeakHandle<'a>> { Ok(WeakHandle { mountpoint, handle: xfs_handle::try_from(fp)?, + has_parent: fsgeom.has_parent(), }) } + + /// Create a new file handle from this one + pub fn subst(&self, fid: XfsFid) -> xfs_handle { + xfs_handle { + ha_fid: xfs_fid { + fid_ino: fid.ino.into(), + fid_gen: fid.gen.into(), + ..self.handle.ha_fid + }, + ..self.handle + } + } + + /// Can this filesystem do parent pointer lookups? + pub fn can_get_parents(&self) -> bool { + self.has_parent + } } impl Display for WeakHandle<'_> { ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 11/19] xfs_healer: make the rust program check if kernel support available 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (9 preceding siblings ...) 2025-10-23 0:14 ` [PATCH 10/19] xfs_healer: use getparents to look up file names Darrick J. Wong @ 2025-10-23 0:15 ` Darrick J. Wong 2025-10-23 0:15 ` [PATCH 12/19] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong ` (8 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:15 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the rust port program to check if kernel support is available, to duplicate existing python functionality. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/src/healthmon/mod.rs | 28 ++++++++++++++++++++++++++++ healer/src/main.rs | 18 ++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/healer/src/healthmon/mod.rs b/healer/src/healthmon/mod.rs index 5116361146db18..f73babbe001154 100644 --- a/healer/src/healthmon/mod.rs +++ b/healer/src/healthmon/mod.rs @@ -3,8 +3,12 @@ * Copyright (C) 2025 Oracle. All Rights Reserved. * Author: Darrick J. Wong <djwong@kernel.org> */ +use crate::xfs_fs; use crate::xfs_fs::xfs_health_monitor; use nix::ioctl_write_ptr; +use std::fs::File; +use std::os::fd::AsRawFd; +use std::os::fd::FromRawFd; pub mod cstruct; pub mod event; @@ -14,3 +18,27 @@ pub mod inodes; pub mod json; ioctl_write_ptr!(xfs_ioc_health_monitor, 'X', 68, xfs_health_monitor); + +/// Check if the open file supports a health monitor. +pub fn is_supported(fp: &File, use_json: bool) -> bool { + let hminfo = xfs_health_monitor { + format: if use_json { + xfs_fs::XFS_HEALTH_MONITOR_FMT_JSON as u8 + } else { + xfs_fs::XFS_HEALTH_MONITOR_FMT_CSTRUCT as u8 + }, + ..Default::default() + }; + + // SAFETY: Trusting the kernel not to corrupt our memory, and for it to return a valid file + // description number, which we immediately convert to a File and drop to close the fd. + unsafe { + match xfs_ioc_health_monitor(fp.as_raw_fd(), &hminfo) { + Ok(x) => { + File::from_raw_fd(x); + true + } + Err(_) => false, + } + } +} diff --git a/healer/src/main.rs b/healer/src/main.rs index 191018779f335d..fe125c4c4ee5f3 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -69,6 +69,12 @@ impl Cli { .help(M_("Always repair corrupt metadata")) .action(ArgAction::SetTrue), ) + .arg( + Arg::new("check") + .long("check") + .help(M_("Check that health monitoring is supported")) + .action(ArgAction::SetTrue), + ) .get_matches()) } } @@ -81,6 +87,7 @@ struct App { everything: bool, json: bool, repair: bool, + check: bool, path: PathBuf, } @@ -155,6 +162,16 @@ impl App { } } + // Now that we know that we can repair if the user wanted to, make sure that the kernel + // supports reporting events if that was as far as the user wanted us to go. + if self.check { + return Ok(if xfs_healer::healthmon::is_supported(&fp, self.json) { + ExitCode::SUCCESS + } else { + ExitCode::FAILURE + }); + } + let fh = WeakHandle::try_new(&fp, &self.path, fsgeom) .with_context(|| M_("Configuring filesystem handle"))?; @@ -187,6 +204,7 @@ impl From<Cli> for App { path: cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf(), json: cli.0.get_flag("json"), repair: cli.0.get_flag("repair"), + check: cli.0.get_flag("check"), } } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 12/19] xfs_healer: use the autofsck fsproperty to select mode 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (10 preceding siblings ...) 2025-10-23 0:15 ` [PATCH 11/19] xfs_healer: make the rust program check if kernel support available Darrick J. Wong @ 2025-10-23 0:15 ` Darrick J. Wong 2025-10-23 0:15 ` [PATCH 13/19] xfs_healer: use rc on the mountpoint instead of lifetime annotations Darrick J. Wong ` (7 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:15 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the rust xfs_healer program query the autofsck filesystem property to figure out which operating mode it should use. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Cargo.toml.in | 1 healer/Makefile | 1 healer/src/fsprops.rs | 101 +++++++++++++++++++++++++++++++++++++++++++++++++ healer/src/lib.rs | 1 healer/src/main.rs | 93 +++++++++++++++++++++++++++++++++++++++++---- m4/package_rust.m4 | 1 6 files changed, 189 insertions(+), 9 deletions(-) create mode 100644 healer/src/fsprops.rs diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in index fcf7f7a6d9373b..dcb356b7772674 100644 --- a/healer/Cargo.toml.in +++ b/healer/Cargo.toml.in @@ -28,6 +28,7 @@ serde_json = { version = "1.0.87" } nix = { version = "0", features = ["ioctl"] } # 0.26.1 strum = { version = "0" } # 0.19.2 strum_macros = { version = "0" } # 0.19.2 +libc = { version = "0" } # 0.2.139 # Dynamically comment out all the gettextrs related dependency information in # Cargo.toml becuse cargo requires the crate to be present so that it can diff --git a/healer/Makefile b/healer/Makefile index 796bed3e166487..76977e527c56e6 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -18,6 +18,7 @@ HFILES = \ RUSTFILES = \ src/fsgeom.rs \ + src/fsprops.rs \ src/getparents.rs \ src/lib.rs \ src/main.rs \ diff --git a/healer/src/fsprops.rs b/healer/src/fsprops.rs new file mode 100644 index 00000000000000..652daf33cb1eb1 --- /dev/null +++ b/healer/src/fsprops.rs @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use libc::fgetxattr; +use std::ffi::CString; +use std::fs::File; +use std::os::fd::AsRawFd; +use std::os::raw::c_void; +use std::str::FromStr; +use strum_macros::EnumString; + +/// Property name for coordinating automatic fsck +const AUTOFSCK_NAME: &str = "autofsck"; + +/// Boilerplate to stamp out functions to convert json array to the given enum +/// type; or return an error with the given message. +macro_rules! fsprop_from_string { + ($enum_type:ty , $default:ident) => { + impl From<Option<String>> for $enum_type { + /// Convert from a json value to an enum + fn from(v: Option<String>) -> $enum_type { + if let Some(value) = v { + match <$enum_type>::from_str(&value) { + Ok(o) => o, + Err(_) => <$enum_type>::$default, + } + } else { + <$enum_type>::$default + } + } + } + }; +} + +/// Values for the autofsck property +#[derive(Debug, strum_macros::Display, EnumString)] +#[strum(serialize_all = "lowercase")] +pub enum XfsAutofsck { + /// No value set + Unset, + + /// Do not do background repairs + None, + + /// Check but do not change anything + Check, + + /// Optimize only, do not repair + Optimize, + + /// Repair and optimize + Repair, +} +fsprop_from_string!(XfsAutofsck, Unset); + +const FSPROP_MAX_VALUELEN: usize = 256; + +fn propname(realname: &str) -> String { + let mut ret: String = "trusted.xfs:".to_owned(); + ret.push_str(realname); + ret +} + +/// Return the value of a filesystem property as a string. Returns None on +/// any kind of error, or an empty value. +fn get(fp: &File, property: &str) -> Option<String> { + let cname: CString = match CString::new(propname(property)) { + Ok(x) => x, + Err(_) => return None, + }; + let mut value: Vec<u8> = vec![0; FSPROP_MAX_VALUELEN]; + + // SAFETY: Trusting the kernel not to corrupt either buffer that we pass in. + let ret = unsafe { + fgetxattr( + fp.as_raw_fd(), + cname.as_ptr(), + value.as_mut_ptr() as *mut c_void, + FSPROP_MAX_VALUELEN, + ) + }; + if ret < 1 { + return None; + } + + let cvalue: CString = match CString::new(&value[0..ret as usize]) { + Ok(x) => x, + _ => return None, + }; + match cvalue.into_string() { + Ok(x) => Some(x), + _ => None, + } +} + +/// Return the autofsck filesystem property. +pub fn get_autofsck(fp: &File) -> XfsAutofsck { + get(fp, AUTOFSCK_NAME).into() +} diff --git a/healer/src/lib.rs b/healer/src/lib.rs index e0e59a5868af75..d952b61646114d 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -13,3 +13,4 @@ pub mod weakhandle; pub mod repair; pub mod fsgeom; pub mod getparents; +pub mod fsprops; diff --git a/healer/src/main.rs b/healer/src/main.rs index fe125c4c4ee5f3..580ca5a0b13508 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -4,10 +4,12 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use anyhow::{Context, Result}; -use clap::{value_parser, Arg, ArgAction, ArgMatches, Command}; +use clap::{value_parser, Arg, ArgAction, ArgGroup, ArgMatches, Command}; use std::fs::File; use std::path::PathBuf; use std::process::ExitCode; +use xfs_healer::fsprops; +use xfs_healer::fsprops::XfsAutofsck; use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; use xfs_healer::healthmon::json::JsonMonitor; @@ -75,6 +77,13 @@ impl Cli { .help(M_("Check that health monitoring is supported")) .action(ArgAction::SetTrue), ) + .arg( + Arg::new("autofsck") + .long("autofsck") + .help(M_("Use the \"autofsck\" fs property to decide to repair")) + .action(ArgAction::SetTrue), + ) + .group(ArgGroup::new("decide_repair").args(["repair", "autofsck"])) .get_matches()) } } @@ -88,9 +97,24 @@ struct App { json: bool, repair: bool, check: bool, + autofsck: bool, path: PathBuf, } +/// Outcome of checking if the kernel supports metadata repair +enum CheckRepair { + ExitWith(ExitCode), + Downgrade, + Proceed, +} + +/// Outcome of looking at the autofsck fsproperty to decide if we will repair metadata +enum CheckAutofsck { + ExitWith(ExitCode), + Upgrade, + Proceed, +} + impl App { /// Return mountpoint as string, for printing messages fn mountpoint(&self) -> String { @@ -121,14 +145,23 @@ impl App { } /// Complain if repairs won't be entirely effective. - fn check_repair(&self, fp: &File, fsgeom: &xfs_fsop_geom) -> Option<ExitCode> { + fn check_repair(&self, fp: &File, fsgeom: &xfs_fsop_geom) -> CheckRepair { if !Repair::is_supported(fp) { + if !self.autofsck { + printlogln!( + "{}: {}", + self.path.display(), + M_("XFS online repair is not supported, exiting") + ); + return CheckRepair::ExitWith(ExitCode::FAILURE); + } + printlogln!( "{}: {}", self.path.display(), - M_("XFS online repair is not supported, exiting") + M_("XFS online repair is not supported, will report only") ); - return Some(ExitCode::FAILURE); + return CheckRepair::Downgrade; } if !fsgeom.has_rmapbt() { @@ -146,19 +179,60 @@ impl App { ); } - None + CheckRepair::Proceed + } + + /// Set the behavior of the program from the autofsck fs property. + fn check_autofsck(&self, fp: &File) -> CheckAutofsck { + match fsprops::get_autofsck(fp) { + XfsAutofsck::None => { + printlogln!( + "{}: {}", + self.path.display(), + M_("Disabling healer per autofsck directive.") + ); + return CheckAutofsck::ExitWith(ExitCode::SUCCESS); + } + XfsAutofsck::Check | XfsAutofsck::Optimize | XfsAutofsck::Unset => { + printlogln!( + "{}: {}", + self.path.display(), + M_("Will not automatically heal per autofsck directive.") + ); + } + XfsAutofsck::Repair => { + printlogln!( + "{}: {}", + self.path.display(), + M_("Automatically healing per autofsck directive.") + ); + return CheckAutofsck::Upgrade; + } + } + CheckAutofsck::Proceed } /// Main app method - fn main(&self) -> Result<ExitCode> { + fn main(&mut self) -> Result<ExitCode> { let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + // Decide if we're going to enable repairs, which must come before check_repair. + if self.autofsck { + match self.check_autofsck(&fp) { + CheckAutofsck::ExitWith(ret) => return Ok(ret), + CheckAutofsck::Upgrade => self.repair = true, + CheckAutofsck::Proceed => {} + } + } + // Make sure that we can initiate repairs let fsgeom = xfs_fsop_geom::try_from(&fp).with_context(|| M_("Querying filesystem geometry"))?; if self.repair { - if let Some(ret) = self.check_repair(&fp, &fsgeom) { - return Ok(ret); + match self.check_repair(&fp, &fsgeom) { + CheckRepair::ExitWith(ret) => return Ok(ret), + CheckRepair::Downgrade => self.repair = false, + CheckRepair::Proceed => {} } } @@ -205,6 +279,7 @@ impl From<Cli> for App { json: cli.0.get_flag("json"), repair: cli.0.get_flag("repair"), check: cli.0.get_flag("check"), + autofsck: cli.0.get_flag("autofsck"), } } } @@ -222,7 +297,7 @@ fn main() -> ExitCode { printlogln!("args: {:?}", args); } - let app: App = args.into(); + let mut app: App = args.into(); match app.main() { Ok(f) => f, Err(e) => { diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index 192d84651df909..109e4ba51d6356 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -134,6 +134,7 @@ enumset = { version = "1.0.12" } strum = { version = "0" } # 0.19.2 strum_macros = { version = "0" } # 0.19.2 serde_json = { version = "1.0.87" } +libc = { version = "0" } # 0.2.139 ], [yes], [no]) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 13/19] xfs_healer: use rc on the mountpoint instead of lifetime annotations 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (11 preceding siblings ...) 2025-10-23 0:15 ` [PATCH 12/19] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong @ 2025-10-23 0:15 ` Darrick J. Wong 2025-10-23 0:15 ` [PATCH 14/19] xfs_healer: use thread pools Darrick J. Wong ` (6 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:15 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Institute a refcount on the mountpoint pathbuf so that we don't have to play around with explicit lifetime rules. That was fine for single threaded operation, but in the next patch we'll move the event processing to another thread to reduce time between read() calls. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/src/getparents.rs | 2 +- healer/src/main.rs | 9 +++++---- healer/src/weakhandle.rs | 19 ++++++++++--------- 3 files changed, 16 insertions(+), 14 deletions(-) diff --git a/healer/src/getparents.rs b/healer/src/getparents.rs index d6d7020e08f9d2..46f9724ee7cf7c 100644 --- a/healer/src/getparents.rs +++ b/healer/src/getparents.rs @@ -180,7 +180,7 @@ fn find_path_components( Some(false) } -impl WeakHandle<'_> { +impl WeakHandle { /// Return a path to the root for the given soft handle and ino/gen info, /// or None if errors occurred or we couldn't find the root. pub fn path_for(&self, fid: XfsFid) -> Option<PathBuf> { diff --git a/healer/src/main.rs b/healer/src/main.rs index 580ca5a0b13508..777b5c2804b297 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -8,6 +8,7 @@ use clap::{value_parser, Arg, ArgAction, ArgGroup, ArgMatches, Command}; use std::fs::File; use std::path::PathBuf; use std::process::ExitCode; +use std::rc::Rc; use xfs_healer::fsprops; use xfs_healer::fsprops::XfsAutofsck; use xfs_healer::healthmon::cstruct::CStructMonitor; @@ -98,7 +99,7 @@ struct App { repair: bool, check: bool, autofsck: bool, - path: PathBuf, + path: Rc<PathBuf>, } /// Outcome of checking if the kernel supports metadata repair @@ -214,7 +215,7 @@ impl App { /// Main app method fn main(&mut self) -> Result<ExitCode> { - let fp = File::open(&self.path).with_context(|| M_("Opening filesystem failed"))?; + let fp = File::open(&*self.path).with_context(|| M_("Opening filesystem failed"))?; // Decide if we're going to enable repairs, which must come before check_repair. if self.autofsck { @@ -246,7 +247,7 @@ impl App { }); } - let fh = WeakHandle::try_new(&fp, &self.path, fsgeom) + let fh = WeakHandle::try_new(&fp, self.path.clone(), fsgeom) .with_context(|| M_("Configuring filesystem handle"))?; if self.json { @@ -275,7 +276,7 @@ impl From<Cli> for App { debug: cli.0.get_flag("debug"), log: cli.0.get_flag("log"), everything: cli.0.get_flag("everything"), - path: cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf(), + path: Rc::new(cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf()), json: cli.0.get_flag("json"), repair: cli.0.get_flag("repair"), check: cli.0.get_flag("check"), diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index 57cc7602fbd25e..8734d421fe5f32 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -19,7 +19,8 @@ use std::fs::File; use std::io::ErrorKind; use std::os::fd::AsRawFd; use std::os::raw::c_void; -use std::path::Path; +use std::path::PathBuf; +use std::rc::Rc; ioctl_readwrite!(xfs_ioc_fd_to_handle, 'X', 106, xfs_fsop_handlereq); @@ -70,9 +71,9 @@ impl TryFrom<&File> for xfs_handle { } /// Filesystem handle that can be disconnected from any open files -pub struct WeakHandle<'a> { +pub struct WeakHandle { /// path to the filesystem mountpoint - mountpoint: &'a Path, + mountpoint: Rc<PathBuf>, /// Filesystem handle handle: xfs_handle, @@ -81,10 +82,10 @@ pub struct WeakHandle<'a> { has_parent: bool, } -impl WeakHandle<'_> { +impl WeakHandle { /// Try to reopen the filesystem from which we got the handle. pub fn reopen(&self) -> Result<File> { - let fp = File::open(self.mountpoint)?; + let fp = File::open(self.mountpoint.as_path())?; if xfs_handle::try_from(&fp)? != self.handle { let s = format!( @@ -105,11 +106,11 @@ impl WeakHandle<'_> { } /// Create a soft handle from an open file descriptor and its mount point - pub fn try_new<'a>( + pub fn try_new( fp: &File, - mountpoint: &'a Path, + mountpoint: Rc<PathBuf>, fsgeom: xfs_fsop_geom, - ) -> Result<WeakHandle<'a>> { + ) -> Result<WeakHandle> { Ok(WeakHandle { mountpoint, handle: xfs_handle::try_from(fp)?, @@ -135,7 +136,7 @@ impl WeakHandle<'_> { } } -impl Display for WeakHandle<'_> { +impl Display for WeakHandle { fn fmt(&self, f: &mut Formatter) -> std::fmt::Result { write!(f, "{}", self.mountpoint.display()) } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 14/19] xfs_healer: use thread pools 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (12 preceding siblings ...) 2025-10-23 0:15 ` [PATCH 13/19] xfs_healer: use rc on the mountpoint instead of lifetime annotations Darrick J. Wong @ 2025-10-23 0:15 ` Darrick J. Wong 2025-10-23 0:16 ` [PATCH 15/19] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong ` (5 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:15 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use a thread pool so that the kernel event reader thread can spend as much time sleeping in the kernel while other threads actually deal with decoding and processing the events. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Cargo.toml.in | 1 + healer/src/main.rs | 87 +++++++++++++++++++++++++++++++++++++++------- healer/src/weakhandle.rs | 6 ++- m4/package_rust.m4 | 1 + 4 files changed, 78 insertions(+), 17 deletions(-) diff --git a/healer/Cargo.toml.in b/healer/Cargo.toml.in index dcb356b7772674..433be243e3846c 100644 --- a/healer/Cargo.toml.in +++ b/healer/Cargo.toml.in @@ -18,6 +18,7 @@ clap = { version = "4.0.32", features = ["derive"] } anyhow = { version = "1.0.69" } enumset = { version = "1.0.12" } serde_json = { version = "1.0.87" } +threadpool = { version = "1.8.1" } # XXX: Crates with major version 0 are not considered ABI-stable, so the minor # version is treated as if it were the major version. This creates problems diff --git a/healer/src/main.rs b/healer/src/main.rs index 777b5c2804b297..4d0f6021177ac9 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -8,16 +8,19 @@ use clap::{value_parser, Arg, ArgAction, ArgGroup, ArgMatches, Command}; use std::fs::File; use std::path::PathBuf; use std::process::ExitCode; -use std::rc::Rc; +use std::sync::Arc; +use threadpool::ThreadPool; use xfs_healer::fsprops; use xfs_healer::fsprops::XfsAutofsck; use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; +use xfs_healer::healthmon::json::JsonEventWrapper; use xfs_healer::healthmon::json::JsonMonitor; use xfs_healer::printlogln; use xfs_healer::repair::Repair; use xfs_healer::weakhandle::WeakHandle; use xfs_healer::xfs_fs::xfs_fsop_geom; +use xfs_healer::xfs_fs::xfs_health_monitor_event; use xfs_healer::xfsprogs; use xfs_healer::xfsprogs::M_; @@ -99,7 +102,25 @@ struct App { repair: bool, check: bool, autofsck: bool, - path: Rc<PathBuf>, + path: Arc<PathBuf>, +} + +/// Contains all the per-thread state +#[derive(Debug)] +struct EventThread { + log: bool, + repair: bool, +} + +impl EventThread { + /// Create a new thread context from an App reference + // XXX: I don't know how to do From<&App> + fn new(app: &App) -> Self { + EventThread { + log: app.log, + repair: app.repair, + } + } } /// Outcome of checking if the kernel supports metadata repair @@ -123,28 +144,55 @@ impl App { } /// Handle a health event that has been decoded into real objects - fn process_event(&self, fh: &WeakHandle, cooked: Result<Box<dyn XfsHealthEvent>>) { + fn process_event( + et: EventThread, + fh: Arc<WeakHandle>, + cooked: Result<Box<dyn XfsHealthEvent>>, + ) { match cooked { Err(e) => { - eprintln!("{}: {:#}", self.path.display(), e) + eprintln!("{}: {:#}", fh.mountpoint(), e) } Ok(event) => { - if self.log || event.must_log() { - let (maybe_path, message) = event.format(fh); + if et.log || event.must_log() { + let (maybe_path, message) = event.format(&fh); match maybe_path { Some(x) => printlogln!("{}: {}", x.display(), message), - None => printlogln!("{}: {}", self.path.display(), message), + None => printlogln!("{}: {}", fh.mountpoint(), message), }; } - if self.repair { + if et.repair { for mut repair in event.schedule_repairs() { - repair.perform(fh) + repair.perform(&fh) } } } } } + // fugly helpers to reduce the scope of the variables moved into the closure + fn dispatch_json_event( + threads: &ThreadPool, + et: EventThread, + fh: Arc<WeakHandle>, + raw_event: JsonEventWrapper, + ) { + threads.execute(move || { + App::process_event(et, fh, raw_event.cook()); + }) + } + + fn dispatch_cstruct_event( + threads: &ThreadPool, + et: EventThread, + fh: Arc<WeakHandle>, + raw_event: xfs_health_monitor_event, + ) { + threads.execute(move || { + App::process_event(et, fh, raw_event.cook()); + }) + } + /// Complain if repairs won't be entirely effective. fn check_repair(&self, fp: &File, fsgeom: &xfs_fsop_geom) -> CheckRepair { if !Repair::is_supported(fp) { @@ -247,24 +295,35 @@ impl App { }); } - let fh = WeakHandle::try_new(&fp, self.path.clone(), fsgeom) - .with_context(|| M_("Configuring filesystem handle"))?; + let fh = Arc::new( + WeakHandle::try_new(&fp, self.path.clone(), fsgeom) + .with_context(|| M_("Configuring filesystem handle"))?, + ); + + // Creates a threadpool with nr_cpus workers. + let threads = threadpool::Builder::new().build(); if self.json { let hmon = JsonMonitor::try_new(fp, &self.path, self.everything, self.debug) .with_context(|| M_("Opening js health monitor file"))?; for raw_event in hmon { - self.process_event(&fh, raw_event.cook()); + App::dispatch_json_event(&threads, EventThread::new(self), fh.clone(), raw_event); } } else { let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) .with_context(|| M_("Opening health monitor file"))?; for raw_event in hmon { - self.process_event(&fh, raw_event.cook()); + App::dispatch_cstruct_event( + &threads, + EventThread::new(self), + fh.clone(), + raw_event, + ); } } + threads.join(); Ok(ExitCode::SUCCESS) } @@ -276,7 +335,7 @@ impl From<Cli> for App { debug: cli.0.get_flag("debug"), log: cli.0.get_flag("log"), everything: cli.0.get_flag("everything"), - path: Rc::new(cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf()), + path: Arc::new(cli.0.get_one::<PathBuf>("path").unwrap().to_path_buf()), json: cli.0.get_flag("json"), repair: cli.0.get_flag("repair"), check: cli.0.get_flag("check"), diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index 8734d421fe5f32..8c3dd7e04a64c2 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -20,7 +20,7 @@ use std::io::ErrorKind; use std::os::fd::AsRawFd; use std::os::raw::c_void; use std::path::PathBuf; -use std::rc::Rc; +use std::sync::Arc; ioctl_readwrite!(xfs_ioc_fd_to_handle, 'X', 106, xfs_fsop_handlereq); @@ -73,7 +73,7 @@ impl TryFrom<&File> for xfs_handle { /// Filesystem handle that can be disconnected from any open files pub struct WeakHandle { /// path to the filesystem mountpoint - mountpoint: Rc<PathBuf>, + mountpoint: Arc<PathBuf>, /// Filesystem handle handle: xfs_handle, @@ -108,7 +108,7 @@ impl WeakHandle { /// Create a soft handle from an open file descriptor and its mount point pub fn try_new( fp: &File, - mountpoint: Rc<PathBuf>, + mountpoint: Arc<PathBuf>, fsgeom: xfs_fsop_geom, ) -> Result<WeakHandle> { Ok(WeakHandle { diff --git a/m4/package_rust.m4 b/m4/package_rust.m4 index 109e4ba51d6356..a6fb0b9a8fc50c 100644 --- a/m4/package_rust.m4 +++ b/m4/package_rust.m4 @@ -135,6 +135,7 @@ strum = { version = "0" } # 0.19.2 strum_macros = { version = "0" } # 0.19.2 serde_json = { version = "1.0.87" } libc = { version = "0" } # 0.2.139 +threadpool = { version = "1.8.1" } ], [yes], [no]) ]) ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 15/19] xfs_healer: run full scrub after lost corruption events or targeted repair failure 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (13 preceding siblings ...) 2025-10-23 0:15 ` [PATCH 14/19] xfs_healer: use thread pools Darrick J. Wong @ 2025-10-23 0:16 ` Darrick J. Wong 2025-10-23 0:16 ` [PATCH 16/19] xfs_healer: use getmntent in Rust to find moved filesystems Darrick J. Wong ` (4 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:16 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we fail to perform a spot repair of metadata or the kernel tells us that it lost corruption events due to queue limits, initiate a full run of the online fsck service to try to fix the error. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 + healer/src/healthmon/event.rs | 12 +++++++- healer/src/main.rs | 4 ++- healer/src/repair.rs | 60 ++++++++++++++++++++++++++++++++++++++++- healer/src/util.rs | 9 ++++++ healer/src/weakhandle.rs | 22 +++++++++++++++ healer/src/xfsprogs.rs.in | 1 + 7 files changed, 104 insertions(+), 5 deletions(-) diff --git a/healer/Makefile b/healer/Makefile index 76977e527c56e6..ae01e30403d0e5 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -131,6 +131,7 @@ src/xfsprogs.rs: src/xfsprogs.rs.in $(builddefs) $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" \ -e "s|@PACKAGE@|$(PKG_NAME)|g" \ -e "s|@LOCALEDIR@|$(PKG_LOCALE_DIR)|g" \ + -e "s|@scrub_svcname@|$(XFS_SCRUB_SVCNAME)|g" \ < $< > $@ $(CARGO_MANIFEST): $(CARGO_MANIFEST).in $(builddefs) diff --git a/healer/src/healthmon/event.rs b/healer/src/healthmon/event.rs index ea3a6b21f744df..702d460bca2816 100644 --- a/healer/src/healthmon/event.rs +++ b/healer/src/healthmon/event.rs @@ -22,7 +22,7 @@ pub trait XfsHealthEvent { fn format(&self, fh: &WeakHandle) -> (Option<PathBuf>, String); /// Generate the inputs to a kernel scrub ioctl - fn schedule_repairs(&self) -> Vec<Repair> { + fn schedule_repairs(&self, _everything: bool) -> Vec<Repair> { vec![] } } @@ -32,7 +32,7 @@ pub trait XfsHealthEvent { #[macro_export] macro_rules! schedule_repairs { ($event_type:ty , $lambda: expr ) => { - fn schedule_repairs(&self) -> Vec<$crate::repair::Repair> { + fn schedule_repairs(&self, _: bool) -> Vec<$crate::repair::Repair> { if self.status != $crate::healthmon::event::XfsHealthStatus::Sick { return vec![]; } @@ -89,6 +89,14 @@ impl XfsHealthEvent for LostEvent { fn format(&self, _: &WeakHandle) -> (Option<PathBuf>, String) { (None, format!("{} {}", self.count, M_("events lost"))) } + + fn schedule_repairs(&self, everything: bool) -> Vec<Repair> { + if everything { + vec![] + } else { + vec![Repair::full_repair()] + } + } } /// Event for the monitor starting up diff --git a/healer/src/main.rs b/healer/src/main.rs index 4d0f6021177ac9..b7d4b0dfc6a083 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -109,6 +109,7 @@ struct App { #[derive(Debug)] struct EventThread { log: bool, + everything: bool, repair: bool, } @@ -118,6 +119,7 @@ impl EventThread { fn new(app: &App) -> Self { EventThread { log: app.log, + everything: app.everything, repair: app.repair, } } @@ -162,7 +164,7 @@ impl App { }; } if et.repair { - for mut repair in event.schedule_repairs() { + for mut repair in event.schedule_repairs(et.everything) { repair.perform(&fh) } } diff --git a/healer/src/repair.rs b/healer/src/repair.rs index c0cd7d64306536..975c3cb9cb412a 100644 --- a/healer/src/repair.rs +++ b/healer/src/repair.rs @@ -3,6 +3,7 @@ * Copyright (C) 2025 Oracle. All Rights Reserved. * Author: Darrick J. Wong <djwong@kernel.org> */ +use crate::badness; use crate::display_for_enum; use crate::healthmon::fs::XfsWholeFsMetadata; use crate::healthmon::groups::{XfsPeragMetadata, XfsRtgroupMetadata}; @@ -12,31 +13,35 @@ use crate::weakhandle::WeakHandle; use crate::xfs_fs; use crate::xfs_fs::xfs_scrub_metadata; use crate::xfs_types::{XfsAgNumber, XfsFid, XfsRgNumber}; +use crate::xfsprogs; use crate::xfsprogs::M_; use anyhow::{Context, Result}; use nix::ioctl_readwrite; use std::fs::File; use std::os::fd::AsRawFd; +use std::process::Command; ioctl_readwrite!(xfs_ioc_scrub_metadata, 'X', 60, xfs_scrub_metadata); /// Classification information for later reporting -#[derive(Debug)] +#[derive(Debug, PartialEq)] enum RepairGroup { WholeFs, PerAg, RtGroup, File, + FullRepair, } /// What happened when we tried to repair something? -#[derive(Debug)] +#[derive(Debug, PartialEq)] enum RepairOutcome { Queued, Success, Unnecessary, MightBeOk, Failed, + Running, } display_for_enum!(RepairOutcome, { @@ -45,6 +50,7 @@ display_for_enum!(RepairOutcome, { MightBeOk => M_("Seems correct but cross-referencing failed; offline repair recommended."), Unnecessary => M_("No modification needed."), Success => M_("Repairs successful."), + Running => M_("Repairs in progress."), }); /// Kernel scrub type code @@ -245,6 +251,18 @@ impl Repair { } } + /// Schedule the full online fsck + pub fn full_repair() -> Repair { + Repair { + group: RepairGroup::FullRepair, + detail: xfs_scrub_metadata { + ..Default::default() + }, + outcome: RepairOutcome::Queued, + scrub_type: XfsScrubType(0), + } + } + /// Decode what happened when we tried to repair fn outcome(detail: &xfs_scrub_metadata) -> RepairOutcome { const REPAIR_FAILED: u32 = @@ -282,6 +300,9 @@ impl Repair { format!("{} {} {}", M_("Repair of"), fid, self.scrub_type) } + RepairGroup::FullRepair => { + M_("Full repair") + } } } @@ -298,8 +319,37 @@ impl Repair { fh.mountpoint() } + /// Start the background xfs_scrub service on a filesystem in the hopes that its autofsck + /// setting allows repairs. Does not wait for the service to complete. Multiple activations + /// while the service runs will be coalesced into a single service instance. + fn run_full_repair(&self, fh: &WeakHandle) -> Result<bool> { + let unit_name = fh.instance_unit_name(xfsprogs::XFS_SCRUB_SVCNAME)?; + + let output = Command::new("systemctl") + .arg("start") + .arg("--no-block") + .arg(unit_name) + .output()?; + + if !output.status.success() { + return Err(badness!(M_("Could not start xfs_scrub service.")).into()); + } + + Ok(true) + } + /// Call the kernel to repair things fn repair(&mut self, fh: &WeakHandle) -> Result<bool> { + if self.group == RepairGroup::FullRepair { + let started = self + .run_full_repair(fh) + .with_context(|| self.summary().to_string())?; + if started { + self.outcome = RepairOutcome::Running; + } + return Ok(started); + } + let fp = fh .reopen() .with_context(|| M_("Reopening filesystem to repair metadata"))?; @@ -327,6 +377,12 @@ impl Repair { self.summary(), self.outcome ); + + // Transform into a full repair if we failed to fix things. + if self.outcome == RepairOutcome::Failed && self.group != RepairGroup::FullRepair { + self.group = RepairGroup::FullRepair; + self.perform(fh); + } } }; } diff --git a/healer/src/util.rs b/healer/src/util.rs index bce48f83b01da0..5340724654552e 100644 --- a/healer/src/util.rs +++ b/healer/src/util.rs @@ -55,6 +55,15 @@ macro_rules! display_for_enum { }; } +/// Simple macro for creating errors for random badness. The only parameter describes why the +/// badness happened. +#[macro_export] +macro_rules! badness { + ($message:expr) => {{ + std::io::Error::new(std::io::ErrorKind::Other, $message) + }}; +} + /// Format an enum set into a string pub fn format_set<T: EnumSetType + Display>(f: EnumSet<T>) -> String { let mut ret = "".to_string(); diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index 8c3dd7e04a64c2..8650f6b9633b4d 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -4,6 +4,7 @@ * Author: Darrick J. Wong <djwong@kernel.org> */ use crate::baddata; +use crate::badness; use crate::xfs_fs::xfs_fid; use crate::xfs_fs::xfs_fsop_geom; use crate::xfs_fs::xfs_fsop_handlereq; @@ -13,13 +14,16 @@ use crate::xfsprogs::M_; use anyhow::{Error, Result}; use nix::ioctl_readwrite; use nix::libc::O_LARGEFILE; +use std::ffi::OsString; use std::fmt::Display; use std::fmt::Formatter; use std::fs::File; use std::io::ErrorKind; use std::os::fd::AsRawFd; use std::os::raw::c_void; +use std::os::unix::ffi::OsStringExt; use std::path::PathBuf; +use std::process::Command; use std::sync::Arc; ioctl_readwrite!(xfs_ioc_fd_to_handle, 'X', 106, xfs_fsop_handlereq); @@ -134,6 +138,24 @@ impl WeakHandle { pub fn can_get_parents(&self) -> bool { self.has_parent } + + /// Compute the systemd instance unit name for this mountpoint. + pub fn instance_unit_name(&self, service_template: &str) -> Result<OsString> { + let output = Command::new("systemd-escape") + .arg("--template") + .arg(service_template) + .arg("--path") + .arg(self.mountpoint.as_ref()) + .output()?; + + if !output.status.success() { + return Err(badness!("Could not format systemd instance unit name.").into()); + } + + // systemd always adds a newline to the end of the output; remove it + let trunc_out = &output.stdout[0..output.stdout.len() - 1]; + Ok(OsString::from_vec(trunc_out.to_vec())) + } } impl Display for WeakHandle { diff --git a/healer/src/xfsprogs.rs.in b/healer/src/xfsprogs.rs.in index 0c5cd2d00f7c26..e57995d5a9c429 100644 --- a/healer/src/xfsprogs.rs.in +++ b/healer/src/xfsprogs.rs.in @@ -6,6 +6,7 @@ #![allow(unexpected_cfgs)] pub const VERSION: &str = "@pkg_version@"; +pub const XFS_SCRUB_SVCNAME: &str = "@scrub_svcname@"; /// Try to initialize a localization library. Like the other xfsprogs utilities, we don't care /// if this fails. ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 16/19] xfs_healer: use getmntent in Rust to find moved filesystems 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (14 preceding siblings ...) 2025-10-23 0:16 ` [PATCH 15/19] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong @ 2025-10-23 0:16 ` Darrick J. Wong 2025-10-23 0:16 ` [PATCH 17/19] xfs_healer: validate that repair fds point to the monitored fs in Rust Darrick J. Wong ` (3 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:16 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wrap the libc getmntent function in an iterator. This enables xfs_healer to record the fsname (or fs spec) of the mountpoint that it's running against, and use that fsname to walk /proc/mounts to re-find the filesystem if the mount has moved elsewhere if it needs to open the fs to perform repairs. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 1 healer/src/getmntent.rs | 117 ++++++++++++++++++++++++++++++++++++++++++++++ healer/src/lib.rs | 1 healer/src/weakhandle.rs | 52 ++++++++++++++++++-- 4 files changed, 165 insertions(+), 6 deletions(-) create mode 100644 healer/src/getmntent.rs diff --git a/healer/Makefile b/healer/Makefile index ae01e30403d0e5..b3a9ed579a2a26 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -19,6 +19,7 @@ HFILES = \ RUSTFILES = \ src/fsgeom.rs \ src/fsprops.rs \ + src/getmntent.rs \ src/getparents.rs \ src/lib.rs \ src/main.rs \ diff --git a/healer/src/getmntent.rs b/healer/src/getmntent.rs new file mode 100644 index 00000000000000..86daeb052b8d55 --- /dev/null +++ b/healer/src/getmntent.rs @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use anyhow::Result; +use libc::{endmntent, mntent, setmntent, FILE}; +use std::ffi::{c_char, c_int, CStr, CString}; +use std::io; +use std::mem::MaybeUninit; +use std::path::{Path, PathBuf}; + +/* + * XXX: link directly to getmntent_r because the libc crate in Debian 12 is too old. Note that + * the bindgen'd xfs_fs.rs pulls in a similar but not totally identical version so we need to + * turn off that warning. + */ +#[allow(clashing_extern_declarations)] +extern "C" { + pub fn getmntent_r( + stream: *mut FILE, + mntbuf: *mut mntent, + buf: *mut c_char, + buflen: c_int, + ) -> *mut mntent; +} + +const NAME_BUFSIZE: usize = 4096; + +/// Iterator object that returns mountpoint entries +pub struct MountEntries { + /// mntent file + fp: *mut FILE, + + /// local storage for mtab parsing + mntbuf: mntent, + namebuf: Vec<c_char>, +} + +impl MountEntries { + pub fn try_new_from(mountfile: &Path) -> Result<MountEntries> { + let path = CString::new( + mountfile + .to_str() + .ok_or(io::Error::new(io::ErrorKind::Other, "bad mntent path"))?, + )?; + let mode = CString::new("r")?; + let fp = unsafe { setmntent(path.as_ptr(), mode.as_ptr()) }; + if fp.is_null() { + return Err(io::Error::new(io::ErrorKind::Other, "setmntent failed").into()); + } + + let mntbuf = MaybeUninit::<mntent>::zeroed(); + let namebuf: Vec<c_char> = Vec::with_capacity(NAME_BUFSIZE); + Ok(MountEntries { + fp, + namebuf, + mntbuf: unsafe { mntbuf.assume_init() }, + }) + } + + pub fn try_new() -> Result<MountEntries> { + MountEntries::try_new_from(std::path::Path::new("/proc/self/mounts")) + } +} + +#[derive(Debug)] +pub struct MountEntry { + /// filesystem name + pub fsname: String, + + /// mountpoint + pub dir: PathBuf, + + /// filesystem type + pub fstype: String, +} + +impl Iterator for MountEntries { + type Item = MountEntry; + + /// Return mount points + fn next(&mut self) -> Option<Self::Item> { + let ent = unsafe { + getmntent_r( + self.fp, + &mut self.mntbuf, + self.namebuf.as_mut_ptr() as *mut c_char, + NAME_BUFSIZE as i32, + ) + }; + if ent.is_null() { + return None; + } + + let f0 = unsafe { CStr::from_ptr((*ent).mnt_type) }; + let fstype = String::from_utf8_lossy(f0.to_bytes()).to_string(); + + let f0 = unsafe { CStr::from_ptr((*ent).mnt_fsname) }; + let fsname = String::from_utf8_lossy(f0.to_bytes()).to_string(); + + let f0 = unsafe { CStr::from_ptr((*ent).mnt_dir) }; + let dir = PathBuf::from(f0.to_str().unwrap()); + + Some(MountEntry { + fsname, + fstype, + dir, + }) + } +} + +impl Drop for MountEntries { + fn drop(&mut self) { + unsafe { endmntent(self.fp) }; + } +} diff --git a/healer/src/lib.rs b/healer/src/lib.rs index d952b61646114d..f59182c4de41e0 100644 --- a/healer/src/lib.rs +++ b/healer/src/lib.rs @@ -14,3 +14,4 @@ pub mod repair; pub mod fsgeom; pub mod getparents; pub mod fsprops; +pub mod getmntent; diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index 8650f6b9633b4d..9f0dc77b822077 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -5,6 +5,7 @@ */ use crate::baddata; use crate::badness; +use crate::getmntent::MountEntries; use crate::xfs_fs::xfs_fid; use crate::xfs_fs::xfs_fsop_geom; use crate::xfs_fs::xfs_fsop_handlereq; @@ -22,7 +23,7 @@ use std::io::ErrorKind; use std::os::fd::AsRawFd; use std::os::raw::c_void; use std::os::unix::ffi::OsStringExt; -use std::path::PathBuf; +use std::path::{Path, PathBuf}; use std::process::Command; use std::sync::Arc; @@ -76,6 +77,9 @@ impl TryFrom<&File> for xfs_handle { /// Filesystem handle that can be disconnected from any open files pub struct WeakHandle { + /// device for the xfs filesystem + fsname: String, + /// path to the filesystem mountpoint mountpoint: Arc<PathBuf>, @@ -87,15 +91,15 @@ pub struct WeakHandle { } impl WeakHandle { - /// Try to reopen the filesystem from which we got the handle. - pub fn reopen(&self) -> Result<File> { - let fp = File::open(self.mountpoint.as_path())?; + /// Try to reopen the filesystem with a given mountpoint + fn reopen_from(&self, mountpoint: &Path) -> Result<File> { + let fp = File::open(mountpoint)?; if xfs_handle::try_from(&fp)? != self.handle { let s = format!( "{} {}: {}", M_("reopening"), - self.mountpoint.display(), + mountpoint.display(), M_("Stale file handle") ); return Err(std::io::Error::new(ErrorKind::Other, s).into()); @@ -104,19 +108,55 @@ impl WeakHandle { Ok(fp) } + /// Try to reopen the filesystem from which we got the handle. + pub fn reopen(&self) -> Result<File> { + // First try the original mountpoint + let orig_result = self.reopen_from(&self.mountpoint); + if let Ok(x) = orig_result { + return Ok(x); + } + + // Now scan /proc/self/mounts for any other bind mounts of this filesystem + let entries = MountEntries::try_new()?; + for mntent in entries.filter(|x| x.fstype == "xfs" && x.fsname == self.fsname) { + if let Ok(x) = self.reopen_from(&mntent.dir) { + return Ok(x); + } + } + + // Return original error + orig_result + } + /// Report mountpoint in a displayable manner pub fn mountpoint(&self) -> String { self.mountpoint.display().to_string() } + /// Report xfs device in a displayable manner + pub fn fsname(&self) -> String { + self.fsname.clone() + } + /// Create a soft handle from an open file descriptor and its mount point pub fn try_new( fp: &File, mountpoint: Arc<PathBuf>, fsgeom: xfs_fsop_geom, ) -> Result<WeakHandle> { + // Try to find the xfs device name for this mount point + let mut entries = MountEntries::try_new()?; + let fsname = match entries.find(|x| x.fstype == "xfs" && x.dir == *mountpoint) { + None => { + let s = format!("{}: {}", mountpoint.display(), M_("Cannot find xfs device")); + return Err(std::io::Error::new(ErrorKind::Other, s).into()); + } + Some(mntent) => mntent.fsname, + }; + Ok(WeakHandle { mountpoint, + fsname, handle: xfs_handle::try_from(fp)?, has_parent: fsgeom.has_parent(), }) @@ -160,6 +200,6 @@ impl WeakHandle { impl Display for WeakHandle { fn fmt(&self, f: &mut Formatter) -> std::fmt::Result { - write!(f, "{}", self.mountpoint.display()) + write!(f, "{} {}", self.fsname, self.mountpoint.display()) } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 17/19] xfs_healer: validate that repair fds point to the monitored fs in Rust 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (15 preceding siblings ...) 2025-10-23 0:16 ` [PATCH 16/19] xfs_healer: use getmntent in Rust to find moved filesystems Darrick J. Wong @ 2025-10-23 0:16 ` Darrick J. Wong 2025-10-23 0:17 ` [PATCH 18/19] debian/control: listify the build dependencies Darrick J. Wong ` (2 subsequent siblings) 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:16 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When xfs_healer reopens a mountpoint to perform a repair, it should validate that the opened fd points to a file on the same filesystem as the one being monitored. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- healer/Makefile | 3 ++- healer/src/getparents.rs | 2 +- healer/src/healthmon/cstruct.rs | 15 +++++++++++++-- healer/src/healthmon/json.rs | 15 +++++++++++++-- healer/src/healthmon/mod.rs | 3 +++ healer/src/healthmon/samefs.rs | 33 +++++++++++++++++++++++++++++++++ healer/src/main.rs | 30 +++++++++++++++++++++++++----- healer/src/repair.rs | 11 ++++++----- healer/src/weakhandle.rs | 14 +++++++++----- 9 files changed, 105 insertions(+), 21 deletions(-) create mode 100644 healer/src/healthmon/samefs.rs diff --git a/healer/Makefile b/healer/Makefile index b3a9ed579a2a26..661f7bb27f02a3 100644 --- a/healer/Makefile +++ b/healer/Makefile @@ -35,7 +35,8 @@ RUSTFILES = \ src/healthmon/groups.rs \ src/healthmon/inodes.rs \ src/healthmon/json.rs \ - src/healthmon/mod.rs + src/healthmon/mod.rs \ + src/healthmon/samefs.rs BUILT_RUSTFILES = \ src/xfs_fs.rs \ diff --git a/healer/src/getparents.rs b/healer/src/getparents.rs index 46f9724ee7cf7c..9ddbd209f8a50c 100644 --- a/healer/src/getparents.rs +++ b/healer/src/getparents.rs @@ -188,7 +188,7 @@ impl WeakHandle { return None; } - let fp = match self.reopen() { + let fp = match self.reopen(|_| true) { Err(_) => return None, Ok(x) => x, }; diff --git a/healer/src/healthmon/cstruct.rs b/healer/src/healthmon/cstruct.rs index 58463b0f6fa5b9..c9f77be25db410 100644 --- a/healer/src/healthmon/cstruct.rs +++ b/healer/src/healthmon/cstruct.rs @@ -17,6 +17,7 @@ use crate::healthmon::groups::{XfsPeragEvent, XfsPeragMetadata}; use crate::healthmon::groups::{XfsRtgroupEvent, XfsRtgroupMetadata}; use crate::healthmon::inodes::{XfsFileIoErrorEvent, XfsFileIoErrorType}; use crate::healthmon::inodes::{XfsInodeEvent, XfsInodeMetadata}; +use crate::healthmon::samefs::SameFs; use crate::healthmon::xfs_ioc_health_monitor; use crate::xfs_fs; use crate::xfs_fs::xfs_health_monitor; @@ -33,6 +34,7 @@ use std::io::Read; use std::os::fd::AsRawFd; use std::os::fd::FromRawFd; use std::path::Path; +use std::sync::Arc; /// Boilerplate to stamp out functions to convert a u32 mask to an enumset /// of the given enum type. @@ -74,6 +76,9 @@ pub struct CStructMonitor<'a> { /// path to the filesystem mountpoint mountpoint: &'a Path, + + /// object that repair threads use to check their reopened files against the monitored fs + samefs: Arc<SameFs>, } impl CStructMonitor<'_> { @@ -90,17 +95,23 @@ impl CStructMonitor<'_> { // SAFETY: Trusting the kernel ioctl not to corrupt stack contents, and to return us a valid // file description number. - let health_fp = unsafe { + let (health_fp, samefs) = unsafe { let health_fd = xfs_ioc_health_monitor(fp.as_raw_fd(), &hminfo)?; - File::from_raw_fd(health_fd) + (File::from_raw_fd(health_fd), SameFs::new(health_fd)) }; drop(fp); Ok(CStructMonitor { objiter: BufReader::new(health_fp), mountpoint, + samefs: samefs.into(), }) } + + /// Return an object that can be used to check reopened files + pub fn new_samefs(&self) -> Arc<SameFs> { + self.samefs.clone() + } } enum_from_field!(XfsHealthStatus, { diff --git a/healer/src/healthmon/json.rs b/healer/src/healthmon/json.rs index 2fae6f4b48e68b..ae1389aa73dd4b 100644 --- a/healer/src/healthmon/json.rs +++ b/healer/src/healthmon/json.rs @@ -17,6 +17,7 @@ use crate::healthmon::groups::{XfsPeragEvent, XfsPeragMetadata}; use crate::healthmon::groups::{XfsRtgroupEvent, XfsRtgroupMetadata}; use crate::healthmon::inodes::{XfsFileIoErrorEvent, XfsFileIoErrorType}; use crate::healthmon::inodes::{XfsInodeEvent, XfsInodeMetadata}; +use crate::healthmon::samefs::SameFs; use crate::healthmon::xfs_ioc_health_monitor; use crate::printlogln; use crate::xfs_fs; @@ -38,6 +39,7 @@ use std::os::fd::AsRawFd; use std::os::fd::FromRawFd; use std::path::Path; use std::str::FromStr; +use std::sync::Arc; /// Boilerplate to stamp out functions to convert json array to an enumset /// of the given enum type; or return an error with the given message. @@ -106,6 +108,9 @@ pub struct JsonMonitor<'a> { /// are we debugging? debug: bool, + + /// object that repair threads use to check their reopened files against the monitored fs + samefs: Arc<SameFs>, } impl JsonMonitor<'_> { @@ -127,9 +132,9 @@ impl JsonMonitor<'_> { // SAFETY: Trusting the kernel ioctl not to corrupt stack contents, and to return us a valid // file description number. - let health_fp = unsafe { + let (health_fp, samefs) = unsafe { let health_fd = xfs_ioc_health_monitor(fp.as_raw_fd(), &hminfo)?; - File::from_raw_fd(health_fd) + (File::from_raw_fd(health_fd), SameFs::new(health_fd)) }; drop(fp); @@ -137,8 +142,14 @@ impl JsonMonitor<'_> { lineiter: BufReader::new(health_fp).lines(), mountpoint, debug, + samefs: samefs.into(), }) } + + /// Return an object that can be used to check reopened files + pub fn new_samefs(&self) -> Arc<SameFs> { + self.samefs.clone() + } } /// Raw health event, used to create the real objects diff --git a/healer/src/healthmon/mod.rs b/healer/src/healthmon/mod.rs index f73babbe001154..624b6f231dd4ce 100644 --- a/healer/src/healthmon/mod.rs +++ b/healer/src/healthmon/mod.rs @@ -5,6 +5,7 @@ */ use crate::xfs_fs; use crate::xfs_fs::xfs_health_monitor; +use crate::xfs_fs::xfs_health_samefs; use nix::ioctl_write_ptr; use std::fs::File; use std::os::fd::AsRawFd; @@ -16,8 +17,10 @@ pub mod fs; pub mod groups; pub mod inodes; pub mod json; +pub mod samefs; ioctl_write_ptr!(xfs_ioc_health_monitor, 'X', 68, xfs_health_monitor); +ioctl_write_ptr!(xfs_ioc_health_samefs, 'X', 69, xfs_health_samefs); /// Check if the open file supports a health monitor. pub fn is_supported(fp: &File, use_json: bool) -> bool { diff --git a/healer/src/healthmon/samefs.rs b/healer/src/healthmon/samefs.rs new file mode 100644 index 00000000000000..7b578f00556eaf --- /dev/null +++ b/healer/src/healthmon/samefs.rs @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +use crate::healthmon::xfs_ioc_health_samefs; +use crate::xfs_fs::xfs_health_samefs; +use std::fs::File; +use std::os::fd::AsRawFd; +use std::os::raw::c_int; + +pub struct SameFs(c_int); + +/// Predicate object that unsafely borrows the raw fd from a health monitor to check if a reopened +/// file is actually on the same fs. +impl SameFs { + /// Create a new predicate from the given raw file descriptor. Caller must ensure that the + /// fd is not closed before this object is destroyed. + pub fn new(fd: c_int) -> SameFs { + SameFs(fd) + } + + /// Does this file point to the same filesystem as the health monitor? + pub fn is_same_fs(&self, fp: &File) -> bool { + let hms = xfs_health_samefs { + fd: fp.as_raw_fd(), + ..Default::default() + }; + + // any error means this isn't the same fs mount + !matches!(unsafe { xfs_ioc_health_samefs(self.0, &hms) }, Err(_e)) + } +} diff --git a/healer/src/main.rs b/healer/src/main.rs index b7d4b0dfc6a083..472da3a579051c 100644 --- a/healer/src/main.rs +++ b/healer/src/main.rs @@ -16,6 +16,7 @@ use xfs_healer::healthmon::cstruct::CStructMonitor; use xfs_healer::healthmon::event::XfsHealthEvent; use xfs_healer::healthmon::json::JsonEventWrapper; use xfs_healer::healthmon::json::JsonMonitor; +use xfs_healer::healthmon::samefs::SameFs; use xfs_healer::printlogln; use xfs_healer::repair::Repair; use xfs_healer::weakhandle::WeakHandle; @@ -148,6 +149,7 @@ impl App { /// Handle a health event that has been decoded into real objects fn process_event( et: EventThread, + samefs: Arc<SameFs>, fh: Arc<WeakHandle>, cooked: Result<Box<dyn XfsHealthEvent>>, ) { @@ -165,7 +167,7 @@ impl App { } if et.repair { for mut repair in event.schedule_repairs(et.everything) { - repair.perform(&fh) + repair.perform(&samefs, &fh) } } } @@ -176,22 +178,24 @@ impl App { fn dispatch_json_event( threads: &ThreadPool, et: EventThread, + samefs: Arc<SameFs>, fh: Arc<WeakHandle>, raw_event: JsonEventWrapper, ) { threads.execute(move || { - App::process_event(et, fh, raw_event.cook()); + App::process_event(et, samefs, fh, raw_event.cook()); }) } fn dispatch_cstruct_event( threads: &ThreadPool, et: EventThread, + samefs: Arc<SameFs>, fh: Arc<WeakHandle>, raw_event: xfs_health_monitor_event, ) { threads.execute(move || { - App::process_event(et, fh, raw_event.cook()); + App::process_event(et, samefs, fh, raw_event.cook()); }) } @@ -308,24 +312,40 @@ impl App { if self.json { let hmon = JsonMonitor::try_new(fp, &self.path, self.everything, self.debug) .with_context(|| M_("Opening js health monitor file"))?; + let samefs = hmon.new_samefs(); for raw_event in hmon { - App::dispatch_json_event(&threads, EventThread::new(self), fh.clone(), raw_event); + App::dispatch_json_event( + &threads, + EventThread::new(self), + samefs.clone(), + fh.clone(), + raw_event, + ); } + + // Prohibit hmon from leaving scope (and closing the health mon fd) before the worker + // threads have finished whatever they're doing. + threads.join(); } else { let hmon = CStructMonitor::try_new(fp, &self.path, self.everything) .with_context(|| M_("Opening health monitor file"))?; + let samefs = hmon.new_samefs(); for raw_event in hmon { App::dispatch_cstruct_event( &threads, EventThread::new(self), + samefs.clone(), fh.clone(), raw_event, ); } + + // Prohibit hmon from leaving scope (and closing the health mon fd) before the worker + // threads have finished whatever they're doing. + threads.join(); } - threads.join(); Ok(ExitCode::SUCCESS) } diff --git a/healer/src/repair.rs b/healer/src/repair.rs index 975c3cb9cb412a..bc2dab75a99586 100644 --- a/healer/src/repair.rs +++ b/healer/src/repair.rs @@ -8,6 +8,7 @@ use crate::display_for_enum; use crate::healthmon::fs::XfsWholeFsMetadata; use crate::healthmon::groups::{XfsPeragMetadata, XfsRtgroupMetadata}; use crate::healthmon::inodes::XfsInodeMetadata; +use crate::healthmon::samefs::SameFs; use crate::printlogln; use crate::weakhandle::WeakHandle; use crate::xfs_fs; @@ -339,7 +340,7 @@ impl Repair { } /// Call the kernel to repair things - fn repair(&mut self, fh: &WeakHandle) -> Result<bool> { + fn repair(&mut self, samefs: &SameFs, fh: &WeakHandle) -> Result<bool> { if self.group == RepairGroup::FullRepair { let started = self .run_full_repair(fh) @@ -351,7 +352,7 @@ impl Repair { } let fp = fh - .reopen() + .reopen(|fp| samefs.is_same_fs(fp)) .with_context(|| M_("Reopening filesystem to repair metadata"))?; // SAFETY: Trusting the kernel not to corrupt memory. @@ -365,8 +366,8 @@ impl Repair { } /// Try to repair something, or log whatever went wrong - pub fn perform(&mut self, fh: &WeakHandle) { - match self.repair(fh) { + pub fn perform(&mut self, samefs: &SameFs, fh: &WeakHandle) { + match self.repair(samefs, fh) { Err(e) => { eprintln!("{}: {:#}", self.repair_path(fh), e); } @@ -381,7 +382,7 @@ impl Repair { // Transform into a full repair if we failed to fix things. if self.outcome == RepairOutcome::Failed && self.group != RepairGroup::FullRepair { self.group = RepairGroup::FullRepair; - self.perform(fh); + self.perform(samefs, fh); } } }; diff --git a/healer/src/weakhandle.rs b/healer/src/weakhandle.rs index 9f0dc77b822077..b958b3ccbed793 100644 --- a/healer/src/weakhandle.rs +++ b/healer/src/weakhandle.rs @@ -92,10 +92,14 @@ pub struct WeakHandle { impl WeakHandle { /// Try to reopen the filesystem with a given mountpoint - fn reopen_from(&self, mountpoint: &Path) -> Result<File> { + fn reopen_from( + &self, + mountpoint: &Path, + is_acceptable: impl Fn(&File) -> bool, + ) -> Result<File> { let fp = File::open(mountpoint)?; - if xfs_handle::try_from(&fp)? != self.handle { + if xfs_handle::try_from(&fp)? != self.handle || !is_acceptable(&fp) { let s = format!( "{} {}: {}", M_("reopening"), @@ -109,9 +113,9 @@ impl WeakHandle { } /// Try to reopen the filesystem from which we got the handle. - pub fn reopen(&self) -> Result<File> { + pub fn reopen(&self, is_acceptable: impl Fn(&File) -> bool) -> Result<File> { // First try the original mountpoint - let orig_result = self.reopen_from(&self.mountpoint); + let orig_result = self.reopen_from(&self.mountpoint, &is_acceptable); if let Ok(x) = orig_result { return Ok(x); } @@ -119,7 +123,7 @@ impl WeakHandle { // Now scan /proc/self/mounts for any other bind mounts of this filesystem let entries = MountEntries::try_new()?; for mntent in entries.filter(|x| x.fstype == "xfs" && x.fsname == self.fsname) { - if let Ok(x) = self.reopen_from(&mntent.dir) { + if let Ok(x) = self.reopen_from(&mntent.dir, &is_acceptable) { return Ok(x); } } ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 18/19] debian/control: listify the build dependencies 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (16 preceding siblings ...) 2025-10-23 0:16 ` [PATCH 17/19] xfs_healer: validate that repair fds point to the monitored fs in Rust Darrick J. Wong @ 2025-10-23 0:17 ` Darrick J. Wong 2025-10-23 0:17 ` [PATCH 19/19] debian/control: pull in build dependencies for xfs_healer Darrick J. Wong 2025-11-04 22:48 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:17 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> This will make it less gross to add more build deps later. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- debian/control | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/debian/control b/debian/control index 31ea1e988f66be..01bdefd60f661f 100644 --- a/debian/control +++ b/debian/control @@ -3,7 +3,19 @@ Section: admin Priority: optional Maintainer: XFS Development Team <linux-xfs@vger.kernel.org> Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>, Bastian Germann <bage@debian.org> -Build-Depends: libinih-dev (>= 53), uuid-dev, debhelper (>= 12), gettext, libtool, libedit-dev, libblkid-dev (>= 2.17), linux-libc-dev, libdevmapper-dev, libicu-dev, pkg-config, liburcu-dev, systemd-dev | systemd (<< 253-2~) +Build-Depends: debhelper (>= 12), + gettext, + libblkid-dev (>= 2.17), + libdevmapper-dev, + libedit-dev, + libicu-dev, + libinih-dev (>= 53), + libtool, + liburcu-dev, + linux-libc-dev, + pkg-config, + systemd-dev | systemd (<< 253-2~), + uuid-dev Standards-Version: 4.0.0 Homepage: https://xfs.wiki.kernel.org/ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 19/19] debian/control: pull in build dependencies for xfs_healer 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (17 preceding siblings ...) 2025-10-23 0:17 ` [PATCH 18/19] debian/control: listify the build dependencies Darrick J. Wong @ 2025-10-23 0:17 ` Darrick J. Wong 2025-11-04 22:48 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 19 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:17 UTC (permalink / raw) To: djwong, aalbersh; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pull in the build dependencies for xfs_healer so that we can build the much nicer Rust version. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- debian/control | 16 ++++++++++++++++ debian/rules | 1 + 2 files changed, 17 insertions(+) diff --git a/debian/control b/debian/control index 01bdefd60f661f..9b8c04c6e86ad5 100644 --- a/debian/control +++ b/debian/control @@ -4,16 +4,32 @@ Priority: optional Maintainer: XFS Development Team <linux-xfs@vger.kernel.org> Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>, Bastian Germann <bage@debian.org> Build-Depends: debhelper (>= 12), + cargo-web | cargo, + clang, gettext, libblkid-dev (>= 2.17), libdevmapper-dev, libedit-dev, libicu-dev, libinih-dev (>= 53), + librust-anyhow-dev, + librust-clap-derive-dev, + librust-clap-dev, + librust-derive-more-dev, + librust-enumset-derive-dev, + librust-enumset-dev, + librust-gettext-rs-dev, + librust-nix-dev, + librust-serde-json-dev, + librust-strum-dev, + librust-strum-macros-dev, + librust-threadpool-dev, libtool, liburcu-dev, linux-libc-dev, pkg-config, + rustc-web | rustc, + rustfmt-web | rustfmt, systemd-dev | systemd (<< 253-2~), uuid-dev Standards-Version: 4.0.0 diff --git a/debian/rules b/debian/rules index d13ff5cf954cd2..2f66e92c6532a6 100755 --- a/debian/rules +++ b/debian/rules @@ -41,6 +41,7 @@ configure_options = \ --enable-lto \ --enable-release \ --with-system-crates \ + --disable-crate-checks \ --localstatedir=/var options = export DEBUG=-DNDEBUG DISTRIBUTION=debian \ ^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong ` (18 preceding siblings ...) 2025-10-23 0:17 ` [PATCH 19/19] debian/control: pull in build dependencies for xfs_healer Darrick J. Wong @ 2025-11-04 22:48 ` Darrick J. Wong 2025-12-01 17:59 ` Andrey Albershteyn 19 siblings, 1 reply; 80+ messages in thread From: Darrick J. Wong @ 2025-11-04 22:48 UTC (permalink / raw) To: aalbersh; +Cc: linux-xfs, Neal Gompa On Wed, Oct 22, 2025 at 05:00:20PM -0700, Darrick J. Wong wrote: > Hi all, > > The initial implementation of the self healing daemon is written in > Python. This was useful for rapid prototyping, but a more performant > and typechecked codebase is valuable. Write a second implementation in > Rust to reduce event processing overhead and library dependence. This > could have been done in C, but I decided to use an environment with > somewhat fewer footguns. Having discarded the json output format last week, I decided to rewrite the Python version of xfs_healer in C partly out of curiosity and partly because I didn't see much advantage to having a Python script to call ioctls and interpret C structs. After removing the json support from the Rust version, the release binary sizes are: -rwxr-xr-x root/root 1051096 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer -rwxr-xr-x root/root 43904 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer.orig This is a nearly 24x size increase to have Rust. I'm a n00b Rustacean and a veteran C stuckee, but between that and the difficulties of integrating two languages and two build systems together, I don't think it's worth the trouble to keep the Rust code. I've made a final push with the Rust code to my dev repo for the sake of posterity: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust_2025-11-04 But I'm deleting this from my tree after I send this email. That said, I quite enjoyed using this an excuse to familiarize myself with how to write bad Rust code. Using traits and the newtype pattern for geometric units (e.g. xfs_fsblock_t) was very helpful in keeping unit conversions understandable; and having to think about object access and lifetimes helped me produce a stable prototype very quickly. It also helps that rustc errors are far more helpful than gcc. The only thing I didn't particularly like is the forced coordination for shared resources that already coordinate threads -- you can't easily have multiple readers sharing an open fd, even if that magic fd only emits struct sized objects and takes i_rwsem exclusively to prevent corruption problems. Dealing with cargo for a distro package build was nightmarish -- hermetically sealed build systems (you want this) can't access crates.io which means that I as the author had to be careful only to use crate packages that are in EPEL or Debian stable, and to tell cargo only to look on the local filesystem. So I guess I now have experience in that, should anyone want to know how to do that. (Also, how do you do i18n in Rust programs? gettext???) --D > If you're going to start using this code, I strongly recommend pulling > from my git trees, which are linked below. > > This has been running on the djcloud for months with no problems. Enjoy! > Comments and questions are, as always, welcome. > > --D > > xfsprogs git tree: > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust > --- > Commits in this patchset: > * xfs_healer: start building a Rust version > * xfs_healer: enable gettext for localization > * xfs_healer: bindgen xfs_fs.h > * xfs_healer: define Rust objects for health events and kernel interface > * xfs_healer: read binary health events from the kernel > * xfs_healer: read json health events from the kernel > * xfs_healer: create a weak file handle so we don't pin the mount > * xfs_healer: fix broken filesystem metadata > * xfs_healer: check for fs features needed for effective repairs > * xfs_healer: use getparents to look up file names > * xfs_healer: make the rust program check if kernel support available > * xfs_healer: use the autofsck fsproperty to select mode > * xfs_healer: use rc on the mountpoint instead of lifetime annotations > * xfs_healer: use thread pools > * xfs_healer: run full scrub after lost corruption events or targeted repair failure > * xfs_healer: use getmntent in Rust to find moved filesystems > * xfs_healer: validate that repair fds point to the monitored fs in Rust > * debian/control: listify the build dependencies > * debian/control: pull in build dependencies for xfs_healer > --- > healer/bindgen_xfs_fs.h | 6 + > configure.ac | 84 ++++++++ > debian/control | 30 +++ > debian/rules | 3 > healer/.cargo/config.toml.system | 6 + > healer/Cargo.toml.in | 37 +++ > healer/Makefile | 143 +++++++++++++ > healer/rbindgen | 57 +++++ > healer/src/fsgeom.rs | 41 ++++ > healer/src/fsprops.rs | 101 +++++++++ > healer/src/getmntent.rs | 117 +++++++++++ > healer/src/getparents.rs | 210 ++++++++++++++++++++ > healer/src/healthmon/cstruct.rs | 354 +++++++++++++++++++++++++++++++++ > healer/src/healthmon/event.rs | 122 +++++++++++ > healer/src/healthmon/fs.rs | 163 +++++++++++++++ > healer/src/healthmon/groups.rs | 160 +++++++++++++++ > healer/src/healthmon/inodes.rs | 142 +++++++++++++ > healer/src/healthmon/json.rs | 409 ++++++++++++++++++++++++++++++++++++++ > healer/src/healthmon/mod.rs | 47 ++++ > healer/src/healthmon/samefs.rs | 33 +++ > healer/src/lib.rs | 17 ++ > healer/src/main.rs | 390 ++++++++++++++++++++++++++++++++++++ > healer/src/repair.rs | 390 ++++++++++++++++++++++++++++++++++++ > healer/src/util.rs | 81 ++++++++ > healer/src/weakhandle.rs | 209 +++++++++++++++++++ > healer/src/xfs_types.rs | 292 +++++++++++++++++++++++++++ > healer/src/xfsprogs.rs.in | 33 +++ > include/builddefs.in | 13 + > include/buildrules | 1 > m4/Makefile | 1 > m4/package_rust.m4 | 163 +++++++++++++++ > 31 files changed, 3851 insertions(+), 4 deletions(-) > create mode 100644 healer/bindgen_xfs_fs.h > create mode 100644 healer/.cargo/config.toml.system > create mode 100644 healer/Cargo.toml.in > create mode 100755 healer/rbindgen > create mode 100644 healer/src/fsgeom.rs > create mode 100644 healer/src/fsprops.rs > create mode 100644 healer/src/getmntent.rs > create mode 100644 healer/src/getparents.rs > create mode 100644 healer/src/healthmon/cstruct.rs > create mode 100644 healer/src/healthmon/event.rs > create mode 100644 healer/src/healthmon/fs.rs > create mode 100644 healer/src/healthmon/groups.rs > create mode 100644 healer/src/healthmon/inodes.rs > create mode 100644 healer/src/healthmon/json.rs > create mode 100644 healer/src/healthmon/mod.rs > create mode 100644 healer/src/healthmon/samefs.rs > create mode 100644 healer/src/lib.rs > create mode 100644 healer/src/main.rs > create mode 100644 healer/src/repair.rs > create mode 100644 healer/src/util.rs > create mode 100644 healer/src/weakhandle.rs > create mode 100644 healer/src/xfs_types.rs > create mode 100644 healer/src/xfsprogs.rs.in > create mode 100644 m4/package_rust.m4 > > ^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust 2025-11-04 22:48 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong @ 2025-12-01 17:59 ` Andrey Albershteyn 2025-12-01 21:55 ` Darrick J. Wong 0 siblings, 1 reply; 80+ messages in thread From: Andrey Albershteyn @ 2025-12-01 17:59 UTC (permalink / raw) To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, Neal Gompa On 2025-11-04 14:48:06, Darrick J. Wong wrote: > On Wed, Oct 22, 2025 at 05:00:20PM -0700, Darrick J. Wong wrote: > > Hi all, > > > > The initial implementation of the self healing daemon is written in > > Python. This was useful for rapid prototyping, but a more performant > > and typechecked codebase is valuable. Write a second implementation in > > Rust to reduce event processing overhead and library dependence. This > > could have been done in C, but I decided to use an environment with > > somewhat fewer footguns. > > Having discarded the json output format last week, I decided to rewrite > the Python version of xfs_healer in C partly out of curiosity and partly > because I didn't see much advantage to having a Python script to call > ioctls and interpret C structs. After removing the json support from > the Rust version, the release binary sizes are: > > -rwxr-xr-x root/root 1051096 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer > -rwxr-xr-x root/root 43904 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer.orig > > This is a nearly 24x size increase to have Rust. I'm a n00b Rustacean cargo build --release? (optimized + no debug info) > and a veteran C stuckee, but between that and the difficulties of > integrating two languages and two build systems together, I don't think > it's worth the trouble to keep the Rust code. I've made a final push > with the Rust code to my dev repo for the sake of posterity: > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust_2025-11-04 > > But I'm deleting this from my tree after I send this email. :'( > > That said, I quite enjoyed using this an excuse to familiarize myself > with how to write bad Rust code. Using traits and the newtype pattern > for geometric units (e.g. xfs_fsblock_t) was very helpful in keeping > unit conversions understandable; and having to think about object access > and lifetimes helped me produce a stable prototype very quickly. It > also helps that rustc errors are far more helpful than gcc. > > The only thing I didn't particularly like is the forced coordination for > shared resources that already coordinate threads -- you can't easily > have multiple readers sharing an open fd, even if that magic fd only > emits struct sized objects and takes i_rwsem exclusively to prevent > corruption problems. > > Dealing with cargo for a distro package build was nightmarish -- > hermetically sealed build systems (you want this) can't access crates.io > which means that I as the author had to be careful only to use crate > packages that are in EPEL or Debian stable, and to tell cargo only to > look on the local filesystem. So I guess I now have experience in that, > should anyone want to know how to do that. > > (Also, how do you do i18n in Rust programs? gettext???) > > --D > > > If you're going to start using this code, I strongly recommend pulling > > from my git trees, which are linked below. > > > > This has been running on the djcloud for months with no problems. Enjoy! > > Comments and questions are, as always, welcome. > > > > --D > > > > xfsprogs git tree: > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust > > --- > > Commits in this patchset: > > * xfs_healer: start building a Rust version > > * xfs_healer: enable gettext for localization > > * xfs_healer: bindgen xfs_fs.h > > * xfs_healer: define Rust objects for health events and kernel interface > > * xfs_healer: read binary health events from the kernel > > * xfs_healer: read json health events from the kernel > > * xfs_healer: create a weak file handle so we don't pin the mount > > * xfs_healer: fix broken filesystem metadata > > * xfs_healer: check for fs features needed for effective repairs > > * xfs_healer: use getparents to look up file names > > * xfs_healer: make the rust program check if kernel support available > > * xfs_healer: use the autofsck fsproperty to select mode > > * xfs_healer: use rc on the mountpoint instead of lifetime annotations > > * xfs_healer: use thread pools > > * xfs_healer: run full scrub after lost corruption events or targeted repair failure > > * xfs_healer: use getmntent in Rust to find moved filesystems > > * xfs_healer: validate that repair fds point to the monitored fs in Rust > > * debian/control: listify the build dependencies > > * debian/control: pull in build dependencies for xfs_healer > > --- > > healer/bindgen_xfs_fs.h | 6 + > > configure.ac | 84 ++++++++ > > debian/control | 30 +++ > > debian/rules | 3 > > healer/.cargo/config.toml.system | 6 + > > healer/Cargo.toml.in | 37 +++ > > healer/Makefile | 143 +++++++++++++ > > healer/rbindgen | 57 +++++ > > healer/src/fsgeom.rs | 41 ++++ > > healer/src/fsprops.rs | 101 +++++++++ > > healer/src/getmntent.rs | 117 +++++++++++ > > healer/src/getparents.rs | 210 ++++++++++++++++++++ > > healer/src/healthmon/cstruct.rs | 354 +++++++++++++++++++++++++++++++++ > > healer/src/healthmon/event.rs | 122 +++++++++++ > > healer/src/healthmon/fs.rs | 163 +++++++++++++++ > > healer/src/healthmon/groups.rs | 160 +++++++++++++++ > > healer/src/healthmon/inodes.rs | 142 +++++++++++++ > > healer/src/healthmon/json.rs | 409 ++++++++++++++++++++++++++++++++++++++ > > healer/src/healthmon/mod.rs | 47 ++++ > > healer/src/healthmon/samefs.rs | 33 +++ > > healer/src/lib.rs | 17 ++ > > healer/src/main.rs | 390 ++++++++++++++++++++++++++++++++++++ > > healer/src/repair.rs | 390 ++++++++++++++++++++++++++++++++++++ > > healer/src/util.rs | 81 ++++++++ > > healer/src/weakhandle.rs | 209 +++++++++++++++++++ > > healer/src/xfs_types.rs | 292 +++++++++++++++++++++++++++ > > healer/src/xfsprogs.rs.in | 33 +++ > > include/builddefs.in | 13 + > > include/buildrules | 1 > > m4/Makefile | 1 > > m4/package_rust.m4 | 163 +++++++++++++++ > > 31 files changed, 3851 insertions(+), 4 deletions(-) > > create mode 100644 healer/bindgen_xfs_fs.h > > create mode 100644 healer/.cargo/config.toml.system > > create mode 100644 healer/Cargo.toml.in > > create mode 100755 healer/rbindgen > > create mode 100644 healer/src/fsgeom.rs > > create mode 100644 healer/src/fsprops.rs > > create mode 100644 healer/src/getmntent.rs > > create mode 100644 healer/src/getparents.rs > > create mode 100644 healer/src/healthmon/cstruct.rs > > create mode 100644 healer/src/healthmon/event.rs > > create mode 100644 healer/src/healthmon/fs.rs > > create mode 100644 healer/src/healthmon/groups.rs > > create mode 100644 healer/src/healthmon/inodes.rs > > create mode 100644 healer/src/healthmon/json.rs > > create mode 100644 healer/src/healthmon/mod.rs > > create mode 100644 healer/src/healthmon/samefs.rs > > create mode 100644 healer/src/lib.rs > > create mode 100644 healer/src/main.rs > > create mode 100644 healer/src/repair.rs > > create mode 100644 healer/src/util.rs > > create mode 100644 healer/src/weakhandle.rs > > create mode 100644 healer/src/xfs_types.rs > > create mode 100644 healer/src/xfsprogs.rs.in > > create mode 100644 m4/package_rust.m4 > > > > > -- - Andrey ^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust 2025-12-01 17:59 ` Andrey Albershteyn @ 2025-12-01 21:55 ` Darrick J. Wong 0 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-12-01 21:55 UTC (permalink / raw) To: Andrey Albershteyn; +Cc: aalbersh, linux-xfs, Neal Gompa On Mon, Dec 01, 2025 at 06:59:53PM +0100, Andrey Albershteyn wrote: > On 2025-11-04 14:48:06, Darrick J. Wong wrote: > > On Wed, Oct 22, 2025 at 05:00:20PM -0700, Darrick J. Wong wrote: > > > Hi all, > > > > > > The initial implementation of the self healing daemon is written in > > > Python. This was useful for rapid prototyping, but a more performant > > > and typechecked codebase is valuable. Write a second implementation in > > > Rust to reduce event processing overhead and library dependence. This > > > could have been done in C, but I decided to use an environment with > > > somewhat fewer footguns. > > > > Having discarded the json output format last week, I decided to rewrite > > the Python version of xfs_healer in C partly out of curiosity and partly > > because I didn't see much advantage to having a Python script to call > > ioctls and interpret C structs. After removing the json support from > > the Rust version, the release binary sizes are: > > > > -rwxr-xr-x root/root 1051096 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer > > -rwxr-xr-x root/root 43904 2025-11-04 14:25 ./usr/libexec/xfsprogs/xfs_healer.orig > > > > This is a nearly 24x size increase to have Rust. I'm a n00b Rustacean > > cargo build --release? (optimized + no debug info) That (1.05M) is the release build size. The debug build size was 33MB. Not sure how or why Rust binaries get so huge, but there it is. :/ --D > > and a veteran C stuckee, but between that and the difficulties of > > integrating two languages and two build systems together, I don't think > > it's worth the trouble to keep the Rust code. I've made a final push > > with the Rust code to my dev repo for the sake of posterity: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust_2025-11-04 > > > > But I'm deleting this from my tree after I send this email. > > :'( > > > > > That said, I quite enjoyed using this an excuse to familiarize myself > > with how to write bad Rust code. Using traits and the newtype pattern > > for geometric units (e.g. xfs_fsblock_t) was very helpful in keeping > > unit conversions understandable; and having to think about object access > > and lifetimes helped me produce a stable prototype very quickly. It > > also helps that rustc errors are far more helpful than gcc. > > > > The only thing I didn't particularly like is the forced coordination for > > shared resources that already coordinate threads -- you can't easily > > have multiple readers sharing an open fd, even if that magic fd only > > emits struct sized objects and takes i_rwsem exclusively to prevent > > corruption problems. > > > > Dealing with cargo for a distro package build was nightmarish -- > > hermetically sealed build systems (you want this) can't access crates.io > > which means that I as the author had to be careful only to use crate > > packages that are in EPEL or Debian stable, and to tell cargo only to > > look on the local filesystem. So I guess I now have experience in that, > > should anyone want to know how to do that. > > > > (Also, how do you do i18n in Rust programs? gettext???) > > > > --D > > > > > If you're going to start using this code, I strongly recommend pulling > > > from my git trees, which are linked below. > > > > > > This has been running on the djcloud for months with no problems. Enjoy! > > > Comments and questions are, as always, welcome. > > > > > > --D > > > > > > xfsprogs git tree: > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring-rust > > > --- > > > Commits in this patchset: > > > * xfs_healer: start building a Rust version > > > * xfs_healer: enable gettext for localization > > > * xfs_healer: bindgen xfs_fs.h > > > * xfs_healer: define Rust objects for health events and kernel interface > > > * xfs_healer: read binary health events from the kernel > > > * xfs_healer: read json health events from the kernel > > > * xfs_healer: create a weak file handle so we don't pin the mount > > > * xfs_healer: fix broken filesystem metadata > > > * xfs_healer: check for fs features needed for effective repairs > > > * xfs_healer: use getparents to look up file names > > > * xfs_healer: make the rust program check if kernel support available > > > * xfs_healer: use the autofsck fsproperty to select mode > > > * xfs_healer: use rc on the mountpoint instead of lifetime annotations > > > * xfs_healer: use thread pools > > > * xfs_healer: run full scrub after lost corruption events or targeted repair failure > > > * xfs_healer: use getmntent in Rust to find moved filesystems > > > * xfs_healer: validate that repair fds point to the monitored fs in Rust > > > * debian/control: listify the build dependencies > > > * debian/control: pull in build dependencies for xfs_healer > > > --- > > > healer/bindgen_xfs_fs.h | 6 + > > > configure.ac | 84 ++++++++ > > > debian/control | 30 +++ > > > debian/rules | 3 > > > healer/.cargo/config.toml.system | 6 + > > > healer/Cargo.toml.in | 37 +++ > > > healer/Makefile | 143 +++++++++++++ > > > healer/rbindgen | 57 +++++ > > > healer/src/fsgeom.rs | 41 ++++ > > > healer/src/fsprops.rs | 101 +++++++++ > > > healer/src/getmntent.rs | 117 +++++++++++ > > > healer/src/getparents.rs | 210 ++++++++++++++++++++ > > > healer/src/healthmon/cstruct.rs | 354 +++++++++++++++++++++++++++++++++ > > > healer/src/healthmon/event.rs | 122 +++++++++++ > > > healer/src/healthmon/fs.rs | 163 +++++++++++++++ > > > healer/src/healthmon/groups.rs | 160 +++++++++++++++ > > > healer/src/healthmon/inodes.rs | 142 +++++++++++++ > > > healer/src/healthmon/json.rs | 409 ++++++++++++++++++++++++++++++++++++++ > > > healer/src/healthmon/mod.rs | 47 ++++ > > > healer/src/healthmon/samefs.rs | 33 +++ > > > healer/src/lib.rs | 17 ++ > > > healer/src/main.rs | 390 ++++++++++++++++++++++++++++++++++++ > > > healer/src/repair.rs | 390 ++++++++++++++++++++++++++++++++++++ > > > healer/src/util.rs | 81 ++++++++ > > > healer/src/weakhandle.rs | 209 +++++++++++++++++++ > > > healer/src/xfs_types.rs | 292 +++++++++++++++++++++++++++ > > > healer/src/xfsprogs.rs.in | 33 +++ > > > include/builddefs.in | 13 + > > > include/buildrules | 1 > > > m4/Makefile | 1 > > > m4/package_rust.m4 | 163 +++++++++++++++ > > > 31 files changed, 3851 insertions(+), 4 deletions(-) > > > create mode 100644 healer/bindgen_xfs_fs.h > > > create mode 100644 healer/.cargo/config.toml.system > > > create mode 100644 healer/Cargo.toml.in > > > create mode 100755 healer/rbindgen > > > create mode 100644 healer/src/fsgeom.rs > > > create mode 100644 healer/src/fsprops.rs > > > create mode 100644 healer/src/getmntent.rs > > > create mode 100644 healer/src/getparents.rs > > > create mode 100644 healer/src/healthmon/cstruct.rs > > > create mode 100644 healer/src/healthmon/event.rs > > > create mode 100644 healer/src/healthmon/fs.rs > > > create mode 100644 healer/src/healthmon/groups.rs > > > create mode 100644 healer/src/healthmon/inodes.rs > > > create mode 100644 healer/src/healthmon/json.rs > > > create mode 100644 healer/src/healthmon/mod.rs > > > create mode 100644 healer/src/healthmon/samefs.rs > > > create mode 100644 healer/src/lib.rs > > > create mode 100644 healer/src/main.rs > > > create mode 100644 healer/src/repair.rs > > > create mode 100644 healer/src/util.rs > > > create mode 100644 healer/src/weakhandle.rs > > > create mode 100644 healer/src/xfs_types.rs > > > create mode 100644 healer/src/xfsprogs.rs.in > > > create mode 100644 m4/package_rust.m4 > > > > > > > > > > -- > - Andrey > > ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCHSET V2] fstests: autonomous self healing of filesystems 2025-10-22 23:56 [PATCHBOMB 6.19] xfs: autonomous self healing Darrick J. Wong ` (2 preceding siblings ...) 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong @ 2025-10-23 0:00 ` Darrick J. Wong 2025-10-23 0:17 ` [PATCH 1/4] xfs: test health monitoring code Darrick J. Wong ` (3 more replies) 3 siblings, 4 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:00 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests Hi all, This series adds functionality and regression tests for the automated self healing daemon for xfs. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=health-monitoring --- Commits in this patchset: * xfs: test health monitoring code * xfs: test for metadata corruption error reporting via healthmon * xfs: test io error reporting via healthmon * xfs: test new xfs_healer daemon --- common/config | 6 + common/rc | 15 ++++ common/systemd | 21 +++++ common/xfs | 86 +++++++++++++++++++++ doc/group-names.txt | 1 tests/xfs/1878 | 80 ++++++++++++++++++++ tests/xfs/1878.out | 10 ++ tests/xfs/1879 | 89 ++++++++++++++++++++++ tests/xfs/1879.out | 12 +++ tests/xfs/1882 | 48 ++++++++++++ tests/xfs/1882.out | 2 tests/xfs/1883 | 60 +++++++++++++++ tests/xfs/1883.out | 2 tests/xfs/1884 | 90 ++++++++++++++++++++++ tests/xfs/1884.out | 2 tests/xfs/1885 | 53 +++++++++++++ tests/xfs/1885.out | 5 + tests/xfs/1896 | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1896.out | 21 +++++ tests/xfs/1897 | 98 ++++++++++++++++++++++++ tests/xfs/1897.out | 4 + tests/xfs/1898 | 36 +++++++++ tests/xfs/1898.out | 4 + tests/xfs/1899 | 109 +++++++++++++++++++++++++++ tests/xfs/1899.out | 3 + tests/xfs/1900 | 116 +++++++++++++++++++++++++++++ tests/xfs/1900.out | 2 tests/xfs/1901 | 138 ++++++++++++++++++++++++++++++++++ tests/xfs/1901.out | 2 29 files changed, 1321 insertions(+) create mode 100755 tests/xfs/1878 create mode 100644 tests/xfs/1878.out create mode 100755 tests/xfs/1879 create mode 100644 tests/xfs/1879.out create mode 100755 tests/xfs/1882 create mode 100644 tests/xfs/1882.out create mode 100755 tests/xfs/1883 create mode 100644 tests/xfs/1883.out create mode 100755 tests/xfs/1884 create mode 100644 tests/xfs/1884.out create mode 100755 tests/xfs/1885 create mode 100644 tests/xfs/1885.out create mode 100755 tests/xfs/1896 create mode 100644 tests/xfs/1896.out create mode 100755 tests/xfs/1897 create mode 100755 tests/xfs/1897.out create mode 100755 tests/xfs/1898 create mode 100755 tests/xfs/1898.out create mode 100755 tests/xfs/1899 create mode 100644 tests/xfs/1899.out create mode 100755 tests/xfs/1900 create mode 100755 tests/xfs/1900.out create mode 100755 tests/xfs/1901 create mode 100755 tests/xfs/1901.out ^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH 1/4] xfs: test health monitoring code 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong @ 2025-10-23 0:17 ` Darrick J. Wong 2025-10-23 0:17 ` [PATCH 2/4] xfs: test for metadata corruption error reporting via healthmon Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:17 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests From: Darrick J. Wong <djwong@kernel.org> Add some functionality tests for the new health monitoring code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- doc/group-names.txt | 1 + tests/xfs/1885 | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1885.out | 5 +++++ 3 files changed, 59 insertions(+) create mode 100755 tests/xfs/1885 create mode 100644 tests/xfs/1885.out diff --git a/doc/group-names.txt b/doc/group-names.txt index 10b49e50517797..158f84d36d3154 100644 --- a/doc/group-names.txt +++ b/doc/group-names.txt @@ -117,6 +117,7 @@ samefs overlayfs when all layers are on the same fs scrub filesystem metadata scrubbers seed btrfs seeded filesystems seek llseek functionality +selfhealing self healing filesystem code selftest tests with fixed results, used to validate testing setup send btrfs send/receive shrinkfs decreasing the size of a filesystem diff --git a/tests/xfs/1885 b/tests/xfs/1885 new file mode 100755 index 00000000000000..73fd1a5392056e --- /dev/null +++ b/tests/xfs/1885 @@ -0,0 +1,53 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1885 +# +# Make sure that healthmon handles module refcount correctly. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/module + +refcount_file="/sys/module/xfs/refcnt" +test -e "$refcount_file" || _notrun "cannot find xfs module refcount" + +_require_test +_require_xfs_io_command healthmon + +# Capture mod refcount without the test fs mounted +_test_unmount +init_refcount="$(cat "$refcount_file")" + +# Capture mod refcount with the test fs mounted +_test_mount +nomon_mount_refcount="$(cat "$refcount_file")" + +# Capture mod refcount with test fs mounted and the healthmon fd open. +# Pause the xfs_io process so that it doesn't actually respond to events. +$XFS_IO_PROG -c 'healthmon -c -v' $TEST_DIR >> $seqres.full & +sleep 0.5 +kill -STOP %1 +mon_mount_refcount="$(cat "$refcount_file")" + +# Capture mod refcount with only the healthmon fd open. +_test_unmount +mon_nomount_refcount="$(cat "$refcount_file")" + +# Capture mod refcount after continuing healthmon (which should exit due to the +# unmount) and killing it. +kill -CONT %1 +kill %1 +wait +nomon_nomount_refcount="$(cat "$refcount_file")" + +_within_tolerance "mount refcount" "$nomon_mount_refcount" "$((init_refcount + 1))" 0 -v +_within_tolerance "mount + healthmon refcount" "$mon_mount_refcount" "$((init_refcount + 2))" 0 -v +_within_tolerance "healthmon refcount" "$mon_nomount_refcount" "$((init_refcount + 1))" 0 -v +_within_tolerance "end refcount" "$nomon_nomount_refcount" "$init_refcount" 0 -v + +status=0 +exit diff --git a/tests/xfs/1885.out b/tests/xfs/1885.out new file mode 100644 index 00000000000000..f152cef0525609 --- /dev/null +++ b/tests/xfs/1885.out @@ -0,0 +1,5 @@ +QA output created by 1885 +mount refcount is in range +mount + healthmon refcount is in range +healthmon refcount is in range +end refcount is in range ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 2/4] xfs: test for metadata corruption error reporting via healthmon 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:17 ` [PATCH 1/4] xfs: test health monitoring code Darrick J. Wong @ 2025-10-23 0:17 ` Darrick J. Wong 2025-10-23 0:18 ` [PATCH 3/4] xfs: test io " Darrick J. Wong 2025-10-23 0:18 ` [PATCH 4/4] xfs: test new xfs_healer daemon Darrick J. Wong 3 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:17 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests From: Darrick J. Wong <djwong@kernel.org> Check if we can detect runtime metadata corruptions via the health monitor. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- common/rc | 10 ++++++ tests/xfs/1879 | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1879.out | 12 +++++++ 3 files changed, 111 insertions(+) create mode 100755 tests/xfs/1879 create mode 100644 tests/xfs/1879.out diff --git a/common/rc b/common/rc index aca4e30f9858d1..9d9b2f441871e2 100644 --- a/common/rc +++ b/common/rc @@ -3037,6 +3037,16 @@ _require_xfs_io_command() echo $testio | grep -q "Inappropriate ioctl" && \ _notrun "xfs_io $command support is missing" ;; + "healthmon") + testio=`$XFS_IO_PROG -c "$command -p $param" $TEST_DIR 2>&1` + echo $testio | grep -q "bad argument count" && \ + _notrun "xfs_io $command $param support is missing" + echo $testio | grep -q "Inappropriate ioctl" && \ + _notrun "xfs_io $command $param ioctl support is missing" + echo $testio | grep -q "Operation not supported" && \ + _notrun "xfs_io $command $param kernel support is missing" + param_checked="$param" + ;; "label") testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1` ;; diff --git a/tests/xfs/1879 b/tests/xfs/1879 new file mode 100755 index 00000000000000..b5741c286d5835 --- /dev/null +++ b/tests/xfs/1879 @@ -0,0 +1,89 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test No. 1879 +# +# Corrupt some metadata and try to access it with the health monitoring program +# running. Check that healthmon observes a metadata error. +# +. ./common/preamble +_begin_fstest auto quick eio selfhealing + +_cleanup() +{ + cd / + rm -rf $tmp.* $testdir +} + +. ./common/filter + +_require_scratch_nocheck +_require_xfs_io_command healthmon + +# Disable the scratch rt device to avoid test failures relating to the rt +# bitmap consuming all the free space in our small data device. +unset SCRATCH_RTDEV + +echo "Format and mount" +_scratch_mkfs -d agcount=1 | _filter_mkfs 2> $tmp.mkfs >> $seqres.full +. $tmp.mkfs +_scratch_mount +mkdir $SCRATCH_MNT/a/ +# Enough entries to get to a single block directory +for ((i = 0; i < ( (isize + 255) / 256); i++)); do + path="$(printf "%s/a/%0255d" "$SCRATCH_MNT" "$i")" + touch "$path" +done +inum="$(stat -c %i "$SCRATCH_MNT/a")" +_scratch_unmount + +# Fuzz the directory block so that the touch below will be guaranteed to trip +# a runtime sickness report in exactly the manner we desire. +_scratch_xfs_db -x -c "inode $inum" -c "dblock 0" -c 'fuzz bhdr.hdr.owner add' -c print &>> $seqres.full + +# Try to allocate space to trigger a metadata corruption event +echo "Runtime corruption detection" +_scratch_mount +$XFS_IO_PROG -c 'healthmon -c -v' $SCRATCH_MNT > $tmp.healthmon & +sleep 1 # wait for python program to start up +touch $SCRATCH_MNT/a/farts &>> $seqres.full +_scratch_unmount + +wait # for healthmon to finish + +# Did we get errors? +filter_healthmon() +{ + cat $tmp.healthmon >> $seqres.full + grep -B1 -A1 -E '(sick|corrupt)' $tmp.healthmon | grep -v -- '--' | sort | uniq +} +filter_healthmon + +# Run scrub to trigger a health event from there too. +echo "Scrub corruption detection" +_scratch_mount +if _supports_xfs_scrub $SCRATCH_MNT $SCRATCH_DEV; then + $XFS_IO_PROG -c 'healthmon -c -v' $SCRATCH_MNT > $tmp.healthmon & + sleep 1 # wait for python program to start up + $XFS_SCRUB_PROG -n $SCRATCH_MNT &>> $seqres.full + _scratch_unmount + + wait # for healthmon to finish + + # Did we get errors? + filter_healthmon +else + # mock the output since we don't support scrub + _scratch_unmount + cat << ENDL + "domain": "inode", + "structures": ["directory"], + "structures": ["parent"], + "type": "corrupt", + "type": "sick", +ENDL +fi + +status=0 +exit diff --git a/tests/xfs/1879.out b/tests/xfs/1879.out new file mode 100644 index 00000000000000..f02eefbf58ad6c --- /dev/null +++ b/tests/xfs/1879.out @@ -0,0 +1,12 @@ +QA output created by 1879 +Format and mount +Runtime corruption detection + "domain": "inode", + "structures": ["directory"], + "type": "sick", +Scrub corruption detection + "domain": "inode", + "structures": ["directory"], + "structures": ["parent"], + "type": "corrupt", + "type": "sick", ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 3/4] xfs: test io error reporting via healthmon 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:17 ` [PATCH 1/4] xfs: test health monitoring code Darrick J. Wong 2025-10-23 0:17 ` [PATCH 2/4] xfs: test for metadata corruption error reporting via healthmon Darrick J. Wong @ 2025-10-23 0:18 ` Darrick J. Wong 2025-10-23 0:18 ` [PATCH 4/4] xfs: test new xfs_healer daemon Darrick J. Wong 3 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:18 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests From: Darrick J. Wong <djwong@kernel.org> Create a new test to make sure the kernel can report IO errors via health monitoring. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- tests/xfs/1878 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1878.out | 10 +++++++ 2 files changed, 90 insertions(+) create mode 100755 tests/xfs/1878 create mode 100644 tests/xfs/1878.out diff --git a/tests/xfs/1878 b/tests/xfs/1878 new file mode 100755 index 00000000000000..235fa9d385ea2b --- /dev/null +++ b/tests/xfs/1878 @@ -0,0 +1,80 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test No. 1878 +# +# Attempt to read and write a file in buffered and directio mode with the +# health monitoring program running. Check that healthmon observes all four +# types of IO errors. +# +. ./common/preamble +_begin_fstest auto quick eio selfhealing + +_cleanup() +{ + cd / + rm -rf $tmp.* $testdir + _dmerror_cleanup +} + +. ./common/filter +. ./common/dmerror + +_require_scratch_nocheck +_require_xfs_io_command healthmon +_require_dm_target error + +# Disable the scratch rt device to avoid test failures relating to the rt +# bitmap consuming all the free space in our small data device. +unset SCRATCH_RTDEV + +echo "Format and mount" +_scratch_mkfs > $seqres.full 2>&1 +_dmerror_init no_log +_dmerror_mount + +_require_fs_space $SCRATCH_MNT 65536 + +# Create a file with written regions far enough apart that the pagecache can't +# possibly be caching the regions with a single folio. +testfile=$SCRATCH_MNT/fsync-err-test +$XFS_IO_PROG -f \ + -c 'pwrite -b 1m 0 1m' \ + -c 'pwrite -b 1m 10g 1m' \ + -c 'pwrite -b 1m 20g 1m' \ + -c fsync $testfile >> $seqres.full + +# First we check if directio errors get reported +$XFS_IO_PROG -c 'healthmon -c -v' $SCRATCH_MNT >> $tmp.healthmon & +sleep 1 # wait for python program to start up +_dmerror_load_error_table +$XFS_IO_PROG -d -c 'pwrite -b 256k 12k 16k' $testfile >> $seqres.full +$XFS_IO_PROG -d -c 'pread -b 256k 10g 16k' $testfile >> $seqres.full +_dmerror_load_working_table + +_dmerror_unmount +wait # for healthmon to finish +_dmerror_mount + +# Next we check if buffered io errors get reported. We have to write something +# before loading the error table to ensure the dquots get loaded. +$XFS_IO_PROG -c 'pwrite -b 256k 20g 1k' -c fsync $testfile >> $seqres.full +$XFS_IO_PROG -c 'healthmon -c -v' $SCRATCH_MNT >> $tmp.healthmon & +sleep 1 # wait for python program to start up +_dmerror_load_error_table +$XFS_IO_PROG -c 'pread -b 256k 12k 16k' $testfile >> $seqres.full +$XFS_IO_PROG -c 'pwrite -b 256k 20g 16k' -c fsync $testfile >> $seqres.full +_dmerror_load_working_table + +_dmerror_unmount +wait # for healthmon to finish + +# Did we get errors? +cat $tmp.healthmon >> $seqres.full +grep -E '(directio|readahead|writeback)' $tmp.healthmon | sort | uniq + +_dmerror_cleanup + +status=0 +exit diff --git a/tests/xfs/1878.out b/tests/xfs/1878.out new file mode 100644 index 00000000000000..abfa872cd6234d --- /dev/null +++ b/tests/xfs/1878.out @@ -0,0 +1,10 @@ +QA output created by 1878 +Format and mount +pwrite: Input/output error +pread: Input/output error +pread: Input/output error +fsync: Input/output error + "type": "directio_read", + "type": "directio_write", + "type": "readahead", + "type": "writeback", ^ permalink raw reply related [flat|nested] 80+ messages in thread
* [PATCH 4/4] xfs: test new xfs_healer daemon 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong ` (2 preceding siblings ...) 2025-10-23 0:18 ` [PATCH 3/4] xfs: test io " Darrick J. Wong @ 2025-10-23 0:18 ` Darrick J. Wong 3 siblings, 0 replies; 80+ messages in thread From: Darrick J. Wong @ 2025-10-23 0:18 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests From: Darrick J. Wong <djwong@kernel.org> Make sure the daemon in charge of self healing xfs actually does what it says it does -- emits json blobs that can be validated against the schema, repairs metadata, logs file IO errors, and doesn't get overwhelmed by event floods. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- common/config | 6 ++ common/rc | 5 + common/systemd | 21 +++++ common/xfs | 86 ++++++++++++++++++++++ tests/xfs/1882 | 48 ++++++++++++ tests/xfs/1882.out | 2 + tests/xfs/1883 | 60 +++++++++++++++ tests/xfs/1883.out | 2 + tests/xfs/1884 | 90 +++++++++++++++++++++++ tests/xfs/1884.out | 2 + tests/xfs/1896 | 206 ++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1896.out | 21 +++++ tests/xfs/1897 | 98 +++++++++++++++++++++++++ tests/xfs/1897.out | 4 + tests/xfs/1898 | 36 +++++++++ tests/xfs/1898.out | 4 + tests/xfs/1899 | 109 ++++++++++++++++++++++++++++ tests/xfs/1899.out | 3 + tests/xfs/1900 | 116 +++++++++++++++++++++++++++++ tests/xfs/1900.out | 2 + tests/xfs/1901 | 138 +++++++++++++++++++++++++++++++++++ tests/xfs/1901.out | 2 + 22 files changed, 1061 insertions(+) create mode 100755 tests/xfs/1882 create mode 100644 tests/xfs/1882.out create mode 100755 tests/xfs/1883 create mode 100644 tests/xfs/1883.out create mode 100755 tests/xfs/1884 create mode 100644 tests/xfs/1884.out create mode 100755 tests/xfs/1896 create mode 100644 tests/xfs/1896.out create mode 100755 tests/xfs/1897 create mode 100755 tests/xfs/1897.out create mode 100755 tests/xfs/1898 create mode 100755 tests/xfs/1898.out create mode 100755 tests/xfs/1899 create mode 100644 tests/xfs/1899.out create mode 100755 tests/xfs/1900 create mode 100755 tests/xfs/1900.out create mode 100755 tests/xfs/1901 create mode 100755 tests/xfs/1901.out diff --git a/common/config b/common/config index 1420e35ddfee42..f7b9993284a4ca 100644 --- a/common/config +++ b/common/config @@ -161,6 +161,12 @@ export XFS_ADMIN_PROG="$(type -P xfs_admin)" export XFS_GROWFS_PROG=$(type -P xfs_growfs) export XFS_SPACEMAN_PROG="$(type -P xfs_spaceman)" export XFS_SCRUB_PROG="$(type -P xfs_scrub)" +XFS_HEALER_PROG="$(type -P xfs_healer)" +# If the healer daemon isn't in the PATH, try the one installed in libexec +if [ -z "$XFS_HEALER_PROG" ] && [ -e /usr/libexec/xfs_healer ]; then + XFS_HEALER_PROG=/usr/libexec/xfs_healer +fi +export XFS_HEALER_PROG export XFS_PARALLEL_REPAIR_PROG="$(type -P xfs_prepair)" export XFS_PARALLEL_REPAIR64_PROG="$(type -P xfs_prepair64)" export __XFSDUMP_PROG="$(type -P xfsdump)" diff --git a/common/rc b/common/rc index 9d9b2f441871e2..01b6f1d50c856f 100644 --- a/common/rc +++ b/common/rc @@ -3050,6 +3050,11 @@ _require_xfs_io_command() "label") testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1` ;; + "mediaerror") + testio=`$XFS_IO_PROG -x -c "mediaerror $* 0 0" 2>&1` + echo $testio | grep -q "invalid option" && \ + _notrun "xfs_io $command support is missing" + ;; "open") # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix, # a new -C flag was introduced to execute one shot commands. diff --git a/common/systemd b/common/systemd index b2e24f267b2d93..ce752bf02e503e 100644 --- a/common/systemd +++ b/common/systemd @@ -44,6 +44,18 @@ _systemd_unit_active() { test "$(systemctl is-active "$1")" = "active" } +# Wait for up to a certain number of seconds for a service to reach inactive +# state. +_systemd_unit_wait() { + local svcname="$1" + local timeout="${2:-30}" + + for ((i = 0; i < (timeout * 2); i++)); do + test "$(systemctl is-active "$svcname")" = "inactive" && break + sleep 0.5 + done +} + _require_systemd_unit_active() { _require_systemd_unit_defined "$1" _systemd_unit_active "$1" || \ @@ -71,3 +83,12 @@ _systemd_unit_status() { _systemd_installed || return 1 systemctl status "$1" } + +# Start a running systemd unit +_systemd_unit_start() { + systemctl start "$1" +} +# Stop a running systemd unit +_systemd_unit_stop() { + systemctl stop "$1" +} diff --git a/common/xfs b/common/xfs index ffdb82e6c970ba..ac330f63bd783d 100644 --- a/common/xfs +++ b/common/xfs @@ -2298,3 +2298,89 @@ _filter_bmap_gno() if ($ag =~ /\d+/) {print "$ag "} ; ' } + +# Construct CLI arguments for xfs_healer +_xfs_healer_args() { + local healer_args=() + local daemon_dir + daemon_dir=$(dirname "$XFS_HEALER_PROG") + + # If we're being run from a development branch, we might need to find + # the schema file on our own. + local maybe_schema="$daemon_dir/../libxfs/xfs_healthmon.schema.json" + if [ -f "$maybe_schema" ]; then + local path="$(readlink -m "$maybe_schema")" + healer_args+=(--event-schema "$path") + fi + + echo "${healer_args[@]}" +} + +# Run the xfs_healer program on some filesystem +_xfs_healer() { + $XFS_HEALER_PROG $(_xfs_healer_args) "$@" +} + +# Run the xfs_healer program on the scratch fs +_scratch_xfs_healer() { + _xfs_healer "$@" "$SCRATCH_MNT" +} + +# Turn off the background xfs_healer service if any so that it doesn't fix +# injected metadata errors; then start a background copy of xfs_healer to +# capture that. +_invoke_xfs_healer() { + local mount="$1" + local logfile="$2" + shift; shift + + if _systemd_is_running; then + local scratch_path=$(systemd-escape --path "$mount") + _systemd_unit_stop "xfs_healer@${scratch_path}" &>> $seqres.full + fi + + $XFS_HEALER_PROG $(_xfs_healer_args) --log "$mount" "$@" &> "$logfile" & + XFS_HEALER_PID=$! + + # Wait 30s for the healer program to really start up + for ((i = 0; i < 60; i++)); do + test -e "$logfile" && \ + grep -q 'monitoring started' "$logfile" && \ + break + sleep 0.5 + done +} + +# Run our own copy of xfs_healer against the scratch device. Note that +# unmounting the scratch fs causes the healer daemon to exit, so we don't need +# to kill it explicitly from _cleanup. +_scratch_invoke_xfs_healer() { + _invoke_xfs_healer "$SCRATCH_MNT" "$@" +} + +# Unmount the filesystem to kill the xfs_healer instance started by +# _invoke_xfs_healer, and wait up to a certain amount of time for it to exit. +_kill_xfs_healer() { + local unmount="$1" + local timeout="${2:-30}" + local i + + # Unmount fs to kill healer, then wait for it to finish + for ((i = 0; i < (timeout * 2); i++)); do + $unmount &>> $seqres.full && break + sleep 0.5 + done + + test -n "$XFS_HEALER_PID" && \ + kill $XFS_HEALER_PID &>> $seqres.full + wait + unset XFS_HEALER_PID +} + +# Unmount the scratch fs to kill a _scratch_invoke_xfs_healer instance. +_scratch_kill_xfs_healer() { + local unmount="${1:-_scratch_unmount}" + shift + + _kill_xfs_healer "$unmount" "$@" +} diff --git a/tests/xfs/1882 b/tests/xfs/1882 new file mode 100755 index 00000000000000..a40a43e3da7fc6 --- /dev/null +++ b/tests/xfs/1882 @@ -0,0 +1,48 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1882 +# +# Make sure that xfs_healer correctly handles all the reports that it gets +# from the kernel. We simulate this by using the --everything mode so we get +# all the events, not just the sickness reports. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd +. ./common/populate + +_require_scrub +_require_xfs_io_command "scrub" # online check support +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_scratch + +# Does this fs support health monitoring? +_scratch_mkfs >> $seqres.full +_scratch_mount + +_scratch_xfs_healer --check &>/dev/null || \ + _notrun "health monitoring not supported on this kernel" +_scratch_xfs_healer --require-validation --check &>/dev/null && \ + _notrun "skipping this test in favor of the one that does json validation" +_scratch_unmount + +# Create a sample fs with all the goodies +_scratch_populate_cached nofill &>> $seqres.full +_scratch_mount + +_scratch_invoke_xfs_healer "$tmp.healer" --everything + +# Run scrub to make some noise +_scratch_scrub -b -n >> $seqres.full + +_scratch_kill_xfs_healer +cat $tmp.healer >> $seqres.full + +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/1882.out b/tests/xfs/1882.out new file mode 100644 index 00000000000000..9b31ccb735cabd --- /dev/null +++ b/tests/xfs/1882.out @@ -0,0 +1,2 @@ +QA output created by 1882 +Silence is golden diff --git a/tests/xfs/1883 b/tests/xfs/1883 new file mode 100755 index 00000000000000..fe797b654ca6ad --- /dev/null +++ b/tests/xfs/1883 @@ -0,0 +1,60 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1883 +# +# Make sure that xfs_healer correctly validates the json events that it gets +# from the kernel. We simulate this by using the --everything mode so we get +# all the events, not just the sickness reports. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd +. ./common/populate + +_require_scrub +_require_xfs_io_command "scrub" # online check support +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_scratch + +# Does this fs support health monitoring? +_scratch_mkfs >> $seqres.full +_scratch_mount + +_scratch_xfs_healer --require-validation --check 2> /dev/null || \ + _notrun "health monitoring with validation not supported on this system" +_scratch_unmount + +# Create a sample fs with all the goodies +_scratch_populate_cached nofill &>> $seqres.full +_scratch_mount + +_scratch_invoke_xfs_healer "$tmp.healer" --require-validation --everything --debug-fast + +# Run scrub to make some noise +_scratch_scrub -b -n >> $seqres.full + +# Wait for up to 60 seconds for the log file to stop growing +old_logsz= +new_logsz=$(stat -c '%s' $tmp.healer) +for ((i = 0; i < 60; i++)); do + test "$old_logsz" = "$new_logsz" && break + old_logsz="$new_logsz" + sleep 1 + new_logsz=$(stat -c '%s' $tmp.healer) +done + +_scratch_kill_xfs_healer + +# Look for schema validation errors +grep -q 'not valid under any of the given schemas' $tmp.healer && \ + echo "Should not have found schema validation errors" +cat $tmp.healer >> $seqres.full + +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/1883.out b/tests/xfs/1883.out new file mode 100644 index 00000000000000..bc9c390c778b6e --- /dev/null +++ b/tests/xfs/1883.out @@ -0,0 +1,2 @@ +QA output created by 1883 +Silence is golden diff --git a/tests/xfs/1884 b/tests/xfs/1884 new file mode 100755 index 00000000000000..bef185c8d74f95 --- /dev/null +++ b/tests/xfs/1884 @@ -0,0 +1,90 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1884 +# +# Ensure that autonomous self healing fixes the filesystem correctly. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd + +_require_scrub +_require_xfs_io_command "repair" # online repair support +_require_xfs_db_command "blocktrash" +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_command "$XFS_PROPERTY_PROG" "xfs_property" +_require_scratch + +_scratch_mkfs >> $seqres.full +_scratch_mount + +_xfs_has_feature $SCRATCH_MNT rmapbt || \ + _notrun "reverse mapping required to test directory auto-repair" +_xfs_has_feature $SCRATCH_MNT parent || \ + _notrun "parent pointers required to test directory auto-repair" +_scratch_xfs_healer --repair --check || \ + _notrun "health monitoring with repair not supported on this kernel" + +# Configure the filesystem for automatic repair of the filesystem. +$XFS_PROPERTY_PROG $SCRATCH_MNT set autofsck=repair >> $seqres.full + +# Create a largeish directory +dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT") +echo testdata > $SCRATCH_MNT/a +mkdir -p "$SCRATCH_MNT/some/victimdir" +for ((i = 0; i < (dblksz / 255); i++)); do + fname="$(printf "%0255d" "$i")" + ln $SCRATCH_MNT/a $SCRATCH_MNT/some/victimdir/$fname +done + +# Did we get at least two dir blocks? +dirsize=$(stat -c '%s' $SCRATCH_MNT/some/victimdir) +test "$dirsize" -gt "$dblksz" || echo "failed to create two-block directory" + +# Break the directory, remount filesystem +_scratch_unmount +_scratch_xfs_db -x \ + -c 'path /some/victimdir' \ + -c 'bmap' \ + -c 'dblock 1' \ + -c 'blocktrash -z -0 -o 0 -x 2048 -y 2048 -n 2048' >> $seqres.full +_scratch_mount + +_scratch_invoke_xfs_healer "$tmp.healer" --repair + +# Access the broken directory to trigger a repair, then poll the directory +# for 5 seconds to see if it gets fixed without us needing to intervene. +ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err +_filter_scratch < $tmp.err +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "try $try saw corruption" >> $seqres.full + sleep 0.1 + ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "try $try no longer saw corruption or gave up" >> $seqres.full +_filter_scratch < $tmp.err + +# List the dirents of /victimdir to see if it stops reporting corruption +ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "retry $try still saw corruption" >> $seqres.full + sleep 0.1 + ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "retry $try no longer saw corruption or gave up" >> $seqres.full + +# Unmount to kill the healer +_scratch_kill_xfs_healer +cat $tmp.healer >> $seqres.full + +status=0 +exit diff --git a/tests/xfs/1884.out b/tests/xfs/1884.out new file mode 100644 index 00000000000000..929e33da01f92c --- /dev/null +++ b/tests/xfs/1884.out @@ -0,0 +1,2 @@ +QA output created by 1884 +ls: reading directory 'SCRATCH_MNT/some/victimdir': Structure needs cleaning diff --git a/tests/xfs/1896 b/tests/xfs/1896 new file mode 100755 index 00000000000000..6533f949c01ebc --- /dev/null +++ b/tests/xfs/1896 @@ -0,0 +1,206 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test No. 1896 +# +# Check that xfs_healer can report file IO errors. + +. ./common/preamble +_begin_fstest auto quick scrub eio selfhealing + +# Override the default cleanup function. +_cleanup() +{ + cd / + rm -f $tmp.* + _dmerror_cleanup +} + +# Import common functions. +. ./common/fuzzy +. ./common/filter +. ./common/dmerror +. ./common/systemd + +_require_scratch +_require_scrub +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_dm_target error +_require_no_xfs_always_cow # no out of place writes + +# Ignore everything from the healer except for the four IO error log messages. +# Strip out file handle and range information because the blocksize can vary. +# Writeback and readahead can trigger multiple error messages due to retries, +# hence the uniq. +filter_healer_errors() { + _filter_scratch | \ + grep -E '(readahead|directio|writeback)' | \ + sed -e 's/\s*pos .*$//g' \ + -e 's|SCRATCH_MNT/a|VICTIM|g' \ + -e 's|SCRATCH_MNT: ino [0-9]* gen 0x[0-9a-f]*|VICTIM:|g' | \ + uniq +} + +_scratch_mkfs >> $seqres.full + +# +# The dm-error map added by this test doesn't work on zoned devices because +# table sizes need to be aligned to the zone size, and even for zoned on +# conventional this test will get confused because of the internal RT device. +# +# That check requires a mounted file system, so do a dummy mount before setting +# up DM. +# +_scratch_mount +_require_xfs_scratch_non_zoned +_scratch_xfs_healer --check &>/dev/null || \ + _notrun "health monitoring not supported on this kernel" +_scratch_unmount + +_dmerror_init +_dmerror_mount >> $seqres.full 2>&1 + +# Write a file with 4 file blocks worth of data, figure out the LBA to target +victim=$SCRATCH_MNT/a +file_blksz=$(_get_file_block_size $SCRATCH_MNT) +$XFS_IO_PROG -f -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c "fsync" $victim >> $seqres.full +unset errordev + +awk_len_prog='{print $6}' +if _xfs_is_realtime_file $victim; then + if ! _xfs_has_feature $SCRATCH_MNT rtgroups; then + awk_len_prog='{print $4}' + fi + errordev="RT" +fi +bmap_str="$($XFS_IO_PROG -c "bmap -elpv" $victim | grep "^[[:space:]]*0:")" +echo "$errordev:$bmap_str" >> $seqres.full + +phys="$(echo "$bmap_str" | $AWK_PROG '{print $3}')" +len="$(echo "$bmap_str" | $AWK_PROG "$awk_len_prog")" + +fs_blksz=$(_get_block_size $SCRATCH_MNT) +echo "file_blksz:$file_blksz:fs_blksz:$fs_blksz" >> $seqres.full +kernel_sectors_per_fs_block=$((fs_blksz / 512)) + +# Did we get at least 4 fs blocks worth of extent? +min_len_sectors=$(( 4 * kernel_sectors_per_fs_block )) +test "$len" -lt $min_len_sectors && \ + _fail "could not format a long enough extent on an empty fs??" + +phys_start=$(echo "$phys" | sed -e 's/\.\..*//g') + +echo "$errordev:$phys:$len:$fs_blksz:$phys_start" >> $seqres.full +echo "victim file:" >> $seqres.full +od -tx1 -Ad -c $victim >> $seqres.full + +# Set the dmerror table so that all IO will pass through. +_dmerror_reset_table + +cat >> $seqres.full << ENDL +dmerror before: +$DMERROR_TABLE +$DMERROR_RTTABLE +<end table> +ENDL + +# All sector numbers that we feed to the kernel must be in units of 512b, but +# they also must be aligned to the device's logical block size. +logical_block_size=`$here/src/min_dio_alignment $SCRATCH_MNT $SCRATCH_DEV` +kernel_sectors_per_device_lba=$((logical_block_size / 512)) + +# Mark as bad one of the device LBAs in the middle of the extent. Target the +# second LBA of the third block of the four-block file extent that we allocated +# earlier, but without overflowing into the fourth file block. +bad_sector=$(( phys_start + (2 * kernel_sectors_per_fs_block) )) +bad_len=$kernel_sectors_per_device_lba +if (( kernel_sectors_per_device_lba < kernel_sectors_per_fs_block )); then + bad_sector=$((bad_sector + kernel_sectors_per_device_lba)) +fi +if (( (bad_sector % kernel_sectors_per_device_lba) != 0)); then + echo "bad_sector $bad_sector not congruent with device logical block size $logical_block_size" +fi + +# Remount to flush the page cache, start the healer, and make the LBA bad +_dmerror_unmount +_dmerror_mount + +_scratch_invoke_xfs_healer "$tmp.healer" + +_dmerror_mark_range_bad $bad_sector $bad_len $errordev + +cat >> $seqres.full << ENDL +dmerror after marking bad: +$DMERROR_TABLE +$DMERROR_RTTABLE +<end table> +ENDL + +_dmerror_load_error_table + +# See if buffered reads pick it up +echo "Try buffered read" +$XFS_IO_PROG -c "pread 0 $((4 * file_blksz))" $victim >> $seqres.full + +# See if directio reads pick it up +echo "Try directio read" +$XFS_IO_PROG -d -c "pread 0 $((4 * file_blksz))" $victim >> $seqres.full + +# See if directio writes pick it up +echo "Try directio write" +$XFS_IO_PROG -d -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c fsync $victim >> $seqres.full + +# See if buffered writes pick it up +echo "Try buffered write" +$XFS_IO_PROG -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c fsync $victim >> $seqres.full + +# Now mark the bad range good so that unmount won't fail due to IO errors. +echo "Fix device" +_dmerror_mark_range_good $bad_sector $bad_len $errordev +_dmerror_load_error_table + +cat >> $seqres.full << ENDL +dmerror after marking good: +$DMERROR_TABLE +$DMERROR_RTTABLE +<end table> +ENDL + +# Unmount filesystem to start fresh +echo "Kill healer" +_scratch_kill_xfs_healer _dmerror_unmount +cat $tmp.healer >> $seqres.full +cat $tmp.healer | filter_healer_errors + +# Start the healer again so that can verify that the errors don't persist after +# we flip back to the good dm table. +echo "Remount and restart healer" +_dmerror_mount +_scratch_invoke_xfs_healer "$tmp.healer" + +# See if buffered reads pick it up +echo "Try buffered read again" +$XFS_IO_PROG -c "pread 0 $((4 * file_blksz))" $victim >> $seqres.full + +# See if directio reads pick it up +echo "Try directio read again" +$XFS_IO_PROG -d -c "pread 0 $((4 * file_blksz))" $victim >> $seqres.full + +# See if directio writes pick it up +echo "Try directio write again" +$XFS_IO_PROG -d -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c fsync $victim >> $seqres.full + +# See if buffered writes pick it up +echo "Try buffered write again" +$XFS_IO_PROG -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c fsync $victim >> $seqres.full + +# Unmount fs to kill healer, then wait for it to finish +echo "Kill healer again" +_scratch_kill_xfs_healer _dmerror_unmount +cat $tmp.healer >> $seqres.full +cat $tmp.healer | filter_healer_errors + +# success, all done +status=0 +exit diff --git a/tests/xfs/1896.out b/tests/xfs/1896.out new file mode 100644 index 00000000000000..33fd0ce3fc29f0 --- /dev/null +++ b/tests/xfs/1896.out @@ -0,0 +1,21 @@ +QA output created by 1896 +Try buffered read +pread: Input/output error +Try directio read +pread: Input/output error +Try directio write +pwrite: Input/output error +Try buffered write +fsync: Input/output error +Fix device +Kill healer +VICTIM: readahead +VICTIM: directio_read +VICTIM: directio_write +VICTIM: writeback +Remount and restart healer +Try buffered read again +Try directio read again +Try directio write again +Try buffered write again +Kill healer again diff --git a/tests/xfs/1897 b/tests/xfs/1897 new file mode 100755 index 00000000000000..12f9c284406ce8 --- /dev/null +++ b/tests/xfs/1897 @@ -0,0 +1,98 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test No. 1897 +# +# Check that xfs_healer can report media errors. + +. ./common/preamble +_begin_fstest auto quick scrub eio selfhealing + +. ./common/fuzzy +. ./common/filter +. ./common/systemd + +_require_scratch +_require_scrub +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_xfs_io_command mediaerror + +filter_healer() { + _filter_scratch | \ + grep 'media error' | \ + sed -e 's/0x[0-9a-f]*/NUM/g' \ + -e 's/datadev/DEVICE/g' \ + -e 's/rtdev/DEVICE/g' +} + +_scratch_mkfs >> $seqres.full + +_scratch_mount +_scratch_xfs_healer --check &>/dev/null || \ + _notrun "health monitoring not supported on this kernel" + +# Write a file with 4 file blocks worth of data, figure out the LBA to target +victim=$SCRATCH_MNT/a +file_blksz=$(_get_file_block_size $SCRATCH_MNT) +$XFS_IO_PROG -f -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c "fsync" $victim >> $seqres.full +unset errordev + +awk_len_prog='{print $6}' +if _xfs_is_realtime_file $victim; then + if ! _xfs_has_feature $SCRATCH_MNT rtgroups; then + awk_len_prog='{print $4}' + fi + errordev="-r" +fi +bmap_str="$($XFS_IO_PROG -c "bmap -elpv" $victim | grep "^[[:space:]]*0:")" +echo "$errordev:$bmap_str" >> $seqres.full + +phys="$(echo "$bmap_str" | $AWK_PROG '{print $3}')" +len="$(echo "$bmap_str" | $AWK_PROG "$awk_len_prog")" + +fs_blksz=$(_get_block_size $SCRATCH_MNT) +echo "file_blksz:$file_blksz:fs_blksz:$fs_blksz" >> $seqres.full +kernel_sectors_per_fs_block=$((fs_blksz / 512)) + +# Did we get at least 4 fs blocks worth of extent? +min_len_sectors=$(( 4 * kernel_sectors_per_fs_block )) +test "$len" -lt $min_len_sectors && \ + _fail "could not format a long enough extent on an empty fs??" + +phys_start=$(echo "$phys" | sed -e 's/\.\..*//g') + +echo "$errordev:$phys:$len:$fs_blksz:$phys_start" >> $seqres.full +echo "victim file:" >> $seqres.full +od -tx1 -Ad -c $victim >> $seqres.full + +# All sector numbers that we feed to the kernel must be in units of 512b, but +# they also must be aligned to the device's logical block size. +logical_block_size=`$here/src/min_dio_alignment $SCRATCH_MNT $SCRATCH_DEV` +kernel_sectors_per_device_lba=$((logical_block_size / 512)) + +# Mark as bad one of the device LBAs in the middle of the extent. Target the +# second LBA of the third block of the four-block file extent that we allocated +# earlier, but without overflowing into the fourth file block. +bad_sector=$(( phys_start + (2 * kernel_sectors_per_fs_block) )) +bad_len=$kernel_sectors_per_device_lba +if (( kernel_sectors_per_device_lba < kernel_sectors_per_fs_block )); then + bad_sector=$((bad_sector + kernel_sectors_per_device_lba)) +fi +if (( (bad_sector % kernel_sectors_per_device_lba) != 0)); then + echo "bad_sector $bad_sector not congruent with device logical block size $logical_block_size" +fi + +echo "Simulate media error" +_scratch_invoke_xfs_healer "$tmp.healer" +$XFS_IO_PROG -x -c "mediaerror $errordev $bad_sector $bad_len" $SCRATCH_MNT + +# Unmount filesystem to start fresh +echo "Kill healer" +_scratch_kill_xfs_healer +cat $tmp.healer >> $seqres.full +cat $tmp.healer | filter_healer + +# success, all done +status=0 +exit diff --git a/tests/xfs/1897.out b/tests/xfs/1897.out new file mode 100755 index 00000000000000..8bdb4d69b7b84f --- /dev/null +++ b/tests/xfs/1897.out @@ -0,0 +1,4 @@ +QA output created by 1897 +Simulate media error +Kill healer +SCRATCH_MNT: media error on DEVICE daddr NUM bbcount NUM diff --git a/tests/xfs/1898 b/tests/xfs/1898 new file mode 100755 index 00000000000000..590ed1461513eb --- /dev/null +++ b/tests/xfs/1898 @@ -0,0 +1,36 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test No. 1898 +# +# Check that xfs_healer can report filesystem shutdowns. + +. ./common/preamble +_begin_fstest auto quick scrub eio selfhealing + +. ./common/fuzzy +. ./common/filter +. ./common/systemd + +_require_scratch_nocheck +_require_scrub +_require_command "$XFS_HEALER_PROG" "xfs_healer" + +_scratch_mkfs >> $seqres.full +_scratch_mount +$XFS_IO_PROG -f -c "pwrite -S 0x58 0 500k" -c "fsync" $victim >> $seqres.full + +echo "Start healer and shut down" +_scratch_invoke_xfs_healer "$tmp.healer" +_scratch_shutdown -f + +# Unmount filesystem to start fresh +echo "Kill healer" +_scratch_kill_xfs_healer +cat $tmp.healer >> $seqres.full +cat $tmp.healer | _filter_scratch | grep 'shut down' + +# success, all done +status=0 +exit diff --git a/tests/xfs/1898.out b/tests/xfs/1898.out new file mode 100755 index 00000000000000..f71f848da810ce --- /dev/null +++ b/tests/xfs/1898.out @@ -0,0 +1,4 @@ +QA output created by 1898 +Start healer and shut down +Kill healer +SCRATCH_MNT: filesystem shut down due to forced unmount diff --git a/tests/xfs/1899 b/tests/xfs/1899 new file mode 100755 index 00000000000000..866c821f4bbb0e --- /dev/null +++ b/tests/xfs/1899 @@ -0,0 +1,109 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1899 +# +# Ensure that autonomous self healing works fixes the filesystem correctly +# even if the spot repair doesn't work and it falls back to a full fsck. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd + +_require_scrub +_require_xfs_io_command "repair" # online repair support +_require_xfs_db_command "blocktrash" +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_command "$XFS_PROPERTY_PROG" "xfs_property" +_require_scratch +_require_systemd_unit_defined "xfs_scrub@.service" + +_scratch_mkfs >> $seqres.full +_scratch_mount + +_xfs_has_feature $SCRATCH_MNT rmapbt || \ + _notrun "reverse mapping required to test directory auto-repair" +_xfs_has_feature $SCRATCH_MNT parent || \ + _notrun "parent pointers required to test directory auto-repair" +_scratch_xfs_healer --repair --check || \ + _notrun "health monitoring with repair not supported on this kernel" + +filter_healer() { + _filter_scratch | \ + grep 'Full repair:' | \ + uniq +} + +# Configure the filesystem for automatic repair of the filesystem. +$XFS_PROPERTY_PROG $SCRATCH_MNT set autofsck=repair >> $seqres.full + +# Create a largeish directory +dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT") +echo testdata > $SCRATCH_MNT/a +mkdir -p "$SCRATCH_MNT/some/victimdir" +for ((i = 0; i < (dblksz / 255); i++)); do + fname="$(printf "%0255d" "$i")" + ln $SCRATCH_MNT/a $SCRATCH_MNT/some/victimdir/$fname +done + +# Did we get at least two dir blocks? +dirsize=$(stat -c '%s' $SCRATCH_MNT/some/victimdir) +test "$dirsize" -gt "$dblksz" || echo "failed to create two-block directory" + +# Break the directory, remount filesystem +_scratch_unmount +_scratch_xfs_db -x \ + -c 'path /some/victimdir' \ + -c 'bmap' \ + -c 'dblock 1' \ + -c 'blocktrash -z -0 -o 0 -x 2048 -y 2048 -n 2048' \ + -c 'path /a' \ + -c 'bmap -a' \ + -c 'ablock 1' \ + -c 'blocktrash -z -0 -o 0 -x 2048 -y 2048 -n 2048' \ + >> $seqres.full +_scratch_mount + +_scratch_invoke_xfs_healer "$tmp.healer" --repair + +# Access the broken directory to trigger a repair, then poll the directory +# for 5 seconds to see if it gets fixed without us needing to intervene. +ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err +_filter_scratch < $tmp.err +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "try $try saw corruption" >> $seqres.full + sleep 0.1 + ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "try $try no longer saw corruption or gave up" >> $seqres.full +_filter_scratch < $tmp.err + +# Wait for the background fixer to finish +svcname="$(systemd-escape --template 'xfs_scrub@.service' --path "$SCRATCH_MNT")" +_systemd_unit_wait "$svcname" + +# List the dirents of /victimdir and parent pointers of /a to see if they both +# stop reporting corruption +(ls $SCRATCH_MNT/some/victimdir ; $XFS_IO_PROG -c 'parent') > /dev/null 2> $tmp.err +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "retry $try still saw corruption" >> $seqres.full + sleep 0.1 + (ls $SCRATCH_MNT/some/victimdir ; $XFS_IO_PROG -c 'parent') > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "retry $try no longer saw corruption or gave up" >> $seqres.full + +# Unmount to kill the healer +_scratch_kill_xfs_healer +cat $tmp.healer >> $seqres.full +cat $tmp.healer | filter_healer + +status=0 +exit diff --git a/tests/xfs/1899.out b/tests/xfs/1899.out new file mode 100644 index 00000000000000..48fd6078388371 --- /dev/null +++ b/tests/xfs/1899.out @@ -0,0 +1,3 @@ +QA output created by 1899 +ls: reading directory 'SCRATCH_MNT/some/victimdir': Structure needs cleaning +SCRATCH_MNT: Full repair: Repairs in progress. diff --git a/tests/xfs/1900 b/tests/xfs/1900 new file mode 100755 index 00000000000000..d6699c82cb8d0e --- /dev/null +++ b/tests/xfs/1900 @@ -0,0 +1,116 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024-2025 Oracle. All Rights Reserved. +# +# FS QA Test 1900 +# +# Ensure that autonomous self healing fixes the filesystem correctly even if +# the original mount has moved somewhere else. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd + +_cleanup() +{ + command -v _kill_fsstress &>/dev/null && _kill_fsstress + cd / + rm -r -f $tmp.* + if [ -n "$new_dir" ]; then + _unmount "$new_dir" &>/dev/null + rm -rf "$new_dir" + fi +} + +_require_test +_require_scrub +_require_xfs_io_command "repair" # online repair support +_require_xfs_db_command "blocktrash" +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_command "$XFS_PROPERTY_PROG" "xfs_property" +_require_scratch + +_scratch_mkfs >> $seqres.full +_scratch_mount + +_xfs_has_feature $SCRATCH_MNT rmapbt || \ + _notrun "reverse mapping required to test directory auto-repair" +_xfs_has_feature $SCRATCH_MNT parent || \ + _notrun "parent pointers required to test directory auto-repair" +_scratch_xfs_healer --repair --check || \ + _notrun "health monitoring with repair not supported on this kernel" + +# Configure the filesystem for automatic repair of the filesystem. +$XFS_PROPERTY_PROG $SCRATCH_MNT set autofsck=repair >> $seqres.full + +# Create a largeish directory +dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT") +echo testdata > $SCRATCH_MNT/a +mkdir -p "$SCRATCH_MNT/some/victimdir" +for ((i = 0; i < (dblksz / 255); i++)); do + fname="$(printf "%0255d" "$i")" + ln $SCRATCH_MNT/a $SCRATCH_MNT/some/victimdir/$fname +done + +# Did we get at least two dir blocks? +dirsize=$(stat -c '%s' $SCRATCH_MNT/some/victimdir) +test "$dirsize" -gt "$dblksz" || echo "failed to create two-block directory" + +# Break the directory, remount filesystem +_scratch_unmount +_scratch_xfs_db -x \ + -c 'path /some/victimdir' \ + -c 'bmap' \ + -c 'dblock 1' \ + -c 'blocktrash -z -0 -o 0 -x 2048 -y 2048 -n 2048' >> $seqres.full +_scratch_mount + +_scratch_invoke_xfs_healer "$tmp.healer" --repair + +# Move the scratch filesystem to a completely different mountpoint so that +# we can test if the healer can find it again. +new_dir=$TEST_DIR/moocow +mkdir -p $new_dir +_mount --bind $SCRATCH_MNT $new_dir +_unmount $SCRATCH_MNT + +df -t xfs >> $seqres.full + +# Access the broken directory to trigger a repair, then poll the directory +# for 5 seconds to see if it gets fixed without us needing to intervene. +ls $new_dir/some/victimdir > /dev/null 2> $tmp.err +_filter_scratch < $tmp.err | _filter_test_dir +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "try $try saw corruption" >> $seqres.full + sleep 0.1 + ls $new_dir/some/victimdir > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "try $try no longer saw corruption or gave up" >> $seqres.full +_filter_scratch < $tmp.err | _filter_test_dir + +# List the dirents of /victimdir to see if it stops reporting corruption +ls $new_dir/some/victimdir > /dev/null 2> $tmp.err +try=0 +while [ $try -lt 50 ] && grep -q 'Structure needs cleaning' $tmp.err; do + echo "retry $try still saw corruption" >> $seqres.full + sleep 0.1 + ls $SCRATCH_MNT/some/victimdir > /dev/null 2> $tmp.err + try=$((try + 1)) +done +echo "retry $try no longer saw corruption or gave up" >> $seqres.full + +new_dir_unmount() { + _unmount $new_dir +} + +# Unmount to kill the healer +_scratch_kill_xfs_healer new_dir_unmount +cat $tmp.healer >> $seqres.full + +status=0 +exit diff --git a/tests/xfs/1900.out b/tests/xfs/1900.out new file mode 100755 index 00000000000000..604c9eb5eb10f4 --- /dev/null +++ b/tests/xfs/1900.out @@ -0,0 +1,2 @@ +QA output created by 1900 +ls: reading directory 'TEST_DIR/moocow/some/victimdir': Structure needs cleaning diff --git a/tests/xfs/1901 b/tests/xfs/1901 new file mode 100755 index 00000000000000..163a4e8162d4c7 --- /dev/null +++ b/tests/xfs/1901 @@ -0,0 +1,138 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Oracle. All Rights Reserved. +# +# FS QA Test 1901 +# +# Ensure that autonomous self healing won't fix the wrong filesystem if a +# snapshot of the original filesystem is now mounted on the same directory as +# the original. +# +. ./common/preamble +_begin_fstest auto selfhealing + +. ./common/filter +. ./common/fuzzy +. ./common/systemd + +_cleanup() +{ + command -v _kill_fsstress &>/dev/null && _kill_fsstress + cd / + rm -r -f $tmp.* + test -e "$mntpt" && _unmount "$mntpt" &>/dev/null + test -e "$mntpt" && _unmount "$mntpt" &>/dev/null + test -e "$loop1" && _destroy_loop_device "$loop1" + test -e "$loop2" && _destroy_loop_device "$loop2" + test -e "$testdir" && rm -r -f "$testdir" +} + +_require_test +_require_scrub +_require_xfs_io_command "repair" # online repair support +_require_xfs_db_command "blocktrash" +_require_command "$XFS_HEALER_PROG" "xfs_healer" +_require_command "$XFS_PROPERTY_PROG" "xfs_property" + +testdir=$TEST_DIR/$seq +mntpt=$testdir/mount +disk1=$testdir/disk1 +disk2=$testdir/disk2 + +mkdir -p "$mntpt" +$XFS_IO_PROG -f -c "truncate 300m" $disk1 +$XFS_IO_PROG -f -c "truncate 300m" $disk2 +loop1="$(_create_loop_device "$disk1")" + +filter_mntpt() { + sed -e "s|$mntpt|MNTPT|g" +} + +_mkfs_dev "$loop1" >> $seqres.full +_mount "$loop1" "$mntpt" || _notrun "cannot mount victim filesystem" + +_xfs_has_feature $mntpt rmapbt || \ + _notrun "reverse mapping required to test directory auto-repair" +_xfs_has_feature $mntpt parent || \ + _notrun "parent pointers required to test directory auto-repair" +_xfs_healer "$mntpt" --repair --check || \ + _notrun "health monitoring with repair not supported on this kernel" + +# Configure the filesystem for automatic repair of the filesystem. +$XFS_PROPERTY_PROG $mntpt set autofsck=repair >> $seqres.full + +# Create a largeish directory +dblksz=$(_xfs_get_dir_blocksize "$mntpt") +echo testdata > $mntpt/a +mkdir -p "$mntpt/some/victimdir" +for ((i = 0; i < (dblksz / 255); i++)); do + fname="$(printf "%0255d" "$i")" + ln $mntpt/a $mntpt/some/victimdir/$fname +done + +# Did we get at least two dir blocks? +dirsize=$(stat -c '%s' $mntpt/some/victimdir) +test "$dirsize" -gt "$dblksz" || echo "failed to create two-block directory" + +# Clone the fs, break the directory, remount filesystem +_unmount "$mntpt" + +cp --sparse=always "$disk1" "$disk2" || _fail "cannot copy disk1" +loop2="$(_create_loop_device_like_bdev "$disk2" "$loop1")" + +$XFS_DB_PROG "$loop1" -x \ + -c 'path /some/victimdir' \ + -c 'bmap' \ + -c 'dblock 1' \ + -c 'blocktrash -z -0 -o 0 -x 2048 -y 2048 -n 2048' >> $seqres.full +_mount "$loop1" "$mntpt" || _fail "cannot mount broken fs" + +_invoke_xfs_healer "$mntpt" "$tmp.healer" --repair + +# Stop the healer process so that it can't read error events while we do some +# shenanigans. +test -n "$XFS_HEALER_PID" || _fail "nobody set XFS_HEALER_PID?" +kill -STOP $XFS_HEALER_PID + + +echo "LOG $XFS_HEALER_PID SO FAR:" >> $seqres.full +cat $tmp.healer >> $seqres.full + +# Access the broken directory to trigger a repair event, which will not yet be +# processed. +ls $mntpt/some/victimdir > /dev/null 2> $tmp.err +filter_mntpt < $tmp.err + +ps auxfww | grep xfs_healer >> $seqres.full + +echo "LOG AFTER TRYING TO POKE:" >> $seqres.full +cat $tmp.healer >> $seqres.full + +# Mount the clone filesystem to the same mountpoint so that the healer cannot +# actually reopen it to perform repairs. +_mount "$loop2" "$mntpt" -o nouuid || _fail "cannot mount decoy fs" + +grep -w xfs /proc/mounts >> $seqres.full + +# Continue the healer process so it can handle events now. Wait a few seconds +# while it fails to reopen disk1's mount point to repair things. +kill -CONT $XFS_HEALER_PID +sleep 2 + +new_dir_unmount() { + _unmount "$mntpt" + _unmount "$mntpt" +} + +# Unmount to kill the healer +_kill_xfs_healer new_dir_unmount +echo "LOG AFTER FAILURE" >> $seqres.full +cat $tmp.healer >> $seqres.full + +# Did the healer log complaints about not being able to reopen the mountpoint +# to enact repairs? +grep -q 'Stale file handle' $tmp.healer || \ + echo "Should have seen stale file handle complaints" + +status=0 +exit diff --git a/tests/xfs/1901.out b/tests/xfs/1901.out new file mode 100755 index 00000000000000..ff83e03725307a --- /dev/null +++ b/tests/xfs/1901.out @@ -0,0 +1,2 @@ +QA output created by 1901 +ls: reading directory 'MNTPT/some/victimdir': Structure needs cleaning ^ permalink raw reply related [flat|nested] 80+ messages in thread
end of thread, other threads:[~2025-12-01 21:55 UTC | newest] Thread overview: 80+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-22 23:56 [PATCHBOMB 6.19] xfs: autonomous self healing Darrick J. Wong 2025-10-22 23:59 ` [PATCHSET V2] xfs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:00 ` [PATCH 01/19] docs: remove obsolete links in the xfs online repair documentation Darrick J. Wong 2025-10-24 5:40 ` Christoph Hellwig 2025-10-27 16:15 ` Darrick J. Wong 2025-10-23 0:01 ` [PATCH 02/19] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong 2025-10-30 16:38 ` Darrick J. Wong 2025-10-23 0:01 ` [PATCH 03/19] xfs: create debugfs uuid aliases Darrick J. Wong 2025-10-23 0:01 ` [PATCH 04/19] xfs: create hooks for monitoring health updates Darrick J. Wong 2025-10-23 0:01 ` [PATCH 05/19] xfs: create a filesystem shutdown hook Darrick J. Wong 2025-10-23 0:02 ` [PATCH 06/19] xfs: create hooks for media errors Darrick J. Wong 2025-10-23 0:02 ` [PATCH 07/19] iomap: report buffered read and write io errors to the filesystem Darrick J. Wong 2025-10-23 0:02 ` [PATCH 08/19] iomap: report directio read and write errors to callers Darrick J. Wong 2025-10-23 0:02 ` [PATCH 09/19] xfs: create file io error hooks Darrick J. Wong 2025-10-23 0:03 ` [PATCH 10/19] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong 2025-10-23 0:03 ` [PATCH 11/19] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong 2025-10-30 16:54 ` Darrick J. Wong 2025-10-23 0:03 ` [PATCH 12/19] xfs: report metadata health events through healthmon Darrick J. Wong 2025-10-23 0:04 ` [PATCH 13/19] xfs: report shutdown " Darrick J. Wong 2025-10-23 0:04 ` [PATCH 14/19] xfs: report media errors " Darrick J. Wong 2025-10-23 0:04 ` [PATCH 15/19] xfs: report file io " Darrick J. Wong 2025-10-23 0:04 ` [PATCH 16/19] xfs: allow reconfiguration of the health monitoring device Darrick J. Wong 2025-10-23 0:05 ` [PATCH 17/19] xfs: validate fds against running healthmon Darrick J. Wong 2025-10-23 0:05 ` [PATCH 18/19] xfs: add media error reporting ioctl Darrick J. Wong 2025-10-23 0:05 ` [PATCH 19/19] xfs: send uevents when major filesystem events happen Darrick J. Wong 2025-10-23 0:00 ` [PATCHSET V2 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:05 ` [PATCH 01/26] xfs: create hooks for monitoring health updates Darrick J. Wong 2025-10-23 0:06 ` [PATCH 02/26] xfs: create a special file to pass filesystem health to userspace Darrick J. Wong 2025-10-23 0:06 ` [PATCH 03/26] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong 2025-10-23 0:06 ` [PATCH 04/26] xfs: report metadata health events through healthmon Darrick J. Wong 2025-10-23 0:06 ` [PATCH 05/26] xfs: report shutdown " Darrick J. Wong 2025-10-23 0:07 ` [PATCH 06/26] xfs: report media errors " Darrick J. Wong 2025-10-23 0:07 ` [PATCH 07/26] xfs: report file io " Darrick J. Wong 2025-10-23 0:07 ` [PATCH 08/26] xfs: validate fds against running healthmon Darrick J. Wong 2025-10-23 0:07 ` [PATCH 09/26] xfs: add media error reporting ioctl Darrick J. Wong 2025-10-23 0:08 ` [PATCH 10/26] xfs_io: monitor filesystem health events Darrick J. Wong 2025-10-23 0:08 ` [PATCH 11/26] xfs_io: add a media error reporting command Darrick J. Wong 2025-10-23 0:08 ` [PATCH 12/26] xfs_healer: create daemon to listen for health events Darrick J. Wong 2025-10-23 0:08 ` [PATCH 13/26] xfs_healer: check events against schema Darrick J. Wong 2025-10-23 0:09 ` [PATCH 14/26] xfs_healer: enable repairing filesystems Darrick J. Wong 2025-10-23 0:09 ` [PATCH 15/26] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong 2025-10-23 0:09 ` [PATCH 16/26] xfs_healer: use getparents to look up file names Darrick J. Wong 2025-10-23 0:09 ` [PATCH 17/26] builddefs: refactor udev directory specification Darrick J. Wong 2025-10-23 0:10 ` [PATCH 18/26] xfs_healer: create a background monitoring service Darrick J. Wong 2025-10-23 0:10 ` [PATCH 19/26] xfs_healer: don't start service if kernel support unavailable Darrick J. Wong 2025-10-23 0:10 ` [PATCH 20/26] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong 2025-10-23 0:11 ` [PATCH 21/26] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong 2025-10-23 0:11 ` [PATCH 22/26] xfs_healer: use getmntent to find moved filesystems Darrick J. Wong 2025-10-23 0:11 ` [PATCH 23/26] xfs_healer: validate that repair fds point to the monitored fs Darrick J. Wong 2025-10-23 0:11 ` [PATCH 24/26] xfs_healer: add a manual page Darrick J. Wong 2025-10-23 0:12 ` [PATCH 25/26] xfs_scrub: report media scrub failures to the kernel Darrick J. Wong 2025-10-23 0:12 ` [PATCH 26/26] debian: enable xfs_healer on the root filesystem by default Darrick J. Wong 2025-10-23 0:00 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 2025-10-23 0:12 ` [PATCH 01/19] xfs_healer: start building a Rust version Darrick J. Wong 2025-10-23 0:12 ` [PATCH 02/19] xfs_healer: enable gettext for localization Darrick J. Wong 2025-10-23 0:13 ` [PATCH 03/19] xfs_healer: bindgen xfs_fs.h Darrick J. Wong 2025-10-23 0:13 ` [PATCH 04/19] xfs_healer: define Rust objects for health events and kernel interface Darrick J. Wong 2025-10-23 0:13 ` [PATCH 05/19] xfs_healer: read binary health events from the kernel Darrick J. Wong 2025-10-23 0:13 ` [PATCH 06/19] xfs_healer: read json " Darrick J. Wong 2025-10-23 0:14 ` [PATCH 07/19] xfs_healer: create a weak file handle so we don't pin the mount Darrick J. Wong 2025-10-23 0:14 ` [PATCH 08/19] xfs_healer: fix broken filesystem metadata Darrick J. Wong 2025-10-23 0:14 ` [PATCH 09/19] xfs_healer: check for fs features needed for effective repairs Darrick J. Wong 2025-10-23 0:14 ` [PATCH 10/19] xfs_healer: use getparents to look up file names Darrick J. Wong 2025-10-23 0:15 ` [PATCH 11/19] xfs_healer: make the rust program check if kernel support available Darrick J. Wong 2025-10-23 0:15 ` [PATCH 12/19] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong 2025-10-23 0:15 ` [PATCH 13/19] xfs_healer: use rc on the mountpoint instead of lifetime annotations Darrick J. Wong 2025-10-23 0:15 ` [PATCH 14/19] xfs_healer: use thread pools Darrick J. Wong 2025-10-23 0:16 ` [PATCH 15/19] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong 2025-10-23 0:16 ` [PATCH 16/19] xfs_healer: use getmntent in Rust to find moved filesystems Darrick J. Wong 2025-10-23 0:16 ` [PATCH 17/19] xfs_healer: validate that repair fds point to the monitored fs in Rust Darrick J. Wong 2025-10-23 0:17 ` [PATCH 18/19] debian/control: listify the build dependencies Darrick J. Wong 2025-10-23 0:17 ` [PATCH 19/19] debian/control: pull in build dependencies for xfs_healer Darrick J. Wong 2025-11-04 22:48 ` [PATCHSET V2 2/2] xfsprogs: autonomous self healing of filesystems in Rust Darrick J. Wong 2025-12-01 17:59 ` Andrey Albershteyn 2025-12-01 21:55 ` Darrick J. Wong 2025-10-23 0:00 ` [PATCHSET V2] fstests: autonomous self healing of filesystems Darrick J. Wong 2025-10-23 0:17 ` [PATCH 1/4] xfs: test health monitoring code Darrick J. Wong 2025-10-23 0:17 ` [PATCH 2/4] xfs: test for metadata corruption error reporting via healthmon Darrick J. Wong 2025-10-23 0:18 ` [PATCH 3/4] xfs: test io " Darrick J. Wong 2025-10-23 0:18 ` [PATCH 4/4] xfs: test new xfs_healer daemon Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox