From: "Darrick J. Wong" <djwong@kernel.org>
To: cem@kernel.org, djwong@kernel.org
Cc: hch@lst.de, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, hch@lst.de
Subject: [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc
Date: Thu, 15 Jan 2026 21:42:16 -0800 [thread overview]
Message-ID: <176852588561.2137143.321816631942419344.stgit@frogsfrogsfrogs> (raw)
In-Reply-To: <176852588473.2137143.1604994842772101197.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Update the XFS online repair document to describe the motivation and
design of the autonomous filesystem healing agent known as xfs_healer.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../filesystems/xfs/xfs-online-fsck-design.rst | 153 ++++++++++++++++++++
1 file changed, 151 insertions(+), 2 deletions(-)
diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 3d9233f403dbb1..fd936d1b7a32a2 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -166,9 +166,12 @@ The current XFS tools leave several problems unsolved:
malicious actors **exploit quirks of Unicode** to place misleading names
in directories.
+8. **Site Reliability and Support Engineers** would like to reduce the
+ frequency of incidents requiring **manual intervention**.
+
Given this definition of the problems to be solved and the actors who would
benefit, the proposed solution is a third fsck tool that acts on a running
-filesystem.
+filesystem, and an autononmous agent that fixes problems as they arise.
This new third program has three components: an in-kernel facility to check
metadata, an in-kernel facility to repair metadata, and a userspace driver
@@ -203,6 +206,13 @@ Even if a piece of filesystem metadata can only be regenerated by scanning the
entire system, the scan can still be done in the background while other file
operations continue.
+The autonomous self healing agent should listen for metadata health impact
+reports coming from the kernel and automatically schedule repairs for the
+damaged metadata.
+If the required repairs are larger in scope than a single metadata structure,
+``xfs_scrub`` should be invoked to perform a full analysis.
+``xfs_healer`` is the name of this program.
+
In summary, online fsck takes advantage of resource sharding and redundant
metadata to enable targeted checking and repair operations while the system
is running.
@@ -850,11 +860,16 @@ variable in the following service files:
* ``xfs_scrub_all_fail.service``
The decision to enable the background scan is left to the system administrator.
-This can be done by enabling either of the following services:
+This can be done system-wide by enabling either of the following services:
* ``xfs_scrub_all.timer`` on systemd systems
* ``xfs_scrub_all.cron`` on non-systemd systems
+To enable online repair for specific filesystems, the ``autofsck``
+filesystem property should be set to ``repair``.
+To enable only scanning, the property should be set to ``check``.
+To disable online fsck entirely, the property should be set to ``none``.
+
This automatic weekly scan is configured out of the box to perform an
additional media scan of all file data once per month.
This is less foolproof than, say, storing file data block checksums, but much
@@ -897,6 +912,36 @@ notifications and initiate a repair?
*Answer*: These questions remain unanswered, but should be a part of the
conversation with early adopters and potential downstream users of XFS.
+Autonomous Self Healing
+-----------------------
+
+The autonomous self healing agent is a background system service that starts
+when the filesystem is mounted and runs until unmount.
+When starting up, the agent opens a special pseudofile under the specific
+mount.
+When the filesystem generates new adverse health events, the events will be
+made available for reading via the special pseudofile.
+The events need not be limited to metadata concerns; they can also reflect
+events outside of the filesystem's direct control such as file I/O errors.
+
+The agent reads these events in a loop and responds to the events
+appropriately.
+For a single trouble report about metadata, the agent initiates a targeted
+repair of the specific structure.
+If that repair fails or the agent observes too many metadata trouble reports
+over a short interval, it should then initiate a full scan of the filesystem
+via the ``xfs_scrub`` service.
+
+The decision to enable the background scan is left to the system administrator.
+This can be done system-wide by enabling the following services:
+
+* ``xfs_healer@.service`` on systemd systems
+
+To enable autonomous healing for specific filesystems, the ``autofsck``
+filesystem property should be set to ``repair``.
+To disable self healing, the property should be set to ``check``,
+``optimize``, or ``none``.
+
5. Kernel Algorithms and Data Structures
========================================
@@ -4780,6 +4825,70 @@ Orphaned files are adopted by the orphanage as follows:
7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all
resources.
+Health Monitoring
+-----------------
+
+A self-correcting filesystem responds to observations of problems by scheduling
+repairs of the affected areas.
+The filesystem must therefore create event objects in response to stimuli
+(metadata corruption, file I/O errors, etc.) and dispatch these events to
+downstream consumers.
+
+However, the decision to translate an adverse metadata health report into a
+repair should be made by userspace, and the actual scheduling done by userspace.
+Some users (e.g. containers) would prefer to fast-fail the container and restart
+it on another node at a previous checkpoint.
+For workloads running in isolation, repairs may be preferable; either way this
+is something the system administrator knows, and not the kernel.
+A userspace agent (``xfs_healer``, described later) will collect events from the
+kernel and dispatch them appropriately.
+
+Exporting health events to userspace requires the creation of a new component,
+known as the health monitor.
+Because the monitor exposes itself to userspace to deliver information, a file
+descriptor is the natural abstraction to use here.
+The health monitor hooks all the relevant sources of metadata health events.
+Upon activation of the hook, a new event object is created and added to a queue.
+When the agent reads from the fd, event objects are pulled from the start of the
+queue and formatted into the user's buffer.
+The events are freed, and the read call returns to userspace to allow the agent
+to perform some work.
+Memory usage is constrained on a per-fd basis to prevent memory exhaustion; if
+an event must be discarded, a special "lost event" event is delivered to the
+agent.
+
+In short, health events are captured, queued, and eventually copied out to
+userspace for dispatching.
+
++----------------------------------------------------------------------+
+| **Sidebar**: |
++----------------------------------------------------------------------+
+| **Question**: Why use a pseudofile and not use existing notification |
+| methods such as fanotify? |
+| |
+| *Answer*: The pseudofile is a private filesystem interface only |
+| available to processes with the CAP_SYS_ADMIN priviledge and the |
+| ability to open the root directory of an XFS filesystem. |
+| Using a pseudofile gives the kernel and ``xfs_healer`` the |
+| flexibility to expose XFS-specific filesystem details to a special |
+| userspace daemon without cluttering up fanotify's userspace ABI. |
+| Normal userpace programs are not expected to subscribe to the XFS |
+| events in this manner. |
+| Instead, they should subscribe to the generic events provided by |
+| fanotify. |
+| |
+| The pseudofile can also accept ioctls, which gives the userspace |
+| ``xfs_healer`` program a programmatic means to validate that prior |
+| to a repair, its reopened mountpoint is actually the same filesystem |
+| that is being monitored. |
+| |
+| Finally, on an implementation level, fsnotify provides rather little |
+| in the way of an actual event queue implementation; it's really more |
+| of an event dispatcher. |
+| This means there's little advantage in terms of the quantity of new |
+| code added since we still have to write our own queuing discipline! |
++----------------------------------------------------------------------+
+
6. Userspace Algorithms and Data Structures
===========================================
@@ -5071,6 +5180,46 @@ and report what has been lost.
For media errors in blocks owned by files, parent pointers can be used to
construct file paths from inode numbers for user-friendly reporting.
+Autonomous Self Healing
+-----------------------
+
+When a filesystem mounts, the Linux kernel initiates a fsnotify event
+describing the mount point and the path to the data device.
+A separate ``xfs_healer_start`` systemd service listens for these mount
+events via fanotify, and starts a mountpoint-specific ``xfs_healer``
+service instance.
+The ``xfs_healer`` service opens the mountpoint and issues the
+XFS_IOC_HEALTH_MONITOR ioctl to open a special health monitoring file.
+After that is set up, the mountpoint is closed to avoid pinning the mount.
+
+The health monitoring file hooks certain points of the filesystem so that it
+may receive events about metadata health, filesystem shutdowns, media errors,
+file I/O errors, and unmounting of the filesystem.
+Events are queued up for each health monitor file and encoded into a
+``struct xfs_health_monitor_event`` object when the agent calls ``read()`` on
+the file.
+All health events are dispatched to a background threadpool to reduce stalls
+in the main event loop.
+Events can be logged into the system log for further analysis.
+
+For metadata health events, the specific details are used to construct a call
+to the scrub ioctl.
+The filesystem mountpoint is reopened, and the kernel is called.
+If events are lost or the repairs fail, a full scan will be initiated by
+starting up an ``xfs_scrub@.service`` for the given mountpoint.
+
+A filesystem shutdown causes all future repair work to cease, and an unmount
+causes the agent to exit.
+
+**Future Work Question**: Should the healer daemon also register a dbus
+listener and publish events there?
+
+*Answer*: This is unclear -- if there's a demand for system monitoring daemons
+to consume this information and make decisions, then yes, this could be wired
+up in ``xfs_healer``.
+On the other hand, systemd is in the middle of a transition to varlink, so
+it makes more sense to wait and see what happens.
+
7. Conclusion and Future Work
=============================
next prev parent reply other threads:[~2026-01-16 5:42 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 5:42 [PATCHSET v6] xfs: autonomous self healing of filesystems Darrick J. Wong
2026-01-16 5:42 ` Darrick J. Wong [this message]
2026-01-16 5:42 ` [PATCH 02/11] xfs: start creating infrastructure for health monitoring Darrick J. Wong
2026-02-06 13:07 ` Pankaj Raghav (Samsung)
2026-02-06 17:47 ` Darrick J. Wong
2026-02-06 18:54 ` Pankaj Raghav
2026-02-06 20:41 ` Darrick J. Wong
2026-02-09 6:34 ` Christoph Hellwig
2026-02-10 4:57 ` Darrick J. Wong
2026-01-16 5:42 ` [PATCH 03/11] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong
2026-01-16 5:43 ` [PATCH 04/11] xfs: convey filesystem unmount events to the health monitor Darrick J. Wong
2026-01-19 15:44 ` Christoph Hellwig
2026-01-16 5:43 ` [PATCH 05/11] xfs: convey metadata health " Darrick J. Wong
2026-01-16 5:43 ` [PATCH 06/11] xfs: convey filesystem shutdown " Darrick J. Wong
2026-01-19 15:44 ` Christoph Hellwig
2026-01-16 5:43 ` [PATCH 07/11] xfs: convey externally discovered fsdax media errors " Darrick J. Wong
2026-01-16 5:44 ` [PATCH 08/11] xfs: convey file I/O " Darrick J. Wong
2026-01-16 5:44 ` [PATCH 09/11] xfs: allow toggling verbose logging on the health monitoring file Darrick J. Wong
2026-01-16 5:44 ` [PATCH 10/11] xfs: check if an open file is on the health monitored fs Darrick J. Wong
2026-01-16 5:44 ` [PATCH 11/11] xfs: add media verification ioctl Darrick J. Wong
2026-01-19 15:56 ` Christoph Hellwig
2026-01-19 17:35 ` Darrick J. Wong
2026-01-20 4:12 ` [PATCH v6.1 " Darrick J. Wong
2026-01-20 7:18 ` Christoph Hellwig
2026-01-20 18:00 ` Darrick J. Wong
2026-01-21 7:05 ` Christoph Hellwig
2026-01-21 19:58 ` Darrick J. Wong
2026-02-06 3:01 ` Chris Mason
2026-02-06 4:53 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2026-01-21 6:34 [PATCHSET v7 1/3] xfs: autonomous self healing of filesystems Darrick J. Wong
2026-01-21 6:35 ` [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong
2026-01-13 0:32 [PATCHSET v5] xfs: autonomous self healing of filesystems Darrick J. Wong
2026-01-13 0:32 ` [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong
2026-01-13 16:00 ` Christoph Hellwig
2026-01-06 7:10 [PATCHSET V4] xfs: autonomous self healing of filesystems Darrick J. Wong
2026-01-06 7:10 ` [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong
2026-01-07 9:08 ` Christoph Hellwig
2026-01-07 19:15 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=176852588561.2137143.321816631942419344.stgit@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=cem@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox