public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHBOMB v8] xfsprogs: autonomous self healing of filesystems
@ 2026-03-03  0:25 Darrick J. Wong
  2026-03-03  0:33 ` [PATCHSET " Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Darrick J. Wong @ 2026-03-03  0:25 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: cem, hch, linux-xfs

Hi all,

This patchset contains the userspace and QA changes (xfs_healer) needed
to put to use all the new kernel functionality to deliver live
information about filesystem health events (xfs_healthmon.c) to
userspace.

In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically.  This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing.  They are auto-started by a starter
service that uses fanotify.

When the patchsets under this cover letter are merged, online fsck for
XFS will at long last be fully feature complete.  The passive scan parts
have been done since mid-2024, this final part adds proactive repair.

v8: clean up userspace for merging now that the kernel part is upstream
v7: more cleanups of the media verification ioctl, improve comments, and
    reuse the bio
v6: fix pi-breaking bugs, make verify failures trigger health reports
    and filter bio status flags better
v5: add verify-media ioctl, collapse small helper funcs with only
    one caller
v4: drop multiple client support so we can make direct calls into
    healthmon instead of chasing pointers and doing indirect calls
v3: drag out of rfc status

--D

^ permalink raw reply	[flat|nested] 112+ messages in thread
* [PATCHSET v10 1/2] xfsprogs: autonomous self healing of filesystems
@ 2026-03-19  4:38 Darrick J. Wong
  2026-03-19  4:39 ` [PATCH 03/26] libfrog: add support code for starting systemd services programmatically Darrick J. Wong
  0 siblings, 1 reply; 112+ messages in thread
From: Darrick J. Wong @ 2026-03-19  4:38 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi all,

This patchset builds new functionality to deliver live information about
filesystem health events to userspace.  This is done by creating an
anonymous file that can be read() for events by userspace programs.
Events are captured by hooking various parts of XFS and iomap so that
metadata health failures, file I/O errors, and major changes in
filesystem state (unmounts, shutdowns, etc.) can be observed by
programs.

When an event occurs, the hook functions queue an event object to each
event anonfd for later processing.  Programs must have CAP_SYS_ADMIN
to open the anonfd and there's a maximum event lag to prevent resource
overconsumption.  The events themselves can be read() from the anonfd
as C structs for the xfs_healer daemon.

In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically.  This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing.  They are auto-started by a starter
service that uses fanotify.

v10: move the xfs_scrub cleanups and changes to their own patchset,
     improve the commit messages to explain why we use getmntent and
     statmount
v9: move listmount/statmount to libfrog; improve documentation about why
    we dance with getmntent; enhance getmntent reconnection with
    listmount; move --svcname helpers to xfs_{healer,scrub}; improve
    commit messages; various tweaks to fstests
v8: clean up userspace for merging now that the kernel part is upstream
v7: more cleanups of the media verification ioctl, improve comments, and
    reuse the bio
v6: fix pi-breaking bugs, make verify failures trigger health reports
v5: add verify-media ioctl, collapse small helper funcs with only
    one caller
v4: drop multiple client support so we can make direct calls into
    healthmon instead of chasing pointers and doing indirect calls
v3: drag out of rfc status

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=health-monitoring
---
Commits in this patchset:
 * libfrog: add a function to grab the path from an open fd and a file handle
 * libfrog: create healthmon event log library functions
 * libfrog: add support code for starting systemd services programmatically
 * libfrog: hoist a couple of service helper functions
 * libfrog: add wrappers for listmount and statmount
 * man2: document the healthmon ioctl
 * man2: document the media verification ioctl
 * xfs_io: monitor filesystem health events
 * xfs_io: add a media verify command
 * xfs_healer: create daemon to listen for health events
 * xfs_healer: enable repairing filesystems
 * xfs_healer: use getparents to look up file names
 * xfs_healer: create a per-mount background monitoring service
 * xfs_healer: create a service to start the per-mount healer service
 * xfs_healer: don't start service if kernel support unavailable
 * xfs_healer: use the autofsck fsproperty to select mode
 * xfs_healer: run full scrub after lost corruption events or targeted repair failure
 * xfs_healer: use getmntent to find moved filesystems
 * xfs_healer: use statmount to find moved filesystems even faster
 * xfs_healer: validate that repair fds point to the monitored fs
 * xfs_healer: add a manual page
 * xfs_scrub: print systemd service names
 * xfs_io: add listmount and statmount commands
 * mkfs: enable online repair if all backrefs are enabled
 * debian/control: listify the build dependencies
 * debian: enable xfs_healer on the root filesystem by default
---
 healer/xfs_healer.h                            |   92 +++
 include/linux.h                                |    8 
 io/io.h                                        |    8 
 libfrog/flagmap.h                              |   23 +
 libfrog/fsproperties.h                         |    5 
 libfrog/getparents.h                           |    4 
 libfrog/healthevent.h                          |   55 ++
 libfrog/statmount.h                            |  104 ++++
 libfrog/systemd.h                              |   55 ++
 scrub/xfs_scrub.h                              |    3 
 Makefile                                       |    5 
 configure.ac                                   |   13 
 debian/control                                 |   14 -
 debian/postinst                                |    8 
 debian/prerm                                   |   13 
 debian/rules                                   |    3 
 healer/Makefile                                |   69 ++
 healer/fsrepair.c                              |  342 ++++++++++++
 healer/system-xfs_healer.slice                 |   31 +
 healer/weakhandle.c                            |  296 +++++++++++
 healer/xfs_healer.c                            |  666 ++++++++++++++++++++++++
 healer/xfs_healer@.service.in                  |  108 ++++
 healer/xfs_healer_start.c                      |  368 +++++++++++++
 healer/xfs_healer_start.service.in             |   85 +++
 include/builddefs.in                           |   15 +
 io/Makefile                                    |    9 
 io/healthmon.c                                 |  186 +++++++
 io/init.c                                      |    3 
 io/listmount.c                                 |  361 +++++++++++++
 io/verify_media.c                              |  180 ++++++
 libfrog/Makefile                               |   19 +
 libfrog/flagmap.c                              |   79 +++
 libfrog/getparents.c                           |   93 +++
 libfrog/healthevent.c                          |  477 +++++++++++++++++
 libfrog/statmount.c                            |   76 +++
 libfrog/systemd.c                              |  177 ++++++
 m4/package_libcdev.m4                          |  129 +++++
 man/man2/ioctl_xfs_health_fd_on_monitored_fs.2 |   75 +++
 man/man2/ioctl_xfs_health_monitor.2            |  464 +++++++++++++++++
 man/man2/ioctl_xfs_verify_media.2              |  185 +++++++
 man/man8/Makefile                              |   40 +
 man/man8/xfs_healer.8                          |  109 ++++
 man/man8/xfs_healer_start.8                    |   37 +
 man/man8/xfs_io.8                              |  133 +++++
 mkfs/xfs_mkfs.c                                |    9 
 scrub/Makefile                                 |   14 -
 scrub/xfs_scrub.c                              |   82 ++-
 47 files changed, 5272 insertions(+), 58 deletions(-)
 create mode 100644 healer/xfs_healer.h
 create mode 100644 libfrog/flagmap.h
 create mode 100644 libfrog/healthevent.h
 create mode 100644 libfrog/statmount.h
 create mode 100644 libfrog/systemd.h
 create mode 100644 debian/prerm
 create mode 100644 healer/Makefile
 create mode 100644 healer/fsrepair.c
 create mode 100644 healer/system-xfs_healer.slice
 create mode 100644 healer/weakhandle.c
 create mode 100644 healer/xfs_healer.c
 create mode 100644 healer/xfs_healer@.service.in
 create mode 100644 healer/xfs_healer_start.c
 create mode 100644 healer/xfs_healer_start.service.in
 create mode 100644 io/healthmon.c
 create mode 100644 io/listmount.c
 create mode 100644 io/verify_media.c
 create mode 100644 libfrog/flagmap.c
 create mode 100644 libfrog/healthevent.c
 create mode 100644 libfrog/statmount.c
 create mode 100644 libfrog/systemd.c
 create mode 100644 man/man2/ioctl_xfs_health_fd_on_monitored_fs.2
 create mode 100644 man/man2/ioctl_xfs_health_monitor.2
 create mode 100644 man/man2/ioctl_xfs_verify_media.2
 create mode 100644 man/man8/xfs_healer.8
 create mode 100644 man/man8/xfs_healer_start.8


^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2026-03-19  4:39 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03  0:25 [PATCHBOMB v8] xfsprogs: autonomous self healing of filesystems Darrick J. Wong
2026-03-03  0:33 ` [PATCHSET " Darrick J. Wong
2026-03-03  0:34   ` [PATCH 01/26] libfrog: add a function to grab the path from an open fd and a file handle Darrick J. Wong
2026-03-03 15:44     ` Christoph Hellwig
2026-03-03  0:34   ` [PATCH 02/26] libfrog: create healthmon event log library functions Darrick J. Wong
2026-03-03 15:44     ` Christoph Hellwig
2026-03-03  0:34   ` [PATCH 03/26] libfrog: add support code for starting systemd services programmatically Darrick J. Wong
2026-03-03 15:45     ` Christoph Hellwig
2026-03-03 15:59       ` Darrick J. Wong
2026-03-05  2:39         ` Darrick J. Wong
2026-03-05 13:57           ` Christoph Hellwig
2026-03-03  0:34   ` [PATCH 04/26] libfrog: hoist a couple of service helper functions Darrick J. Wong
2026-03-03 15:45     ` Christoph Hellwig
2026-03-03  0:35   ` [PATCH 05/26] man2: document the healthmon ioctl Darrick J. Wong
2026-03-03 15:46     ` Christoph Hellwig
2026-03-03  0:35   ` [PATCH 06/26] man2: document the media verification ioctl Darrick J. Wong
2026-03-03 15:46     ` Christoph Hellwig
2026-03-03  0:35   ` [PATCH 07/26] xfs_io: monitor filesystem health events Darrick J. Wong
2026-03-03 15:46     ` Christoph Hellwig
2026-03-03  0:35   ` [PATCH 08/26] xfs_io: add a media verify command Darrick J. Wong
2026-03-03 15:46     ` Christoph Hellwig
2026-03-03  0:36   ` [PATCH 09/26] xfs_healer: create daemon to listen for health events Darrick J. Wong
2026-03-03 15:47     ` Christoph Hellwig
2026-03-03  0:36   ` [PATCH 10/26] xfs_healer: enable repairing filesystems Darrick J. Wong
2026-03-03 15:47     ` Christoph Hellwig
2026-03-03  0:36   ` [PATCH 11/26] xfs_healer: use getparents to look up file names Darrick J. Wong
2026-03-03 15:48     ` Christoph Hellwig
2026-03-03  0:36   ` [PATCH 12/26] xfs_healer: create a per-mount background monitoring service Darrick J. Wong
2026-03-03 15:48     ` Christoph Hellwig
2026-03-03  0:37   ` [PATCH 13/26] xfs_healer: create a service to start the per-mount healer service Darrick J. Wong
2026-03-03 15:49     ` Christoph Hellwig
2026-03-03 16:52       ` Darrick J. Wong
2026-03-03 16:54         ` Christoph Hellwig
2026-03-03 17:06           ` Darrick J. Wong
2026-03-03  0:37   ` [PATCH 14/26] xfs_healer: don't start service if kernel support unavailable Darrick J. Wong
2026-03-03 15:49     ` Christoph Hellwig
2026-03-03  0:37   ` [PATCH 15/26] xfs_healer: use the autofsck fsproperty to select mode Darrick J. Wong
2026-03-03 15:50     ` Christoph Hellwig
2026-03-03  0:38   ` [PATCH 16/26] xfs_healer: run full scrub after lost corruption events or targeted repair failure Darrick J. Wong
2026-03-03 15:50     ` Christoph Hellwig
2026-03-03  0:38   ` [PATCH 17/26] xfs_healer: use getmntent to find moved filesystems Darrick J. Wong
2026-03-03 15:51     ` Christoph Hellwig
2026-03-03 17:26       ` Darrick J. Wong
2026-03-04 13:03         ` Christoph Hellwig
2026-03-04 16:30           ` Darrick J. Wong
2026-03-05 14:00             ` Christoph Hellwig
2026-03-05 17:55               ` Darrick J. Wong
2026-03-03  0:38   ` [PATCH 18/26] xfs_healer: validate that repair fds point to the monitored fs Darrick J. Wong
2026-03-03 15:52     ` Christoph Hellwig
2026-03-03  0:38   ` [PATCH 19/26] xfs_healer: add a manual page Darrick J. Wong
2026-03-03 15:52     ` Christoph Hellwig
2026-03-03  0:39   ` [PATCH 20/26] xfs_scrub: use the verify media ioctl during phase 6 if possible Darrick J. Wong
2026-03-03 15:53     ` Christoph Hellwig
2026-03-03 16:59       ` Darrick J. Wong
2026-03-03  0:39   ` [PATCH 21/26] xfs_scrub: perform media scanning of the log region Darrick J. Wong
2026-03-03 15:54     ` Christoph Hellwig
2026-03-03  0:39   ` [PATCH 22/26] xfs_io: add listmount command Darrick J. Wong
2026-03-03 15:56     ` Christoph Hellwig
2026-03-03 17:08       ` Darrick J. Wong
2026-03-03  0:39   ` [PATCH 23/26] xfs_io: print systemd service names Darrick J. Wong
2026-03-03 15:57     ` Christoph Hellwig
2026-03-03 17:29       ` Darrick J. Wong
2026-03-04 13:04         ` Christoph Hellwig
2026-03-04 16:35           ` Darrick J. Wong
2026-03-05 13:55             ` Christoph Hellwig
2026-03-05 22:00               ` Darrick J. Wong
2026-03-06 14:20                 ` Christoph Hellwig
2026-03-06 15:58                   ` Darrick J. Wong
2026-03-03  0:40   ` [PATCH 24/26] mkfs: enable online repair if all backrefs are enabled Darrick J. Wong
2026-03-03 15:58     ` Christoph Hellwig
2026-03-03 17:32       ` Darrick J. Wong
2026-03-05 22:22         ` Darrick J. Wong
2026-03-03  0:40   ` [PATCH 25/26] debian: enable xfs_healer on the root filesystem by default Darrick J. Wong
2026-03-03 15:58     ` Christoph Hellwig
2026-03-03 17:14       ` Darrick J. Wong
2026-03-04 13:01         ` Christoph Hellwig
2026-03-05 22:10           ` Darrick J. Wong
2026-03-05 22:18             ` Darrick J. Wong
2026-03-03  0:40   ` [PATCH 26/26] debian/control: listify the build dependencies Darrick J. Wong
2026-03-03 15:58     ` Christoph Hellwig
2026-03-03 17:09       ` Darrick J. Wong
2026-03-03  0:33 ` [PATCHSET v8 1/2] fstests: test generic file IO error reporting Darrick J. Wong
2026-03-03  0:40   ` [PATCH 1/1] generic: test fsnotify filesystem " Darrick J. Wong
2026-03-03  9:21     ` Amir Goldstein
2026-03-03 14:51       ` Christoph Hellwig
2026-03-03 14:56         ` Amir Goldstein
2026-03-04 10:10         ` Jan Kara
2026-03-03 14:54     ` Christoph Hellwig
2026-03-03 16:06       ` Gabriel Krisman Bertazi
2026-03-03 16:12         ` Christoph Hellwig
2026-03-03 16:38           ` Darrick J. Wong
2026-03-03 16:49       ` Darrick J. Wong
2026-03-03 16:53         ` Christoph Hellwig
2026-03-03 17:59           ` Darrick J. Wong
2026-03-03  0:33 ` [PATCHSET v8 2/2] fstests: autonomous self healing of filesystems Darrick J. Wong
2026-03-03  0:41   ` [PATCH 01/13] xfs: test health monitoring code Darrick J. Wong
2026-03-09 17:21     ` Zorro Lang
2026-03-09 18:03       ` Darrick J. Wong
2026-03-03  0:41   ` [PATCH 02/13] xfs: test for metadata corruption error reporting via healthmon Darrick J. Wong
2026-03-03  0:41   ` [PATCH 03/13] xfs: test io " Darrick J. Wong
2026-03-03  0:41   ` [PATCH 04/13] xfs: set up common code for testing xfs_healer Darrick J. Wong
2026-03-03  0:42   ` [PATCH 05/13] xfs: test xfs_healer's event handling Darrick J. Wong
2026-03-03  0:42   ` [PATCH 06/13] xfs: test xfs_healer can fix a filesystem Darrick J. Wong
2026-03-03  0:42   ` [PATCH 07/13] xfs: test xfs_healer can report file I/O errors Darrick J. Wong
2026-03-03  0:42   ` [PATCH 08/13] xfs: test xfs_healer can report file media errors Darrick J. Wong
2026-03-03  0:43   ` [PATCH 09/13] xfs: test xfs_healer can report filesystem shutdowns Darrick J. Wong
2026-03-03  0:43   ` [PATCH 10/13] xfs: test xfs_healer can initiate full filesystem repairs Darrick J. Wong
2026-03-03  0:43   ` [PATCH 11/13] xfs: test xfs_healer can follow mount moves Darrick J. Wong
2026-03-03  0:43   ` [PATCH 12/13] xfs: test xfs_healer wont repair the wrong filesystem Darrick J. Wong
2026-03-03  0:44   ` [PATCH 13/13] xfs: test xfs_healer background service Darrick J. Wong
2026-03-03  0:47   ` [PATCH 14/13] xfs: test xfs_healer startup service Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2026-03-19  4:38 [PATCHSET v10 1/2] xfsprogs: autonomous self healing of filesystems Darrick J. Wong
2026-03-19  4:39 ` [PATCH 03/26] libfrog: add support code for starting systemd services programmatically Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox