public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET V4] xfs: autonomous self healing of filesystems
@ 2026-01-06  7:10 Darrick J. Wong
  2026-01-06  7:10 ` [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Darrick J. Wong @ 2026-01-06  7:10 UTC (permalink / raw)
  To: cem, djwong; +Cc: linux-xfs, hch, linux-fsdevel

Hi all,

This patchset builds new functionality to deliver live information about
filesystem health events to userspace.  This is done by creating an
anonymous file that can be read() for events by userspace programs.
Events are captured by hooking various parts of XFS and iomap so that
metadata health failures, file I/O errors, and major changes in
filesystem state (unmounts, shutdowns, etc.) can be observed by
programs.

When an event occurs, the hook functions queue an event object to each
event anonfd for later processing.  Programs must have CAP_SYS_ADMIN
to open the anonfd and there's a maximum event lag to prevent resource
overconsumption.  The events themselves can be read() from the anonfd
as C structs for the xfs_healer daemon.

In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically.  This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing.  They are auto-started by a starter
service that uses fanotify.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-monitoring

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-monitoring

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=health-monitoring
---
Commits in this patchset:
 * docs: discuss autonomous self healing in the xfs online repair design doc
 * xfs: start creating infrastructure for health monitoring
 * xfs: create event queuing, formatting, and discovery infrastructure
 * xfs: convey filesystem unmount events to the health monitor
 * xfs: convey metadata health events to the health monitor
 * xfs: convey filesystem shutdown events to the health monitor
 * xfs: convey externally discovered fsdax media errors to the health monitor
 * xfs: convey file I/O errors to the health monitor
 * xfs: allow reconfiguration of the health monitoring device
 * xfs: check if an open file is on the health monitored fs
 * xfs: add media error reporting ioctl
---
 fs/xfs/libxfs/xfs_fs.h                             |  178 +++
 fs/xfs/libxfs/xfs_health.h                         |    5 
 fs/xfs/xfs_healthmon.h                             |  181 +++
 fs/xfs/xfs_mount.h                                 |    4 
 fs/xfs/xfs_notify_failure.h                        |    4 
 fs/xfs/xfs_trace.h                                 |  414 ++++++
 .../filesystems/xfs/xfs-online-fsck-design.rst     |  218 +++
 fs/xfs/Makefile                                    |    7 
 fs/xfs/xfs_fsops.c                                 |   15 
 fs/xfs/xfs_health.c                                |  124 ++
 fs/xfs/xfs_healthmon.c                             | 1305 ++++++++++++++++++++
 fs/xfs/xfs_ioctl.c                                 |    7 
 fs/xfs/xfs_mount.c                                 |    2 
 fs/xfs/xfs_notify_failure.c                        |  195 +++
 fs/xfs/xfs_super.c                                 |   12 
 fs/xfs/xfs_trace.c                                 |    5 
 16 files changed, 2657 insertions(+), 19 deletions(-)
 create mode 100644 fs/xfs/xfs_healthmon.h
 create mode 100644 fs/xfs/xfs_healthmon.c


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-12  5:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06  7:10 [PATCHSET V4] xfs: autonomous self healing of filesystems Darrick J. Wong
2026-01-06  7:10 ` [PATCH 01/11] docs: discuss autonomous self healing in the xfs online repair design doc Darrick J. Wong
2026-01-07  9:08   ` Christoph Hellwig
2026-01-07 19:15     ` Darrick J. Wong
2026-01-06  7:11 ` [PATCH 02/11] xfs: start creating infrastructure for health monitoring Darrick J. Wong
2026-01-07  9:17   ` Christoph Hellwig
2026-01-07 18:50     ` Darrick J. Wong
2026-01-08 10:21       ` Christoph Hellwig
2026-01-06  7:11 ` [PATCH 03/11] xfs: create event queuing, formatting, and discovery infrastructure Darrick J. Wong
2026-01-07  9:32   ` Christoph Hellwig
2026-01-07 19:01     ` Darrick J. Wong
2026-01-06  7:11 ` [PATCH 04/11] xfs: convey filesystem unmount events to the health monitor Darrick J. Wong
2026-01-06  7:11 ` [PATCH 05/11] xfs: convey metadata health " Darrick J. Wong
2026-01-06  7:12 ` [PATCH 06/11] xfs: convey filesystem shutdown " Darrick J. Wong
2026-01-06  7:12 ` [PATCH 07/11] xfs: convey externally discovered fsdax media errors " Darrick J. Wong
2026-01-06  7:12 ` [PATCH 08/11] xfs: convey file I/O " Darrick J. Wong
2026-01-06  7:12 ` [PATCH 09/11] xfs: allow reconfiguration of the health monitoring device Darrick J. Wong
2026-01-06  7:13 ` [PATCH 10/11] xfs: check if an open file is on the health monitored fs Darrick J. Wong
2026-01-06  7:13 ` [PATCH 11/11] xfs: add media error reporting ioctl Darrick J. Wong
2026-01-07  9:36   ` Christoph Hellwig
2026-01-07 16:30     ` Darrick J. Wong
2026-01-08 10:25       ` Christoph Hellwig
2026-01-08 16:09         ` Darrick J. Wong
2026-01-08 16:14           ` Christoph Hellwig
2026-01-08 16:18             ` Darrick J. Wong
2026-01-08 16:20               ` Christoph Hellwig
2026-01-08 16:53                 ` Darrick J. Wong
2026-01-12  5:24                   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox