From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
xfs <linux-xfs@vger.kernel.org>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [LSF TOPIC] online repair of filesystems: what next?
Date: Wed, 8 Mar 2023 18:12:06 +0100 [thread overview]
Message-ID: <20230308171206.zuci3wdd3yg7amw5@quack3> (raw)
In-Reply-To: <Y/5ovz6HI2Z47jbk@magnolia>
Hi!
I'm interested in this topic. Some comments below.
On Tue 28-02-23 12:49:03, Darrick J. Wong wrote:
> Five years ago[0], we started a conversation about cross-filesystem
> userspace tooling for online fsck. I think enough time has passed for
> us to have another one, since a few things have happened since then:
>
> 1. ext4 has gained the ability to send corruption reports to a userspace
> monitoring program via fsnotify. Thanks, Collabora!
>
> 2. XFS now tracks successful scrubs and corruptions seen during runtime
> and during scrubs. Userspace can query this information.
>
> 3. Directory parent pointers, which enable online repair of the
> directory tree, is nearing completion.
>
> 4. Dave and I are working on merging online repair of space metadata for
> XFS. Online repair of directory trees is feature complete, but we
> still have one or two unresolved questions in the parent pointer
> code.
>
> 5. I've gotten a bit better[1] at writing systemd service descriptions
> for scheduling and performing background online fsck.
>
> Now that fsnotify_sb_error exists as a result of (1), I think we
> should figure out how to plumb calls into the readahead and writeback
> code so that IO failures can be reported to the fsnotify monitor. I
> suspect there may be a few difficulties here since fsnotify (iirc)
> allocates memory and takes locks.
Well, if you want to generate fsnotify events from an interrupt handler,
you're going to have a hard time, I don't have a good answer for that. But
offloading of error event generation to a workqueue should be doable (and
event delivery is async anyway so from userspace POV there's no
difference). Otherwise locking shouldn't be a problem AFAICT. WRT memory
allocation, we currently preallocate the error events to avoid the loss of
event due to ENOMEM. With current usecases (filesystem catastrophical error
reporting) we have settled on a mempool with 32 preallocated events (note
that preallocated event gets used only if normal kmalloc fails) for
simplicity. If the error reporting mechanism is going to be used
significantly more, we may need to reconsider this but it should be doable.
And frankly if you have a storm of fs errors *and* the system is going
ENOMEM at the same time, I have my doubts loosing some error report is
going to do any more harm ;).
> As a result of (2), XFS now retains quite a bit of incore state about
> its own health. The structure that fsnotify gives to userspace is very
> generic (superblock, inode, errno, errno count). How might XFS export
> a greater amount of information via this interface? We can provide
> details at finer granularity -- for example, a specific data structure
> under an allocation group or an inode, or specific quota records.
Fsnotify (fanotify in fact) interface is fairly flexible in what can be
passed through it. So if you need to pass some (reasonably short) binary
blob to userspace which knows how to decode it, fanotify can handle that
(with some wrapping). Obviously there's a tradeoff to make how much of the
event is generic (as that is then easier to process by tools common for all
filesystems) and how much is fs specific (which allows to pass more
detailed information). But I guess we need to have concrete examples of
events to discuss this.
> With (4) on the way, I can envision wanting a system service that would
> watch for these fsnotify events, and transform the error reports into
> targeted repair calls in the kernel. This of course would be very
> filesystem specific, but I would also like to hear from anyone pondering
> other usecases for fsnotify filesystem error monitors.
I think when we do report IO errors (or ENOSPC, EDQUOT errors for that
matter) through fsnotify, there would be some interesting system-health
monitoring usecases. But I don't know about anybody working on this.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2023-03-08 17:12 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-28 20:49 [LSF TOPIC] online repair of filesystems: what next? Darrick J. Wong
2023-03-08 17:12 ` Jan Kara [this message]
2023-03-08 21:54 ` Dave Chinner
2023-03-09 16:00 ` Darrick J. Wong
2023-03-09 18:26 ` Ritesh Harjani
2023-03-14 2:14 ` Darrick J. Wong
2023-03-15 3:45 ` Ritesh Harjani
2023-03-09 17:49 ` Jan Kara
2023-04-15 12:18 ` [Lsf-pc] " Amir Goldstein
2023-04-18 4:46 ` Darrick J. Wong
2023-04-18 7:46 ` Amir Goldstein
2023-04-19 2:11 ` Darrick J. Wong
2023-04-19 4:06 ` Amir Goldstein
2023-04-19 10:58 ` Chandan Babu R
2023-04-20 4:32 ` Darrick J. Wong
2023-04-20 4:47 ` Chandan Babu R
2023-04-19 3:34 ` Matthew Wilcox
2023-04-19 4:02 ` Amir Goldstein
2023-04-16 8:10 ` Qu Wenruo
2023-04-16 8:43 ` [Lsf-pc] " Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230308171206.zuci3wdd3yg7amw5@quack3 \
--to=jack@suse.cz \
--cc=djwong@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox