From: Allison Henderson <allison.henderson@oracle.com>
To: "djwong@kernel.org" <djwong@kernel.org>
Cc: Catherine Hoang <catherine.hoang@oracle.com>,
"david@fromorbit.com" <david@fromorbit.com>,
"willy@infradead.org" <willy@infradead.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
Chandan Babu <chandan.babu@oracle.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>
Subject: Re: [PATCH 01/14] xfs: document the motivation for online fsck design
Date: Sat, 7 Jan 2023 05:01:54 +0000 [thread overview]
Message-ID: <0607e986e96def5ba17bd53ff3f7e775a99d3d94.camel@oracle.com> (raw)
In-Reply-To: <167243825174.682859.4770282034026097725.stgit@magnolia>
On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Start the first chapter of the online fsck design documentation.
> This covers the motivations for creating this in the first place.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> Documentation/filesystems/index.rst | 1
> .../filesystems/xfs-online-fsck-design.rst | 199
> ++++++++++++++++++++
> 2 files changed, 200 insertions(+)
> create mode 100644 Documentation/filesystems/xfs-online-fsck-
> design.rst
>
>
> diff --git a/Documentation/filesystems/index.rst
> b/Documentation/filesystems/index.rst
> index bee63d42e5ec..fbb2b5ada95b 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -123,4 +123,5 @@ Documentation for filesystem implementations.
> vfat
> xfs-delayed-logging-design
> xfs-self-describing-metadata
> + xfs-online-fsck-design
> zonefs
> diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst
> b/Documentation/filesystems/xfs-online-fsck-design.rst
> new file mode 100644
> index 000000000000..25717ebb5f80
> --- /dev/null
> +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> @@ -0,0 +1,199 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _xfs_online_fsck_design:
> +
> +..
> + Mapping of heading styles within this document:
> + Heading 1 uses "====" above and below
> + Heading 2 uses "===="
> + Heading 3 uses "----"
> + Heading 4 uses "````"
> + Heading 5 uses "^^^^"
> + Heading 6 uses "~~~~"
> + Heading 7 uses "...."
> +
> + Sections are manually numbered because apparently that's
> what everyone
> + does in the kernel.
> +
> +======================
> +XFS Online Fsck Design
> +======================
> +
> +This document captures the design of the online filesystem check
> feature for
> +XFS.
> +The purpose of this document is threefold:
> +
> +- To help kernel distributors understand exactly what the XFS online
> fsck
> + feature is, and issues about which they should be aware.
> +
> +- To help people reading the code to familiarize themselves with the
> relevant
> + concepts and design points before they start digging into the
> code.
> +
> +- To help developers maintaining the system by capturing the reasons
> + supporting higher level decisionmaking.
nit: decision making
> +
> +As the online fsck code is merged, the links in this document to
> topic branches
> +will be replaced with links to code.
> +
> +This document is licensed under the terms of the GNU Public License,
> v2.
> +The primary author is Darrick J. Wong.
> +
> +This design document is split into seven parts.
> +Part 1 defines what fsck tools are and the motivations for writing a
> new one.
> +Parts 2 and 3 present a high level overview of how online fsck
> process works
> +and how it is tested to ensure correct functionality.
> +Part 4 discusses the user interface and the intended usage modes of
> the new
> +program.
> +Parts 5 and 6 show off the high level components and how they fit
> together, and
> +then present case studies of how each repair function actually
> works.
> +Part 7 sums up what has been discussed so far and speculates about
> what else
> +might be built atop online fsck.
> +
> +.. contents:: Table of Contents
> + :local:
> +
Something that I've noticed in my training sessions is that often
times, less is more. People really only absorb so much over a
particular duration of time, so sometimes having too much detail in the
context is not as helpful as you might think. A lot of times,
paraphrasing excerpts to reflect the same info in a more compact format
will help you keep audience on track (a little longer at least).
> +1. What is a Filesystem Check?
> +==============================
> +
> +A Unix filesystem has three main jobs: to provide a hierarchy of
> names through
> +which application programs can associate arbitrary blobs of data for
> any
> +length of time, to virtualize physical storage media across those
> names, and
> +to retrieve the named data blobs at any time.
Consider the following paraphrase:
A Unix filesystem has three main jobs:
* Provide a hierarchy of names by which applications access data for a
length of time.
* Store or retrieve that data at any time.
* Virtualize physical storage media across those names
Also... I dont think it would be inappropriate to just skip the above,
and jump right into fsck. That's a very limited view of a filesystem,
likely a reader seeking an fsck doc probably has some idea of what a fs
is otherwise supposed to be doing.
> +The filesystem check (fsck) tool examines all the metadata in a
> filesystem
> +to look for errors.
> +Simple tools only check for obvious corruptions, but the more
> sophisticated
> +ones cross-reference metadata records to look for inconsistencies.
> +People do not like losing data, so most fsck tools also contains
> some ability
> +to deal with any problems found.
While simple tools can detect data corruptions, a filesystem check
(fsck) uses metadata records as a cross-reference to find and correct
more inconsistencies.
?
> +As a word of caution -- the primary goal of most Linux fsck tools is
> to restore
> +the filesystem metadata to a consistent state, not to maximize the
> data
> +recovered.
> +That precedent will not be challenged here.
> +
> +Filesystems of the 20th century generally lacked any redundancy in
> the ondisk
> +format, which means that fsck can only respond to errors by erasing
> files until
> +errors are no longer detected.
> +System administrators avoid data loss by increasing the number of
> separate
> +storage systems through the creation of backups;
> and they avoid downtime by
> +increasing the redundancy of each storage system through the
> creation of RAID.
Mmm, raids help more for hardware failures right? They dont really
have a notion of when the fs is corrupted. While an fsck can help
navigate around a corruption possibly caused by a hardware failure, I
think it's really a different kind of redundancy. I think I'd probably
drop the last line and keep the selling point focused online repair.
> +More recent filesystem designs contain enough redundancy in their
> metadata that
> +it is now possible to regenerate data structures when non-
> catastrophic errors
> +occur;
> this capability aids both strategies.
> +Over the past few years, XFS has added a storage space reverse
> mapping index to
> +make it easy to find which files or metadata objects think they own
> a
> +particular range of storage.
> +Efforts are under way to develop a similar reverse mapping index for
> the naming
> +hierarchy, which will involve storing directory parent pointers in
> each file.
> +With these two pieces in place, XFS uses secondary information to
> perform more
> +sophisticated repairs.
This part here I think I would either let go or relocate. The topic of
this section is supposed to discuss roughly what a filesystem check is.
Ideally so we can start talking about how ofsck is different. It feels
like a bit of a jump to suddenly hop into rmap and pptrs, and for
"sophisticated repairs" that we havn't really gotten into the details
of yet. So I think it would read easier if we saved this part until we
start talking about how they are used later.
> +
> +TLDR; Show Me the Code!
> +-----------------------
> +
> +Code is posted to the kernel.org git trees as follows:
> +`kernel changes
> <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git
> /log/?h=repair-symlink>`_,
> +`userspace changes
> <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.
> git/log/?h=scrub-media-scan-service>`_, and
> +`QA test changes
> <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.
> git/log/?h=repair-dirs>`_.
> +Each kernel patchset adding an online repair function will use the
> same branch
> +name across the kernel, xfsprogs, and fstests git repos.
> +
> +Existing Tools
> +--------------
> +
> +The online fsck tool described here will be the third tool in the
> history of
> +XFS (on Linux) to check and repair filesystems.
> +Two programs precede it:
> +
> +The first program, ``xfs_check``, was created as part of the XFS
> debugger
> +(``xfs_db``) and can only be used with unmounted filesystems.
> +It walks all metadata in the filesystem looking for inconsistencies
> in the
> +metadata, though it lacks any ability to repair what it finds.
> +Due to its high memory requirements and inability to repair things,
> this
> +program is now deprecated and will not be discussed further.
> +
> +The second program, ``xfs_repair``, was created to be faster and
> more robust
> +than the first program.
> +Like its predecessor, it can only be used with unmounted
> filesystems.
> +It uses extent-based in-memory data structures to reduce memory
> consumption,
> +and tries to schedule readahead IO appropriately to reduce I/O
> waiting time
> +while it scans the metadata of the entire filesystem.
> +The most important feature of this tool is its ability to respond to
> +inconsistencies in file metadata and directory tree by erasing
> things as needed
> +to eliminate problems.
> +Space usage metadata are rebuilt from the observed file metadata.
> +
> +Problem Statement
> +-----------------
> +
> +The current XFS tools leave several problems unsolved:
> +
> +1. **User programs** suddenly **lose access** to information in the
> computer
> + when unexpected shutdowns occur as a result of silent corruptions
> in the
> + filesystem metadata.
> + These occur **unpredictably** and often without warning.
1. **User programs** suddenly **lose access** to the filesystem
when unexpected shutdowns occur as a result of silent corruptions
that could have otherwise been avoided with an online repair
While some of these issues are not untrue, I think it makes sense to
limit them to the issue you plan to solve, and therefore discuss.
> +
> +2. **Users** experience a **total loss of service** during the
> recovery period
> + after an **unexpected shutdown** occurs.
> +
> +3. **Users** experience a **total loss of service** if the
> filesystem is taken
> + offline to **look for problems** proactively.
> +
> +4. **Data owners** cannot **check the integrity** of their stored
> data without
> + reading all of it.
> + This may expose them to substantial billing costs when a linear
> media scan
> + might suffice.
Ok, I had to re-read this one a few times, but I think this reads a
little cleaner:
Customers that are billed for data egress may incur unnecessary
cost when a background media scan on the host may have sufficed
?
> +
> +5. **System administrators** cannot **schedule** a maintenance
> window to deal
> + with corruptions if they **lack the means** to assess filesystem
> health
> + while the filesystem is online.
> +
> +6. **Fleet monitoring tools** cannot **automate periodic checks** of
> filesystem
> + health when doing so requires **manual intervention** and
> downtime.
> +
> +7. **Users** can be tricked into **doing things they do not desire**
> when
> + malicious actors **exploit quirks of Unicode** to place
> misleading names
> + in directories.
hrmm, I guess I'm not immediately extrapolating what things users are
being tricked into doing, or how ofsck solves this? Otherwise I might
drop the last one here, I think the rest of the bullets are plenty of
motivation.
> +
> +Given this definition of the problems to be solved and the actors
> who would
> +benefit, the proposed solution is a third fsck tool that acts on a
> running
> +filesystem.
> +
> +This new third program has three components: an in-kernel facility
> to check
> +metadata, an in-kernel facility to repair metadata, and a userspace
> driver
> +program to drive fsck activity on a live filesystem.
> +``xfs_scrub`` is the name of the driver program.
> +The rest of this document presents the goals and use cases of the
> new fsck
> +tool, describes its major design points in connection to those
> goals, and
> +discusses the similarities and differences with existing tools.
> +
> ++-------------------------------------------------------------------
> -------+
> +|
> **Note**:
> |
> ++-------------------------------------------------------------------
> -------+
> +| Throughout this document, the existing offline fsck tool can also
> be |
> +| referred to by its current name
> "``xfs_repair``". |
> +| The userspace driver program for the new online fsck tool can
> be |
> +| referred to as
> "``xfs_scrub``". |
> +| The kernel portion of online fsck that validates metadata is
> called |
> +| "online scrub", and portion of the kernel that fixes metadata is
> called |
> +| "online
> repair". |
> ++-------------------------------------------------------------------
> -------+
>
Hmm, maybe here might be a good spot to move rmap and pptrs? It's not
otherwise clear to me what "secondary metadata" is. If that is what it
is meant to refer to, I think the reader will more intuitively make the
connection if those two blurbs appear in the same context.
> +
> +Secondary metadata indices enable the reconstruction of parts of a
> damaged
> +primary metadata object from secondary information.
I would take out this blurb...
> +XFS filesystems shard themselves into multiple primary objects to
> enable better
> +performance on highly threaded systems and to contain the blast
> radius when
> +problems happen.
> +The naming hierarchy is broken up into objects known as directories
> and files;
> +and the physical space is split into pieces known as allocation
> groups.
And add here:
"This enables better performance on highly threaded systems and helps
to contain corruptions when they occur."
I think that reads cleaner
> +The division of the filesystem into principal objects (allocation
> groups and
> +inodes) means that there are ample opportunities to perform targeted
> checks and
> +repairs on a subset of the filesystem.
> +While this is going on, other parts continue processing IO requests.
> +Even if a piece of filesystem metadata can only be regenerated by
> scanning the
> +entire system, the scan can still be done in the background while
> other file
> +operations continue.
> +
> +In summary, online fsck takes advantage of resource sharding and
> redundant
> +metadata to enable targeted checking and repair operations while the
> system
> +is running.
> +This capability will be coupled to automatic system management so
> that
> +autonomous self-healing of XFS maximizes service availability.
>
Nits and paraphrases aside, I think this looks pretty good?
Allison
next prev parent reply other threads:[~2023-01-07 5:02 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Y69UceeA2MEpjMJ8@magnolia>
2022-12-30 22:10 ` [PATCHSET v24.0 00/14] xfs: design documentation for online fsck Darrick J. Wong
2022-12-30 22:10 ` [PATCH 02/14] xfs: document the general theory underlying online fsck design Darrick J. Wong
2023-01-11 1:25 ` Allison Henderson
2023-01-11 23:39 ` Darrick J. Wong
2023-01-12 0:29 ` Dave Chinner
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:35 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 01/14] xfs: document the motivation for " Darrick J. Wong
2023-01-07 5:01 ` Allison Henderson [this message]
2023-01-11 19:10 ` Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 1:29 ` Darrick J. Wong
2023-01-12 0:10 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 04/14] xfs: document the user interface for online fsck Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:42 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 03/14] xfs: document the testing plan " Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:38 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-02-02 7:14 ` Allison Henderson
2023-02-02 23:14 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2023-02-09 23:14 ` Darrick J. Wong
2023-02-25 7:32 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-01-05 9:08 ` Amir Goldstein
2023-01-05 19:40 ` Darrick J. Wong
2023-01-06 3:33 ` Amir Goldstein
2023-01-11 17:54 ` Darrick J. Wong
2023-01-31 6:11 ` Allison Henderson
2023-02-02 19:55 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2022-12-30 22:10 ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-02-09 5:47 ` Allison Henderson
2023-02-10 0:24 ` Darrick J. Wong
2023-02-16 15:46 ` Allison Henderson
2023-02-16 21:08 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-01-21 1:38 ` Allison Henderson
2023-02-02 19:04 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2023-03-01 5:37 ` Allison Henderson
2023-03-02 0:39 ` Darrick J. Wong
2023-03-03 23:51 ` Allison Henderson
2023-03-04 2:28 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-01 2:42 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-02-16 15:47 ` Allison Henderson
2023-02-16 22:48 ` Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-01 22:09 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-01-14 2:32 ` [PATCH v24.2 " Darrick J. Wong
2023-02-03 2:12 ` [PATCH v24.3 " Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-02 0:14 ` Darrick J. Wong
2023-03-03 23:50 ` Allison Henderson
2023-03-04 2:19 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-01 5:36 ` Allison Henderson
2023-03-02 0:27 ` Darrick J. Wong
2023-03-03 23:51 ` Allison Henderson
2023-03-04 2:25 ` Darrick J. Wong
2023-03-07 1:30 ` [PATCHSET v24.3 00/14] xfs: design documentation for online fsck Darrick J. Wong
2023-03-07 1:30 ` Darrick J. Wong
2023-03-07 1:30 ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
2023-03-07 1:31 ` [PATCH 02/14] xfs: document the general theory underlying " Darrick J. Wong
2023-03-07 1:31 ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-03-07 1:31 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-03-07 1:31 ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-03-07 1:31 ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-03-07 1:31 ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-03-07 1:31 ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-03-07 1:31 ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2023-03-07 1:31 ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-03-07 1:31 ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-03-07 1:31 ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-03-07 1:32 ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-07 1:32 ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2022-10-02 18:19 [PATCHSET v23.3 00/14] xfs: design documentation for " Darrick J. Wong
2022-10-02 18:19 ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2022-08-07 18:30 [PATCHSET v2 00/14] xfs: design documentation for online fsck Darrick J. Wong
2022-08-07 18:30 ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0607e986e96def5ba17bd53ff3f7e775a99d3d94.camel@oracle.com \
--to=allison.henderson@oracle.com \
--cc=catherine.hoang@oracle.com \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).