linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Allison Henderson <allison.henderson@oracle.com>
To: "djwong@kernel.org" <djwong@kernel.org>
Cc: Catherine Hoang <catherine.hoang@oracle.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	Chandan Babu <chandan.babu@oracle.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>
Subject: Re: [PATCH 04/14] xfs: document the user interface for online fsck
Date: Wed, 18 Jan 2023 00:03:29 +0000	[thread overview]
Message-ID: <4098826a3f69a53fd23df08eb8ffbb733d7f75ce.camel@oracle.com> (raw)
In-Reply-To: <167243825217.682859.14201039734624895373.stgit@magnolia>

On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Start the fourth chapter of the online fsck design documentation,
> which
> discusses the user interface and the background scrubbing service.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  .../filesystems/xfs-online-fsck-design.rst         |  114
> ++++++++++++++++++++
>  1 file changed, 114 insertions(+)
> 
> 
> diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst
> b/Documentation/filesystems/xfs-online-fsck-design.rst
> index d630b6bdbe4a..42e82971e036 100644
> --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> @@ -750,3 +750,117 @@ Proposed patchsets include `general stress
> testing
>  <
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> it/log/?h=race-scrub-and-mount-state-changes>`_
>  and the `evolution of existing per-function stress testing
>  <
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> it/log/?h=refactor-scrub-stress>`_.
> +
> +4. User Interface
> +=================
> +
> +The primary user of online fsck is the system administrator, just
> like offline
> +repair.
> +Online fsck presents two modes of operation to administrators:
> +A foreground CLI process for online fsck on demand, and a background
> service
> +that performs autonomous checking and repair.
> +
> +Checking on Demand
> +------------------
> +
> +For administrators who want the absolute freshest information about
> the
> +metadata in a filesystem, ``xfs_scrub`` can be run as a foreground
> process on
> +a command line.
> +The program checks every piece of metadata in the filesystem while
> the
> +administrator waits for the results to be reported, just like the
> existing
> +``xfs_repair`` tool.
> +Both tools share a ``-n`` option to perform a read-only scan, and a
> ``-v``
> +option to increase the verbosity of the information reported.
> +
> +A new feature of ``xfs_scrub`` is the ``-x`` option, which employs
> the error
> +correction capabilities of the hardware to check data file contents.
> +The media scan is not enabled by default because it may dramatically
> increase
> +program runtime and consume a lot of bandwidth on older storage
> hardware.
> +
> +The output of a foreground invocation is captured in the system log.
> +
> +The ``xfs_scrub_all`` program walks the list of mounted filesystems
> and
> +initiates ``xfs_scrub`` for each of them in parallel.
> +It serializes scans for any filesystems that resolve to the same top
> level
> +kernel block device to prevent resource overconsumption.
> +
> +Background Service
> +------------------
> +
I'm assuming the below systemd services are configurable right?
> +To reduce the workload of system administrators, the ``xfs_scrub``
> package
> +provides a suite of `systemd <https://systemd.io/>`_ timers and
> services that
> +run online fsck automatically on weekends.
by default.

> +The background service configures scrub to run with as little
> privilege as
> +possible, the lowest CPU and IO priority, and in a CPU-constrained
> single
> +threaded mode.
"This can be tuned at anytime to best suit the needs of the customer
workload."

Then I think you can drop the below line...
> +It is hoped that this minimizes the amount of load generated on the
> system and
> +avoids starving regular workloads.
> +
> +The output of the background service is also captured in the system
> log.
> +If desired, reports of failures (either due to inconsistencies or
> mere runtime
> +errors) can be emailed automatically by setting the ``EMAIL_ADDR``
> environment
> +variable in the following service files:
> +
> +* ``xfs_scrub_fail@.service``
> +* ``xfs_scrub_media_fail@.service``
> +* ``xfs_scrub_all_fail.service``
> +
> +The decision to enable the background scan is left to the system
> administrator.
> +This can be done by enabling either of the following services:
> +
> +* ``xfs_scrub_all.timer`` on systemd systems
> +* ``xfs_scrub_all.cron`` on non-systemd systems
> +
> +This automatic weekly scan is configured out of the box to perform
> an
> +additional media scan of all file data once per month.
> +This is less foolproof than, say, storing file data block checksums,
> but much
> +more performant if application software provides its own integrity
> checking,
> +redundancy can be provided elsewhere above the filesystem, or the
> storage
> +device's integrity guarantees are deemed sufficient.
> +
> +The systemd unit file definitions have been subjected to a security
> audit
> +(as of systemd 249) to ensure that the xfs_scrub processes have as
> little
> +access to the rest of the system as possible.
> +This was performed via ``systemd-analyze security``, after which
> privileges
> +were restricted to the minimum required, sandboxing was set up to
> the maximal
> +extent possible with sandboxing and system call filtering; and
> access to the
> +filesystem tree was restricted to the minimum needed to start the
> program and
> +access the filesystem being scanned.
> +The service definition files restrict CPU usage to 80% of one CPU
> core, and
> +apply as nice of a priority to IO and CPU scheduling as possible.
> +This measure was taken to minimize delays in the rest of the
> filesystem.
> +No such hardening has been performed for the cron job.
> +
> +Proposed patchset:
> +`Enabling the xfs_scrub background service
> +<
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.g
> it/log/?h=scrub-media-scan-service>`_.
> +
> +Health Reporting
> +----------------
> +
> +XFS caches a summary of each filesystem's health status in memory.
> +The information is updated whenever ``xfs_scrub`` is run, or
> whenever
> +inconsistencies are detected in the filesystem metadata during
> regular
> +operations.
> +System administrators should use the ``health`` command of
> ``xfs_spaceman`` to
> +download this information into a human-readable format.
> +If problems have been observed, the administrator can schedule a
> reduced
> +service window to run the online repair tool to correct the problem.
> +Failing that, the administrator can decide to schedule a maintenance
> window to
> +run the traditional offline repair tool to correct the problem.
> +
> +**Question**: Should the health reporting integrate with the new
> inotify fs
> +error notification system?
> +
> +**Question**: Would it be helpful for sysadmins to have a daemon to
> listen for
> +corruption notifications and initiate a repair?
> +
> +*Answer*: These questions remain unanswered, but should be a part of
> the
> +conversation with early adopters and potential downstream users of
> XFS.
I think if there's been no commentary at this point then likely they
can't be answered at this time.  Perhaps for now it is reasonable to
just let the be a potential improvement in the future if the demand for
it arises. In any case, I think we should probably clean out the Q&A
discussion prompts.

Rest looks good tho
Allison

> +
> +Proposed patchsets include
> +`wiring up health reports to correction returns
> +<
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> log/?h=corruption-health-reports>`_
> +and
> +`preservation of sickness info during memory reclaim
> +<
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> log/?h=indirect-health-reporting>`_.
> 


  reply	other threads:[~2023-01-18  0:35 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Y69UceeA2MEpjMJ8@magnolia>
2022-12-30 22:10 ` [PATCHSET v24.0 00/14] xfs: design documentation for online fsck Darrick J. Wong
2022-12-30 22:10   ` [PATCH 02/14] xfs: document the general theory underlying online fsck design Darrick J. Wong
2023-01-11  1:25     ` Allison Henderson
2023-01-11 23:39       ` Darrick J. Wong
2023-01-12  0:29         ` Dave Chinner
2023-01-18  0:03         ` Allison Henderson
2023-01-18  2:35           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 01/14] xfs: document the motivation for " Darrick J. Wong
2023-01-07  5:01     ` Allison Henderson
2023-01-11 19:10       ` Darrick J. Wong
2023-01-18  0:03         ` Allison Henderson
2023-01-18  1:29           ` Darrick J. Wong
2023-01-12  0:10       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2022-12-30 22:10   ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-01-21  1:38     ` Allison Henderson
2023-02-02 19:04       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2022-12-30 22:10   ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-01-05  9:08     ` Amir Goldstein
2023-01-05 19:40       ` Darrick J. Wong
2023-01-06  3:33         ` Amir Goldstein
2023-01-11 17:54           ` Darrick J. Wong
2023-01-31  6:11     ` Allison Henderson
2023-02-02 19:55       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2022-12-30 22:10   ` [PATCH 04/14] xfs: document the user interface for online fsck Darrick J. Wong
2023-01-18  0:03     ` Allison Henderson [this message]
2023-01-18  2:42       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-02-09  5:47     ` Allison Henderson
2023-02-10  0:24       ` Darrick J. Wong
2023-02-16 15:46         ` Allison Henderson
2023-02-16 21:08           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-02-02  7:14     ` Allison Henderson
2023-02-02 23:14       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2023-02-09 23:14           ` Darrick J. Wong
2023-02-25  7:32             ` Allison Henderson
2022-12-30 22:10   ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-01-18  0:03     ` Allison Henderson
2023-01-18  2:38       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 10/14] xfs: document full filesystem scans " Darrick J. Wong
2023-02-16 15:47     ` Allison Henderson
2023-02-16 22:48       ` Darrick J. Wong
2023-02-25  7:33         ` Allison Henderson
2023-03-01 22:09           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-02-25  7:33     ` Allison Henderson
2023-03-01  2:42       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2023-03-01  5:37     ` Allison Henderson
2023-03-02  0:39       ` Darrick J. Wong
2023-03-03 23:51         ` Allison Henderson
2023-03-04  2:28           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-01-14  2:32     ` [PATCH v24.2 " Darrick J. Wong
2023-02-03  2:12     ` [PATCH v24.3 " Darrick J. Wong
2023-02-25  7:33       ` Allison Henderson
2023-03-02  0:14         ` Darrick J. Wong
2023-03-03 23:50           ` Allison Henderson
2023-03-04  2:19             ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-01  5:36     ` Allison Henderson
2023-03-02  0:27       ` Darrick J. Wong
2023-03-03 23:51         ` Allison Henderson
2023-03-04  2:25           ` Darrick J. Wong
2023-03-07  1:30   ` [PATCHSET v24.3 00/14] xfs: design documentation for online fsck Darrick J. Wong
2023-03-07  1:30   ` Darrick J. Wong
2023-03-07  1:30     ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
2023-03-07  1:31     ` [PATCH 02/14] xfs: document the general theory underlying " Darrick J. Wong
2023-03-07  1:31     ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-03-07  1:31     ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-03-07  1:31     ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-03-07  1:31     ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-03-07  1:31     ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-03-07  1:31     ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-03-07  1:31     ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2023-03-07  1:31     ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-03-07  1:31     ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-03-07  1:31     ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-03-07  1:32     ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-07  1:32     ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2022-10-02 18:19 [PATCHSET v23.3 00/14] xfs: design documentation for " Darrick J. Wong
2022-10-02 18:19 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2022-08-07 18:30 [PATCHSET v2 00/14] xfs: design documentation " Darrick J. Wong
2022-08-07 18:30 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2022-08-11  0:20   ` Dave Chinner
2022-08-16  2:30     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4098826a3f69a53fd23df08eb8ffbb733d7f75ce.camel@oracle.com \
    --to=allison.henderson@oracle.com \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).