linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Allison Henderson <allison.henderson@oracle.com>
Cc: Catherine Hoang <catherine.hoang@oracle.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	Chandan Babu <chandan.babu@oracle.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>
Subject: Re: [PATCH 04/14] xfs: document the user interface for online fsck
Date: Tue, 17 Jan 2023 18:42:09 -0800	[thread overview]
Message-ID: <Y8dcge12A7FP9nrW@magnolia> (raw)
In-Reply-To: <4098826a3f69a53fd23df08eb8ffbb733d7f75ce.camel@oracle.com>

On Wed, Jan 18, 2023 at 12:03:29AM +0000, Allison Henderson wrote:
> On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Start the fourth chapter of the online fsck design documentation,
> > which
> > discusses the user interface and the background scrubbing service.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  .../filesystems/xfs-online-fsck-design.rst         |  114
> > ++++++++++++++++++++
> >  1 file changed, 114 insertions(+)
> > 
> > 
> > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst
> > b/Documentation/filesystems/xfs-online-fsck-design.rst
> > index d630b6bdbe4a..42e82971e036 100644
> > --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> > @@ -750,3 +750,117 @@ Proposed patchsets include `general stress
> > testing
> >  <
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=race-scrub-and-mount-state-changes>`_
> >  and the `evolution of existing per-function stress testing
> >  <
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=refactor-scrub-stress>`_.
> > +
> > +4. User Interface
> > +=================
> > +
> > +The primary user of online fsck is the system administrator, just
> > like offline
> > +repair.
> > +Online fsck presents two modes of operation to administrators:
> > +A foreground CLI process for online fsck on demand, and a background
> > service
> > +that performs autonomous checking and repair.
> > +
> > +Checking on Demand
> > +------------------
> > +
> > +For administrators who want the absolute freshest information about
> > the
> > +metadata in a filesystem, ``xfs_scrub`` can be run as a foreground
> > process on
> > +a command line.
> > +The program checks every piece of metadata in the filesystem while
> > the
> > +administrator waits for the results to be reported, just like the
> > existing
> > +``xfs_repair`` tool.
> > +Both tools share a ``-n`` option to perform a read-only scan, and a
> > ``-v``
> > +option to increase the verbosity of the information reported.
> > +
> > +A new feature of ``xfs_scrub`` is the ``-x`` option, which employs
> > the error
> > +correction capabilities of the hardware to check data file contents.
> > +The media scan is not enabled by default because it may dramatically
> > increase
> > +program runtime and consume a lot of bandwidth on older storage
> > hardware.
> > +
> > +The output of a foreground invocation is captured in the system log.
> > +
> > +The ``xfs_scrub_all`` program walks the list of mounted filesystems
> > and
> > +initiates ``xfs_scrub`` for each of them in parallel.
> > +It serializes scans for any filesystems that resolve to the same top
> > level
> > +kernel block device to prevent resource overconsumption.
> > +
> > +Background Service
> > +------------------
> > +
> I'm assuming the below systemd services are configurable right?

Yes, through the standard systemd overriddes.

> > +To reduce the workload of system administrators, the ``xfs_scrub``
> > package
> > +provides a suite of `systemd <https://systemd.io/>`_ timers and
> > services that
> > +run online fsck automatically on weekends.
> by default.

Fixed.

> > +The background service configures scrub to run with as little
> > privilege as
> > +possible, the lowest CPU and IO priority, and in a CPU-constrained
> > single
> > +threaded mode.
> "This can be tuned at anytime to best suit the needs of the customer
> workload."

Fixed.

> Then I think you can drop the below line...
> > +It is hoped that this minimizes the amount of load generated on the
> > system and
> > +avoids starving regular workloads.

Done.

> > +The output of the background service is also captured in the system
> > log.
> > +If desired, reports of failures (either due to inconsistencies or
> > mere runtime
> > +errors) can be emailed automatically by setting the ``EMAIL_ADDR``
> > environment
> > +variable in the following service files:
> > +
> > +* ``xfs_scrub_fail@.service``
> > +* ``xfs_scrub_media_fail@.service``
> > +* ``xfs_scrub_all_fail.service``
> > +
> > +The decision to enable the background scan is left to the system
> > administrator.
> > +This can be done by enabling either of the following services:
> > +
> > +* ``xfs_scrub_all.timer`` on systemd systems
> > +* ``xfs_scrub_all.cron`` on non-systemd systems
> > +
> > +This automatic weekly scan is configured out of the box to perform
> > an
> > +additional media scan of all file data once per month.
> > +This is less foolproof than, say, storing file data block checksums,
> > but much
> > +more performant if application software provides its own integrity
> > checking,
> > +redundancy can be provided elsewhere above the filesystem, or the
> > storage
> > +device's integrity guarantees are deemed sufficient.
> > +
> > +The systemd unit file definitions have been subjected to a security
> > audit
> > +(as of systemd 249) to ensure that the xfs_scrub processes have as
> > little
> > +access to the rest of the system as possible.
> > +This was performed via ``systemd-analyze security``, after which
> > privileges
> > +were restricted to the minimum required, sandboxing was set up to
> > the maximal
> > +extent possible with sandboxing and system call filtering; and
> > access to the
> > +filesystem tree was restricted to the minimum needed to start the
> > program and
> > +access the filesystem being scanned.
> > +The service definition files restrict CPU usage to 80% of one CPU
> > core, and
> > +apply as nice of a priority to IO and CPU scheduling as possible.
> > +This measure was taken to minimize delays in the rest of the
> > filesystem.
> > +No such hardening has been performed for the cron job.
> > +
> > +Proposed patchset:
> > +`Enabling the xfs_scrub background service
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.g
> > it/log/?h=scrub-media-scan-service>`_.
> > +
> > +Health Reporting
> > +----------------
> > +
> > +XFS caches a summary of each filesystem's health status in memory.
> > +The information is updated whenever ``xfs_scrub`` is run, or
> > whenever
> > +inconsistencies are detected in the filesystem metadata during
> > regular
> > +operations.
> > +System administrators should use the ``health`` command of
> > ``xfs_spaceman`` to
> > +download this information into a human-readable format.
> > +If problems have been observed, the administrator can schedule a
> > reduced
> > +service window to run the online repair tool to correct the problem.
> > +Failing that, the administrator can decide to schedule a maintenance
> > window to
> > +run the traditional offline repair tool to correct the problem.
> > +
> > +**Question**: Should the health reporting integrate with the new
> > inotify fs
> > +error notification system?
> > +
> > +**Question**: Would it be helpful for sysadmins to have a daemon to
> > listen for
> > +corruption notifications and initiate a repair?
> > +
> > +*Answer*: These questions remain unanswered, but should be a part of
> > the
> > +conversation with early adopters and potential downstream users of
> > XFS.
> I think if there's been no commentary at this point then likely they
> can't be answered at this time.  Perhaps for now it is reasonable to
> just let the be a potential improvement in the future if the demand for
> it arises. In any case, I think we should probably clean out the Q&A
> discussion prompts.

I'll change them to "future work Q's" so I don't forget to pursue them
after part 1 is merged.

> Rest looks good tho

:-D  Thanks!

--D

> Allison
> 
> > +
> > +Proposed patchsets include
> > +`wiring up health reports to correction returns
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > log/?h=corruption-health-reports>`_
> > +and
> > +`preservation of sickness info during memory reclaim
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > log/?h=indirect-health-reporting>`_.
> > 
> 

  reply	other threads:[~2023-01-18  2:42 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Y69UceeA2MEpjMJ8@magnolia>
2022-12-30 22:10 ` [PATCHSET v24.0 00/14] xfs: design documentation for online fsck Darrick J. Wong
2022-12-30 22:10   ` [PATCH 02/14] xfs: document the general theory underlying online fsck design Darrick J. Wong
2023-01-11  1:25     ` Allison Henderson
2023-01-11 23:39       ` Darrick J. Wong
2023-01-12  0:29         ` Dave Chinner
2023-01-18  0:03         ` Allison Henderson
2023-01-18  2:35           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 01/14] xfs: document the motivation for " Darrick J. Wong
2023-01-07  5:01     ` Allison Henderson
2023-01-11 19:10       ` Darrick J. Wong
2023-01-18  0:03         ` Allison Henderson
2023-01-18  1:29           ` Darrick J. Wong
2023-01-12  0:10       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-02-02  7:14     ` Allison Henderson
2023-02-02 23:14       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2023-02-09 23:14           ` Darrick J. Wong
2023-02-25  7:32             ` Allison Henderson
2022-12-30 22:10   ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-01-18  0:03     ` Allison Henderson
2023-01-18  2:38       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-01-18  0:03     ` Allison Henderson
2023-01-18  2:42       ` Darrick J. Wong [this message]
2022-12-30 22:10   ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-01-05  9:08     ` Amir Goldstein
2023-01-05 19:40       ` Darrick J. Wong
2023-01-06  3:33         ` Amir Goldstein
2023-01-11 17:54           ` Darrick J. Wong
2023-01-31  6:11     ` Allison Henderson
2023-02-02 19:55       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2022-12-30 22:10   ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-02-09  5:47     ` Allison Henderson
2023-02-10  0:24       ` Darrick J. Wong
2023-02-16 15:46         ` Allison Henderson
2023-02-16 21:08           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2022-12-30 22:10   ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-01-21  1:38     ` Allison Henderson
2023-02-02 19:04       ` Darrick J. Wong
2023-02-09  5:41         ` Allison Henderson
2022-12-30 22:10   ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-02-25  7:33     ` Allison Henderson
2023-03-01  2:42       ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2023-03-01  5:37     ` Allison Henderson
2023-03-02  0:39       ` Darrick J. Wong
2023-03-03 23:51         ` Allison Henderson
2023-03-04  2:28           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 10/14] xfs: document full filesystem scans for " Darrick J. Wong
2023-02-16 15:47     ` Allison Henderson
2023-02-16 22:48       ` Darrick J. Wong
2023-02-25  7:33         ` Allison Henderson
2023-03-01 22:09           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-01  5:36     ` Allison Henderson
2023-03-02  0:27       ` Darrick J. Wong
2023-03-03 23:51         ` Allison Henderson
2023-03-04  2:25           ` Darrick J. Wong
2022-12-30 22:10   ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-01-14  2:32     ` [PATCH v24.2 " Darrick J. Wong
2023-02-03  2:12     ` [PATCH v24.3 " Darrick J. Wong
2023-02-25  7:33       ` Allison Henderson
2023-03-02  0:14         ` Darrick J. Wong
2023-03-03 23:50           ` Allison Henderson
2023-03-04  2:19             ` Darrick J. Wong
2023-03-07  1:30   ` [PATCHSET v24.3 00/14] xfs: design documentation for online fsck Darrick J. Wong
2023-03-07  1:30   ` Darrick J. Wong
2023-03-07  1:30     ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
2023-03-07  1:31     ` [PATCH 02/14] xfs: document the general theory underlying " Darrick J. Wong
2023-03-07  1:31     ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-03-07  1:31     ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-03-07  1:31     ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-03-07  1:31     ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-03-07  1:31     ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-03-07  1:31     ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-03-07  1:31     ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2023-03-07  1:31     ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-03-07  1:31     ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-03-07  1:31     ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-03-07  1:32     ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-07  1:32     ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2022-10-02 18:19 [PATCHSET v23.3 00/14] xfs: design documentation for " Darrick J. Wong
2022-10-02 18:19 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2022-08-07 18:30 [PATCHSET v2 00/14] xfs: design documentation " Darrick J. Wong
2022-08-07 18:30 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2022-08-11  0:20   ` Dave Chinner
2022-08-16  2:30     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8dcge12A7FP9nrW@magnolia \
    --to=djwong@kernel.org \
    --cc=allison.henderson@oracle.com \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).