From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 11/21] xfs: repair the rmapbt
Date: Thu, 5 Jul 2018 17:03:24 +1000 [thread overview]
Message-ID: <20180705070324.GQ2234@dastard> (raw)
In-Reply-To: <20180705034858.GH32415@magnolia>
On Wed, Jul 04, 2018 at 08:48:58PM -0700, Darrick J. Wong wrote:
> On Thu, Jul 05, 2018 at 09:21:15AM +1000, Dave Chinner wrote:
> > On Tue, Jul 03, 2018 at 04:59:01PM -0700, Darrick J. Wong wrote:
> > > On Tue, Jul 03, 2018 at 03:32:00PM +1000, Dave Chinner wrote:
> > Keep in mind that online repair will never guarantee that it can fix
> > all problems, so we're always going to have to offline repair. iWhat
> > we want to achieve is minimising downtime for users when a repair is
> > required. With the above IO limitations in mind, I've always
> > considered that online repair would just be for all the simple,
> > quick, easy to fix stuff, because complex stuff that required huge
> > amounts of RAM and full filesystem scans to resolve would always be
> > done faster offline.
> >
> > That's why I think that offline repair will be a better choice for
> > users for the forseeable future if repairing the damage requires
> > full filesystem metadata scans.
>
> That might be, but how else can we determine that than to merge it under
> an experimental banner, iterate it for a while, and try to persuade a
> wider band of users than ourselves to try it on a non-production system
> to see if it better solves their problems? :)
I think "deploy and idea to users and see if they have problems" is
a horrible development model. That's where btrfs went off the rails
almost 10 years ago....
That aside, I'd like to make sure we're on the same page - we don't
need online repair to fix /everything/ before it gets merged. Even
without rmap rebuild, it will still be usable and useful to the vast
majority of users wanting the filesystem to self repair the typical
one-off corruption problems filesystems encounter during their
typical production life.
What I'm worried about is that the online repair algorithms have a
fundmental dependence on rmap lookups to avoid the need to build the
global cross-references needed to isolate and repair corruptions.
This use of rmap, from one angle, can be seen as a performance
optimisation, but as I've come understand more of the algorithms the
online repair code uses, it looks more like an intractable problem
than one of performance optimisation.
That is, if the rmap is corrupt, or we even suspect that it's
corrupt, then we cannot use it to validate the state and contents of
the other on-disk structures. We could propagate rmap corruptions to
those other structures and it would go undetected. And because we
can't determine the validity of all the other structures, the only
way we can correctly repair the filesystem is to build a global
cross-reference, resolve all the inconsistencies, and then rewrite
all the structures. i.e. we need to do what xfs_repair does in phase
3, 4 and 5.
IOWs, ISTM that the online scrub/repair algorithms break down in the
face of rmap corruption and it is correctable only by building a
global cross-reference which we can then reconcile and use to
rebuild all the per-AG metadata btrees in the filesystem. Trying to
do it online via scanning of unverifiable structures risks making
the problem worse and there is a good possibility that we can't
repair the inconsistencies in a single pass.
So at this point, I'd prefer that we stop, discuss and work out a
solution to the rmap rebuild problem rather than merge a rmap
rebuild algorithm prematurely. This doesn't need to hold up any of
the other online repair functionality - none of that is dependent on
this particular piece of the puzzle being implemented.
> At worst, if future us decide that we will never figure out how to make
> online rmapbt repair robust I'm fully prepared to withdraw it.
Merging new features before sorting out the fundamental principles,
behaviours and algorithms of those features is how btrfs ended up
with a pile of stuff that doesn't quite work which no-one
understands quite well enough to fix.
> > > > This seems like a likely thing to happen given the "no reclaim"
> > > > state of the filesystem and the memory demand a rmapbt rebuild
> > > > can have. If we've got GBs of rmap info in the AG that needs to be
> > > > rebuilt, how much RAM are we going to need to index it all as we
> > > > scan the filesystem?
> > >
> > > More than I'd like -- at least 24 bytes per record (which at least is no
> > > larger than the size of the on-disk btree) plus a list_head until I can
> > > move the repairers away from creating huge lists.
> >
> > Ok, it kinda sounds a bit like we need to be able to create the new
> > btree on the fly, rather than as a single operation at the end. e.g.
> > if the list builds up to, say, 100k records, we push them into the
> > new tree and can free them. e.g. can we iteratively build the new
> > tree on disk as we go, then do a root block swap at the end to
> > switch from the old tree to the new tree?
>
> Seeing as we have a bunch of storage at our disposal, I think we could
> push the records out to disk (or just write them straight into a new
> btree) when we detect low memory conditions or consume more than some
> threshold of memory. i
*nod* That's kinda along the lines of what I was thinking of.
> For v0 I was focusing on getting it to work at all
> even with a largeish memory cost. :)
Right, I'm not suggesting that this needs to be done before the
initial merge - I just want to make sure we don't back ourselves
into a corner we can't get out of.
> > If that's a potential direction, then maybe we can look at this as a
> > future direction? It also leads to the posibility of
> > pausing/continuing repair from where the last chunk of records were
> > processed, so if we do run out of memory we don't have to start from
> > the beginning again?
>
> That could be difficult to do in practice -- we'll continue to
> accumulate deferred frees for that AG in the meantime.
I think it depends on the way we record deferred frees, but it's a
bit premature to be discussing this right now :P
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-07-05 7:03 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-24 19:23 [PATCH v16 00/21] xfs-4.19: online repair support Darrick J. Wong
2018-06-24 19:23 ` [PATCH 01/21] xfs: don't assume a left rmap when allocating a new rmap Darrick J. Wong
2018-06-27 0:54 ` Dave Chinner
2018-06-28 21:11 ` Allison Henderson
2018-06-29 14:39 ` Darrick J. Wong
2018-06-24 19:23 ` [PATCH 02/21] xfs: add helper to decide if an inode has allocated cow blocks Darrick J. Wong
2018-06-27 1:02 ` Dave Chinner
2018-06-28 21:12 ` Allison Henderson
2018-06-24 19:23 ` [PATCH 03/21] xfs: refactor part of xfs_free_eofblocks Darrick J. Wong
2018-06-28 21:13 ` Allison Henderson
2018-06-24 19:23 ` [PATCH 04/21] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-27 2:19 ` Dave Chinner
2018-06-27 16:44 ` Allison Henderson
2018-06-27 23:37 ` Dave Chinner
2018-06-29 15:14 ` Darrick J. Wong
2018-06-28 17:25 ` Allison Henderson
2018-06-29 15:08 ` Darrick J. Wong
2018-06-28 21:14 ` Allison Henderson
2018-06-28 23:21 ` Dave Chinner
2018-06-29 1:35 ` Allison Henderson
2018-06-29 14:55 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 05/21] xfs: repair the AGI Darrick J. Wong
2018-06-27 2:22 ` Dave Chinner
2018-06-28 21:15 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 06/21] xfs: repair free space btrees Darrick J. Wong
2018-06-27 3:21 ` Dave Chinner
2018-07-04 2:15 ` Darrick J. Wong
2018-07-04 2:25 ` Dave Chinner
2018-06-30 17:36 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 07/21] xfs: repair inode btrees Darrick J. Wong
2018-06-28 0:55 ` Dave Chinner
2018-07-04 2:22 ` Darrick J. Wong
2018-06-30 17:36 ` Allison Henderson
2018-06-30 18:30 ` Darrick J. Wong
2018-07-01 0:45 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running Darrick J. Wong
2018-06-28 23:37 ` Dave Chinner
2018-06-29 14:49 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 09/21] xfs: finish our set of inode get/put tracepoints for scrub Darrick J. Wong
2018-06-24 19:24 ` [PATCH 10/21] xfs: introduce online scrub freeze Darrick J. Wong
2018-06-24 19:24 ` [PATCH 11/21] xfs: repair the rmapbt Darrick J. Wong
2018-07-03 5:32 ` Dave Chinner
2018-07-03 23:59 ` Darrick J. Wong
2018-07-04 8:44 ` Carlos Maiolino
2018-07-04 18:40 ` Darrick J. Wong
2018-07-04 23:21 ` Dave Chinner
2018-07-05 3:48 ` Darrick J. Wong
2018-07-05 7:03 ` Dave Chinner [this message]
2018-07-06 0:47 ` Darrick J. Wong
2018-07-06 1:08 ` Dave Chinner
2018-06-24 19:24 ` [PATCH 12/21] xfs: repair refcount btrees Darrick J. Wong
2018-07-03 5:50 ` Dave Chinner
2018-07-04 2:23 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 13/21] xfs: repair inode records Darrick J. Wong
2018-07-03 6:17 ` Dave Chinner
2018-07-04 0:16 ` Darrick J. Wong
2018-07-04 1:03 ` Dave Chinner
2018-07-04 1:30 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 14/21] xfs: zap broken inode forks Darrick J. Wong
2018-07-04 2:07 ` Dave Chinner
2018-07-04 3:26 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 15/21] xfs: repair inode block maps Darrick J. Wong
2018-07-04 3:00 ` Dave Chinner
2018-07-04 3:41 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 16/21] xfs: repair damaged symlinks Darrick J. Wong
2018-07-04 5:45 ` Dave Chinner
2018-07-04 18:45 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 17/21] xfs: repair extended attributes Darrick J. Wong
2018-07-06 1:03 ` Dave Chinner
2018-07-06 3:10 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 18/21] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-06-29 2:52 ` Dave Chinner
2018-06-24 19:25 ` [PATCH 19/21] xfs: repair quotas Darrick J. Wong
2018-07-06 1:50 ` Dave Chinner
2018-07-06 3:16 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 20/21] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-06-24 19:25 ` [PATCH 21/21] xfs: add online scrub/repair for superblock counters Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180705070324.GQ2234@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox