Re: [PATCH 16/25] xfs: scrub inodes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 16/25] xfs: scrub inodes
Date: Thu, 5 Oct 2017 18:13:39 +1100	[thread overview]
Message-ID: <20171005071339.GN3666@dastard> (raw)
In-Reply-To: <20171005052219.GE7122@magnolia>

On Wed, Oct 04, 2017 at 10:22:19PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 05, 2017 at 03:04:52PM +1100, Dave Chinner wrote:
> > On Tue, Oct 03, 2017 at 01:42:30PM -0700, Darrick J. Wong wrote:
> > > +	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
> > > +			0, &ips);
> > 
> > I think we also want XFS_IGET_DONTCACHE here, so we don't trash the
> > inode cache with inodes that we use once for scrub and never touch
> > again.
> 
> I thought about adding this, but if we let the inodes fall out of the
> cache now then we'll just have to load them back in for the bmap checks,
> right?

Well, I'm looking at ensuring that we don't blow out the memory
side of things. We've still got the inode buffer in the buffer
cache, so I don't see why we should double cache these things
and then leave both cached copied hanging around after we've
finished with them. Leave the buffer around because we do a fair few
checks with it, but don't use excessive icache memory and trash the
working set if we can avoid it...

> > > +xfs_scrub_checkpoint_log(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	int			error;
> > > +
> > > +	error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	xfs_ail_push_all_sync(mp->m_ail);
> > > +	return 0;
> > > +}
> > 
> > Oooo, that's a nasty thing to do on busy systems with large dirty
> > logs. I hope this is a "last resort" kind of thing....
> 
> It is; we only do this if the inobt says there's an inode there and the
> inode verifiers fail.

Ok, so why would pushing the log and the AIL make the verifier then
succeed? how likely is this to occur on a busy system?

> > > +/* Set us up with an inode. */
> > 
> > What state are we trying to get the inode into here? We grab all the
> > various locks, but we can still have data changing via mmap pages
> > that are already faulted in and delalloc extents in the incore
> > extent list that aren't reflected on disk...
> > 
> > A comment explaining what we expect here would be nice.
> 
> /* 
>  * Grab total control of the inode metadata.  It doesn't matter here if
>  * the file data is still changing, we just want exclusive access to the
>  * metadata.
>  */

*nod*

> > > +	/* Got the inode, lock it and we're ready to go. */
> > > +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > > +	xfs_ilock(sc->ip, sc->ilock_flags);
> > > +	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
> > > +	if (error)
> > > +		goto out_unlock;
> > > +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> > > +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
> > 
> > Should the inode be joined to the transaction so that cancelling the
> > transaction unlocks the inode? Then the need for the ilock_flags
> > variable goes away....
> 
> This is the confluence of two semi-icky things: first, some of the
> scrubbers (particularly the dir and parent pointer scrubbers) will need
> to drop the ILOCK for short periods of time; later on, repair will want
> to keep the inode locked across all the repair transactions, so it makes
> more sense to control the lock and unlock directly.

Ok, I'll pass on this for now, see how the rest of the code falls
out.

> > > +	/* di_size */
> > > +	isize = be64_to_cpu(dip->di_size);
> > > +	if (isize & (1ULL << 63))
> > > +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > Should we be checking against the on disk format size, or the
> > mounted filesystem maximum size (i.e. mp->m_super->s_maxbytes)?
> > 32 or 64 bit systems are going to have different maximum valid file
> > sizes..
> 
> It's perfectly valid to 'truncate -s $((2 ** 60) foofile' so the only

Ugh. We can't do IO past 16TB on 32 bit systems, so I'm kinda
surprised truncate doesn't have the same s_maxbytes restriction...

> thing we can really check for here is that the upper bit isn't set
> (because the VFS does not check, but barfs on, files with that large of
> a size).

xfs_max_file_offset() sets the max file offset to 2^63 - 1, so it
looks like the lack of checking in truncate is the problem here,
not the IO path.

> > Directories have a maximum bound size, too - the data space, leaf
> > space and freespace space, each of which are 32GB in size, IIRC.
> > 
> > And symlinks have a different maximum size, too.
> 
> Fair enough, I'll expand the i_size checks, though ISTR the verifiers
> now check that for us.

If they do, then just drop a comment in there to say what is checked
by the verifier.

> > > +	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode) && isize != 0)
> > > +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > > +
> > > +	/* di_nblocks */
> > > +	if (flags2 & XFS_DIFLAG2_REFLINK) {
> > > +		; /* nblocks can exceed dblocks */
> > > +	} else if (flags & XFS_DIFLAG_REALTIME) {
> > > +		if (be64_to_cpu(dip->di_nblocks) >=
> > > +		    mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > That doesn't seem right. the file can be on either the data or the
> > rt device, so the maximum file blocks is the size of one device or
> > the other, not both combined.
> 
> di_nblocks is the sum of (data blocks + bmbt blocks + attr blocks),
> right?

Yeah, forgot it was more than just data extents.

> So in theory if you had a rt file with 1000 data blocks, 10 bmbt
> blocks to map the data blocks, and 100 attr blocks then di_nblocks has
> to be 1110.

Yup, but the additional metadata on the data device is not going to
be anywhere near the size of the data device.

/me shrugs

I can't think of an easy way to get a maximum block count, so I
guess that'll have to do...

> > > +		if (nextents > fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_BTREE:
> > > +		if (nextents <= fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_LOCAL:
> > > +	case XFS_DINODE_FMT_DEV:
> > > +	case XFS_DINODE_FMT_UUID:
> > > +	default:
> > > +		if (nextents != 0)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	}
> > > +
> > > +	/* di_anextents */
> > > +	nextents = be16_to_cpu(dip->di_anextents);
> > > +	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
> > > +	switch (dip->di_aformat) {
> > > +	case XFS_DINODE_FMT_EXTENTS:
> > > +		if (nextents > fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_BTREE:
> > > +		if (nextents <= fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_LOCAL:
> > > +	case XFS_DINODE_FMT_DEV:
> > > +	case XFS_DINODE_FMT_UUID:
> > > +	default:
> > > +		if (nextents != 0)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	}
> > 
> > Don't we need a check here first to see whether an attribute fork
> > exists or not?
> 
> Do you mean the xfs_inode_fork, or something else?

SOmething else. :P

> XFS_DFORK_ASIZE returns zero if !XFS_DFORK_Q which in turn is based on
> di_forkoff so we're really only checking that di_aformat makes sense
> given the number of extents and the size of the attr fork area.

Right, but if XFS_DFORK_ASIZE == 0, the dip->di_aformat *must* be
XFS_DINODE_FMT_EXTENTS. That's the only valid configuration when
there is no attribute fork present.

If there is an attribute fork present, then it can be XFS_DINODE_FMT_LOCAL,
EXTENT or BTREE, and then the extent count needs checking.
XFS_DINODE_FMT_DEV and XFS_DINODE_FMT_UUID are both invalid for the
attribute fork.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2017-10-05  7:14 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 20:40 [PATCH v11 00/25] xfs: online scrub support Darrick J. Wong
2017-10-03 20:40 ` [PATCH 01/25] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-10-03 20:41 ` [PATCH 02/25] xfs: dispatch metadata scrub subcommands Darrick J. Wong
2017-10-03 20:41 ` [PATCH 03/25] xfs: probe the scrub ioctl Darrick J. Wong
2017-10-03 23:32   ` Dave Chinner
2017-10-04  0:02     ` Darrick J. Wong
2017-10-04  1:56       ` Dave Chinner
2017-10-04  3:14         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 04/25] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-10-03 23:44   ` Dave Chinner
2017-10-04  0:56     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 05/25] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-10-03 23:49   ` Dave Chinner
2017-10-04  0:13     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 06/25] xfs: scrub the shape of " Darrick J. Wong
2017-10-04  0:15   ` Dave Chinner
2017-10-04  3:51     ` Darrick J. Wong
2017-10-04  5:48       ` Dave Chinner
2017-10-04 17:48         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 07/25] xfs: scrub btree keys and records Darrick J. Wong
2017-10-04 20:52   ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 08/25] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-10-04  0:46   ` Dave Chinner
2017-10-04  3:58     ` Darrick J. Wong
2017-10-04  5:59       ` Dave Chinner
2017-10-04 17:51         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 09/25] xfs: scrub the backup superblocks Darrick J. Wong
2017-10-04  0:57   ` Dave Chinner
2017-10-04  4:06     ` Darrick J. Wong
2017-10-04  6:13       ` Dave Chinner
2017-10-04 17:56         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 10/25] xfs: scrub AGF and AGFL Darrick J. Wong
2017-10-04  1:31   ` Dave Chinner
2017-10-04  4:21     ` Darrick J. Wong
2017-10-04  6:28       ` Dave Chinner
2017-10-04 17:57         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 11/25] xfs: scrub the AGI Darrick J. Wong
2017-10-04  1:43   ` Dave Chinner
2017-10-04  4:25     ` Darrick J. Wong
2017-10-04  6:43       ` Dave Chinner
2017-10-04 18:02         ` Darrick J. Wong
2017-10-04 22:16           ` Dave Chinner
2017-10-04 23:12             ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 12/25] xfs: scrub free space btrees Darrick J. Wong
2017-10-05  0:59   ` Dave Chinner
2017-10-05  1:13     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 13/25] xfs: scrub inode btrees Darrick J. Wong
2017-10-05  2:08   ` Dave Chinner
2017-10-05  5:47     ` Darrick J. Wong
2017-10-05  7:22       ` Dave Chinner
2017-10-05 18:26         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 14/25] xfs: scrub rmap btrees Darrick J. Wong
2017-10-05  2:56   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 15/25] xfs: scrub refcount btrees Darrick J. Wong
2017-10-05  2:59   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 16/25] xfs: scrub inodes Darrick J. Wong
2017-10-05  4:04   ` Dave Chinner
2017-10-05  5:22     ` Darrick J. Wong
2017-10-05  7:13       ` Dave Chinner [this message]
2017-10-05 19:56         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 17/25] xfs: scrub inode block mappings Darrick J. Wong
2017-10-06  2:51   ` Dave Chinner
2017-10-06 17:00     ` Darrick J. Wong
2017-10-07 23:10       ` Dave Chinner
2017-10-08  3:54         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 18/25] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-10-06  5:07   ` Dave Chinner
2017-10-06 18:30     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 19/25] xfs: scrub directory metadata Darrick J. Wong
2017-10-06  7:07   ` Dave Chinner
2017-10-06 19:45     ` Darrick J. Wong
2017-10-06 22:16       ` Dave Chinner
2017-10-03 20:42 ` [PATCH 20/25] xfs: scrub directory freespace Darrick J. Wong
2017-10-09  1:44   ` Dave Chinner
2017-10-09 22:54     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 21/25] xfs: scrub extended attributes Darrick J. Wong
2017-10-09  2:13   ` Dave Chinner
2017-10-09 21:14     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 22/25] xfs: scrub symbolic links Darrick J. Wong
2017-10-09  2:17   ` Dave Chinner
2017-10-03 20:43 ` [PATCH 23/25] xfs: scrub parent pointers Darrick J. Wong
2017-10-03 20:43 ` [PATCH 24/25] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-10-09  2:28   ` Dave Chinner
2017-10-09 20:24     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 25/25] xfs: scrub quota information Darrick J. Wong
2017-10-09  2:51   ` Dave Chinner
2017-10-09 20:03     ` Darrick J. Wong
2017-10-09 22:17       ` Dave Chinner
2017-10-09 23:08         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171005071339.GN3666@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox