Re: [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow
Date: Tue, 29 Mar 2022 20:43:33 -0700	[thread overview]
Message-ID: <20220330034333.GG27690@magnolia> (raw)
In-Reply-To: <20220329062340.GY1544202@dread.disaster.area>

On Tue, Mar 29, 2022 at 05:23:40PM +1100, Dave Chinner wrote:
> On Tue, Mar 29, 2022 at 10:52:04AM +0530, Chandan Babu R wrote:
> > On 25 Mar 2022 at 03:44, Dave Chinner wrote:
> > > On Mon, Mar 21, 2022 at 10:47:46AM +0530, Chandan Babu R wrote:
> > >> The maximum file size that can be represented by the data fork extent counter
> > >> in the worst case occurs when all extents are 1 block in length and each block
> > >> is 1KB in size.
> > >> 
> > >> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> > >> 1KB sized blocks, a file can reach upto,
> > >> (2^31) * 1KB = 2TB
> > >> 
> > >> This is much larger than the theoretical maximum size of a directory
> > >> i.e. 32GB * 3 = 96GB.
> > >> 
> > >> Since a directory's inode can never overflow its data fork extent counter,
> > >> this commit replaces checking the return value of
> > >> xfs_iext_count_may_overflow() with calls to ASSERT(error == 0).
> > >
> > > I'd really prefer that we don't add noise like this to a bunch of
> > > call sites.  If directories can't overflow the extent count in
> > > normal operation, then why are we even calling
> > > xfs_iext_count_may_overflow() in these paths? i.e. an overflow would
> > > be a sign of an inode corruption, and we should have flagged that
> > > long before we do an operation that might overflow the extent count.
> > >
> > > So, really, I think you should document the directory size
> > > constraints at the site where we define all the large extent count
> > > values in xfs_format.h, remove the xfs_iext_count_may_overflow()
> > > checks from the directory code and replace them with a simple inode
> > > verifier check that we haven't got more than 100GB worth of
> > > individual extents in the data fork for directory inodes....
> > 
> > I don't think that we could trivially verify if the extents in a directory's
> > data fork add up to more than 96GB.
> 
> dip->di_nextents tells us how many extents there are in the data
> fork, we know what the block size of the filesystem is, so it should
> be pretty easy to calculate a maximum extent count for 96GB of
> space. i.e. absolute maximum valid dir data fork extent count
> is (96GB / blocksize).
> 
> > 
> > xfs_dinode->di_size tracks the size of XFS_DIR2_DATA_SPACE. This also includes
> > holes that could be created by freeing directory entries in a single directory
> > block. Also, there is no easy method to determine the space occupied by
> > XFS_DIR2_LEAF_SPACE and XFS_DIR2_FREE_SPACE segments of a directory.
> 
> Sure there is. We do this sort of calc for things like transaction
> reservations via definitions like XFS_DA_NODE_MAXDEPTH. That tells us

Hmmm.  Seeing as I just replaced XFS_BTREE_MAXLEVELS with dynamic limits
set for each filesytem, is XFS_DA_NODE_MAXDEPTH even appropriate for
modern filesystems?  We're about to start allowing far more extended
attributes in the form of parent pointers, so we should be careful about
this.

For a directory, there can be at most 32GB of directory entries, so the
maximum number of directory entries is...

32GB / (directory block size) * (max entries per dir block)

The dabtree stores (u32 hash, u32 offset) records, so I guess it
wouldn't be so hard to compute the number of blocks needed for each node
level until we only need one block, and that's our real
XFS_DA_NODE_MAXEPTH.

But then the second question is: what's the maximum height of a dabtree
that indexes an xattr structure?  I don't think there's any maximum
limit within XFS on the number of attrs you can set on a file, is there?
At least until you hit the iext_max_count check.  I think the VFS
institutes its own limit of 64k for the llistxattr buffer, but that's
about all I can think of.

I suppose right now the xattr structure can't grow larger than 2^(16+21)
blocks in size, which is 2^49 bytes, but that's a mix of attr leaves and
dabtree blocks, unlike directories, right?

> immediately how many blocks can be in the XFS_DIR2_LEAF_SPACE
> segement....
> 
> We also know the maximum number of individual directory blocks in
> the 32GB segment (fixed at 32GB / dir block size), so the free space
> array is also a fixed size at (32GB / dir block size / free space
> entries per block).
> 
> It's easy to just use (96GB / block size) and that will catch most
> corruptions with no risk of a false positive detection, but we could
> quite easily refine this to something like:
> 
> data	(32GB +				
> leaf	 btree blocks(XFS_DA_NODE_MAXDEPTH) +
> freesp	 (32GB / free space records per block))
> frags					/ filesystem block size

I think we ought to do a more careful study of XFS_DA_NODE_MAXDEPTH, but
it could become more involved than we think.  In the interest of keeping
this series moving, can we start with a new verifier check that
(di_nextents < that formula from above) and then refine that based on
whatever improvements we may or may not come up with for
XFS_DA_NODE_MAXDEPTH?

> 
> > May be the following can be added to xfs_dinode_verify(),
> > 
> > 	if (S_ISDIR(mode) && ((xfs_dinode->di_size + 2 * 32GB) > 96GB))
> >     		return __this_address
> 
> That doesn't validate that the on disk or in-memory di_nextents
> value is withing the known valid range or not. We can do that
> directly (as per above), so we shouldn't need a hueristic like this.

Indeed, inode size is not a good proxy variable for extent count.

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

next prev parent reply	other threads:[~2022-03-30  3:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21  5:17 [PATCH V8 00/19] xfs: Extend per-inode extent counters Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 02/19] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 03/19] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 04/19] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 05/19] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-03-24 21:31   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-03-24 21:33   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-03-24 21:37   ` Dave Chinner
2022-03-24 21:40     ` Darrick J. Wong
2022-03-25  6:01       ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-03-24 21:38   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-03-24 21:42   ` Dave Chinner
2022-03-25  6:01     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
2022-03-24 21:47   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-03-24 21:53   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
2022-03-24 22:14   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-25  6:37       ` Chandan Babu R
2022-03-29  5:22     ` Chandan Babu R
2022-03-29  6:23       ` Dave Chinner
2022-03-30  3:43         ` Darrick J. Wong [this message]
2022-03-30 15:39           ` Chandan Babu R
2022-03-30 15:52             ` Darrick J. Wong
2022-04-01  1:27           ` Dave Chinner
2022-04-01  7:46             ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
2022-03-24 22:28   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
2022-03-24 22:28   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
2022-03-24 22:33   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220330034333.GG27690@magnolia \
    --to=djwong@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox