From: Dave Chinner <david@fromorbit.com>
To: Chandan Rajendra <chandan@linux.ibm.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
Chandan Rajendra <chandanrlinux@gmail.com>,
linux-xfs@vger.kernel.org, bfoster@redhat.com
Subject: Re: [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits
Date: Mon, 27 Apr 2020 08:08:05 +1000 [thread overview]
Message-ID: <20200426220805.GE2040@dread.disaster.area> (raw)
In-Reply-To: <2457302.TnqmriUJk8@localhost.localdomain>
On Sat, Apr 25, 2020 at 05:37:39PM +0530, Chandan Rajendra wrote:
> On Thursday, April 23, 2020 4:00 AM Dave Chinner wrote:
> > On Wed, Apr 22, 2020 at 03:08:00PM +0530, Chandan Rajendra wrote:
> > > Attr bmbt tree height (MINABTPTRS == 2)
> > > |-------+------------------------+-------------------------|
> > > | Level | Number of nodes/leaves | Total Nr recs |
> > > | | | (nr nodes/leaves * 125) |
> > > |-------+------------------------+-------------------------|
> > > | 0 | 1 | 2 |
> > > | 1 | 2 | 250 |
> > > | 2 | 250 | 31250 |
> > > | 3 | 31250 | 3906250 |
> > > | 4 | 3906250 | 488281250 |
> > > | 5 | 488281250 | 61035156250 |
> > > |-------+------------------------+-------------------------|
> > >
> > > For xattr extents, (2 ** 32) - 1 = 4294967295 (~ 4 billion extents). So this
> > > will cause the corresponding bmbt's maximum height to go from 3 to 5.
> > > This probably won't cause any regression.
> >
> > We already have the XFS_DA_NODE_MAXDEPTH set to 5, so changing the
> > attr fork extent count makes no difference to the attribute fork
> > bmbt reservations. i.e. the bmbt reservations are defined by the
> > dabtree structure limits, not the maximum extent count the fork can
> > hold.
>
> I think the dabtree structure limits is because of the following ...
>
> How many levels of dabtree would be needed to hold ~100 million xattrs?
> - name len = 16 bytes
> struct xfs_parent_name_rec {
> __be64 p_ino;
> __be32 p_gen;
> __be32 p_diroffset;
> };
> i.e. 64 + 32 + 32 = 128 bits = 16 bytes;
> - Value len = file name length = Assume ~40 bytes
That's quite long for a file name, but lets run with it...
> - Formula for number of node entries (used in column 3 in the table given
> below) at any level of the dabtree,
> nr_blocks * ((block size - sizeof(struct xfs_da3_node_hdr)) / sizeof(struct
> xfs_da_node_entry))
> i.e. nr_blocks * ((block size - 64) / 8)
> - Formula for number of leaf entries (used in column 4 in the table given
> below),
> (block size - sizeof(xfs_attr_leaf_hdr_t)) /
> (sizeof(xfs_attr_leaf_entry_t) + valuelen + namelen + nameval)
> i.e. nr_blocks * ((block size - 32) / (8 + 2 + 1 + 16 + 40))
>
> Here I have assumed block size to be 4k.
>
> |-------+------------------+--------------------------+--------------------------|
> | Level | Number of blocks | Number of entries (node) | Number of entries (leaf) |
> |-------+------------------+--------------------------+--------------------------|
> | 0 | 1.0 | 5e2 | 6.1e1 |
> | 1 | 5e2 | 2.5e5 | 3.0e4 |
> | 2 | 2.5e5 | 1.3e8 | 1.5e7 |
> | 3 | 1.3e8 | 6.6e10 | 7.9e9 |
> |-------+------------------+--------------------------+--------------------------|
I'm not sure what this table actually represents.
>
> Hence we would need a tree of height 3.
> Total number of blocks = 1 + 5e2 + 2.5e5 + 1.3e8 = ~1.3e8
130 million blocks to hold 100 million xattrs? That doesn't pass the
smell test.
I think you are trying to do these calculations from the wrong
direction. Calculate the number of leaf blocks needed to hold the
xattr data first, then work out the height of the pointer tree from
that. e.g:
If we need 100m xattrs, we need this many 100% full 4k blocks to
hold them all:
blocks = 100m / entries per leaf
= 100m / 61
= 1.64m
and if we assume 37% for the least populated (because magic
split/merge number), multiply by 3, so blocks ~= 5m for 100m xattrs
in 4k blocks.
That makes a lot more sense. Now the tree itself:
ptrs per node ^ N = 5m
ptrs per node ^ (N-1) = 5m / 500 = 10k
ptrs per node ^ (N-2) = 10k / 500 = 200
ptrs per node ^ (N-3) = 200 / 500 = 1
So, N-3 = level 0, so we've got a tree of height 4 for 100m xattrs,
and the pointer tree requires ~12000 blocks which is noise compared
to the number of leaf blocks...
As for the bmbt, we've got ~5m extents worst case, which is
ptrs per node ^ N = 5m
ptrs per node ^ (N-1) = 5m / 125 = 40k
ptrs per node ^ (N-2) = 40k / 125 = 320
ptrs per node ^ (N-3) = 320 / 125 = 3
As 3 bmbt records should fit in the inode fork, we'd only need a 4
level bmbt tree to hold this, too. It's at the lower limit of a 4
level tree, but 100m xattrs is the extreme case we are talking about
here...
FWIW, repeat this with a directory data segment size of 32GB w/ 40
byte names, and the numbers aren't much different to a worst case
xattr tree of this shape. You'll see the reason for the dabtree
height being limited to 5, and that neither the directory structure
nor the xattr structure is anywhere near the 2^32 bit extent count
limit...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-04-26 22:08 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-04 8:52 [PATCH 0/2] Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-04 8:52 ` [PATCH 1/2] xfs: Fix log reservation calculation for xattr insert operation Chandan Rajendra
2020-04-06 15:25 ` Brian Foster
2020-04-06 22:57 ` Dave Chinner
2020-04-07 5:11 ` Chandan Rajendra
2020-04-07 12:59 ` Brian Foster
2020-04-07 0:49 ` Dave Chinner
2020-04-08 8:47 ` Chandan Rajendra
2020-04-04 8:52 ` [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-06 16:45 ` Brian Foster
2020-04-08 12:40 ` Chandan Rajendra
2020-04-06 17:06 ` Darrick J. Wong
2020-04-06 23:30 ` Dave Chinner
2020-04-08 12:43 ` Chandan Rajendra
2020-04-08 15:38 ` Darrick J. Wong
2020-04-08 22:43 ` Dave Chinner
2020-04-08 15:45 ` Darrick J. Wong
2020-04-08 22:45 ` Dave Chinner
2020-04-08 12:42 ` Chandan Rajendra
2020-04-07 1:20 ` Dave Chinner
2020-04-08 12:45 ` Chandan Rajendra
2020-04-10 7:46 ` Chandan Rajendra
2020-04-12 6:34 ` Chandan Rajendra
2020-04-13 18:55 ` Darrick J. Wong
2020-04-20 4:38 ` Chandan Rajendra
2020-04-22 9:38 ` Chandan Rajendra
2020-04-22 22:30 ` Dave Chinner
2020-04-25 12:07 ` Chandan Rajendra
2020-04-26 22:08 ` Dave Chinner [this message]
2020-04-29 15:35 ` Chandan Rajendra
2020-05-01 7:08 ` Chandan Rajendra
2020-05-12 23:53 ` Darrick J. Wong
2020-05-13 12:19 ` Chandan Rajendra
2020-04-22 22:51 ` Darrick J. Wong
2020-04-27 7:42 ` Christoph Hellwig
2020-04-27 7:39 ` Christoph Hellwig
2020-04-30 2:29 ` Chandan Rajendra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200426220805.GE2040@dread.disaster.area \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=chandan@linux.ibm.com \
--cc=chandanrlinux@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.