All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chandan Rajendra <chandan@linux.ibm.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Chandan Rajendra <chandanrlinux@gmail.com>,
	linux-xfs@vger.kernel.org, bfoster@redhat.com
Subject: Re: [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits
Date: Thu, 23 Apr 2020 08:30:41 +1000	[thread overview]
Message-ID: <20200422223041.GE27860@dread.disaster.area> (raw)
In-Reply-To: <2468041.fvziTNUSPq@localhost.localdomain>

On Wed, Apr 22, 2020 at 03:08:00PM +0530, Chandan Rajendra wrote:
> On Monday, April 20, 2020 10:08 AM Chandan Rajendra wrote: 
> > On Tuesday, April 14, 2020 12:25 AM Darrick J. Wong wrote: 
> > > That said, it was very helpful to point out that the current MAXEXTNUM /
> > > MAXAEXTNUM symbols stop short of using all 32 (or 16) bits.
> > > 
> > > Can we use this new feature flag + inode flag to allow 4294967295
> > > extents in either fork?
> > 
> > Sure.
> > 
> > I have already tested that having 4294967295 as the maximum data extent count
> > does not cause any regressions.
> > 
> > Also, Dave was of the opinion that data extent counter be increased to
> > 64-bit. I think I should include that change along with this feature flag
> > rather than adding a new one in the near future.
> > 
> > 
> 
> Hello Dave & Darrick,
> 
> Can you please look into the following design decision w.r.t using 32-bit and
> 64-bit unsigned counters for xattr and data extents.
> 
> Maximum extent counts.
> |-----------------------+----------------------|
> | Field width (in bits) |          Max extents |
> |-----------------------+----------------------|
> |                    32 |           4294967295 |
> |                    48 |      281474976710655 |
> |                    64 | 18446744073709551615 |
> |-----------------------+----------------------|

These huge numbers are impossible to compare visually.  Once numbers
go beyond 7-9 digits, you need to start condensing them in reports.
Humans are, in general, unable to handle strings of digits longer
than 7-9 digits at all well...

Can you condense them by using scientific representation i.e. XEy,
which gives:

|-----------------------+-------------|
| Field width (in bits) | Max extents |
|-----------------------+-------------|
|                    32 |      4.3E09 |
|                    48 |      2.8E14 |
|                    64 |      1.8E19 |
|-----------------------+-------------|

It's much easier to compare differences visually because it's not
only 4 digits, not 20. The other alternative is to use k,m,g,t,p,e
suffixes to indicate magnitude (4.3g, 280t, 18e), but using
exponentials make the numbers easier to do calculations on
directly...

> |-------------------+-----|
> | Minimum node recs | 125 |
> | Minimum leaf recs | 125 |
> |-------------------+-----|

Please show your working. I'm assuming this is 50% * 4kB /
sizeof(bmbt_rec), so you are working out limits based on 4kB block
size? Realistically, worse case behaviour will be with the minimum
supported block size, which in this case will be 1kB....

> Data bmbt tree height (MINDBTPTRS == 3)
> |-------+------------------------+-------------------------|
> | Level | Number of nodes/leaves |           Total Nr recs |
> |       |                        | (nr nodes/leaves * 125) |
> |-------+------------------------+-------------------------|
> |     0 |                      1 |                       3 |
> |     1 |                      3 |                     375 |
> |     2 |                    375 |                   46875 |
> |     3 |                  46875 |                 5859375 |
> |     4 |                5859375 |               732421875 |
> |     5 |              732421875 |             91552734375 |
> |     6 |            91552734375 |          11444091796875 |
> |     7 |         11444091796875 |        1430511474609375 |
> |     8 |       1430511474609375 |      178813934326171875 |
> |     9 |     178813934326171875 |    22351741790771484375 |
> |-------+------------------------+-------------------------|
> 
> For counting data extents, even though we theoretically have 64 bits at our
> disposal, I think we should have (2 ** 48) - 1 as the maximum number of
> extents. This gives 281474976710655 (i.e. ~281 trillion extents). With this,
> bmbt tree's height grows by just two more levels (i.e. it grows from the
> current maximum height of 5 to 7). Please let me know your opinion on this.

We shouldn't make up arbitrary limits when we can calculate them exactly.
i.e. 2^63 max file size, 1kB block size (2^10), means max fragments
is 2^53 entries. On a 64kB block size (2^16), we have a max extent
count of 2^47....

i.e. 2^48 would be an acceptible limit for 1kB block size, but it is
not correct for 64kB block size filesystems....

> Attr bmbt tree height (MINABTPTRS == 2)
> |-------+------------------------+-------------------------|
> | Level | Number of nodes/leaves |           Total Nr recs |
> |       |                        | (nr nodes/leaves * 125) |
> |-------+------------------------+-------------------------|
> |     0 |                      1 |                       2 |
> |     1 |                      2 |                     250 |
> |     2 |                    250 |                   31250 |
> |     3 |                  31250 |                 3906250 |
> |     4 |                3906250 |               488281250 |
> |     5 |              488281250 |             61035156250 |
> |-------+------------------------+-------------------------|
> 
> For xattr extents, (2 ** 32) - 1 = 4294967295 (~ 4 billion extents). So this
> will cause the corresponding bmbt's maximum height to go from 3 to 5.
> This probably won't cause any regression.

We already have the XFS_DA_NODE_MAXDEPTH set to 5, so changing the
attr fork extent count makes no difference to the attribute fork
bmbt reservations. i.e. the bmbt reservations are defined by the
dabtree structure limits, not the maximum extent count the fork can
hold.

The data fork to 64 bits has no impact on the directory
reservations, either, because the number of extents in the directory
is bound by the directory segment size of 32GB. i.e. a directory can
hold, at most, 32GB of dirent data, which means there's a hard limit
on the number of dabtree entries somewhere in the order of a few
hundred million. That's where XFS_DA_NODE_MAXDEPTH comes from - it's
large enough to index a max sized directory, and the BMBT overhead
is derived from that...

> Meanwhile, I will work on finding the impact of increasing the
> height of these two trees on log reservation.

It should not change it substantially - 2 blocks per bmbt
reservation per transaction is what I'd expect from the numbers
presented...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2020-04-22 22:30 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-04  8:52 [PATCH 0/2] Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-04  8:52 ` [PATCH 1/2] xfs: Fix log reservation calculation for xattr insert operation Chandan Rajendra
2020-04-06 15:25   ` Brian Foster
2020-04-06 22:57     ` Dave Chinner
2020-04-07  5:11       ` Chandan Rajendra
2020-04-07 12:59       ` Brian Foster
2020-04-07  0:49   ` Dave Chinner
2020-04-08  8:47     ` Chandan Rajendra
2020-04-04  8:52 ` [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-06 16:45   ` Brian Foster
2020-04-08 12:40     ` Chandan Rajendra
2020-04-06 17:06   ` Darrick J. Wong
2020-04-06 23:30     ` Dave Chinner
2020-04-08 12:43       ` Chandan Rajendra
2020-04-08 15:38         ` Darrick J. Wong
2020-04-08 22:43         ` Dave Chinner
2020-04-08 15:45       ` Darrick J. Wong
2020-04-08 22:45         ` Dave Chinner
2020-04-08 12:42     ` Chandan Rajendra
2020-04-07  1:20   ` Dave Chinner
2020-04-08 12:45     ` Chandan Rajendra
2020-04-10  7:46     ` Chandan Rajendra
2020-04-12  6:34       ` Chandan Rajendra
2020-04-13 18:55         ` Darrick J. Wong
2020-04-20  4:38           ` Chandan Rajendra
2020-04-22  9:38             ` Chandan Rajendra
2020-04-22 22:30               ` Dave Chinner [this message]
2020-04-25 12:07                 ` Chandan Rajendra
2020-04-26 22:08                   ` Dave Chinner
2020-04-29 15:35                     ` Chandan Rajendra
2020-05-01  7:08                       ` Chandan Rajendra
2020-05-12 23:53                         ` Darrick J. Wong
2020-05-13 12:19                           ` Chandan Rajendra
2020-04-22 22:51               ` Darrick J. Wong
2020-04-27  7:42     ` Christoph Hellwig
2020-04-27  7:39   ` Christoph Hellwig
2020-04-30  2:29     ` Chandan Rajendra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422223041.GE27860@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=chandan@linux.ibm.com \
    --cc=chandanrlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.