From: Chandan Babu R <chandan.babu@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
Date: Wed, 16 Feb 2022 18:04:22 +0530 [thread overview]
Message-ID: <87sfsjhtwx.fsf@debian-BULLSEYE-live-builder-AMD64> (raw)
In-Reply-To: <20220216035951.GA59715@dread.disaster.area>
On 16 Feb 2022 at 09:29, Dave Chinner wrote:
> On Tue, Feb 15, 2022 at 05:16:33PM -0800, Darrick J. Wong wrote:
>> On Tue, Feb 15, 2022 at 06:46:16PM +0530, Chandan Babu R wrote:
>> > On 15 Feb 2022 at 17:03, Chandan Babu R wrote:
>> > > On 15 Feb 2022 at 15:03, Dave Chinner wrote:
>> > >> On Tue, Feb 15, 2022 at 12:18:50PM +0530, Chandan Babu R wrote:
>> > >>> On 14 Feb 2022 at 22:37, Darrick J. Wong wrote:
>> > >>> > On Fri, Feb 11, 2022 at 05:40:30PM +0530, Chandan Babu R wrote:
>> > >>> >> On 07 Feb 2022 at 22:41, Darrick J. Wong wrote:
>> > >>> >> > On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote:
>> > >>> >> >> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote:
>> > >>> >> >> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote:
>> > >>> >> >> I went through all the call sites of xfs_iext_count_may_overflow() and I think
>> > >>> >> >> that your suggestion can be implemented.
>> > >>> >>
>> > >>> >> Sorry, I missed/overlooked the usage of xfs_iext_count_may_overflow() in
>> > >>> >> xfs_symlink().
>> > >>> >>
>> > >>> >> Just after invoking xfs_iext_count_may_overflow(), we execute the following
>> > >>> >> steps,
>> > >>> >>
>> > >>> >> 1. Allocate inode chunk
>> > >>> >> 2. Initialize inode chunk.
>> > >>> >> 3. Insert record into inobt/finobt.
>> > >>> >> 4. Roll the transaction.
>> > >>> >> 5. Allocate ondisk inode.
>> > >>> >> 6. Add directory inode to transaction.
>> > >>> >> 7. Allocate blocks to store symbolic link path name.
>> > >>> >> 8. Log symlink's inode (data fork contains block mappings).
>> > >>> >> 9. Log data blocks containing symbolic link path name.
>> > >>> >> 10. Add name to directory and log directory's blocks.
>> > >>> >> 11. Log directory inode.
>> > >>> >> 12. Commit transaction.
>> > >>> >>
>> > >>> >> xfs_trans_roll() invoked in step 4 would mean that we cannot move step 6 to
>> > >>> >> occur before step 1 since xfs_trans_roll would unlock the inode by executing
>> > >>> >> xfs_inode_item_committing().
>> > >>> >>
>> > >>> >> xfs_create() has a similar flow.
>> > >>> >>
>> > >>> >> Hence, I think we should retain the current logic of setting
>> > >>> >> XFS_DIFLAG2_NREXT64 just after reading the inode from the disk.
>> > >>> >
>> > >>> > File creation shouldn't ever run into problems with
>> > >>> > xfs_iext_count_may_overflow because (a) only symlinks get created with
>> > >>> > mapped blocks, and never more than two; and (b) we always set NREXT64
>> > >>> > (the inode flag) on new files if NREXT64 (the superblock feature bit) is
>> > >>> > enabled, so a newly created file will never require upgrading.
>> > >>>
>> > >>> The inode representing the symbolic link being created cannot overflow its
>> > >>> data fork extent count field. However, the inode representing the directory
>> > >>> inside which the symbolic link entry is being created, might overflow its data
>> > >>> fork extent count field.
>> > >>
>> > >> I dont' think that can happen. A directory is limited in size to 3
>> > >> segments of 32GB each. In reality, only the data segment can ever
>> > >> reach 32GB as both the dabtree and free space segments are just
>> > >> compact indexes of the contents of the 32GB data segment.
>> > >>
>> > >> Hence a directory is never likely to reach more than about 40GB of
>> > >> blocks which is nowhere near large enough to overflowing a 32 bit
>> > >> extent count field.
>> > >
>> > > I think you are right.
>> > >
>> > > The maximum file size that can be represented by the data fork extent counter
>> > > in the worst case occurs when all extents are 1 block in size and each block
>> > > is 1k in size.
>> > >
>> > > With 1k byte sized blocks, a file can reach upto,
>> > > 1k * (2^31) = 2048 GB
>> > >
>> > > This is much larger than the asymptotic maximum size of a directory i.e.
>> > > 32GB * 3 = 96GB.
>>
>> The downside of getting rid of the checks for directories is that we
>> won't be able to use the error injection knob that limits all forks to
>> 10 extents max to see what happens when that part of directory expansion
>> fails. But if it makes it easier to handle nrext64, then that's
>> probably a good enough reason to forego that.
>
> If you want error injection to do that, add explicit error injection
> to the directory code.
The transaction might already be dirty before entering the directory code
(e.g. xfs_dir_createname()). In this case, an error return from
xfs_iext_count_may_overflow() will cause the filesystem to be shut down.
On the other hand, removing calls to xfs_iext_count_may_overflow() from the
previously listed directory functions would result in the error injection knob
to not work for directories. This would require us to delete xfs/533 test.
Leaving the current invocations of xfs_iext_count_may_overflow() in their
respective locations would mean that they are essentially no-ops for functions
which manipulate directories. However, with functions like xfs_symlink() and
xfs_create(), I wouldn't be able to add the inode to the transaction before
invoking xfs_iext_count_may_overflow() because this leads to inode being
unlocked when rolling the transaction.
Therefore I think we should not change the current code flow w.r.t to
functions associated with directory entry manipulation. i.e.
1. Let xfs_iext_count_may_overflow() continue to be no-op w.r.t directory
manipulation.
2. Since xfs_iext_count_may_overflow() is a no-op, there is no need to move
"add inode to transaction" code to occur before invoking
xfs_iext_count_may_overflow().
>
>> > xfs_bmap_del_extent_real()
>>
>> Not sure about this one, since it actually /can/ result in more extents.
>
> Yup, unlikely to ever trigger, but still necessary for correctness.
>
--
chandan
next prev parent reply other threads:[~2022-02-16 12:34 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-21 5:18 [PATCH V5 00/16] xfs: Extend per-inode extent counters Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 01/16] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 02/16] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 03/16] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 04/16] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 05/16] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 06/16] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-01-25 0:32 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 07/16] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 08/16] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 09/16] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 10/16] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-01-25 0:31 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 11/16] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-02-01 18:49 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 12/16] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-01-25 22:51 ` kernel test robot
2022-01-26 8:50 ` Chandan Babu R
2022-02-01 18:51 ` Darrick J. Wong
2022-02-01 19:10 ` Darrick J. Wong
2022-02-07 4:54 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use " Chandan Babu R
2022-02-01 20:01 ` Darrick J. Wong
2022-02-07 4:55 ` Chandan Babu R
2022-02-07 17:11 ` Darrick J. Wong
2022-02-11 12:10 ` Chandan Babu R
2022-02-14 17:07 ` Darrick J. Wong
2022-02-15 6:48 ` Chandan Babu R
2022-02-15 9:33 ` Dave Chinner
2022-02-15 11:33 ` Chandan Babu R
2022-02-15 13:16 ` Chandan Babu R
2022-02-16 1:16 ` Darrick J. Wong
2022-02-16 3:59 ` Dave Chinner
2022-02-16 12:34 ` Chandan Babu R [this message]
2022-01-21 5:18 ` [PATCH V5 14/16] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
2022-02-01 19:24 ` Darrick J. Wong
2022-02-07 4:56 ` Chandan Babu R
2022-02-07 9:46 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 15/16] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 16/16] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-02-01 19:26 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sfsjhtwx.fsf@debian-BULLSEYE-live-builder-AMD64 \
--to=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox