From: Chandan Babu R <chandan.babu@oracle.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
Date: Fri, 11 Feb 2022 17:40:30 +0530 [thread overview]
Message-ID: <87bkzda9jd.fsf@debian-BULLSEYE-live-builder-AMD64> (raw)
In-Reply-To: <20220207171106.GB8313@magnolia>
On 07 Feb 2022 at 22:41, Darrick J. Wong wrote:
> On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote:
>> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote:
>> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote:
>> >> This commit upgrades inodes to use 64-bit extent counters when they are read
>> >> from disk. Inodes are upgraded only when the filesystem instance has
>> >> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
>> >>
>> >> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> >> ---
>> >> fs/xfs/libxfs/xfs_inode_buf.c | 6 ++++++
>> >> 1 file changed, 6 insertions(+)
>> >>
>> >> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> >> index 2200526bcee0..767189c7c887 100644
>> >> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> >> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> >> @@ -253,6 +253,12 @@ xfs_inode_from_disk(
>> >> }
>> >> if (xfs_is_reflink_inode(ip))
>> >> xfs_ifork_init_cow(ip);
>> >> +
>> >> + if ((from->di_version == 3) &&
>> >> + xfs_has_nrext64(ip->i_mount) &&
>> >> + !xfs_dinode_has_nrext64(from))
>> >> + ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
>> >
>> > Hmm. Last time around I asked about the oddness of updating the inode
>> > feature flags outside of a transaction, and then never responded. :(
>> > So to quote you from last time:
>> >
>> >> The following is the thought process behind upgrading an inode to
>> >> XFS_DIFLAG2_NREXT64 when it is read from the disk,
>> >>
>> >> 1. With support for dynamic upgrade, The extent count limits of an
>> >> inode needs to be determined by checking flags present within the
>> >> inode i.e. we need to satisfy self-describing metadata property. This
>> >> helps tools like xfs_repair and scrub to verify inode's extent count
>> >> limits without having to refer to other metadata objects (e.g.
>> >> superblock feature flags).
>> >
>> > I think this makes an even /stronger/ argument for why this update
>> > needs to be transactional.
>> >
>> >> 2. Upgrade when performed inside xfs_trans_log_inode() may cause
>> >> xfs_iext_count_may_overflow() to return -EFBIG when the inode's
>> >> data/attr extent count is already close to 2^31/2^15 respectively.
>> >> Hence none of the file operations will be able to add new extents to a
>> >> file.
>> >
>> > Aha, there's the reason why! You're right, xfs_iext_count_may_overflow
>> > will abort the operation due to !NREXT64 before we even get a chance to
>> > log the inode.
>> >
>> > I observe, however, that any time we call that function, we also have a
>> > transaction allocated and we hold the ILOCK on the inode being tested.
>> > *Most* of those call sites have also joined the inode to the transaction
>> > already. I wonder, is that a more appropriate place to be upgrading the
>> > inodes? Something like:
>> >
>> > /*
>> > * Ensure that the inode has the ability to add the specified number of
>> > * extents. Caller must hold ILOCK_EXCL and have joined the inode to
>> > * the transaction. Upon return, the inode will still be in this state
>> > * upon return and the transaction will be clean.
>> > */
>> > int
>> > xfs_trans_inode_ensure_nextents(
>> > struct xfs_trans **tpp,
>> > struct xfs_inode *ip,
>> > int whichfork,
>> > int nr_to_add)
>> > {
>> > int error;
>> >
>> > error = xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
>> > if (!error)
>> > return 0;
>> >
>> > /*
>> > * Try to upgrade if the extent count fields aren't large
>> > * enough.
>> > */
>> > if (!xfs_has_nrext64(ip->i_mount) ||
>> > (ip->i_diflags2 & XFS_DIFLAG2_NREXT64))
>> > return error;
>> >
>> > ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
>> > xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
>> >
>> > error = xfs_trans_roll(tpp);
>> > if (error)
>> > return error;
>> >
>> > return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
>> > }
>> >
>> > and then the current call sites become:
>> >
>> > error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write,
>> > dblocks, rblocks, false, &tp);
>> > if (error)
>> > return error;
>> >
>> > error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
>> > XFS_IEXT_ADD_NOSPLIT_CNT);
>> > if (error)
>> > goto out_cancel;
>> >
>> > What do you think about that?
>> >
>>
>> I went through all the call sites of xfs_iext_count_may_overflow() and I think
>> that your suggestion can be implemented.
Sorry, I missed/overlooked the usage of xfs_iext_count_may_overflow() in
xfs_symlink().
Just after invoking xfs_iext_count_may_overflow(), we execute the following
steps,
1. Allocate inode chunk
2. Initialize inode chunk.
3. Insert record into inobt/finobt.
4. Roll the transaction.
5. Allocate ondisk inode.
6. Add directory inode to transaction.
7. Allocate blocks to store symbolic link path name.
8. Log symlink's inode (data fork contains block mappings).
9. Log data blocks containing symbolic link path name.
10. Add name to directory and log directory's blocks.
11. Log directory inode.
12. Commit transaction.
xfs_trans_roll() invoked in step 4 would mean that we cannot move step 6 to
occur before step 1 since xfs_trans_roll would unlock the inode by executing
xfs_inode_item_committing().
xfs_create() has a similar flow.
Hence, I think we should retain the current logic of setting
XFS_DIFLAG2_NREXT64 just after reading the inode from the disk.
>>
>> However, wouldn't the current approach suffice in terms of being functionally
>> and logically correct? XFS_DIFLAG2_NREXT64 is set when inode is read from the
>> disk and the first operation to log the changes made to the inode will make
>> sure to include the new value of ip->i_diflags2. Hence we never end up in a
>> situation where a disk inode has more than 2^31 data fork extents without
>> having XFS_DIFLAG2_NREXT64 flag set.
>>
>> But the approach described above does go against the convention of changing
>> metadata within a transaction. Hence I will try to implement your suggestion
>> and include it in the next version of the patchset.
>
> Ok, that sounds good. :)
>
--
chandan
next prev parent reply other threads:[~2022-02-11 12:10 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-21 5:18 [PATCH V5 00/16] xfs: Extend per-inode extent counters Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 01/16] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 02/16] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 03/16] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 04/16] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 05/16] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 06/16] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-01-25 0:32 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 07/16] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 08/16] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 09/16] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 10/16] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-01-25 0:31 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 11/16] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-02-01 18:49 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 12/16] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-01-25 22:51 ` kernel test robot
2022-01-26 8:50 ` Chandan Babu R
2022-02-01 18:51 ` Darrick J. Wong
2022-02-01 19:10 ` Darrick J. Wong
2022-02-07 4:54 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use " Chandan Babu R
2022-02-01 20:01 ` Darrick J. Wong
2022-02-07 4:55 ` Chandan Babu R
2022-02-07 17:11 ` Darrick J. Wong
2022-02-11 12:10 ` Chandan Babu R [this message]
2022-02-14 17:07 ` Darrick J. Wong
2022-02-15 6:48 ` Chandan Babu R
2022-02-15 9:33 ` Dave Chinner
2022-02-15 11:33 ` Chandan Babu R
2022-02-15 13:16 ` Chandan Babu R
2022-02-16 1:16 ` Darrick J. Wong
2022-02-16 3:59 ` Dave Chinner
2022-02-16 12:34 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 14/16] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
2022-02-01 19:24 ` Darrick J. Wong
2022-02-07 4:56 ` Chandan Babu R
2022-02-07 9:46 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 15/16] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 16/16] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-02-01 19:26 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkzda9jd.fsf@debian-BULLSEYE-live-builder-AMD64 \
--to=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox