From: Chandan Babu R <chandanrlinux@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Chandan Babu R <chandanrlinux@gmail.com>,
linux-xfs@vger.kernel.org, hch@lst.de,
allison.henderson@oracle.com
Subject: Re: [PATCH V12 04/14] xfs: Check for extent overflow when adding/removing dir entries
Date: Sat, 09 Jan 2021 10:56:42 +0530 [thread overview]
Message-ID: <87y2h2r88d.fsf@garuda> (raw)
In-Reply-To: <20210109013351.GS6918@magnolia>
On 09 Jan 2021 at 07:03, Darrick J. Wong wrote:
> On Fri, Jan 08, 2021 at 10:25:35AM +0530, Chandan Babu R wrote:
>> On Thu, 07 Jan 2021 17:17:13 -0800, Darrick J. Wong wrote:
>> > On Mon, Jan 04, 2021 at 04:01:10PM +0530, Chandan Babu R wrote:
>> > > Directory entry addition can cause the following,
>> > > 1. Data block can be added/removed.
>> > > A new extent can cause extent count to increase by 1.
>> > > 2. Free disk block can be added/removed.
>> > > Same behaviour as described above for Data block.
>> > > 3. Dabtree blocks.
>> > > XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
>> > > can be new extents. Hence extent count can increase by
>> > > XFS_DA_NODE_MAXDEPTH.
>> > >
>> > > Directory entry remove and rename (applicable only to the source
>> > > directory entry) operations are handled specially to allow them to
>> > > succeed in low extent count availability scenarios
>> > > i.e. xfs_bmap_del_extent_real() will now return -ENOSPC when a possible
>> > > extent count overflow is detected. -ENOSPC is already handled by higher
>> > > layers of XFS by letting,
>> > > 1. Empty Data/Free space index blocks to linger around until a future
>> > > remove operation frees them.
>> > > 2. Dabtree blocks would be swapped with the last block in the leaf space
>> > > followed by unmapping of the new last block.
>> > >
>> > > Also, Extent overflow check is performed for the target directory entry
>> > > of the rename operation only when the entry does not exist and a
>> > > non-zero space reservation is obtained successfully.
>> > >
>> > > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
>> > > ---
>> > > fs/xfs/libxfs/xfs_bmap.c | 15 ++++++++++++
>> > > fs/xfs/libxfs/xfs_inode_fork.h | 13 ++++++++++
>> > > fs/xfs/xfs_inode.c | 45 ++++++++++++++++++++++++++++++++++
>> > > fs/xfs/xfs_symlink.c | 5 ++++
>> > > 4 files changed, 78 insertions(+)
>> > >
>> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> > > index 32aeacf6f055..5fd804534e67 100644
>> > > --- a/fs/xfs/libxfs/xfs_bmap.c
>> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
>> > > @@ -5151,6 +5151,21 @@ xfs_bmap_del_extent_real(
>> > > /*
>> > > * Deleting the middle of the extent.
>> > > */
>> > > +
>> > > + /*
>> > > + * For directories, -ENOSPC will be handled by higher layers of
>> > > + * XFS by letting the corresponding empty Data/Free blocks to
>> > > + * linger around until a future remove operation. Dabtree blocks
>> > > + * would be swapped with the last block in the leaf space and
>> > > + * then the new last block will be unmapped.
>> > > + */
>> > > + if (S_ISDIR(VFS_I(ip)->i_mode) &&
>> > > + whichfork == XFS_DATA_FORK &&
>> > > + xfs_iext_count_may_overflow(ip, whichfork, 1)) {
>> > > + error = -ENOSPC;
>> > > + goto done;
>> >
>> > Hmm... it strikes me as a little odd that we're checking file mode and
>> > fork type in the middle of the bmap code. However, I think it's the
>> > case that the only place where anyone would punch a hole in the /middle/
>> > of an extent is xattr trees and regular files, right? And both of those
>> > cases are checked before we end up in the bmap code, right?
>>
>> Yes, your observation is correct. I will remove the file mode and fork type
>> checks.
>
> You might want to leave an assert there just in case someone else trips
> over that...
Ok.
>
>>
>> >
>> > So we only really need this check to prevent extent count overflows when
>> > removing dirents from directories, like the comment says, and only
>> > because directories don't have a hard requirement that the bunmapi
>> > succeeds. And I think this logic covers xfs_remove too? That's a bit
>> > subtle, but as there's no extent count check in that function, there's
>> > not much to attach a comment to... :)
>>
>> Yes, To provide more clarity, I should replace the above comment with
>> following,
>>
>> /*
>> * -ENOSPC is returned since a directory entry remove operation must not fail
>> * due to low extent count availability. -ENOSPC will be handled by higher
>> * layers of XFS by letting the corresponding empty Data/Free blocks to linger
>
> grammar nit: "...by letting the corresponding empty Data/Free blocks linger
> until a future remove operation."
>
Ok, I will fix that.
>> * around until a future remove operation. Dabtree blocks would be swapped with
>> * the last block in the leaf space and then the new last block will be
>> * unmapped.
>> *
>> * The above logic also applies to the source directory entry of a rename
>> * operation.
>> */
>>
>> >
>> > Hm. I think I'd like xfs_rename to get a brief comment that we're
>> > protected from extent count overflows in xfs_remove() by virtue of this
>> > "leave the dir block in place if we ENOSPC" capability:
>> >
>> > /*
>> > * NOTE: We don't need to check for extent overflows here
>> > * because the dir removename code will leave the dir block
>> > * in place if the extent count would overflow.
>> > */
>> > error = xfs_dir_removename(...);
>>
>> Sure, I will add that.
>>
>> >
>> > Do xattr trees also have the same ability? I think they do, at least
>> > for the dabtree part...?
>>
>> The following code snippet from xfs_da_shrink_inode() does the special casing
>> only for the data fork i.e. for blocks holding directory entries.
>>
>> for (;;) {
>> /*
>> * Remove extents. If we get ENOSPC for a dir we have to move
>> * the last block to the place we want to kill.
>> */
>> error = xfs_bunmapi(tp, dp, dead_blkno, count,
>> xfs_bmapi_aflag(w), 0, &done);
>> if (error == -ENOSPC) {
>> if (w != XFS_DATA_FORK)
>> break;
>> error = xfs_da3_swap_lastblock(args, &dead_blkno,
>> &dead_buf);
>>
>> So this facility is available only for directory entries.
>>
>> Hence for xattrs, if we ever reach the extent count limit, the only way out is
>> to delete the corresponding file.
>
> Ok. Just checking. :)
>
>> >
>> > I think I would've split this patch into three pieces:
>> >
>> > - create, link, and symlink in one patch (adding dirents),
>> > - the xfs_bmap_del_extent_real change and a comment for xfs_remove
>> > (removing dirents)
>> > - all the xfs_rename changes (adding and removing dirents)
>> >
>> > Though I dunno, this series is already 14 patches, and the part that I
>> > care most about is not leaving that subtlety in xfs_remove(). :)
>>
>> I think you are right about that. I will split this patch according to what
>> you have mentioned above.
>
>
> Thanks!
>
> FWIW I didn't see any regressions in fstests...
Cool. Thanks for the test run.
>
>> >
>> > Other than that, I follow the logic in this patch and will give it a
>> > testrun tonight.
>>
>> Thank you!
>>
>> >
>> > --D
>> >
>> > > + }
>> > > +
>> > > old = got;
>> > >
>> > > got.br_blockcount = del->br_startoff - got.br_startoff;
>> > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
>> > > index bcac769a7df6..ea1a9dd8a763 100644
>> > > --- a/fs/xfs/libxfs/xfs_inode_fork.h
>> > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
>> > > @@ -47,6 +47,19 @@ struct xfs_ifork {
>> > > */
>> > > #define XFS_IEXT_PUNCH_HOLE_CNT (1)
>> > >
>> > > +/*
>> > > + * Directory entry addition can cause the following,
>> > > + * 1. Data block can be added/removed.
>> > > + * A new extent can cause extent count to increase by 1.
>> > > + * 2. Free disk block can be added/removed.
>> > > + * Same behaviour as described above for Data block.
>> > > + * 3. Dabtree blocks.
>> > > + * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
>> > > + * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
>> > > + */
>> > > +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
>> > > + ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
>> > > +
>> > > /*
>> > > * Fork handling.
>> > > */
>> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> > > index b7352bc4c815..0db21368c7e1 100644
>> > > --- a/fs/xfs/xfs_inode.c
>> > > +++ b/fs/xfs/xfs_inode.c
>> > > @@ -1042,6 +1042,11 @@ xfs_create(
>> > > if (error)
>> > > goto out_trans_cancel;
>> > >
>> > > + error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
>> > > + XFS_IEXT_DIR_MANIP_CNT(mp));
>> > > + if (error)
>> > > + goto out_trans_cancel;
>> > > +
>> > > /*
>> > > * A newly created regular or special file just has one directory
>> > > * entry pointing to them, but a directory also the "." entry
>> > > @@ -1258,6 +1263,11 @@ xfs_link(
>> > > xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
>> > > xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
>> > >
>> > > + error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
>> > > + XFS_IEXT_DIR_MANIP_CNT(mp));
>> > > + if (error)
>> > > + goto error_return;
>> > > +
>> > > /*
>> > > * If we are using project inheritance, we only allow hard link
>> > > * creation in our tree when the project IDs are the same; else
>> > > @@ -3106,6 +3116,35 @@ xfs_rename(
>> > > /*
>> > > * Check for expected errors before we dirty the transaction
>> > > * so we can return an error without a transaction abort.
>> > > + *
>> > > + * Extent count overflow check:
>> > > + *
>> > > + * From the perspective of src_dp, a rename operation is essentially a
>> > > + * directory entry remove operation. Hence the only place where we check
>> > > + * for extent count overflow for src_dp is in
>> > > + * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
>> > > + * -ENOSPC when it detects a possible extent count overflow and in
>> > > + * response, the higher layers of directory handling code do the
>> > > + * following:
>> > > + * 1. Data/Free blocks: XFS lets these blocks linger around until a
>> > > + * future remove operation removes them.
>> > > + * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
>> > > + * Leaf space and unmaps the last block.
>> > > + *
>> > > + * For target_dp, there are two cases depending on whether the
>> > > + * destination directory entry exists or not.
>> > > + *
>> > > + * When destination directory entry does not exist (i.e. target_ip ==
>> > > + * NULL), extent count overflow check is performed only when transaction
>> > > + * has a non-zero sized space reservation associated with it. With a
>> > > + * zero-sized space reservation, XFS allows a rename operation to
>> > > + * continue only when the directory has sufficient free space in its
>> > > + * data/leaf/free space blocks to hold the new entry.
>> > > + *
>> > > + * When destination directory entry exists (i.e. target_ip != NULL), all
>> > > + * we need to do is change the inode number associated with the already
>> > > + * existing entry. Hence there is no need to perform an extent count
>> > > + * overflow check.
>> > > */
>> > > if (target_ip == NULL) {
>> > > /*
>> > > @@ -3116,6 +3155,12 @@ xfs_rename(
>> > > error = xfs_dir_canenter(tp, target_dp, target_name);
>> > > if (error)
>> > > goto out_trans_cancel;
>> > > + } else {
>> > > + error = xfs_iext_count_may_overflow(target_dp,
>> > > + XFS_DATA_FORK,
>> > > + XFS_IEXT_DIR_MANIP_CNT(mp));
>> > > + if (error)
>> > > + goto out_trans_cancel;
>> > > }
>> > > } else {
>> > > /*
>> > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
>> > > index 1f43fd7f3209..0b8136a32484 100644
>> > > --- a/fs/xfs/xfs_symlink.c
>> > > +++ b/fs/xfs/xfs_symlink.c
>> > > @@ -220,6 +220,11 @@ xfs_symlink(
>> > > if (error)
>> > > goto out_trans_cancel;
>> > >
>> > > + error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
>> > > + XFS_IEXT_DIR_MANIP_CNT(mp));
>> > > + if (error)
>> > > + goto out_trans_cancel;
>> > > +
>> > > /*
>> > > * Allocate an inode for the symlink.
>> > > */
>> >
>>
--
chandan
next prev parent reply other threads:[~2021-01-09 5:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-04 10:31 [PATCH V12 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 01/14] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 02/14] xfs: Check for extent overflow when trivally adding a new extent Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 03/14] xfs: Check for extent overflow when punching a hole Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 04/14] xfs: Check for extent overflow when adding/removing dir entries Chandan Babu R
2021-01-08 1:17 ` Darrick J. Wong
2021-01-08 4:55 ` Chandan Babu R
2021-01-09 1:33 ` Darrick J. Wong
2021-01-09 5:26 ` Chandan Babu R [this message]
2021-01-04 10:31 ` [PATCH V12 05/14] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 06/14] xfs: Check for extent overflow when writing to unwritten extent Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 07/14] xfs: Check for extent overflow when moving extent from cow to data fork Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 08/14] xfs: Check for extent overflow when remapping an extent Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 09/14] xfs: Check for extent overflow when swapping extents Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 11/14] xfs: Remove duplicate assert statement in xfs_bmap_btalloc() Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 12/14] xfs: Compute bmap extent alignments in a separate function Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 13/14] xfs: Process allocated extent " Chandan Babu R
2021-01-04 10:31 ` [PATCH V12 14/14] xfs: Introduce error injection to allocate only minlen size extents for files Chandan Babu R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y2h2r88d.fsf@garuda \
--to=chandanrlinux@gmail.com \
--cc=allison.henderson@oracle.com \
--cc=darrick.wong@oracle.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox