From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v3 04/11] xfs: update inode allocation/free transaction reservations for finobt
Date: Tue, 18 Feb 2014 12:10:16 -0500 [thread overview]
Message-ID: <530393F8.4070106@redhat.com> (raw)
In-Reply-To: <20140211064609.GE13647@dastard>
On 02/11/2014 01:46 AM, Dave Chinner wrote:
> On Tue, Feb 04, 2014 at 12:49:35PM -0500, Brian Foster wrote:
>> Create the xfs_calc_finobt_res() helper to calculate the finobt log
>> reservation for inode allocation and free. Update
>> XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt
>> insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to
>> reserve blocks for the potential finobt record insertion on inode
>> free (i.e., if an inode chunk was previously fully allocated).
>>
>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>> ---
>> fs/xfs/xfs_inode.c | 4 +++-
>> fs/xfs/xfs_trans_resv.c | 47 +++++++++++++++++++++++++++++++++++++++++++----
>> fs/xfs/xfs_trans_space.h | 7 ++++++-
>> 3 files changed, 52 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 001aa89..57c77ed 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -1730,7 +1730,9 @@ xfs_inactive_ifree(
>> int error;
>>
>> tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
>> - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, 0, 0);
>> + tp->t_flags |= XFS_TRANS_RESERVE;
>> + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree,
>> + XFS_IFREE_SPACE_RES(mp), 0);
>
> Can you add a comment explaining why the XFS_TRANS_RESERVE flag is
> used here, and why it's use won't lead to accelerated reserve pool
> depletion?
>
So this aspect of things appears to be a bit more interesting than I
originally anticipated. I "reserve enabled" this transaction to
facilitate the ability to free up inodes under ENOSPC conditions.
Without this, the problem of failing out of xfs_inactive_ifree() (and
leaving an inode chained on the unlinked list) is easily reproducible
with generic/083.
The basic argument for why this is reasonable is that releasing an inode
releases used space (i.e., file blocks and potentially directory blocks
and inode chunks over time). That said, I can manufacture situations
where this is not the case. E.g., allocate a bunch of 0-sized files,
consume remaining free space in some separate file, start removing
inodes in a manner that removes a single inode per chunk or so. This
creates a scenario where the inobt can be very large and the finobt very
small (likely a single record). Removing the inodes in this manner
reduces the likelihood of freeing up any space and thus rapidly grows
the finobt towards the size of the inobt without any free space
available. This might or might not qualify as sane use of the fs, but I
don't think the failure scenario is acceptable as things currently stand.
I think there are several ways this can go from here. A couple ideas
that have crossed my mind:
- Find a way to variably reserve the number of blocks that would be
required to grow the finobt to the finobt, based on current state. This
would require the total number of blocks (not just enough for a split),
so this could get complex and somewhat overbearing (i.e., a lot of space
could be quietly reserved, current tracking might not be sufficient and
the allocation paths could get hairy).
- Work to push the ifree transaction allocation and reservation to the
unlink codepath rather than the eviction codepath. Under normal
circumstances, chain the tp to the xfs_inode such that the eviction code
path can grab it and run. This prevents us going into the state where an
inode is unlinked without having enough space to free up. On the flip
side, ENOSPC on unlink isn't very forgiving behavior to the user.
I think the former approach is probably overkill for something that
might be a pathological situation. The latter approach is more simple,
but it feels like a bit of a hack. I've experimented with it a bit, but
I'm not quite sure yet if it introduces any transaction issues by
allocating the unlink and ifree transactions at the same time.
Perhaps another argument could be made that it's rather unlikely we run
into an fs with as many 0-sized (or sub-inode chunk sized) files as
required to deplete the reserve pool without freeing any space, and we
should just touch up the failure handling. E.g.,
1.) Continue to reserve enable the ifree transaction. Consider expanding
the reserve pool on finobt-enabled fs' if appropriate. Note that this is
not guaranteed to provide enough resources to populate the finobt to the
level of the inobt without freeing up more space.
2.) Attempt a !XFS_TRANS_RESERVE tp reservation in xfs_inactive_ifree().
If fails, xfs_warn()/notice() and enable XFS_TRANS_RESERVE.
3.) Attempt XFS_TRANS_RESERVE reservation. If fails, xfs_notice() and
shutdown.
And this could probably be made more intelligent to bail out sooner if
we repeat XFS_TRANS_RESERVE reservations without freeing up any space,
etc. Before going too far in one direction... thoughts?
Brian
>> if (error) {
>> ASSERT(XFS_FORCED_SHUTDOWN(mp));
>> xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES);
>> diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
>> index 2fd59c0..32f35c1 100644
>> --- a/fs/xfs/xfs_trans_resv.c
>> +++ b/fs/xfs/xfs_trans_resv.c
>> @@ -98,6 +98,37 @@ xfs_calc_inode_res(
>> }
>>
>> /*
>> + * The free inode btree is a conditional feature and the log reservation
>> + * requirements differ slightly from that of the traditional inode allocation
>> + * btree. The finobt tracks records for inode chunks with at least one free inode.
>> + * Therefore, a record can be removed from the tree for an inode allocation or
>> + * free and the associated merge reservation is unconditional. This also covers
>> + * the possibility of a split on record insertion.
>
> Slightly wider than 80 columns here. FWIW, if you use vim, add this
> rule to have it add a red line at the textwidth you have set:
>
> " highlight textwidth
> set cc=+1
>
> And that will point out lines that are too long quite obviously ;)
>
>> + *
>> + * the free inode btree: max depth * block size
>> + * the free inode btree entry: block size
>> + *
>> + * TODO: is the modify res really necessary? covered by the merge/split res?
>> + * This seems to be the pattern of ifree, but not create_resv_alloc. Why?
>
> The modify case is for an allocation that only updates an inobt
> record (i.e. chunk already allocated, free inodes in it). Because
> we can remove a finobt record when "modifying" the last free inode
> record in a chunk, "modify" can cause a redcord removal and hence a
> tree merge. In which case it's no different of any of the other
> finobt reservations....
>
>> @@ -267,6 +298,7 @@ xfs_calc_remove_reservation(
>> * the superblock for the nlink flag: sector size
>> * the directory btree: (max depth + v2) * dir block size
>> * the directory inode's bmap btree: (max depth + v2) * block size
>> + * the finobt
>> */
>> STATIC uint
>> xfs_calc_create_resv_modify(
>> @@ -275,7 +307,8 @@ xfs_calc_create_resv_modify(
>> return xfs_calc_inode_res(mp, 2) +
>> xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
>> (uint)XFS_FSB_TO_B(mp, 1) +
>> - xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
>> + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)) +
>> + xfs_calc_finobt_res(mp, 1);
>> }
>
> And this is where is starts to get complex. The modify operation can
> now cause a finobt merge, when means blocks will be allocated/freed.
> That means we now need to take into account:
>
> * the allocation btrees: 2 trees * (max depth - 1) * block size
>
> and anything else freeing an extent requires.
>
>> /*
>> @@ -285,6 +318,7 @@ xfs_calc_create_resv_modify(
>> * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
>> * the inode btree: max depth * blocksize
>> * the allocation btrees: 2 trees * (max depth - 1) * block size
>> + * the finobt
>> */
>> STATIC uint
>> xfs_calc_create_resv_alloc(
>> @@ -295,7 +329,8 @@ xfs_calc_create_resv_alloc(
>> xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
>> xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
>> xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
>> - XFS_FSB_TO_B(mp, 1));
>> + XFS_FSB_TO_B(mp, 1)) +
>> + xfs_calc_finobt_res(mp, 0);
>> }
>
> This reservation is only for v4 superblocks - the icreate
> transaction reservation is used for v5 superblocks, so that's the
> only one you need to modify.
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-02-18 17:10 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-04 17:49 [PATCH v3 00/11] xfs: introduce the free inode btree Brian Foster
2014-02-04 17:49 ` [PATCH v3 01/11] xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers Brian Foster
2014-02-04 17:49 ` [PATCH v3 02/11] xfs: reserve v5 superblock read-only compat. feature bit for finobt Brian Foster
2014-02-11 6:07 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 03/11] xfs: support the XFS_BTNUM_FINOBT free inode btree type Brian Foster
2014-02-11 6:22 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 04/11] xfs: update inode allocation/free transaction reservations for finobt Brian Foster
2014-02-11 6:46 ` Dave Chinner
2014-02-11 16:22 ` Brian Foster
2014-02-20 1:00 ` Dave Chinner
2014-02-20 16:04 ` Brian Foster
2014-02-18 17:10 ` Brian Foster [this message]
2014-02-18 20:34 ` Brian Foster
2014-02-20 2:01 ` Dave Chinner
2014-02-20 18:49 ` Brian Foster
2014-02-20 20:50 ` Dave Chinner
2014-02-20 21:14 ` Christoph Hellwig
2014-02-20 23:13 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 05/11] xfs: insert newly allocated inode chunks into the finobt Brian Foster
2014-02-11 6:48 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 06/11] xfs: use and update the finobt on inode allocation Brian Foster
2014-02-11 7:17 ` Dave Chinner
2014-02-11 16:32 ` Brian Foster
2014-02-14 20:01 ` Brian Foster
2014-02-20 0:38 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 07/11] xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper Brian Foster
2014-02-11 7:19 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 08/11] xfs: update the finobt on inode free Brian Foster
2014-02-11 7:31 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 09/11] xfs: add finobt support to growfs Brian Foster
2014-02-04 17:49 ` [PATCH v3 10/11] xfs: report finobt status in fs geometry Brian Foster
2014-02-11 7:34 ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 11/11] xfs: enable the finobt feature on v5 superblocks Brian Foster
2014-02-11 7:34 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530393F8.4070106@redhat.com \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox