public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v3 04/11] xfs: update inode allocation/free transaction reservations for finobt
Date: Tue, 18 Feb 2014 15:34:10 -0500	[thread overview]
Message-ID: <5303C3C2.5070501@redhat.com> (raw)
In-Reply-To: <530393F8.4070106@redhat.com>

On 02/18/2014 12:10 PM, Brian Foster wrote:
> On 02/11/2014 01:46 AM, Dave Chinner wrote:
>> On Tue, Feb 04, 2014 at 12:49:35PM -0500, Brian Foster wrote:
>>> Create the xfs_calc_finobt_res() helper to calculate the finobt log
>>> reservation for inode allocation and free. Update
>>> XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt
>>> insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to
>>> reserve blocks for the potential finobt record insertion on inode
>>> free (i.e., if an inode chunk was previously fully allocated).
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>  fs/xfs/xfs_inode.c       |  4 +++-
>>>  fs/xfs/xfs_trans_resv.c  | 47 +++++++++++++++++++++++++++++++++++++++++++----
>>>  fs/xfs/xfs_trans_space.h |  7 ++++++-
>>>  3 files changed, 52 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>>> index 001aa89..57c77ed 100644
>>> --- a/fs/xfs/xfs_inode.c
>>> +++ b/fs/xfs/xfs_inode.c
>>> @@ -1730,7 +1730,9 @@ xfs_inactive_ifree(
>>>  	int			error;
>>>  
>>>  	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
>>> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, 0, 0);
>>> +	tp->t_flags |= XFS_TRANS_RESERVE;
>>> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree,
>>> +				  XFS_IFREE_SPACE_RES(mp), 0);
>>
>> Can you add a comment explaining why the XFS_TRANS_RESERVE flag is
>> used here, and why it's use won't lead to accelerated reserve pool
>> depletion?
>>
> 
> So this aspect of things appears to be a bit more interesting than I
> originally anticipated. I "reserve enabled" this transaction to
> facilitate the ability to free up inodes under ENOSPC conditions.
> Without this, the problem of failing out of xfs_inactive_ifree() (and
> leaving an inode chained on the unlinked list) is easily reproducible
> with generic/083.
> 
> The basic argument for why this is reasonable is that releasing an inode
> releases used space (i.e., file blocks and potentially directory blocks
> and inode chunks over time). That said, I can manufacture situations
> where this is not the case. E.g., allocate a bunch of 0-sized files,
> consume remaining free space in some separate file, start removing
> inodes in a manner that removes a single inode per chunk or so. This
> creates a scenario where the inobt can be very large and the finobt very
> small (likely a single record). Removing the inodes in this manner
> reduces the likelihood of freeing up any space and thus rapidly grows
> the finobt towards the size of the inobt without any free space
> available. This might or might not qualify as sane use of the fs, but I
> don't think the failure scenario is acceptable as things currently stand.
> 
> I think there are several ways this can go from here. A couple ideas
> that have crossed my mind:
> 
> - Find a way to variably reserve the number of blocks that would be
> required to grow the finobt to the finobt, based on current state. This
> would require the total number of blocks (not just enough for a split),
> so this could get complex and somewhat overbearing (i.e., a lot of space
> could be quietly reserved, current tracking might not be sufficient and
> the allocation paths could get hairy).
> 
> - Work to push the ifree transaction allocation and reservation to the
> unlink codepath rather than the eviction codepath. Under normal
> circumstances, chain the tp to the xfs_inode such that the eviction code
> path can grab it and run. This prevents us going into the state where an
> inode is unlinked without having enough space to free up. On the flip
> side, ENOSPC on unlink isn't very forgiving behavior to the user.
> 

- Add some state or flags bits to the finobt and the associated ability
to kill/invalidate it at runtime. Print a warning with regard to the
situation that indicates performance might be affected and a repair is
required to re-enable.

Brian

> I think the former approach is probably overkill for something that
> might be a pathological situation. The latter approach is more simple,
> but it feels like a bit of a hack. I've experimented with it a bit, but
> I'm not quite sure yet if it introduces any transaction issues by
> allocating the unlink and ifree transactions at the same time.
> 
> Perhaps another argument could be made that it's rather unlikely we run
> into an fs with as many 0-sized (or sub-inode chunk sized) files as
> required to deplete the reserve pool without freeing any space, and we
> should just touch up the failure handling. E.g.,
> 
> 1.) Continue to reserve enable the ifree transaction. Consider expanding
> the reserve pool on finobt-enabled fs' if appropriate. Note that this is
> not guaranteed to provide enough resources to populate the finobt to the
> level of the inobt without freeing up more space.
> 2.) Attempt a !XFS_TRANS_RESERVE tp reservation in xfs_inactive_ifree().
> If fails, xfs_warn()/notice() and enable XFS_TRANS_RESERVE.
> 3.) Attempt XFS_TRANS_RESERVE reservation. If fails, xfs_notice() and
> shutdown.
> 
> And this could probably be made more intelligent to bail out sooner if
> we repeat XFS_TRANS_RESERVE reservations without freeing up any space,
> etc. Before going too far in one direction... thoughts?
> 
> Brian
> 
>>>  	if (error) {
>>>  		ASSERT(XFS_FORCED_SHUTDOWN(mp));
>>>  		xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES);
>>> diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
>>> index 2fd59c0..32f35c1 100644
>>> --- a/fs/xfs/xfs_trans_resv.c
>>> +++ b/fs/xfs/xfs_trans_resv.c
>>> @@ -98,6 +98,37 @@ xfs_calc_inode_res(
>>>  }
>>>  
>>>  /*
>>> + * The free inode btree is a conditional feature and the log reservation
>>> + * requirements differ slightly from that of the traditional inode allocation
>>> + * btree. The finobt tracks records for inode chunks with at least one free inode.
>>> + * Therefore, a record can be removed from the tree for an inode allocation or
>>> + * free and the associated merge reservation is unconditional. This also covers
>>> + * the possibility of a split on record insertion.
>>
>> Slightly wider than 80 columns here. FWIW, if you use vim, add this
>> rule to have it add a red line at the textwidth you have set:
>>
>> " highlight textwidth
>> set cc=+1
>>
>> And that will point out lines that are too long quite obviously ;)
>>
>>> + *
>>> + * the free inode btree: max depth * block size
>>> + * the free inode btree entry: block size
>>> + *
>>> + * TODO: is the modify res really necessary? covered by the merge/split res?
>>> + * This seems to be the pattern of ifree, but not create_resv_alloc. Why?
>>
>> The modify case is for an allocation that only updates an inobt
>> record (i.e. chunk already allocated, free inodes in it). Because
>> we can remove a finobt record when "modifying" the last free inode
>> record in a chunk, "modify" can cause a redcord removal and hence a
>> tree merge. In which case it's no different of any of the other
>> finobt reservations....
>>
>>> @@ -267,6 +298,7 @@ xfs_calc_remove_reservation(
>>>   *    the superblock for the nlink flag: sector size
>>>   *    the directory btree: (max depth + v2) * dir block size
>>>   *    the directory inode's bmap btree: (max depth + v2) * block size
>>> + *    the finobt
>>>   */
>>>  STATIC uint
>>>  xfs_calc_create_resv_modify(
>>> @@ -275,7 +307,8 @@ xfs_calc_create_resv_modify(
>>>  	return xfs_calc_inode_res(mp, 2) +
>>>  		xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
>>>  		(uint)XFS_FSB_TO_B(mp, 1) +
>>> -		xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
>>> +		xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)) +
>>> +		xfs_calc_finobt_res(mp, 1);
>>>  }
>>
>> And this is where is starts to get complex. The modify operation can
>> now cause a finobt merge, when means blocks will be allocated/freed.
>> That means we now need to take into account:
>>
>>  *    the allocation btrees: 2 trees * (max depth - 1) * block size
>>
>> and anything else freeing an extent requires.
>>
>>>  /*
>>> @@ -285,6 +318,7 @@ xfs_calc_create_resv_modify(
>>>   *    the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize
>>>   *    the inode btree: max depth * blocksize
>>>   *    the allocation btrees: 2 trees * (max depth - 1) * block size
>>> + *    the finobt
>>>   */
>>>  STATIC uint
>>>  xfs_calc_create_resv_alloc(
>>> @@ -295,7 +329,8 @@ xfs_calc_create_resv_alloc(
>>>  		xfs_calc_buf_res(XFS_IALLOC_BLOCKS(mp), XFS_FSB_TO_B(mp, 1)) +
>>>  		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
>>>  		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
>>> -				 XFS_FSB_TO_B(mp, 1));
>>> +				 XFS_FSB_TO_B(mp, 1)) +
>>> +		xfs_calc_finobt_res(mp, 0);
>>>  }
>>
>> This reservation is only for v4 superblocks - the icreate
>> transaction reservation is used for v5 superblocks, so that's the
>> only one you need to modify.
>>
>> Cheers,
>>
>> Dave.
>>
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-02-18 20:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-04 17:49 [PATCH v3 00/11] xfs: introduce the free inode btree Brian Foster
2014-02-04 17:49 ` [PATCH v3 01/11] xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers Brian Foster
2014-02-04 17:49 ` [PATCH v3 02/11] xfs: reserve v5 superblock read-only compat. feature bit for finobt Brian Foster
2014-02-11  6:07   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 03/11] xfs: support the XFS_BTNUM_FINOBT free inode btree type Brian Foster
2014-02-11  6:22   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 04/11] xfs: update inode allocation/free transaction reservations for finobt Brian Foster
2014-02-11  6:46   ` Dave Chinner
2014-02-11 16:22     ` Brian Foster
2014-02-20  1:00       ` Dave Chinner
2014-02-20 16:04         ` Brian Foster
2014-02-18 17:10     ` Brian Foster
2014-02-18 20:34       ` Brian Foster [this message]
2014-02-20  2:01       ` Dave Chinner
2014-02-20 18:49         ` Brian Foster
2014-02-20 20:50           ` Dave Chinner
2014-02-20 21:14           ` Christoph Hellwig
2014-02-20 23:13             ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 05/11] xfs: insert newly allocated inode chunks into the finobt Brian Foster
2014-02-11  6:48   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 06/11] xfs: use and update the finobt on inode allocation Brian Foster
2014-02-11  7:17   ` Dave Chinner
2014-02-11 16:32     ` Brian Foster
2014-02-14 20:01     ` Brian Foster
2014-02-20  0:38       ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 07/11] xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper Brian Foster
2014-02-11  7:19   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 08/11] xfs: update the finobt on inode free Brian Foster
2014-02-11  7:31   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 09/11] xfs: add finobt support to growfs Brian Foster
2014-02-04 17:49 ` [PATCH v3 10/11] xfs: report finobt status in fs geometry Brian Foster
2014-02-11  7:34   ` Dave Chinner
2014-02-04 17:49 ` [PATCH v3 11/11] xfs: enable the finobt feature on v5 superblocks Brian Foster
2014-02-11  7:34   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5303C3C2.5070501@redhat.com \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox