public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Al@disappointment.disaster, Viro@disappointment.disaster,
	viro@ZenIV.linux.org.uk, xfs@oss.sgi.com
Subject: Re: [PATCH 5/6] xfs: splitting delalloc extents can run out of reservation
Date: Fri, 4 Apr 2014 09:37:01 -0400	[thread overview]
Message-ID: <20140404133700.GA12961@laptop.bfoster> (raw)
In-Reply-To: <1395396710-3824-6-git-send-email-david@fromorbit.com>

On Fri, Mar 21, 2014 at 09:11:49PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we punch a hole in a delalloc extent, we split the indirect
> block reservation between the two new extents. If we repeatedly
> punch holes in a large delalloc extent, that reservation will
> eventually run out and we'll assert fail in xfs_bunmapi() because
> the indirect block reservation for the delalloc extent is zero. This
> is caused by doing a large delalloc write, then zeroing multiple
> ranges of that write using fallocate to punch lots of holes in the
> delayed allocation range.
> 
> To avoid this problem, if we split the reservation and require more
> indirect blocks for the two new extents than we had for the old
> reservation, steal the additional blocks from the hole we punched in
> the extent. In most cases we only need a single extra block, so even
> if we punch only single block holes we can still retain sufficient
> indirect block reservations to avoid problems.
> 
> In doing this "stealing", we need to change where we account for the
> delalloc blocks being freed. The block count held on the inode does
> not take into account the indirect block reservation, so we still
> need to do that before we free the extent. However, the accounting
> ofr free space in the superblock need to be done after we've stolen
> the blocks fro the freed extent so that they are accounted for
> correctly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bmap.c | 65 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 47 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index 5b6092e..4bf6a0e 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
> @@ -4945,7 +4945,27 @@ xfs_bmap_del_extent(
>  			temp2 = xfs_bmap_worst_indlen(ip, temp2);
>  			new.br_startblock = nullstartblock((int)temp2);
>  			da_new = temp + temp2;
> +
> +			/*
> +			 * Note: if we have an odd number of blocks reserved,
> +			 * then if we keep splitting the delalloc extent like
> +			 * this we end up with a delalloc indlen reservation of
> +			 * zero for one of the two extents. Hence if we end
> +			 * up with the new indlen reservations being larger than
> +			 * the old one, steal blocks from the data reservation
> +			 * we just punched out. Otherwise, just reduce the
> +			 * remaining indlen reservations alternately and hope
> +			 * next time we come here the range getting removed is
> +			 * large enough to fix this all up.
> +			 */
>  			while (da_new > da_old) {
> +				if (del->br_blockcount) {
> +					/* steal a block */
> +					da_new--;
> +					del->br_blockcount--;
> +					continue;
> +				}
> +
>  				if (temp) {
>  					temp--;
>  					da_new--;
> @@ -5255,24 +5275,6 @@ xfs_bunmapi(
>  		}
>  		if (wasdel) {
>  			ASSERT(startblockval(del.br_startblock) > 0);
> -			/* Update realtime/data freespace, unreserve quota */
> -			if (isrt) {
> -				xfs_filblks_t rtexts;
> -
> -				rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> -				do_div(rtexts, mp->m_sb.sb_rextsize);
> -				xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> -						(int64_t)rtexts, 0);
> -				(void)xfs_trans_reserve_quota_nblks(NULL,
> -					ip, -((long)del.br_blockcount), 0,
> -					XFS_QMOPT_RES_RTBLKS);
> -			} else {
> -				xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> -						(int64_t)del.br_blockcount, 0);
> -				(void)xfs_trans_reserve_quota_nblks(NULL,
> -					ip, -((long)del.br_blockcount), 0,
> -					XFS_QMOPT_RES_REGBLKS);
> -			}
>  			ip->i_delayed_blks -= del.br_blockcount;
>  			if (cur)
>  				cur->bc_private.b.flags |=
> @@ -5302,6 +5304,33 @@ xfs_bunmapi(
>  		}
>  		error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
>  				&tmp_logflags, whichfork);
> +		/*
> +		 * xfs_bmap_del_extent may hand delayed alloc blocks back to the
> +		 * indirect block reservations to keep extent split reservations
> +		 * sane. Hence we should only decrement the delayed block count
> +		 * on the inode once we know exactly the amount of delalloc
> +		 * space we actually removed from the inode.
> +		 */
> +		if (wasdel && del.br_blockcount) {
> +			/* Update realtime/data freespace, unreserve quota */
> +			if (isrt) {
> +				xfs_filblks_t rtexts;
> +
> +				rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> +				do_div(rtexts, mp->m_sb.sb_rextsize);
> +				xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> +						(int64_t)rtexts, 0);
> +				(void)xfs_trans_reserve_quota_nblks(NULL,
> +					ip, -((long)del.br_blockcount), 0,
> +					XFS_QMOPT_RES_RTBLKS);
> +			} else {
> +				xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> +						(int64_t)del.br_blockcount, 0);
> +				(void)xfs_trans_reserve_quota_nblks(NULL,
> +					ip, -((long)del.br_blockcount), 0,
> +					XFS_QMOPT_RES_REGBLKS);
> +			}
> +		}

In xfs_bmapi_reserve_delalloc(), we account alen against the quota,
calculate indlen, account both against the sb counters, add alen to
i_delayed_blks and continue on...

In xfs_bunmap(), we remove br_blockcount from the sb counters and
unreserve from the quota then call into xfs_bmap_del_extent(). The
latter takes care of removing the indlen blocks from the sb counters if
necessary.

With these changes, we move the accounting after the del_extent() call
and allow the latter to steal from br_blockcount for indlen. This seems
like it works for the sb counters. We also adjust i_delayed_blks against
the original extent length before it can be modified. The quota
reservation looks like it could become inconsistent, however. E.g., some
blocks previously accounted against the quota (alen) are now moved over
to indlen. If those indlen blocks happen to be cleaned up through
del_extent(), they'd only be replenished to the sb counters (near the
end of del_extent()). Perhaps we should leave the quota unreserve prior
to the del_extent() call or sample the original br_blockcount and use it
post del_extent()..?

Brian

>  		logflags |= tmp_logflags;
>  		if (error)
>  			goto error0;
> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-04-04 13:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-21 10:11 [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Dave Chinner
2014-03-21 10:11 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
2014-03-21 10:11 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
2014-03-29 15:14   ` Brian Foster
2014-04-04 15:26     ` Brian Foster
2014-04-04 21:26       ` Dave Chinner
2014-03-21 10:11 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
2014-03-21 10:11 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
2014-03-21 10:11 ` [PATCH 5/6] xfs: splitting delalloc extents can run out of reservation Dave Chinner
2014-04-04 13:37   ` Brian Foster [this message]
2014-04-04 21:31     ` Dave Chinner
2014-03-21 10:11 ` [PATCH 6/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
2014-03-31 17:22 ` [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Brian Foster
2014-03-31 20:17   ` Dave Chinner
2014-04-01 11:54     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140404133700.GA12961@laptop.bfoster \
    --to=bfoster@redhat.com \
    --cc=Al@disappointment.disaster \
    --cc=Viro@disappointment.disaster \
    --cc=david@fromorbit.com \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox