From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Al@disappointment.disaster, Viro@disappointment.disaster,
viro@ZenIV.linux.org.uk, xfs@oss.sgi.com
Subject: Re: [PATCH 5/6] xfs: splitting delalloc extents can run out of reservation
Date: Sat, 5 Apr 2014 08:31:52 +1100 [thread overview]
Message-ID: <20140404213152.GA17603@dastard> (raw)
In-Reply-To: <20140404133700.GA12961@laptop.bfoster>
On Fri, Apr 04, 2014 at 09:37:01AM -0400, Brian Foster wrote:
> On Fri, Mar 21, 2014 at 09:11:49PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > When we punch a hole in a delalloc extent, we split the indirect
> > block reservation between the two new extents. If we repeatedly
> > punch holes in a large delalloc extent, that reservation will
> > eventually run out and we'll assert fail in xfs_bunmapi() because
> > the indirect block reservation for the delalloc extent is zero. This
> > is caused by doing a large delalloc write, then zeroing multiple
> > ranges of that write using fallocate to punch lots of holes in the
> > delayed allocation range.
> >
> > To avoid this problem, if we split the reservation and require more
> > indirect blocks for the two new extents than we had for the old
> > reservation, steal the additional blocks from the hole we punched in
> > the extent. In most cases we only need a single extra block, so even
> > if we punch only single block holes we can still retain sufficient
> > indirect block reservations to avoid problems.
> >
> > In doing this "stealing", we need to change where we account for the
> > delalloc blocks being freed. The block count held on the inode does
> > not take into account the indirect block reservation, so we still
> > need to do that before we free the extent. However, the accounting
> > ofr free space in the superblock need to be done after we've stolen
> > the blocks fro the freed extent so that they are accounted for
> > correctly.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_bmap.c | 65 ++++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 47 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> > index 5b6092e..4bf6a0e 100644
> > --- a/fs/xfs/xfs_bmap.c
> > +++ b/fs/xfs/xfs_bmap.c
> > @@ -4945,7 +4945,27 @@ xfs_bmap_del_extent(
> > temp2 = xfs_bmap_worst_indlen(ip, temp2);
> > new.br_startblock = nullstartblock((int)temp2);
> > da_new = temp + temp2;
> > +
> > + /*
> > + * Note: if we have an odd number of blocks reserved,
> > + * then if we keep splitting the delalloc extent like
> > + * this we end up with a delalloc indlen reservation of
> > + * zero for one of the two extents. Hence if we end
> > + * up with the new indlen reservations being larger than
> > + * the old one, steal blocks from the data reservation
> > + * we just punched out. Otherwise, just reduce the
> > + * remaining indlen reservations alternately and hope
> > + * next time we come here the range getting removed is
> > + * large enough to fix this all up.
> > + */
> > while (da_new > da_old) {
> > + if (del->br_blockcount) {
> > + /* steal a block */
> > + da_new--;
> > + del->br_blockcount--;
> > + continue;
> > + }
> > +
> > if (temp) {
> > temp--;
> > da_new--;
> > @@ -5255,24 +5275,6 @@ xfs_bunmapi(
> > }
> > if (wasdel) {
> > ASSERT(startblockval(del.br_startblock) > 0);
> > - /* Update realtime/data freespace, unreserve quota */
> > - if (isrt) {
> > - xfs_filblks_t rtexts;
> > -
> > - rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> > - do_div(rtexts, mp->m_sb.sb_rextsize);
> > - xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> > - (int64_t)rtexts, 0);
> > - (void)xfs_trans_reserve_quota_nblks(NULL,
> > - ip, -((long)del.br_blockcount), 0,
> > - XFS_QMOPT_RES_RTBLKS);
> > - } else {
> > - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> > - (int64_t)del.br_blockcount, 0);
> > - (void)xfs_trans_reserve_quota_nblks(NULL,
> > - ip, -((long)del.br_blockcount), 0,
> > - XFS_QMOPT_RES_REGBLKS);
> > - }
> > ip->i_delayed_blks -= del.br_blockcount;
> > if (cur)
> > cur->bc_private.b.flags |=
> > @@ -5302,6 +5304,33 @@ xfs_bunmapi(
> > }
> > error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
> > &tmp_logflags, whichfork);
> > + /*
> > + * xfs_bmap_del_extent may hand delayed alloc blocks back to the
> > + * indirect block reservations to keep extent split reservations
> > + * sane. Hence we should only decrement the delayed block count
> > + * on the inode once we know exactly the amount of delalloc
> > + * space we actually removed from the inode.
> > + */
> > + if (wasdel && del.br_blockcount) {
> > + /* Update realtime/data freespace, unreserve quota */
> > + if (isrt) {
> > + xfs_filblks_t rtexts;
> > +
> > + rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> > + do_div(rtexts, mp->m_sb.sb_rextsize);
> > + xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> > + (int64_t)rtexts, 0);
> > + (void)xfs_trans_reserve_quota_nblks(NULL,
> > + ip, -((long)del.br_blockcount), 0,
> > + XFS_QMOPT_RES_RTBLKS);
> > + } else {
> > + xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> > + (int64_t)del.br_blockcount, 0);
> > + (void)xfs_trans_reserve_quota_nblks(NULL,
> > + ip, -((long)del.br_blockcount), 0,
> > + XFS_QMOPT_RES_REGBLKS);
> > + }
> > + }
>
> In xfs_bmapi_reserve_delalloc(), we account alen against the quota,
> calculate indlen, account both against the sb counters, add alen to
> i_delayed_blks and continue on...
*nod*
> In xfs_bunmap(), we remove br_blockcount from the sb counters and
> unreserve from the quota then call into xfs_bmap_del_extent(). The
> latter takes care of removing the indlen blocks from the sb counters if
> necessary.
*nod*
> With these changes, we move the accounting after the del_extent() call
> and allow the latter to steal from br_blockcount for indlen. This seems
> like it works for the sb counters.
*nod*
> We also adjust i_delayed_blks against
> the original extent length before it can be modified. The quota
> reservation looks like it could become inconsistent, however. E.g., some
> blocks previously accounted against the quota (alen) are now moved over
> to indlen.
Yes, you are right - it will cause inconsistency. That's a lesser
evil than not having a reservation at all, but I can probably fix
that up.
> If those indlen blocks happen to be cleaned up through
> del_extent(), they'd only be replenished to the sb counters (near the
> end of del_extent()). Perhaps we should leave the quota unreserve prior
> to the del_extent() call or sample the original br_blockcount and use it
> post del_extent()..?
That's a possibility. I really only looked at fixing the reservation
issue and keeping the sb counters correct, not the quota aspect of
it. I'll go back and see what I can come up with that solves this
problem, too.
Thanks!
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-04-04 21:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-21 10:11 [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Dave Chinner
2014-03-21 10:11 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
2014-03-21 10:11 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
2014-03-29 15:14 ` Brian Foster
2014-04-04 15:26 ` Brian Foster
2014-04-04 21:26 ` Dave Chinner
2014-03-21 10:11 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
2014-03-21 10:11 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
2014-03-21 10:11 ` [PATCH 5/6] xfs: splitting delalloc extents can run out of reservation Dave Chinner
2014-04-04 13:37 ` Brian Foster
2014-04-04 21:31 ` Dave Chinner [this message]
2014-03-21 10:11 ` [PATCH 6/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
2014-03-31 17:22 ` [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Brian Foster
2014-03-31 20:17 ` Dave Chinner
2014-04-01 11:54 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140404213152.GA17603@dastard \
--to=david@fromorbit.com \
--cc=Al@disappointment.disaster \
--cc=Viro@disappointment.disaster \
--cc=bfoster@redhat.com \
--cc=viro@ZenIV.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox