public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: Re: block allocations for the refcount btree
Date: Tue, 1 Mar 2016 10:18:09 -0800	[thread overview]
Message-ID: <20160301181809.GC27973@birch.djwong.org> (raw)
In-Reply-To: <20160212191046.GA28421@infradead.org>

On Fri, Feb 12, 2016 at 11:10:46AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> 
> I'm pretty sure that is a separate issue.  With the refcount btree we may
> allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> ensure this block is always available, but we still need to account for
> that in xfs_trans_reserve(), which we currently don't do for itruncate
> transactions.  

One side effect of the per-ag block reservation code is that it reserves all
the blocks that the refcountbt will ever need at mount time, which includes
decreasing the incore fdblocks counter at mount and putting it back at unmount
time.  This /should/ eliminate the need for reserving blocks in truncate
transactions, though clearly this isn't being done correctly.  The AGresv code
as of a couple weeks ago tried to monkey with the transaction block reservation
counts after the allocator does its usual accounting, which as you observe,
doesn't work.

Dave suggested that I embed the AGresv structures directly into xfs_perag, and
I realized that we'll only ever need two of these things -- one to feed the
AGFL (rmapbt) and another to feed the higher level btrees (refcountbt).  At the
same time, I decided that because the AGresv code ultimately knows whether an
allocation request was satisfied from a reservation or from the free space
btree, it should also have a hand in deciding whether or not to update the
transaction's block reservation.

So what I'm saying is that I think this problem was with the AGresv code not
doing accounting correctly, and that I've fixed it in a subsequent rewrite of
the AGresv code.  I'll post it later, after I figure out why generic/333
regresses with the new code.

However, there's one thing to be aware of -- if the AGresv uses up all the
blocks that were preallocated at mount time, the allocator will grab any free
blocks available and charge the blocks to the transaction, just like before.
If this ever happens (in theory we reserve enough blocks so that we can have a
refcount record for every block in the AG) then we'll still have this problem.

The most cautious thing to do, I think, is to combine the AGresv fixes with
this patch that adds a block reservation to truncate transactions in case the
AGresv can't supply a block to the refcount btree.  The problem here is that
for most cases we'll have both the AGresv and the transaction reserving blocks
for the same purpose, which seems excessive.  Moreover, it introduces the
possibility of userspace seeing ENOSPC while truncating files even if there's
actually sufficient space to handle a refcountbt split.

<shrug> What does everyone else think?

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-03-01 18:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10  9:30 block allocations for the refcount btree Christoph Hellwig
2016-02-10  9:50 ` Darrick J. Wong
2016-02-10 19:07   ` Christoph Hellwig
2016-02-10 21:40     ` Dave Chinner
2016-02-11 14:09       ` Brian Foster
2016-02-11 20:21         ` Dave Chinner
2016-02-12 19:10       ` Christoph Hellwig
2016-02-13  2:33         ` Dave Chinner
2016-02-13  4:44           ` Darrick J. Wong
2016-02-13  8:02             ` Christoph Hellwig
2016-02-13  7:48           ` Christoph Hellwig
2016-02-14  0:21             ` Dave Chinner
2016-03-01 18:18         ` Darrick J. Wong [this message]
2016-03-01 20:40           ` Christoph Hellwig
2016-03-02  5:24             ` Darrick J. Wong
2016-03-02  9:59               ` Christoph Hellwig
2016-03-02 16:41                 ` Darrick J. Wong
2016-03-02 16:57                   ` Christoph Hellwig
2016-03-02 21:21                     ` Darrick J. Wong
2016-03-03 14:05                       ` Christoph Hellwig
2016-03-04  1:36                         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160301181809.GC27973@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox