From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] Prevent extent btree block allocation failures
Date: Mon, 23 Jun 2008 16:21:58 +1000 [thread overview]
Message-ID: <20080623062157.GH29319@disturbed> (raw)
In-Reply-To: <485F339D.7090802@sgi.com>
On Mon, Jun 23, 2008 at 03:24:45PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Thu, Jun 19, 2008 at 05:28:50PM +1000, Lachlan McIlroy wrote:
>>> I see it sets a lowspace indicator which filters back up and into
>>> some btree operations. It appears the purpose of this feature is to
>>> allow allocations to search for space in other AGs as in this example
>>> from xfs_bmap_extents_to_btree():
>>>
>>> if (*firstblock == NULLFSBLOCK) {
>>> args.type = XFS_ALLOCTYPE_START_BNO;
>>> args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
>>> } else if (flist->xbf_low) {
>>> args.type = XFS_ALLOCTYPE_START_BNO;
>>> args.fsbno = *firstblock;
>>> } else {
>>> args.type = XFS_ALLOCTYPE_NEAR_BNO;
>>> args.fsbno = *firstblock;
>>> }
>>
>> Hmmm - the only place xbf_low is used in the extent-to-btree conversion. I
>> don't have access to the revision history anymore, so i can't find out what
>> bug the xbf_low condition was added for. It certainly looks like it is
>> allowing the btree block to be allocated in a different AG to data block.
>
> The lowspace algorithm was added way back in the early '90s and has been
> 'tweaked' many times since.
It's good to have a complete revision history around somewhere. ;)
>>> This is sort of what I was trying to do with my patch but without the
>>> special lowspace condition. This lowspace feature is probably broken
>>> because there was a similar special case in xfs_bmbt_split() that got
>>> removed with the changes that fixed the AG out-of-order locking problem.
>>>
>>> @@ -1569,12 +1569,11 @@
>>> lbno = XFS_DADDR_TO_FSB(args.mp, XFS_BUF_ADDR(lbp));
>>> left = XFS_BUF_TO_BMBT_BLOCK(lbp);
>>> args.fsbno = cur->bc_private.b.firstblock;
>>> + args.firstblock = args.fsbno;
>>> if (args.fsbno == NULLFSBLOCK) {
>>> args.fsbno = lbno;
>>> args.type = XFS_ALLOCTYPE_START_BNO;
>>> - } else if (cur->bc_private.b.flist->xbf_low)
>>> - args.type = XFS_ALLOCTYPE_FIRST_AG;
>>> - else
>>> + } else
>>> args.type = XFS_ALLOCTYPE_NEAR_BNO;
>>> args.mod = args.minleft = args.alignment = args.total = args.isfl =
>>> args.userdata = args.minalignslop = 0;
>>>
>>> This could be why we have allocations failing now.
>>
>> Hmmm - yes, could be. Well found, Lachlan. Was there an equivalent change
>> to the allocation of a new root block?
>
> Yeah but it got dropped 14 years ago in a code cleanup change! Looks like
> it was by mistake too.
Ouch.
> There used to be another special case for converting
> delayed allocations that had the same semantics as this low space trick - it
> used XFS_ALLOCTYPE_FIRST_AG instead - maybe to try a little harder to find
> space for cases where it is too late to return an error to the user.
Interesting. Any idea why that was removed? Or another accident?
[....]
>> Hmmmm - perhaps before allocating with minleft = 0 we need to
>> check if we can allocate the rest of the blocks from another AG and
>> lock both AGs in the correct order first, recheck we can allocate
>> from both of them after they are locked but before modifying anything
>> and only then proceed. If we can't find two AGs to allocate from then
>> we can safely ENOSPC without any problems. In this special case we'd then
>> be able to search the entire FS for space and hence only get an ENOSPC
>> if we are really at ENOSPC. Can you pick holes in this?
>
> Sounds like it should work. We may need to lock more than two AGs at once to
> find all the space we need. Since we can't lock AGs out of order then if we get
> to the last AG and we still don't have enough space then we will need to try
> again but start at an earlier AG (say AG 0 which should work).
In this case, I'd just start from XFS_ALLOCTYPE_FIRST_AG rather than
doing multiple passes and having to undo between them....
> So the code in xfs_alloc_vextent() that uses XFS_ALLOCTYPE_FIRST_AG and sets
> minleft to 0 should work.
Yes, as long as the next selected AG has the original minleft blocks
available in it.
> If we start at AG 0 and we've done the proper reservations
> then we should eventually find all the space we need - as long as everything that
> needs to allocate space obeys the lowspace algorithm and we always kick off each
> search for space from the AG we last allocated from.
Yup. Seems sane to me.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2008-06-23 6:21 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-13 7:38 [PATCH] Prevent extent btree block allocation failures Lachlan McIlroy
2008-06-13 13:44 ` Christoph Hellwig
2008-06-16 3:57 ` Lachlan McIlroy
2008-06-13 15:57 ` Dave Chinner
2008-06-16 6:11 ` Lachlan McIlroy
2008-06-16 17:10 ` Dave Chinner
2008-06-17 1:58 ` Lachlan McIlroy
2008-06-17 7:39 ` Dave Chinner
2008-06-19 7:28 ` Lachlan McIlroy
2008-06-20 5:21 ` Dave Chinner
2008-06-23 5:20 ` Dave Chinner
2008-06-23 5:57 ` Lachlan McIlroy
2008-06-23 6:14 ` Dave Chinner
2008-06-23 6:40 ` Lachlan McIlroy
2008-06-23 8:05 ` Dave Chinner
2008-06-23 5:24 ` Lachlan McIlroy
2008-06-23 6:21 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080623062157.GH29319@disturbed \
--to=david@fromorbit.com \
--cc=lachlan@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.