From: Lachlan McIlroy <lachlan@sgi.com>
To: Lachlan McIlroy <lachlan@sgi.com>, xfs-dev <xfs-dev@sgi.com>,
xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] Prevent extent btree block allocation failures
Date: Mon, 23 Jun 2008 15:57:22 +1000 [thread overview]
Message-ID: <485F3B42.9050300@sgi.com> (raw)
In-Reply-To: <20080623052025.GF29319@disturbed>
Dave Chinner wrote:
> On Fri, Jun 20, 2008 at 03:21:20PM +1000, Dave Chinner wrote:
>> On Thu, Jun 19, 2008 at 05:28:50PM +1000, Lachlan McIlroy wrote:
>>> Dave Chinner wrote:
>>>> On Tue, Jun 17, 2008 at 11:58:47AM +1000, Lachlan McIlroy wrote:
>>>>> Dave Chinner wrote:
>>>>>> On Sunday 15 June 2008 11:11 pm, Lachlan McIlroy wrote:
>>>>>>> I'm well aware of that particular deadlock involving the freelist - I
>>>>>>> hit it while testing. If you look closely at the code that deadlock
>>>>>>> can occur with or without the AG locking avoidance logic. This is
>>>>>>> because the rest of the transaction is unaware that an AG has been
>>>>>>> locked due to a freelist operation.
>>>>>> Yes, which is why you need to prevent freelist modifications occurring
>>>>>> when you can't allocate anything out of the AG.
>>>>> That sounds reasonable but it isn't consistent with the deadlock I saw.
>>>>> One of the threads that was deadlocked had tried to allocate a data extent
>>>>> in AG3 but didn't find the space. It had modified, and hence locked, AG3
>>>>> due to modifying the freelist but since it didn't get the space it needed
>>>>> it had to go on to another AG.
>>>> That sounds like an exact allocation failure - there is enough
>>>> space, a large enough extent but no free space at the exact block
>>>> required. This is exactly the case that occurred with the inode
>>>> allocation - and then allocation in the same AG failed because of
>>>> alignment that wasn't taken into account by the first exact
>>>> allocation attempt. Perhaps the minalignslop calculation in
>>>> xfs_bmap_btalloc() is incorrect...
>>> Okay I'll look into that.
>>>
>>> There's something else that looks suspicious to me - this code in
>>> xfs_bmap_btalloc() is setting minleft to 0. Doesn't this go against
>>> what you were saying about setting minleft to be the space we might
>>> need for the btree operations?
>>>
>>> if (args.fsbno == NULLFSBLOCK && nullfb) {
>>> args.fsbno = 0;
>>> args.type = XFS_ALLOCTYPE_FIRST_AG;
>>> args.total = ap->minlen;
>>> args.minleft = 0;
>>> if ((error = xfs_alloc_vextent(&args)))
>>> return error;
>>> ap->low = 1;
>>> }
>> Hmmm - that looks suspicious. In xfs_bmapi(), when we are doing a
>> write and *firstblock == NULLFSBLOCK (which leads to nullfb being
>> set in the above code), we do:
>>
>> if (wr && *firstblock == NULLFSBLOCK) {
>> if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
>> minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
>> else
>> minleft = 1;
>> } else
>> minleft = 0;
>>
>> If we are in btree format we set the minleft to the number of blocks needed
>> for a split. If we are in extent or local format, change to extent of btree
>> format requires one extra block.
>>
>> The above code you point out definitely breaks this - we haven't done a
>> previous allocation so we can start from the first AG, but we sure as
>> hell still need minleft set to the number of blocks needed for a
>> format change or btree split.
>
> Just to point out yet another problem in this code (one that's just
> been tripped over @ agami) is unwritten extent conversion.
>
> Basically, we don't do an allocation here, so when we end up in
> xfs_bmap_add_extent_unwritten_real() with a null firstblock. Hence
> the cases where conversion can cause a split - case
> MASK(LEFT_FILLING), MASK(RIGHT_FILLING) and 0 (convert the middle of
> an extent) - we can select an AG that doesn't have enough space for
> the entire split as we've ignored the number of blocks we might
> need to allocate in the split (the minleft parameter) entirely.
>
> I suspect that xfs_bmbt_split() needs to handle the null first block
> case slightly differently - the minleft parameter passed to the
> allocation should not be zero - it should be the number of levels
> above the current level left in the tree. i.e:
>
> minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
>
> If we've already got a firstblock set, then this should have already
> been taken into account (i.e. we still need to fix the low space
> case where it got ignored as we were discussing).
>
Funny. I tested the exact same change last week to try to fix the same
problem. Seemed to work okay.
In the case where we convert the middle of an existing unwritten extent
we need to insert two new extents. I might be paranoid here but I'll
assume the worst case scenario and that we'll need space for two complete
tree splits. The first allocation for the first insert will set minleft
correctly but what about the allocations for splits during the second
insert? We could run out of space in the chosen AG because minleft wasn't
enough.
next prev parent reply other threads:[~2008-06-23 5:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-13 7:38 [PATCH] Prevent extent btree block allocation failures Lachlan McIlroy
2008-06-13 13:44 ` Christoph Hellwig
2008-06-16 3:57 ` Lachlan McIlroy
2008-06-13 15:57 ` Dave Chinner
2008-06-16 6:11 ` Lachlan McIlroy
2008-06-16 17:10 ` Dave Chinner
2008-06-17 1:58 ` Lachlan McIlroy
2008-06-17 7:39 ` Dave Chinner
2008-06-19 7:28 ` Lachlan McIlroy
2008-06-20 5:21 ` Dave Chinner
2008-06-23 5:20 ` Dave Chinner
2008-06-23 5:57 ` Lachlan McIlroy [this message]
2008-06-23 6:14 ` Dave Chinner
2008-06-23 6:40 ` Lachlan McIlroy
2008-06-23 8:05 ` Dave Chinner
2008-06-23 5:24 ` Lachlan McIlroy
2008-06-23 6:21 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=485F3B42.9050300@sgi.com \
--to=lachlan@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox