All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lachlan McIlroy <lachlan@sgi.com>
To: Lachlan McIlroy <lachlan@sgi.com>, xfs-dev <xfs-dev@sgi.com>,
	xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] Prevent extent btree block allocation failures
Date: Mon, 23 Jun 2008 16:40:26 +1000	[thread overview]
Message-ID: <485F455A.9060701@sgi.com> (raw)
In-Reply-To: <20080623061421.GG29319@disturbed>

Dave Chinner wrote:
> On Mon, Jun 23, 2008 at 03:57:22PM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> On Fri, Jun 20, 2008 at 03:21:20PM +1000, Dave Chinner wrote:
>>>> On Thu, Jun 19, 2008 at 05:28:50PM +1000, Lachlan McIlroy wrote:
>>>>> There's something else that looks suspicious to me - this code in
>>>>> xfs_bmap_btalloc() is setting minleft to 0.  Doesn't this go against
>>>>> what you were saying about setting minleft to be the space we might
>>>>> need for the btree operations?
>>>>>
>>>>> 	if (args.fsbno == NULLFSBLOCK && nullfb) {
>>>>> 		args.fsbno = 0;
>>>>> 		args.type = XFS_ALLOCTYPE_FIRST_AG;
>>>>> 		args.total = ap->minlen;
>>>>> 		args.minleft = 0;
>>>>> 		if ((error = xfs_alloc_vextent(&args)))
>>>>> 			return error;
>>>>> 		ap->low = 1;
>>>>> 	}
>>>> Hmmm - that looks suspicious. In xfs_bmapi(), when we are doing a
>>>> write and *firstblock == NULLFSBLOCK (which leads to nullfb being
>>>> set in the above code), we do:
>>>>
>>>>         if (wr && *firstblock == NULLFSBLOCK) {
>>>>                 if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
>>>>                         minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
>>>>                 else
>>>>                         minleft = 1;
>>>>         } else
>>>>                 minleft = 0;
>>>>
>>>> If we are in btree format we set the minleft to the number of blocks needed
>>>> for a split. If we are in extent or local format, change to extent of btree
>>>> format requires one extra block.
>>>>
>>>> The above code you point out definitely breaks this - we haven't done a
>>>> previous allocation so we can start from the first AG, but we sure as
>>>> hell still need minleft set to the number of blocks needed for a
>>>> format change or btree split.
>>> Just to point out yet another problem in this code (one that's just
>>> been tripped over @ agami) is unwritten extent conversion.
>>>
>>> Basically, we don't do an allocation here, so when we end up in
>>> xfs_bmap_add_extent_unwritten_real() with a null firstblock. Hence
>>> the cases where conversion can cause a split - case
>>> MASK(LEFT_FILLING), MASK(RIGHT_FILLING) and 0 (convert the middle of
>>> an extent) - we can select an AG that doesn't have enough space for
>>> the entire split as we've ignored the number of blocks we might
>>> need to allocate in the split (the minleft parameter) entirely.
>>>
>>> I suspect that xfs_bmbt_split() needs to handle the null first block
>>> case slightly differently - the minleft parameter passed to the
>>> allocation should not be zero - it should be the number of levels
>>> above the current level left in the tree. i.e:
>>>
>>> 	minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
>>>
>>> If we've already got a firstblock set, then this should have already
>>> been taken into account (i.e. we still need to fix the low space
>>> case where it got ignored as we were discussing).
>> Funny.  I tested the exact same change last week to try to fix the same
>> problem.  Seemed to work okay.
> 
> Cool. Got a patch for review?

I couldn't find the original patch that calculated minleft as above - instead
here's a variant that addresses the double insert problem by retrieving the
reservation amount from the transaction.  It could very well be overkill though.

--- fs/xfs/xfs_bmap_btree.c_1.169	2008-06-16 17:25:10.000000000 +1000
+++ fs/xfs/xfs_bmap_btree.c	2008-06-16 18:32:45.000000000 +1000
@@ -1496,9 +1496,12 @@ xfs_bmbt_split(
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = lbno;
 		args.type = XFS_ALLOCTYPE_START_BNO;
-	} else
+		args.minleft = xfs_trans_get_block_res(args.tp);
+	} else {
 		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-	args.mod = args.minleft = args.alignment = args.total = args.isfl =
+		args.minleft = 0;
+	}
+	args.mod = args.alignment = args.total = args.isfl =
 		args.userdata = args.minalignslop = 0;
 	args.minlen = args.maxlen = args.prod = 1;
 	args.wasdel = cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL;


> 
>> In the case where we convert the middle of an existing unwritten extent
>> we need to insert two new extents.  I might be paranoid here but I'll
>> assume the worst case scenario and that we'll need space for two complete
>> tree splits.
> 
> Yes, I think so. Certainly, if you look at the block reservation in
> xfs_iomap_write_unwritten():
> 
> 892         resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
> 
> #define XFS_DIOSTRAT_SPACE_RES(mp, v)   \
>         (XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + (v))
> 
> #define XFS_EXTENTADD_SPACE_RES(mp,w)   (XFS_BM_MAXLEVELS(mp,w) - 1)
> 
> It reserves enough blocks for 2 bmbt splits so I think this is
> definitely a possibility we need to handle.
> 
>> The first allocation for the first insert will set minleft
>> correctly but what about the allocations for splits during the second
>> insert?  We could run out of space in the chosen AG because minleft wasn't
>> enough.
> 
> Yeah, so we probably need pass a flag in the cursor to indicate
> it's a double split case when doing the first allocation in
> xfs_bmbt_split....
> 
> Cheers,
> 
> Dave.

  reply	other threads:[~2008-06-23  6:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-13  7:38 [PATCH] Prevent extent btree block allocation failures Lachlan McIlroy
2008-06-13 13:44 ` Christoph Hellwig
2008-06-16  3:57   ` Lachlan McIlroy
2008-06-13 15:57 ` Dave Chinner
2008-06-16  6:11   ` Lachlan McIlroy
2008-06-16 17:10     ` Dave Chinner
2008-06-17  1:58       ` Lachlan McIlroy
2008-06-17  7:39         ` Dave Chinner
2008-06-19  7:28           ` Lachlan McIlroy
2008-06-20  5:21             ` Dave Chinner
2008-06-23  5:20               ` Dave Chinner
2008-06-23  5:57                 ` Lachlan McIlroy
2008-06-23  6:14                   ` Dave Chinner
2008-06-23  6:40                     ` Lachlan McIlroy [this message]
2008-06-23  8:05                       ` Dave Chinner
2008-06-23  5:24               ` Lachlan McIlroy
2008-06-23  6:21                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=485F455A.9060701@sgi.com \
    --to=lachlan@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.