public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Kevin Jamieson <kevin@kevinjamieson.com>
Cc: Mark Goodwin <markgw@sgi.com>, xfs@oss.sgi.com
Subject: Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
Date: Fri, 26 Sep 2008 20:16:00 +1000	[thread overview]
Message-ID: <20080926101600.GO27997@disturbed> (raw)
In-Reply-To: <62255.192.168.1.1.1222403942.squirrel@squirrel.kevinjamieson.com>

On Thu, Sep 25, 2008 at 09:39:02PM -0700, Kevin Jamieson wrote:
> On Thu, September 25, 2008 6:27 pm, Dave Chinner wrote:
> > On Thu, Sep 25, 2008 at 03:56:25PM -0700, Kevin Jamieson wrote:
> >> On Tue, September 23, 2008 2:18 am, Dave Chinner wrote:
> >>
> >> > A metadump will tell us what the freespace patterns are....
> >>
> >> Hi Dave,
> >>
> >> A metadump of a file system that triggers this issue is now available on

Cc'įng this back to the open list because there have been several
other occurrences of this problem recently, so I want this to
hit the public archives.

Firstly, Kevinn, thank you for the image and the trivial test case.
This is exactly how I found the last one of these problems - 20
minutes with UML and single stepping through gdb....

Breakpoint 5, xfs_dir_createname (tp=0x7f3bba18, dp=0x7f4b90c0,
	name=0x7f273c30, inum=308318065,
	first=0x7f273bc0, flist=0x7f273b90, total=35) at fs/xfs/xfs_dir2.c:207

We have a reservation of 35 blocks for this operation.

	xfs_dir_createname()
	  xfs_dir2_node_addname()
		xfs_dir2_node_addname_int()	- adds new block,
		xfs_dir2_leafn_add() 	- full block, no stale
		xfs_da_split()
		  xfs_dir2_leafn_split() - single block allocated out of AG 9
		    xfs_da_grow_inode()
		xfs_da_root_split()
		    xfs_da_grow_inode() - fails to allocate single block

Allocation fails with AG 9 having 34 free blocks and it does not
try any other AG. Now to trace the second xfs_bmapi call to see why
it fails.

	xfs_bmapi()
	  xfs_bmap_alloc()
(gdb) p *ap
$26 = {firstblock = 83559978, rval = 1612378431, off = 8388610, tp = 0x7f14da18,
  ip = 0x7f4ba0c0, prevp = 0x7f0c7790, gotp = 0x7f0c77b0, alen = 1, total = 35,
  minlen = 1, minleft = 0, eof = 0 '\0', wasdel = 0 '\0', userdata = 0 '\0',
  low = 0 '\0', aeof = 0 '\0', conv = 0 '\0'}

	  xfs_bmap_btalloc()
	    xfs_alloc_vextent()
(gdb) p *args
$30 = {tp = 0x7f14da18, mp = 0x7f12f800, agbp = 0x7f0c77f4, pag = 0x7f595e80, fsbno = 83559979,
  agno = 9, agbno = 0, minlen = 1, maxlen = 1, mod = 0, prod = 1, minleft = 0,
  total = 35, alignment = 1, minalignslop = 0, len = 2131523184, type = XFS_ALLOCTYPE_NEAR_BNO,
  otype = 1612065600, wasdel = 0 '\0', wasfromfl = 0 '\0', isfl = 0 '\0', userdata = 0 '\0',
  firstblock = 83559978}

		xfs_alloc_fix_freelist()

1842         if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
1843                 need = XFS_MIN_FREELIST_PAG(pag, mp);
1844                 delta = need > pag->pagf_flcount ? need - pag->pagf_flcount : 0;
1845                 /*
1846                  * If it looks like there isn't a long enough extent, or enough
1847                  * total blocks, reject it.
1848                  */
1849                 longest = (pag->pagf_longest > delta) ?
1850                         (pag->pagf_longest - delta) :
1851                         (pag->pagf_flcount > 0 || pag->pagf_longest > 0);
1852                 if ((args->minlen + args->alignment + args->minalignslop - 1) >
1853                                 longest ||
1854 >>>>>>>             ((int)(pag->pagf_freeblks + pag->pagf_flcount -
1855 >>>>>>>                    need - args->total) < (int)args->minleft)) {
1856                         if (agbp)
1857                                 xfs_trans_brelse(tp, agbp);
1858 >>>>>>>                 args->agbp = NULL;
1859 >>>>>>>                 return 0;
1860                 }
1861         }


We are failing the marked check.

	pag->pagf_freeblks + pag->pagf_flcount - need - args->total = -1.

and

	args->minleft = 0

The problem is that AG 9 has only 34 free blocks left when the root
split occurs.

So, what has happened is this:

	- transaction block reservation is for 35 blocks
	- directory located in AG 9
	- AG 9 has 35 free blocks.
	- we've allocated a new block in the directory for the name
		- allocation set up with args->total = 35
		- single block allocated reduces AG 9 to 34 free blocks
	- node is full, so can't add pointer to new free block
		- triggers root split
	- root split tries to allocate new block with:
		- allocation set up with args->total = 35
		- AG 9 only has 34 free blocks now.
		- fails with not enough space for "entire transaction"
		  in the AG.

Hence an ENOSPC with plenty of space left in the AG and huge
amounts of free space in the filesystem, and a shutdown because we
are cancelling a dirty transaction.

There's several problems here.

	1. the directory code does not account for blocks that get allocated
	   by reducing args->total as blocks are allocated. That directly
	   causes this particular shutdown.

	2. the xfs bmap code has no way of passing back how many blocks
	   were allocated to the inode. We're going to have to infer it
	   from the number of reserved blocks used in the transaction
	   structure or from the change in the inode block count
	   across the allocation....

	3. we're going to have to audit and fix all the allocation
	   calls in the directory code to ensure the accounting is
	   correct.

	4. the check in xfs_alloc_fix_freelist() is incorrect.
		- it assumes that we can completely empty the AG
		- we must leave 4 blocks behind in the AG so that the
		  first extent free on a full AG can succeed.
		- hence even if we fix 1), this case could still fail
		  once we get to 32 of 35 blocks allocated.

	5. The metadata allocation is a XFS_ALLOCTYPE_NEAR_BNO
	   allocation with no fallback if the AG is ENOSPC - if we
	   can't allocate in that AG, we fail. Why isn't there a
	   fallback in this case? Directory btree blocks are not
	   confined to a single AG, right?

This is going to take a bit of work to fix....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2008-09-26 10:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-21 19:29 XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c Kevin Jamieson
2008-09-23  7:50 ` Mark Goodwin
2008-09-23  9:18 ` Dave Chinner
2008-09-24  2:49   ` Kevin Jamieson
     [not found]   ` <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>
     [not found]     ` <20080926012704.GI27997@disturbed>
     [not found]       ` <62255.192.168.1.1.1222403942.squirrel@squirrel.kevinjamieson.com>
2008-09-26 10:16         ` Dave Chinner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-02-25 20:58 Wolfgang Karall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080926101600.GO27997@disturbed \
    --to=david@fromorbit.com \
    --cc=kevin@kevinjamieson.com \
    --cc=markgw@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox