From: David Chinner <dgc@sgi.com>
To: "Christian Røsnes" <christian.rosnes@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
Date: Fri, 14 Mar 2008 01:53:49 +1100 [thread overview]
Message-ID: <20080313145349.GJ95344431@sgi.com> (raw)
In-Reply-To: <1a4a774c0803130446x609b9cb2mf3da323183c35606@mail.gmail.com>
ok..... loads the metadump...
Looking at the AGF status before the mkdir:
dgc@budgie:/mnt/test$ for i in `seq 0 1 15`; do echo AG $i ; sudo xfs_db -r -c "agf $i" -c 'p flcount longest' -f /mnt/scratch/shutdown; done
AG 0
flcount = 6
longest = 8
AG 1
flcount = 6
longest = 8
AG 2
flcount = 6
longest = 7
AG 3
flcount = 6
longest = 7
AG 4
flcount = 6
longest = 7
AG 5
flcount = 7
longest = 8
....
AG 5 immediately caught my eye:
seqno = 5
length = 4476752
bnoroot = 7
cntroot = 46124
bnolevel = 2
cntlevel = 2
flfirst = 56
fllast = 62
flcount = 7
freeblks = 68797
longest = 8
btreeblks = 0
magicnum = 0x58414746
versionnum = 1
Mainly because at level 2 btrees this:
blocks = XFS_MIN_FREELIST_PAG(pag,mp);
gives blocks = 6 and the freelist count says 7 blocks.
hence if the alignment check fails in some way, it will
try to reduce the free list down to 6 blocks. Unsurprisingly,
then, this breakpoint (what function does every "log object"
operation call?) eventually tripped:
Stack traceback for pid 2936
0xe000003817440000 2936 2902 1 1 R 0xe0000038174403a0 *mkdir
0xa0000001003c3cc0 xfs_trans_find_item
0xa0000001003c0d10 xfs_trans_log_buf+0x2f0
0xa0000001002f81e0 xfs_alloc_log_agf+0x80
0xa0000001002fa3d0 xfs_alloc_get_freelist+0x3d0
0xa0000001002ffe90 xfs_alloc_fix_freelist+0x770
0xa000000100300a00 xfs_alloc_vextent+0x440
0xa000000100374d70 xfs_ialloc_ag_alloc+0x2d0
0xa000000100375dd0 xfs_dialloc+0x2d0
.......
Which is the first place we dirty a log item. It's the
AGF of block 5:
[1]kdb> xbuf 0xe0000038143e2e00
buf 0xe0000038143e2e00 agf 0xe000003817550200
magicnum 0x58414746 versionnum 0x1 seqno 0x5 length 0x444f50
roots b 0x7 c 0xb42c levels b 2 c 2
flfirst 57 fllast 62 flcount 6 freeblks 68797 longest 8
And you'll note that flcount = 6 and flfirst = 57 now. In memory:
[1]kdb> xperag 0xe000003802979510
.....
ag 5 f_init 1 i_init 1
f_levels[b,c] 2,2 f_flcount 6 f_freeblks 68797 f_longest 8
f__metadata 0
i_freecount 0 i_inodeok 1
.....
f_flcount = 6 as well. So, we've really modified the AGF here,
and find out why the alignement checks failed.
[1]kdb> xalloc 0xe00000381744fc48
tp 0xe000003817450000 mp 0xe000003802979510 agbp 0x0000000000020024 pag 0xe000003802972378 fsbno 42563856[5:97910]
agno 0x5 agbno 0xffffffff minlen 0x8 maxlen 0x8 mod 0x0
prod 0x1 minleft 0x0 total 0x0 alignment 0x1
minalignslop 0x0 len 0x0 type this_bno otype this_bno wasdel 0
wasfromfl 0 isfl 0 userdata 0
Oh - alignment = 1. How did that happen? And why did it fail? I
note: "this_bno" means it wants an exact allocation (fsbno
42563856[5:97910]). Ah, that means we are in the first attmpt to
allocate a block in an AG. i.e here:
153 /*
154 * First try to allocate inodes contiguous with the last-allocated
155 * chunk of inodes. If the filesystem is striped, this will fill
156 * an entire stripe unit with inodes.
157 */
158 agi = XFS_BUF_TO_AGI(agbp);
159 newino = be32_to_cpu(agi->agi_newino);
160 args.agbno = XFS_AGINO_TO_AGBNO(args.mp, newino) +
161 XFS_IALLOC_BLOCKS(args.mp);
162 if (likely(newino != NULLAGINO &&
163 (args.agbno < be32_to_cpu(agi->agi_length)))) {
164 args.fsbno = XFS_AGB_TO_FSB(args.mp,
165 be32_to_cpu(agi->agi_seqno), args.agbno);
166 args.type = XFS_ALLOCTYPE_THIS_BNO;
167 args.mod = args.total = args.wasdel = args.isfl =
168 args.userdata = args.minalignslop = 0;
169 >>>>>>>> args.prod = 1;
170 >>>>>>>> args.alignment = 1;
171 /*
172 * Allow space for the inode btree to split.
173 */
174 args.minleft = XFS_IN_MAXLEVELS(args.mp) - 1;
175 >>>>>>>> if ((error = xfs_alloc_vextent(&args)))
176 return error;
This now makes sense - at first we attempt an unaligned, exact block
allocation. This gets us to modifying the free list because we have
a free 8 block extent as required. However, the exact extent being
asked for is not free, so the btree lookup fails and we abort the
allocation attempt.
We then fall back to method 2 - try stripe alignment - which now
fails the longest free block checks because alignment is accounted
for and we need ~24 blocks to make sure of this.
We fall back to method 3 - cluster alignment - which also fails
because we need a extent of 9 blocks, but we only have extents of
8 blocks available.
We never try again without alignment....
Now we fail allocation in that AG having dirtied the AGF, the AGFL,
and a block out of both the by-size and by-count free space btrees.
Hence when we fail to allocate in all other AGs, we return ENOSPC
and the transaction get cancelled. Because it has dirty items
in it, we get shut down.
But no wonder it was so hard to reproduce....
The patch below fixes the shutdown for me. Can you give it a go?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_ialloc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-13 13:07:24.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-14 01:40:21.926153338 +1100
@@ -167,7 +167,12 @@ xfs_ialloc_ag_alloc(
args.mod = args.total = args.wasdel = args.isfl =
args.userdata = args.minalignslop = 0;
args.prod = 1;
- args.alignment = 1;
+ if (xfs_sb_version_hasalign(&args.mp->m_sb) &&
+ args.mp->m_sb.sb_inoalignmt >=
+ XFS_B_TO_FSBT(args.mp, XFS_INODE_CLUSTER_SIZE(args.mp)))
+ args.alignment = args.mp->m_sb.sb_inoalignmt;
+ else
+ args.alignment = 1;
/*
* Allow space for the inode btree to split.
*/
next prev parent reply other threads:[~2008-03-13 14:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-13 10:51 XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c Christian Røsnes
2008-02-13 11:04 ` Justin Piszcz
2008-02-13 11:44 ` Christian Røsnes
2008-02-13 21:45 ` David Chinner
2008-02-14 8:41 ` Christian Røsnes
2008-03-05 13:53 ` Christian Røsnes
2008-03-06 11:10 ` Christian Røsnes
2008-03-07 11:19 ` Christian Røsnes
2008-03-10 0:08 ` David Chinner
2008-03-10 8:34 ` Christian Røsnes
2008-03-10 10:02 ` Christian Røsnes
2008-03-10 22:21 ` David Chinner
2008-03-11 8:08 ` Christian Røsnes
2008-03-11 9:34 ` David Chinner
2008-03-11 11:19 ` Christian Røsnes
2008-03-11 12:21 ` David Chinner
2008-03-11 12:39 ` Christian Røsnes
[not found] ` <20080312232425.GR155407@sgi.com>
[not found] ` <1a4a774c0803130114l3927051byd54cd96cdb0efbe7@mail.gmail.com>
[not found] ` <20080313090830.GD95344431@sgi.com>
[not found] ` <1a4a774c0803130214x406a4eb9wfb8738d1f503663f@mail.gmail.com>
[not found] ` <20080313092139.GF95344431@sgi.com>
[not found] ` <1a4a774c0803130227l2fdf4861v21183b9bd3e7ce8d@mail.gmail.com>
[not found] ` <20080313113634.GH95344431@sgi.com>
[not found] ` <1a4a774c0803130446x609b9cb2mf3da323183c35606@mail.gmail.com>
2008-03-13 14:53 ` David Chinner [this message]
2008-03-14 9:02 ` Christian Røsnes
2008-03-09 22:59 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080313145349.GJ95344431@sgi.com \
--to=dgc@sgi.com \
--cc=christian.rosnes@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox