* XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
@ 2008-09-21 19:29 Kevin Jamieson
2008-09-23 7:50 ` Mark Goodwin
2008-09-23 9:18 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Kevin Jamieson @ 2008-09-21 19:29 UTC (permalink / raw)
To: xfs
Hello,
We have encountered the following error several times on some of our
customer systems running SLES 10 SP1 (kernel version 2.6.16.46-0.12):
Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": XFS internal error
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xf9429f23
Sep 12 01:24:41 gn1 kernel: [<f942277d>] xfs_trans_cancel+0x4d/0xd6 [xfs]
Sep 12 01:24:41 gn1 kernel: [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
Sep 12 01:24:41 gn1 kernel: [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
Sep 12 01:24:41 gn1 kernel: [<f9433657>] xfs_vn_mknod+0x172/0x276 [xfs]
Sep 12 01:24:41 gn1 kernel: [<c016c45b>] vfs_create+0xd1/0x138
Sep 12 01:24:41 gn1 kernel: [<c016e862>] sys_mknodat+0xfa/0x164
Sep 12 01:24:41 gn1 kernel: [<c016fc47>] do_ioctl+0x4f/0x5e
Sep 12 01:24:41 gn1 kernel: [<c016fea2>] vfs_ioctl+0x24c/0x25e
Sep 12 01:24:41 gn1 kernel: [<c016e8df>] sys_mknod+0x13/0x17
Sep 12 01:24:41 gn1 kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79
Sep 12 01:24:41 gn1 kernel: xfs_force_shutdown(dm-0,0x8) called from
line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xf94359db
Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": Corruption of in-memory
data detected. Shutting down filesystem: dm-0
From a recent occurrence of this, I was able to obtain a copy of the
file system prior to it being restored from backup.
The file system in question is relatively full -- 800GB with 25GB free
and 70M inodes used (the file system is used with a proprietary
DMAPI-based HSM application, so most of the file data is punched out).
With the file system in this state, the forced shutdown error is
reproducible when attempting to create a file in a particular directory
-- an ENOSPC error is returned to userspace, and the file system shuts down.
The forced shutdown is also reproducible with this file system mounted
on a more recent kernel version -- here is a stack trace from the same
file system mounted on a 2.6.26 kernel built from oss.sgi.com cvs on Sep
19 2008:
Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": XFS internal error
xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xf93c8195
Sep 21 06:35:41 gn1 kernel: [<f93c2fc0>] xfs_trans_cancel+0x4d/0xd3 [xfs]
Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
Sep 21 06:35:41 gn1 kernel: [<f93d166b>] xfs_vn_mknod+0x128/0x1e3 [xfs]
Sep 21 06:35:41 gn1 kernel: [<c0170e9d>] vfs_create+0xb4/0x117
Sep 21 06:35:41 gn1 kernel: [<c0172c46>] do_filp_open+0x1a0/0x671
Sep 21 06:35:41 gn1 kernel: [<c01681da>] do_sys_open+0x40/0xb6
Sep 21 06:35:41 gn1 kernel: [<c0168294>] sys_open+0x1e/0x23
Sep 21 06:35:41 gn1 kernel: [<c0104791>] sysenter_past_esp+0x6a/0x99
Sep 21 06:35:41 gn1 kernel: [<c02b0000>] unix_listen+0x8/0xc9
Sep 21 06:35:41 gn1 kernel: =======================
Sep 21 06:35:41 gn1 kernel: xfs_force_shutdown(loop0,0x8) called from
line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xf93c2fd6
Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": Corruption of in-memory
data detected. Shutting down filesystem: loop0
Tracing through the XFS code, the ENOSPC error is returned here from
fs/xfs/xfs_da_btree.c:
xfs_da_grow_inode(xfs_da_args_t *args, xfs_dablk_t *new_blkno)
{
...
if (got != count || mapp[0].br_startoff != bno ||
...
return XFS_ERROR(ENOSPC);
}
...
}
where got = 0 and count = 1 and xfs_da_grow_inode() is called from
xfs_create() -> xfs_dir_createname() -> xfs_dir2_node_addname() ->
xfs_da_split() -> xfs_da_root_split()
xfs_repair -n (the latest version of xfs_repair from cvs, as the SLES 10
SP1 version just runs out of memory) does not report any problems with
the file system, but after running xfs_repair (without -n) on the file
system, the error can no longer be triggered. Based on this, I suspect a
problem with the free space btrees, as I understand that xfs_repair
rebuilds them. I tried running xfs_check (latest cvs version also) as
well but it runs out of memory and dies.
Are there any known issues in 2.6.16 that could lead to this sort of
problem? If there is any additional information that would be helpful in
tracking this down, please let me know. If needed, I can probably make a
xfs_metadump of the file system available to someone from SGI later this
week.
Some additional information that may be useful:
# xfs_info /fs
meta-data=/dev/loop0 isize=1024 agcount=32, agsize=6179712
blks
= sectsz=512 attr=1
data = bsize=4096 blocks=197750784, imaxpct=0
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=4096 blocks=0, rtextents=0
# xfs_db -f -r -c 'sb 0' -c p /mnt/scratch3/xfs.fs
magicnum = 0x58465342
blocksize = 4096
dblocks = 197750784
rblocks = 0
rextents = 0
uuid = 18602b1d-e419-4f97-9bd7-37c4b64b884a
logstart = 134217732
rootino = 32
rbmino = 33
rsumino = 34
rextsize = 1
agblocks = 6179712
agcount = 32
rbmblocks = 0
logblocks = 32768
versionnum = 0x3094
sectsize = 512
inodesize = 1024
inopblock = 4
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 10
inopblog = 2
agblklog = 23
rextslog = 0
inprogress = 0
imax_pct = 0
icount = 72288384
ifree = 1131
fdblocks = 6101857
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 0
This is for the directory that triggers the problem:
# xfs_db -f -r -c 'inode 304409506' -c p /mnt/scratch3/xfs.fs
core.magic = 0x494e
core.mode = 040700
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 2
core.uid = 60000
core.gid = 65534
core.flushiter = 0
core.atime.sec = Fri Sep 12 01:15:17 2008
core.atime.nsec = 169962854
core.mtime.sec = Fri Sep 12 01:24:41 2008
core.mtime.nsec = 058880718
core.ctime.sec = Fri Sep 12 01:24:41 2008
core.ctime.nsec = 058880718
core.size = 40960
core.nblocks = 12
core.extsize = 0
core.nextents = 12
core.naextents = 0
core.forkoff = 24
core.aformat = 1 (local)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 0
next_unlinked = null
u.bmx[0-11] = [startoff,startblock,blockcount,extentflag]
0:[0,76102390,1,0] 1:[1,76102361,1,0] 2:[2,76102054,1,0]
3:[3,76102053,1,0] 4:[4,76101744,1,0] 5:[5,76101743,1,0]
6:[6,76101742,1,0] 7:[7,76101741,1,0] 8:[8,76101740,1,0]
9:[9,76101707,1,0] 10:[8388608,76102391,1,0] 11:[16777216,76102719,1,0]
a.sfattr.hdr.totsize = 74
a.sfattr.hdr.count = 2
a.sfattr.list[0].namelen = 16
a.sfattr.list[0].valuelen = 8
a.sfattr.list[0].root = 1
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "\004,\005XfY\016oapoBO\bR)"
a.sfattr.list[0].value = "\000\000\000\000\000\000\000\000"
a.sfattr.list[1].namelen = 16
a.sfattr.list[1].valuelen = 24
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "\003>\016oMa7\034@#I8\004X\027G"
a.sfattr.list[1].value =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
# xfs_repair -f /mnt/scratch4/xfs.fs
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
Thanks,
Kevin Jamieson
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
2008-09-21 19:29 XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c Kevin Jamieson
@ 2008-09-23 7:50 ` Mark Goodwin
2008-09-23 9:18 ` Dave Chinner
1 sibling, 0 replies; 6+ messages in thread
From: Mark Goodwin @ 2008-09-23 7:50 UTC (permalink / raw)
To: kevin; +Cc: xfs
Hi Kevin, sorry for the delay - yes AFAICT we have seen this before,
see the discussion and patches in the xfs@oss archives at
http://oss.sgi.com/archives/xfs/2008-03/msg00144.html
I think several patches have been committed to recent mainline kernels,
but it looks like your testing on 2.6.26 indicates either it still isn't
fixed, or you've hit a different problem, despite the similar stack
traces. Tim might have additional info to add here ...
In any case, this has also been been reported in Novell bugzilla #243029
against SLES10sp1. Since you're running sles10sp1, perhaps you could get
in touch with Novell for support?
Thanks
-- Mark
Kevin Jamieson wrote:
> Hello,
>
> We have encountered the following error several times on some of our
> customer systems running SLES 10 SP1 (kernel version 2.6.16.46-0.12):
>
> Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": XFS internal error
> xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller
> 0xf9429f23
> Sep 12 01:24:41 gn1 kernel: [<f942277d>] xfs_trans_cancel+0x4d/0xd6 [xfs]
> Sep 12 01:24:41 gn1 kernel: [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
> Sep 12 01:24:41 gn1 kernel: [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
> Sep 12 01:24:41 gn1 kernel: [<f9433657>] xfs_vn_mknod+0x172/0x276 [xfs]
> Sep 12 01:24:41 gn1 kernel: [<c016c45b>] vfs_create+0xd1/0x138
> Sep 12 01:24:41 gn1 kernel: [<c016e862>] sys_mknodat+0xfa/0x164
> Sep 12 01:24:41 gn1 kernel: [<c016fc47>] do_ioctl+0x4f/0x5e
> Sep 12 01:24:41 gn1 kernel: [<c016fea2>] vfs_ioctl+0x24c/0x25e
> Sep 12 01:24:41 gn1 kernel: [<c016e8df>] sys_mknod+0x13/0x17
> Sep 12 01:24:41 gn1 kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79
> Sep 12 01:24:41 gn1 kernel: xfs_force_shutdown(dm-0,0x8) called from
> line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xf94359db
> Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": Corruption of in-memory
> data detected. Shutting down filesystem: dm-0
>
>
> From a recent occurrence of this, I was able to obtain a copy of the
> file system prior to it being restored from backup.
>
> The file system in question is relatively full -- 800GB with 25GB free
> and 70M inodes used (the file system is used with a proprietary
> DMAPI-based HSM application, so most of the file data is punched out).
>
> With the file system in this state, the forced shutdown error is
> reproducible when attempting to create a file in a particular directory
> -- an ENOSPC error is returned to userspace, and the file system shuts
> down.
>
> The forced shutdown is also reproducible with this file system mounted
> on a more recent kernel version -- here is a stack trace from the same
> file system mounted on a 2.6.26 kernel built from oss.sgi.com cvs on Sep
> 19 2008:
>
> Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": XFS internal error
> xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller
> 0xf93c8195
> Sep 21 06:35:41 gn1 kernel: [<f93c2fc0>] xfs_trans_cancel+0x4d/0xd3 [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93d166b>] xfs_vn_mknod+0x128/0x1e3 [xfs]
> Sep 21 06:35:41 gn1 kernel: [<c0170e9d>] vfs_create+0xb4/0x117
> Sep 21 06:35:41 gn1 kernel: [<c0172c46>] do_filp_open+0x1a0/0x671
> Sep 21 06:35:41 gn1 kernel: [<c01681da>] do_sys_open+0x40/0xb6
> Sep 21 06:35:41 gn1 kernel: [<c0168294>] sys_open+0x1e/0x23
> Sep 21 06:35:41 gn1 kernel: [<c0104791>] sysenter_past_esp+0x6a/0x99
> Sep 21 06:35:41 gn1 kernel: [<c02b0000>] unix_listen+0x8/0xc9
> Sep 21 06:35:41 gn1 kernel: =======================
> Sep 21 06:35:41 gn1 kernel: xfs_force_shutdown(loop0,0x8) called from
> line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xf93c2fd6
> Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": Corruption of in-memory
> data detected. Shutting down filesystem: loop0
>
>
> Tracing through the XFS code, the ENOSPC error is returned here from
> fs/xfs/xfs_da_btree.c:
>
> xfs_da_grow_inode(xfs_da_args_t *args, xfs_dablk_t *new_blkno)
> {
> ...
> if (got != count || mapp[0].br_startoff != bno ||
> ...
> return XFS_ERROR(ENOSPC);
> }
> ...
> }
>
> where got = 0 and count = 1 and xfs_da_grow_inode() is called from
> xfs_create() -> xfs_dir_createname() -> xfs_dir2_node_addname() ->
> xfs_da_split() -> xfs_da_root_split()
>
>
> xfs_repair -n (the latest version of xfs_repair from cvs, as the SLES 10
> SP1 version just runs out of memory) does not report any problems with
> the file system, but after running xfs_repair (without -n) on the file
> system, the error can no longer be triggered. Based on this, I suspect a
> problem with the free space btrees, as I understand that xfs_repair
> rebuilds them. I tried running xfs_check (latest cvs version also) as
> well but it runs out of memory and dies.
>
> Are there any known issues in 2.6.16 that could lead to this sort of
> problem? If there is any additional information that would be helpful in
> tracking this down, please let me know. If needed, I can probably make a
> xfs_metadump of the file system available to someone from SGI later this
> week.
>
>
> Some additional information that may be useful:
>
>
> # xfs_info /fs
> meta-data=/dev/loop0 isize=1024 agcount=32, agsize=6179712
> blks
> = sectsz=512 attr=1
> data = bsize=4096 blocks=197750784, imaxpct=0
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=1
> = sectsz=512 sunit=0 blks
> realtime =none extsz=4096 blocks=0, rtextents=0
>
>
> # xfs_db -f -r -c 'sb 0' -c p /mnt/scratch3/xfs.fs
> magicnum = 0x58465342
> blocksize = 4096
> dblocks = 197750784
> rblocks = 0
> rextents = 0
> uuid = 18602b1d-e419-4f97-9bd7-37c4b64b884a
> logstart = 134217732
> rootino = 32
> rbmino = 33
> rsumino = 34
> rextsize = 1
> agblocks = 6179712
> agcount = 32
> rbmblocks = 0
> logblocks = 32768
> versionnum = 0x3094
> sectsize = 512
> inodesize = 1024
> inopblock = 4
> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
> blocklog = 12
> sectlog = 9
> inodelog = 10
> inopblog = 2
> agblklog = 23
> rextslog = 0
> inprogress = 0
> imax_pct = 0
> icount = 72288384
> ifree = 1131
> fdblocks = 6101857
> frextents = 0
> uquotino = 0
> gquotino = 0
> qflags = 0
> flags = 0
> shared_vn = 0
> inoalignmt = 2
> unit = 0
> width = 0
> dirblklog = 0
> logsectlog = 0
> logsectsize = 0
> logsunit = 0
> features2 = 0
>
>
> This is for the directory that triggers the problem:
>
> # xfs_db -f -r -c 'inode 304409506' -c p /mnt/scratch3/xfs.fs
> core.magic = 0x494e
> core.mode = 040700
> core.version = 1
> core.format = 2 (extents)
> core.nlinkv1 = 2
> core.uid = 60000
> core.gid = 65534
> core.flushiter = 0
> core.atime.sec = Fri Sep 12 01:15:17 2008
> core.atime.nsec = 169962854
> core.mtime.sec = Fri Sep 12 01:24:41 2008
> core.mtime.nsec = 058880718
> core.ctime.sec = Fri Sep 12 01:24:41 2008
> core.ctime.nsec = 058880718
> core.size = 40960
> core.nblocks = 12
> core.extsize = 0
> core.nextents = 12
> core.naextents = 0
> core.forkoff = 24
> core.aformat = 1 (local)
> core.dmevmask = 0
> core.dmstate = 0
> core.newrtbm = 0
> core.prealloc = 0
> core.realtime = 0
> core.immutable = 0
> core.append = 0
> core.sync = 0
> core.noatime = 0
> core.nodump = 0
> core.rtinherit = 0
> core.projinherit = 0
> core.nosymlinks = 0
> core.extsz = 0
> core.extszinherit = 0
> core.nodefrag = 0
> core.gen = 0
> next_unlinked = null
> u.bmx[0-11] = [startoff,startblock,blockcount,extentflag]
> 0:[0,76102390,1,0] 1:[1,76102361,1,0] 2:[2,76102054,1,0]
> 3:[3,76102053,1,0] 4:[4,76101744,1,0] 5:[5,76101743,1,0]
> 6:[6,76101742,1,0] 7:[7,76101741,1,0] 8:[8,76101740,1,0]
> 9:[9,76101707,1,0] 10:[8388608,76102391,1,0] 11:[16777216,76102719,1,0]
> a.sfattr.hdr.totsize = 74
> a.sfattr.hdr.count = 2
> a.sfattr.list[0].namelen = 16
> a.sfattr.list[0].valuelen = 8
> a.sfattr.list[0].root = 1
> a.sfattr.list[0].secure = 0
> a.sfattr.list[0].name = "\004,\005XfY\016oapoBO\bR)"
> a.sfattr.list[0].value = "\000\000\000\000\000\000\000\000"
> a.sfattr.list[1].namelen = 16
> a.sfattr.list[1].valuelen = 24
> a.sfattr.list[1].root = 1
> a.sfattr.list[1].secure = 0
> a.sfattr.list[1].name = "\003>\016oMa7\034@#I8\004X\027G"
> a.sfattr.list[1].value =
> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
>
>
>
> # xfs_repair -f /mnt/scratch4/xfs.fs
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> - agno = 4
> - agno = 5
> - agno = 6
> - agno = 7
> - agno = 8
> - agno = 9
> - agno = 10
> - agno = 11
> - agno = 12
> - agno = 13
> - agno = 14
> - agno = 15
> - agno = 16
> - agno = 17
> - agno = 18
> - agno = 19
> - agno = 20
> - agno = 21
> - agno = 22
> - agno = 23
> - agno = 24
> - agno = 25
> - agno = 26
> - agno = 27
> - agno = 28
> - agno = 29
> - agno = 30
> - agno = 31
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> - agno = 4
> - agno = 5
> - agno = 6
> - agno = 7
> - agno = 8
> - agno = 9
> - agno = 10
> - agno = 11
> - agno = 12
> - agno = 13
> - agno = 14
> - agno = 15
> - agno = 16
> - agno = 17
> - agno = 18
> - agno = 19
> - agno = 20
> - agno = 21
> - agno = 22
> - agno = 23
> - agno = 24
> - agno = 25
> - agno = 26
> - agno = 27
> - agno = 28
> - agno = 29
> - agno = 30
> - agno = 31
> Phase 5 - rebuild AG headers and trees...
> - reset superblock...
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
>
>
>
> Thanks,
> Kevin Jamieson
>
>
--
Mark Goodwin markgw@sgi.com
Engineering Manager for XFS and PCP Phone: +61-3-99631937
SGI Australian Software Group Cell: +61-4-18969583
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
2008-09-21 19:29 XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c Kevin Jamieson
2008-09-23 7:50 ` Mark Goodwin
@ 2008-09-23 9:18 ` Dave Chinner
2008-09-24 2:49 ` Kevin Jamieson
[not found] ` <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>
1 sibling, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2008-09-23 9:18 UTC (permalink / raw)
To: Kevin Jamieson; +Cc: xfs
On Sun, Sep 21, 2008 at 12:29:49PM -0700, Kevin Jamieson wrote:
> The forced shutdown is also reproducible with this file system mounted
> on a more recent kernel version -- here is a stack trace from the same
> file system mounted on a 2.6.26 kernel built from oss.sgi.com cvs on Sep
> 19 2008:
>
> Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": XFS internal error
> xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller
> 0xf93c8195
> Sep 21 06:35:41 gn1 kernel: [<f93c2fc0>] xfs_trans_cancel+0x4d/0xd3 [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
> Sep 21 06:35:41 gn1 kernel: [<f93d166b>] xfs_vn_mknod+0x128/0x1e3 [xfs]
> Sep 21 06:35:41 gn1 kernel: [<c0170e9d>] vfs_create+0xb4/0x117
> Sep 21 06:35:41 gn1 kernel: [<c0172c46>] do_filp_open+0x1a0/0x671
> Sep 21 06:35:41 gn1 kernel: [<c01681da>] do_sys_open+0x40/0xb6
> Sep 21 06:35:41 gn1 kernel: [<c0168294>] sys_open+0x1e/0x23
> Sep 21 06:35:41 gn1 kernel: [<c0104791>] sysenter_past_esp+0x6a/0x99
> Sep 21 06:35:41 gn1 kernel: [<c02b0000>] unix_listen+0x8/0xc9
> Sep 21 06:35:41 gn1 kernel: =======================
> Sep 21 06:35:41 gn1 kernel: xfs_force_shutdown(loop0,0x8) called from
> line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xf93c2fd6
> Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": Corruption of in-memory
> data detected. Shutting down filesystem: loop0
Oh, that's interesting. I've been trying to track down the problem
on TOT kernels without much luck recently.
> Tracing through the XFS code, the ENOSPC error is returned here from
> fs/xfs/xfs_da_btree.c:
>
> xfs_da_grow_inode(xfs_da_args_t *args, xfs_dablk_t *new_blkno)
> {
> ...
> if (got != count || mapp[0].br_startoff != bno ||
> ...
> return XFS_ERROR(ENOSPC);
> }
> ...
> }
>
> where got = 0 and count = 1 and xfs_da_grow_inode() is called from
> xfs_create() -> xfs_dir_createname() -> xfs_dir2_node_addname() ->
> xfs_da_split() -> xfs_da_root_split()
got = 0 means that xfs_bmapi() returned zero blocks. Given that it
was only being asked for a single block (from the xfs_info output),
that implies that either the FS was out of space or that the order
of AG locking meant we couldn't get to the AGs that had space in
them. Given that the transaction reservation or the
xfs_dir_can_enter() check should ensure we have space availlable,
I'm inclined to think that the free space is in an AG we can't
currently allocate out of because of previous allocations for
other blocks needed by the split....
> xfs_repair -n (the latest version of xfs_repair from cvs, as the SLES 10
> SP1 version just runs out of memory) does not report any problems with
> the file system, but after running xfs_repair (without -n) on the file
> system, the error can no longer be triggered. Based on this, I suspect a
> problem with the free space btrees, as I understand that xfs_repair
> rebuilds them. I tried running xfs_check (latest cvs version also) as
> well but it runs out of memory and dies.
Rebuilding the freespace trees will change the pattern of free space
in each AG, which means the same sequence of events could result in
different allocation patterns.
> Are there any known issues in 2.6.16 that could lead to this sort of
> problem? If there is any additional information that would be helpful in
> tracking this down, please let me know. If needed, I can probably make a
> xfs_metadump of the file system available to someone from SGI later this
> week.
A metadump will tell us what the freespace patterns are....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
2008-09-23 9:18 ` Dave Chinner
@ 2008-09-24 2:49 ` Kevin Jamieson
[not found] ` <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>
1 sibling, 0 replies; 6+ messages in thread
From: Kevin Jamieson @ 2008-09-24 2:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Tue, September 23, 2008 2:18 am, Dave Chinner wrote:
> A metadump will tell us what the freespace patterns are....
I should be able to get a metadump for you soon, just waiting for approval
from management (sigh).
> I'm inclined to think that the free space is in an AG we can't
> currently allocate out of because of previous allocations for
> other blocks needed by the split....
Adding some debug messages (this is with the latest oss.sgi.com CVS
kernel) into xfs_alloc_vextent() and xfs_alloc_fix_freelist() shows the
following allocations for the create transaction, all in AG 9 which has
only 40 free blocks:
kernel: XFS: ENTER function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2285 agno 9 type 4 called from:
kernel: [<f951f5c0>] xfs_alloc_vextent+0xe4/0x520 [xfs]
kernel: [<f952fe4f>] xfs_bmap_btalloc+0x791/0xa80 [xfs]
kernel: [<f9530b2f>] xfs_bmapi+0x9cf/0x148c [xfs]
kernel: [<f95372bd>] xfs_da_grow_inode+0xe6/0x2a1 [xfs]
kernel: [<f954009a>] xfs_dir2_leafn_split+0x29/0x27f [xfs]
kernel: [<f9539419>] xfs_da_split+0xc2/0x236 [xfs]
kernel: [<f953f55b>] xfs_dir2_node_addname+0x845/0x8ae [xfs]
kernel: [<f95646ca>] kmem_zone_alloc+0x46/0x8a [xfs]
kernel: [<f956472b>] kmem_zone_zalloc+0x1d/0x41 [xfs]
kernel: [<f953a149>] xfs_dir_createname+0x12c/0x13a [xfs]
kernel: [<f9561447>] xfs_create+0x2d9/0x4e6 [xfs]
kernel: =======================
kernel: XFS: function xfs_alloc_fix_freelist file fs/xfs/xfs_alloc.c line
1855 agno 9 minlen 1 alignment 1 minalignslop 0 longest 10 minleft 1
pagf_freeblks 36 pagf_flcount 4 need 4 total 35
kernel: XFS: EXIT function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2454 fsbno 76102855
kernel: XFS: ENTER function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2285 agno 9 type 5 called from:
kernel: [<f951f5c0>] xfs_alloc_vextent+0xe4/0x520 [xfs]
kernel: [<f952aeb3>] xfs_bmap_extents_to_btree+0x26d/0x4eb [xfs]
kernel: [<f95490e6>] xfs_iext_add+0xe0/0x213 [xfs]
kernel: [<f952d10d>] xfs_bmap_add_extent+0x338/0x3cb [xfs]
kernel: [<f9530d50>] xfs_bmapi+0xbf0/0x148c [xfs]
kernel: [<f95372bd>] xfs_da_grow_inode+0xe6/0x2a1 [xfs]
kernel: [<f954009a>] xfs_dir2_leafn_split+0x29/0x27f [xfs]
kernel: [<f9539419>] xfs_da_split+0xc2/0x236 [xfs]
kernel: [<f953f55b>] xfs_dir2_node_addname+0x845/0x8ae [xfs]
kernel: [<f95646ca>] kmem_zone_alloc+0x46/0x8a [xfs]
kernel: [<f956472b>] kmem_zone_zalloc+0x1d/0x41 [xfs]
kernel: [<f953a149>] xfs_dir_createname+0x12c/0x13a [xfs]
kernel: [<f9561447>] xfs_create+0x2d9/0x4e6 [xfs]
kernel: =======================
kernel: XFS: function xfs_alloc_fix_freelist file fs/xfs/xfs_alloc.c line
1855 agno 9 minlen 1 alignment 1 minalignslop 0 longest 10 minleft 0
pagf_freeblks 35 pagf_flcount 4 need 4 total 0
kernel: XFS: EXIT function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2454 fsbno 76103705
kernel: XFS: ENTER function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2285 agno 9 type 5 called from:
kernel: [<f951f5c0>] xfs_alloc_vextent+0xe4/0x520 [xfs]
kernel: [<c0132a98>] up+0x9/0x2a
kernel: [<f952fe4f>] xfs_bmap_btalloc+0x791/0xa80 [xfs]
kernel: [<f952d10d>] xfs_bmap_add_extent+0x338/0x3cb [xfs]
kernel: [<f95485e3>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
kernel: [<f9530b2f>] xfs_bmapi+0x9cf/0x148c [xfs]
kernel: [<f956713f>] _xfs_buf_lookup_pages+0x25e/0x285 [xfs]
kernel: [<f95372bd>] xfs_da_grow_inode+0xe6/0x2a1 [xfs]
kernel: [<f953ed09>] xfs_dir2_leafn_add+0x29c/0x2a9 [xfs]
kernel: [<f953c7c7>] xfs_dir2_leaf_log_header+0x9/0xb [xfs]
kernel: [<f9538df4>] xfs_da_root_split+0x1d/0x15e [xfs]
kernel: [<f953756e>] xfs_da_fixhashpath+0x4c/0xbe [xfs]
kernel: [<f95394ba>] xfs_da_split+0x163/0x236 [xfs]
kernel: [<f953f55b>] xfs_dir2_node_addname+0x845/0x8ae [xfs]
kernel: [<f95646ca>] kmem_zone_alloc+0x46/0x8a [xfs]
kernel: [<f956472b>] kmem_zone_zalloc+0x1d/0x41 [xfs]
kernel: [<f953a149>] xfs_dir_createname+0x12c/0x13a [xfs]
kernel: [<f9561447>] xfs_create+0x2d9/0x4e6 [xfs]
kernel: =======================
kernel: XFS: function xfs_alloc_fix_freelist file fs/xfs/xfs_alloc.c line
1855 agno 9 minlen 1 alignment 1 minalignslop 0 longest 10 minleft 0
pagf_freeblks 34 pagf_flcount 4 need 4 total 35
kernel: XFS: EXIT function xfs_alloc_vextent file fs/xfs/xfs_alloc.c line
2454 fsbno -1
So this last allocation is failing this test in xfs_alloc_fix_freelist()
and not allocating anything:
/*
* If it looks like there isn't a long enough extent, or enough
* total blocks, reject it.
*/
if ((args->minlen + args->alignment + args->minalignslop - 1) >
longest ||
((int)(pag->pagf_freeblks + pag->pagf_flcount -
need - args->total) < (int)args->minleft)) {
if (agbp)
xfs_trans_brelse(tp, agbp);
args->agbp = NULL;
return 0;
Thanks for your help,
Kevin Jamieson
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>]
* XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
@ 2008-02-25 20:58 Wolfgang Karall
0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Karall @ 2008-02-25 20:58 UTC (permalink / raw)
To: xfs
Hello,
today I ran across above error, resulting in an unaccessible FS,
after remounting once xfs_repair didn't show any signs of errors. At
the time of the error the machine and also the concerned FS was under
rather high I/O load (doing test restores of some SVN repositories).
The FS is now working again without a hitch (and did before for close
to a year), latest operation on the filesystem was an xfs_growfs a
couple of days ago. I read a couple of reports about similiar errors
(with differing line numbers), but didn't find conclusive
information, so I'm asking here what the problem might be.
For details see below, if anything is amiss, please let me know. And
please CC me on the replies, since I'm not subscribed to the list.
Kind regards
Wolfgang Karall
Full dmesg output of the problem:
---------
Filesystem "dm-13": XFS internal error xfs_trans_cancel at line 1138
of file fs/xfs/xfs_trans.c. Caller 0xffffffff88158042
Call Trace:
[<ffffffff8814fd54>] :xfs:xfs_trans_cancel+0x5b/0xfe
[<ffffffff88158042>] :xfs:xfs_create+0x58b/0x5dd
[<ffffffff8816053e>] :xfs:xfs_vn_mknod+0x1bd/0x3c8
[<ffffffff8815c793>] :xfs:xfs_buf_rele+0x32/0xa1
[<ffffffff8812bd3d>] :xfs:xfs_da_brelse+0x70/0x9d
[<ffffffff80207138>] kmem_cache_free+0x77/0xca
[<ffffffff8813379b>] :xfs:xfs_dir2_node_lookup+0xa9/0xb8
[<ffffffff8812e887>] :xfs:xfs_dir2_isleaf+0x19/0x4a
[<ffffffff8812f0c4>] :xfs:xfs_dir_lookup+0x100/0x122
[<ffffffff8020f89e>] link_path_walk+0xd3/0xe5
[<ffffffff8023b184>] vfs_create+0xe7/0x12c
[<ffffffff8021adfc>] open_namei+0x18c/0x6a0
[<ffffffff8023beda>] chown_common+0xa5/0xb0
[<ffffffff80227624>] do_filp_open+0x1c/0x3d
[<ffffffff80219a7f>] do_sys_open+0x44/0xc5
[<ffffffff8025be0e>] system_call+0x86/0x8b
[<ffffffff8025bd88>] system_call+0x0/0x8b
Filesystem "dm-13": Corruption of in-memory data detected. Shutting
down filesystem: dm-13
Please umount the filesystem, and rectify the problem(s)
---------
More information about the used software versions, mount options, etc.:
---------
test:~# uname -a
Linux test 2.6.18-6-xen-amd64 #1 SMP Sun Feb 10 18:02:52 UTC 2008
x86_64 GNU/Linux
test:~# dpkg -l xfsprogs
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-
installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err:
uppercase=bad)
||/ Name Version
Description
+++-=============================-=============================-
========================================================================
==
ii xfsprogs 2.8.11-1
Utilities for managing the XFS filesystem
test:~# mount | grep /srv
/dev/mapper/test-srv on /srv type xfs (rw)
test:~# grep /srv /etc/fstab
/dev/test/srv /srv xfs rw 0 2
test:~# xfs_info /srv
meta-data=/dev/test/srv isize=256 agcount=56,
agsize=65536 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=3670016, imaxpct=25
= sunit=0 swidth=0 blks,
unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
---------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-09-26 10:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-21 19:29 XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c Kevin Jamieson
2008-09-23 7:50 ` Mark Goodwin
2008-09-23 9:18 ` Dave Chinner
2008-09-24 2:49 ` Kevin Jamieson
[not found] ` <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>
[not found] ` <20080926012704.GI27997@disturbed>
[not found] ` <62255.192.168.1.1.1222403942.squirrel@squirrel.kevinjamieson.com>
2008-09-26 10:16 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2008-02-25 20:58 Wolfgang Karall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox