public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
@ 2008-09-21 19:29 Kevin Jamieson
  2008-09-23  7:50 ` Mark Goodwin
  2008-09-23  9:18 ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Kevin Jamieson @ 2008-09-21 19:29 UTC (permalink / raw)
  To: xfs

Hello,

We have encountered the following error several times on some of our 
customer systems running SLES 10 SP1 (kernel version 2.6.16.46-0.12):

Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": XFS internal error 
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xf9429f23
Sep 12 01:24:41 gn1 kernel:  [<f942277d>] xfs_trans_cancel+0x4d/0xd6 [xfs]
Sep 12 01:24:41 gn1 kernel:  [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
Sep 12 01:24:41 gn1 kernel:  [<f9429f23>] xfs_create+0x4f1/0x52a [xfs]
Sep 12 01:24:41 gn1 kernel:  [<f9433657>] xfs_vn_mknod+0x172/0x276 [xfs]
Sep 12 01:24:41 gn1 kernel:  [<c016c45b>] vfs_create+0xd1/0x138
Sep 12 01:24:41 gn1 kernel:  [<c016e862>] sys_mknodat+0xfa/0x164
Sep 12 01:24:41 gn1 kernel:  [<c016fc47>] do_ioctl+0x4f/0x5e
Sep 12 01:24:41 gn1 kernel:  [<c016fea2>] vfs_ioctl+0x24c/0x25e
Sep 12 01:24:41 gn1 kernel:  [<c016e8df>] sys_mknod+0x13/0x17
Sep 12 01:24:41 gn1 kernel:  [<c0103bdb>] sysenter_past_esp+0x54/0x79
Sep 12 01:24:41 gn1 kernel: xfs_force_shutdown(dm-0,0x8) called from 
line 1139 of file fs/xfs/xfs_trans.c.  Return address = 0xf94359db
Sep 12 01:24:41 gn1 kernel: Filesystem "dm-0": Corruption of in-memory 
data detected.  Shutting down filesystem: dm-0


 From a recent occurrence of this, I was able to obtain a copy of the 
file system prior to it being restored from backup.

The file system in question is relatively full -- 800GB with 25GB free 
and 70M inodes used (the file system is used with a proprietary 
DMAPI-based HSM application, so most of the file data is punched out).

With the file system in this state, the forced shutdown error is 
reproducible when attempting to create a file in a particular directory 
-- an ENOSPC error is returned to userspace, and the file system shuts down.

The forced shutdown is also reproducible with this file system mounted 
on a more recent kernel version -- here is a stack trace from the same 
file system mounted on a 2.6.26 kernel built from oss.sgi.com cvs on Sep 
19 2008:

Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": XFS internal error 
xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.  Caller 0xf93c8195
Sep 21 06:35:41 gn1 kernel:  [<f93c2fc0>] xfs_trans_cancel+0x4d/0xd3 [xfs]
Sep 21 06:35:41 gn1 kernel:  [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
Sep 21 06:35:41 gn1 kernel:  [<f93c8195>] xfs_create+0x49b/0x4db [xfs]
Sep 21 06:35:41 gn1 kernel:  [<f93d166b>] xfs_vn_mknod+0x128/0x1e3 [xfs]
Sep 21 06:35:41 gn1 kernel:  [<c0170e9d>] vfs_create+0xb4/0x117
Sep 21 06:35:41 gn1 kernel:  [<c0172c46>] do_filp_open+0x1a0/0x671
Sep 21 06:35:41 gn1 kernel:  [<c01681da>] do_sys_open+0x40/0xb6
Sep 21 06:35:41 gn1 kernel:  [<c0168294>] sys_open+0x1e/0x23
Sep 21 06:35:41 gn1 kernel:  [<c0104791>] sysenter_past_esp+0x6a/0x99
Sep 21 06:35:41 gn1 kernel:  [<c02b0000>] unix_listen+0x8/0xc9
Sep 21 06:35:41 gn1 kernel:  =======================
Sep 21 06:35:41 gn1 kernel: xfs_force_shutdown(loop0,0x8) called from 
line 1165 of file fs/xfs/xfs_trans.c.  Return address = 0xf93c2fd6
Sep 21 06:35:41 gn1 kernel: Filesystem "loop0": Corruption of in-memory 
data detected.  Shutting down filesystem: loop0


Tracing through the XFS code, the ENOSPC error is returned here from 
fs/xfs/xfs_da_btree.c:

xfs_da_grow_inode(xfs_da_args_t *args, xfs_dablk_t *new_blkno)
{
...
	if (got != count || mapp[0].br_startoff != bno ||
		...
	    return XFS_ERROR(ENOSPC);
	}
...
}

where got = 0 and count = 1 and xfs_da_grow_inode() is called from 
xfs_create() -> xfs_dir_createname() -> xfs_dir2_node_addname() -> 
xfs_da_split() -> xfs_da_root_split()


xfs_repair -n (the latest version of xfs_repair from cvs, as the SLES 10 
SP1 version just runs out of memory) does not report any problems with 
the file system, but after running xfs_repair (without -n) on the file 
system, the error can no longer be triggered. Based on this, I suspect a 
problem with the free space btrees, as I understand that xfs_repair 
rebuilds them. I tried running xfs_check (latest cvs version also) as 
well but it runs out of memory and dies.

Are there any known issues in 2.6.16 that could lead to this sort of 
problem? If there is any additional information that would be helpful in 
tracking this down, please let me know. If needed, I can probably make a 
xfs_metadump of the file system available to someone from SGI later this 
week.


Some additional information that may be useful:


# xfs_info /fs
meta-data=/dev/loop0             isize=1024   agcount=32, agsize=6179712 
blks
          =                       sectsz=512   attr=1
data     =                       bsize=4096   blocks=197750784, imaxpct=0
          =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
          =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=4096   blocks=0, rtextents=0


# xfs_db -f -r -c 'sb 0' -c p /mnt/scratch3/xfs.fs
magicnum = 0x58465342
blocksize = 4096
dblocks = 197750784
rblocks = 0
rextents = 0
uuid = 18602b1d-e419-4f97-9bd7-37c4b64b884a
logstart = 134217732
rootino = 32
rbmino = 33
rsumino = 34
rextsize = 1
agblocks = 6179712
agcount = 32
rbmblocks = 0
logblocks = 32768
versionnum = 0x3094
sectsize = 512
inodesize = 1024
inopblock = 4
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 10
inopblog = 2
agblklog = 23
rextslog = 0
inprogress = 0
imax_pct = 0
icount = 72288384
ifree = 1131
fdblocks = 6101857
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 0


This is for the directory that triggers the problem:

# xfs_db -f -r -c 'inode 304409506' -c p /mnt/scratch3/xfs.fs
core.magic = 0x494e
core.mode = 040700
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 2
core.uid = 60000
core.gid = 65534
core.flushiter = 0
core.atime.sec = Fri Sep 12 01:15:17 2008
core.atime.nsec = 169962854
core.mtime.sec = Fri Sep 12 01:24:41 2008
core.mtime.nsec = 058880718
core.ctime.sec = Fri Sep 12 01:24:41 2008
core.ctime.nsec = 058880718
core.size = 40960
core.nblocks = 12
core.extsize = 0
core.nextents = 12
core.naextents = 0
core.forkoff = 24
core.aformat = 1 (local)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 0
next_unlinked = null
u.bmx[0-11] = [startoff,startblock,blockcount,extentflag] 
0:[0,76102390,1,0] 1:[1,76102361,1,0] 2:[2,76102054,1,0] 
3:[3,76102053,1,0] 4:[4,76101744,1,0] 5:[5,76101743,1,0] 
6:[6,76101742,1,0] 7:[7,76101741,1,0] 8:[8,76101740,1,0] 
9:[9,76101707,1,0] 10:[8388608,76102391,1,0] 11:[16777216,76102719,1,0]
a.sfattr.hdr.totsize = 74
a.sfattr.hdr.count = 2
a.sfattr.list[0].namelen = 16
a.sfattr.list[0].valuelen = 8
a.sfattr.list[0].root = 1
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "\004,\005XfY\016oapoBO\bR)"
a.sfattr.list[0].value = "\000\000\000\000\000\000\000\000"
a.sfattr.list[1].namelen = 16
a.sfattr.list[1].valuelen = 24
a.sfattr.list[1].root = 1
a.sfattr.list[1].secure = 0
a.sfattr.list[1].name = "\003>\016oMa7\034@#I8\004X\027G"
a.sfattr.list[1].value = 
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"


# xfs_repair -f /mnt/scratch4/xfs.fs
Phase 1 - find and verify superblock...
Phase 2 - using internal log
         - zero log...
         - scan filesystem freespace and inode maps...
         - found root inode chunk
Phase 3 - for each AG...
         - scan and clear agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - agno = 4
         - agno = 5
         - agno = 6
         - agno = 7
         - agno = 8
         - agno = 9
         - agno = 10
         - agno = 11
         - agno = 12
         - agno = 13
         - agno = 14
         - agno = 15
         - agno = 16
         - agno = 17
         - agno = 18
         - agno = 19
         - agno = 20
         - agno = 21
         - agno = 22
         - agno = 23
         - agno = 24
         - agno = 25
         - agno = 26
         - agno = 27
         - agno = 28
         - agno = 29
         - agno = 30
         - agno = 31
         - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - agno = 4
         - agno = 5
         - agno = 6
         - agno = 7
         - agno = 8
         - agno = 9
         - agno = 10
         - agno = 11
         - agno = 12
         - agno = 13
         - agno = 14
         - agno = 15
         - agno = 16
         - agno = 17
         - agno = 18
         - agno = 19
         - agno = 20
         - agno = 21
         - agno = 22
         - agno = 23
         - agno = 24
         - agno = 25
         - agno = 26
         - agno = 27
         - agno = 28
         - agno = 29
         - agno = 30
         - agno = 31
Phase 5 - rebuild AG headers and trees...
         - reset superblock...
Phase 6 - check inode connectivity...
         - resetting contents of realtime bitmap and summary inodes
         - traversing filesystem ...
         - traversal finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done



Thanks,
Kevin Jamieson

^ permalink raw reply	[flat|nested] 6+ messages in thread
* XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c
@ 2008-02-25 20:58 Wolfgang Karall
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Karall @ 2008-02-25 20:58 UTC (permalink / raw)
  To: xfs

Hello,

today I ran across above error, resulting in an unaccessible FS,  
after remounting once xfs_repair didn't show any signs of errors. At  
the time of the error the machine and also the concerned FS was under  
rather high I/O load (doing test restores of some SVN repositories).

The FS is now working again without a hitch (and did before for close  
to a year), latest operation on the filesystem was an xfs_growfs a  
couple of days ago. I read a couple of reports about similiar errors  
(with differing line numbers), but didn't find conclusive  
information, so I'm asking here what the problem might be.

For details see below, if anything is amiss, please let me know. And  
please CC me on the replies, since I'm not subscribed to the list.

Kind regards
Wolfgang Karall


Full dmesg output of the problem:

---------
Filesystem "dm-13": XFS internal error xfs_trans_cancel at line 1138  
of file fs/xfs/xfs_trans.c. Caller 0xffffffff88158042
Call Trace:
[<ffffffff8814fd54>] :xfs:xfs_trans_cancel+0x5b/0xfe
[<ffffffff88158042>] :xfs:xfs_create+0x58b/0x5dd
[<ffffffff8816053e>] :xfs:xfs_vn_mknod+0x1bd/0x3c8
[<ffffffff8815c793>] :xfs:xfs_buf_rele+0x32/0xa1
[<ffffffff8812bd3d>] :xfs:xfs_da_brelse+0x70/0x9d
[<ffffffff80207138>] kmem_cache_free+0x77/0xca
[<ffffffff8813379b>] :xfs:xfs_dir2_node_lookup+0xa9/0xb8
[<ffffffff8812e887>] :xfs:xfs_dir2_isleaf+0x19/0x4a
[<ffffffff8812f0c4>] :xfs:xfs_dir_lookup+0x100/0x122
[<ffffffff8020f89e>] link_path_walk+0xd3/0xe5
[<ffffffff8023b184>] vfs_create+0xe7/0x12c
[<ffffffff8021adfc>] open_namei+0x18c/0x6a0
[<ffffffff8023beda>] chown_common+0xa5/0xb0
[<ffffffff80227624>] do_filp_open+0x1c/0x3d
[<ffffffff80219a7f>] do_sys_open+0x44/0xc5
[<ffffffff8025be0e>] system_call+0x86/0x8b
[<ffffffff8025bd88>] system_call+0x0/0x8b
Filesystem "dm-13": Corruption of in-memory data detected. Shutting  
down filesystem: dm-13
Please umount the filesystem, and rectify the problem(s)
---------

More information about the used software versions, mount options, etc.:

---------
test:~# uname -a
Linux test 2.6.18-6-xen-amd64 #1 SMP Sun Feb 10 18:02:52 UTC 2008  
x86_64 GNU/Linux
test:~# dpkg -l xfsprogs
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half- 
installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err:  
uppercase=bad)
||/ Name                          Version                        
Description
+++-=============================-=============================- 
======================================================================== 
==
ii  xfsprogs                      2.8.11-1                       
Utilities for managing the XFS filesystem
test:~# mount | grep /srv
/dev/mapper/test-srv on /srv type xfs (rw)
test:~# grep /srv /etc/fstab
/dev/test/srv /srv xfs rw 0 2
test:~# xfs_info /srv
meta-data=/dev/test/srv          isize=256    agcount=56,  
agsize=65536 blks
          =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=3670016, imaxpct=25
          =                       sunit=0      swidth=0 blks,  
unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=2560, version=1
          =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
---------

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-09-26 10:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-21 19:29 XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c Kevin Jamieson
2008-09-23  7:50 ` Mark Goodwin
2008-09-23  9:18 ` Dave Chinner
2008-09-24  2:49   ` Kevin Jamieson
     [not found]   ` <54241.24.80.224.145.1222383385.squirrel@squirrel.kevinjamieson.com>
     [not found]     ` <20080926012704.GI27997@disturbed>
     [not found]       ` <62255.192.168.1.1.1222403942.squirrel@squirrel.kevinjamieson.com>
2008-09-26 10:16         ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2008-02-25 20:58 Wolfgang Karall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox