* XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
@ 2008-02-13 10:51 Christian Røsnes
2008-02-13 11:04 ` Justin Piszcz
2008-02-13 21:45 ` David Chinner
0 siblings, 2 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-02-13 10:51 UTC (permalink / raw)
To: xfs
Over the past month I've been hit with two cases of "xfs_trans_cancel
at line 1150"
The two errors occurred on different raid sets. In both cases the
error happened during
rsync from a remote server to this server, and the local partition
which reported
the error was 99% full (as reported by df -k, see below for details).
System: Dell 2850
Mem: 4GB RAM
OS: Debian 3 (32-bit)
Kernel: 2.6.17.7 (custom compiled)
I've been running this kernel since Aug 2006 without any of these
problems, until a month ago.
I've not used any of the previous kernel in the 2.6.17 series.
/usr/src/linux-2.6.17.7# grep 4K .config
# CONFIG_4KSTACKS is not set
Are there any known XFS problems with this kernel version and nearly
full partitions ?
I'm thinking about upgrading the kernel to a newer version, to see if
it fixes this problem.
Are there any known XFS problems with version 2.6.24.2 ?
Thanks
Christian
--
case logs:
case 1:
Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1150 of
file fs/xfs/xfs_trans.c. Caller 0xc0208467
<c0201cb8> xfs_trans_cancel+0x54/0xe1 <c0208467> xfs_create+0x527/0x563
<c0208467> xfs_create+0x527/0x563 <c0211d5f> xfs_vn_mknod+0x1a9/0x3bd
<c01df2ad> xfs_dir2_leafn_lookup_int+0x49/0x452 <c020dbfc>
xfs_buf_free+0x7f/0x84
<c01d874d> xfs_da_state_free+0x54/0x5a <c01e0c57>
xfs_dir2_node_lookup+0x95/0xa0
<c01da14d> xfs_dir2_lookup+0xf5/0x125 <c0163fcc> mntput_no_expire+0x14/0x71
<c02123f6> xfs_vn_permission+0x1b/0x21 <c0211f86> xfs_vn_create+0x13/0x17
<c01592ff> vfs_create+0xc2/0xf8 <c01596a1> open_namei+0x16d/0x5b3
<c014b794> do_filp_open+0x26/0x3c <c014b8f3> get_unused_fd+0x5a/0xb0
<c014ba1a> do_sys_open+0x40/0xb6 <c014baa3> sys_open+0x13/0x17
<c0102697> syscall_call+0x7/0xb
xfs_force_shutdown(sdb1,0x8) called from line 1151 of file
fs/xfs/xfs_trans.c. Return address = 0xc0214e7d
Filesystem "sdb1": Corruption of in-memory data detected. Shutting
down filesystem: sdb1
Please umount the filesystem, and rectify the problem(s)
mount options:
/dev/sdb1 on /data type xfs (rw,noatime)
df -k
/dev/sdb1 286380096 283256112 3123984 99% /data
sdb1 is an internal raid. Case 1 occurred last night, and I'm now
about to run repair on that partition.
case2:
Filesystem "sdd1": XFS internal error xfs_trans_cancel at line 1150 of
file fs/xfs/xfs_trans.c. Caller 0xc0208467
<c0201cb8> xfs_trans_cancel+0x54/0xe1 <c0208467> xfs_create+0x527/0x563
<c0208467> xfs_create+0x527/0x563 <c0211d5f> xfs_vn_mknod+0x1a9/0x3bd
<c03155be> qdisc_restart+0x13/0x152 <c01248bb> in_group_p+0x26/0x2d
<c01efc86> xfs_iaccess+0xad/0x15b <c0206f21> xfs_access+0x2b/0x33
<c01da0fd> xfs_dir2_lookup+0xa5/0x125 <c0163fcc> mntput_no_expire+0x14/0x71
<c02123f6> xfs_vn_permission+0x1b/0x21 <c0211f86> xfs_vn_create+0x13/0x17
<c01592ff> vfs_create+0xc2/0xf8 <c01596a1> open_namei+0x16d/0x5b3
<c014b794> do_filp_open+0x26/0x3c <c014b8f3> get_unused_fd+0x5a/0xb0
<c014ba1a> do_sys_open+0x40/0xb6 <c014baa3> sys_open+0x13/0x17
<c0102697> syscall_call+0x7/0xb
xfs_force_shutdown(sdd1,0x8) called from line 1151 of file
fs/xfs/xfs_trans.c. Return address = 0xc0214e7d
Filesystem "sdd1": Corruption of in-memory data detected. Shutting
down filesystem: sdd1
Please umount the filesystem, and rectify the problem(s)
mount options:
/dev/sdd1 on /content/raid03 type xfs (rw,noatime,logbufs=8,nobarrier)
df -k:
/dev/sdd1 1951266816 1925560144 25706672 99% /content/raid03
sdd1 is an external raid. In case 2 I rebooted, then ran xfs_repair
from xfsprogs 2.9.4. And then remounted
the partition, and the partition was ok.
xfs_repair /dev/sdd1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-02-13 10:51 XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c Christian Røsnes
@ 2008-02-13 11:04 ` Justin Piszcz
2008-02-13 11:44 ` Christian Røsnes
2008-02-13 21:45 ` David Chinner
1 sibling, 1 reply; 20+ messages in thread
From: Justin Piszcz @ 2008-02-13 11:04 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
[-- Attachment #1: Type: TEXT/PLAIN, Size: 611 bytes --]
On Wed, 13 Feb 2008, Christian Røsnes wrote:
> Over the past month I've been hit with two cases of "xfs_trans_cancel
> at line 1150"
> The two errors occurred on different raid sets. In both cases the
> error happened during
> rsync from a remote server to this server, and the local partition
> which reported
> the error was 99% full (as reported by df -k, see below for details).
>
> System: Dell 2850
> Mem: 4GB RAM
> OS: Debian 3 (32-bit)
> Kernel: 2.6.17.7 (custom compiled)
Just curious have you run memtest86 for a few passes and checked to make sure
the memory is OK?
Justin.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-02-13 11:04 ` Justin Piszcz
@ 2008-02-13 11:44 ` Christian Røsnes
0 siblings, 0 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-02-13 11:44 UTC (permalink / raw)
To: Justin Piszcz; +Cc: xfs
On Feb 13, 2008 12:04 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Just curious have you run memtest86 for a few passes and checked to make sure
> the memory is OK?
>
I haven't run memtest yet, but I'll schedule some downtime for this
server to get it done.
Thanks
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-02-13 10:51 XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c Christian Røsnes
2008-02-13 11:04 ` Justin Piszcz
@ 2008-02-13 21:45 ` David Chinner
2008-02-14 8:41 ` Christian Røsnes
2008-03-05 13:53 ` Christian Røsnes
1 sibling, 2 replies; 20+ messages in thread
From: David Chinner @ 2008-02-13 21:45 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
> Over the past month I've been hit with two cases of "xfs_trans_cancel
> at line 1150"
> The two errors occurred on different raid sets. In both cases the
> error happened during
> rsync from a remote server to this server, and the local partition
> which reported
> the error was 99% full (as reported by df -k, see below for details).
>
> System: Dell 2850
> Mem: 4GB RAM
> OS: Debian 3 (32-bit)
> Kernel: 2.6.17.7 (custom compiled)
>
> I've been running this kernel since Aug 2006 without any of these
> problems, until a month ago.
>
> I've not used any of the previous kernel in the 2.6.17 series.
>
> /usr/src/linux-2.6.17.7# grep 4K .config
> # CONFIG_4KSTACKS is not set
>
>
> Are there any known XFS problems with this kernel version and nearly
> full partitions ?
Yes. Deadlocks that weren't properly fixed until 2.6.18 (partially
fixed in 2.6.17) and an accounting problem in the transaction code
that leads to the shutdown you are seeing. The accounting problem is
fixed by this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c34141126a89da07197d5b89c04c6847f1171a
which I think went into 2.6.22.
Luckily, neither of these problems result in corruption.
> I'm thinking about upgrading the kernel to a newer version, to see if
> it fixes this problem.
> Are there any known XFS problems with version 2.6.24.2 ?
Yes - a problem with readdir. The fix is currently in the stable
queue (i.e for 2.6.24.3):
http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=commit;h=ee864b866419890b019352412c7bc9634d96f61b
So we are just waiting for Greg to release 2.6.24.3 now.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-02-13 21:45 ` David Chinner
@ 2008-02-14 8:41 ` Christian Røsnes
2008-03-05 13:53 ` Christian Røsnes
1 sibling, 0 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-02-14 8:41 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Wed, Feb 13, 2008 at 10:45 PM, David Chinner <dgc@sgi.com> wrote:
> On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
> > Kernel: 2.6.17.7 (custom compiled)
> >
> > Are there any known XFS problems with this kernel version and nearly
> > full partitions ?
>
> Yes. Deadlocks that weren't properly fixed until 2.6.18 (partially
> fixed in 2.6.17) and an accounting problem in the transaction code
> that leads to the shutdown you are seeing. The accounting problem is
> fixed by this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c34141126a89da07197d5b89c04c6847f1171a
>
> which I think went into 2.6.22.
>
> Luckily, neither of these problems result in corruption.
>
>
> > I'm thinking about upgrading the kernel to a newer version, to see if
> > it fixes this problem.
> > Are there any known XFS problems with version 2.6.24.2 ?
>
> Yes - a problem with readdir. The fix is currently in the stable
> queue (i.e for 2.6.24.3):
>
> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=commit;h=ee864b866419890b019352412c7bc9634d96f61b
>
> So we are just waiting for Greg to release 2.6.24.3 now.
>
Thanks. I ran memtest overnight just to be sure and no errors were
found. I'll wait for 2.6.24.3, and then upgrade the kernel.
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-02-13 21:45 ` David Chinner
2008-02-14 8:41 ` Christian Røsnes
@ 2008-03-05 13:53 ` Christian Røsnes
2008-03-06 11:10 ` Christian Røsnes
2008-03-09 22:59 ` David Chinner
1 sibling, 2 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-03-05 13:53 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Wed, Feb 13, 2008 at 10:45 PM, David Chinner <dgc@sgi.com> wrote:
> On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
> > Over the past month I've been hit with two cases of "xfs_trans_cancel
> > at line 1150"
> > The two errors occurred on different raid sets. In both cases the
> > error happened during
> > rsync from a remote server to this server, and the local partition
> > which reported
> > the error was 99% full (as reported by df -k, see below for details).
> >
> > System: Dell 2850
> > Mem: 4GB RAM
> > OS: Debian 3 (32-bit)
> > Kernel: 2.6.17.7 (custom compiled)
> >
> > I've been running this kernel since Aug 2006 without any of these
> > problems, until a month ago.
> >
> > I've not used any of the previous kernel in the 2.6.17 series.
> >
> > /usr/src/linux-2.6.17.7# grep 4K .config
> > # CONFIG_4KSTACKS is not set
> >
> >
> > Are there any known XFS problems with this kernel version and nearly
> > full partitions ?
>
> Yes. Deadlocks that weren't properly fixed until 2.6.18 (partially
> fixed in 2.6.17) and an accounting problem in the transaction code
> that leads to the shutdown you are seeing. The accounting problem is
> fixed by this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c34141126a89da07197d5b89c04c6847f1171a
>
> which I think went into 2.6.22.
>
> Luckily, neither of these problems result in corruption.
>
>
> > I'm thinking about upgrading the kernel to a newer version, to see if
> > it fixes this problem.
> > Are there any known XFS problems with version 2.6.24.2 ?
>
> Yes - a problem with readdir. The fix is currently in the stable
> queue (i.e for 2.6.24.3):
>
> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=commit;h=ee864b866419890b019352412c7bc9634d96f61b
>
> So we are just waiting for Greg to release 2.6.24.3 now.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
After being hit several times by the problem mentioned above (running
kernel 2.6.17.7),
I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
a 99% full partition:
df -k:
/dev/sdb1 286380096 282994528 3385568 99% /data
The rsync application will probably fail because it will most likely
run out of space,
but I got another xfs_trans_cancel kernel message:
Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xc021a010
Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
[<c0212678>] xfs_trans_cancel+0x5d/0xe6
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
[<c0222d70>] xfs_vn_mknod+0x165/0x243
[<c0217b9e>] xfs_access+0x2f/0x35
[<c0222e6d>] xfs_vn_mkdir+0x12/0x14
[<c016057b>] vfs_mkdir+0xa3/0xe2
[<c0160644>] sys_mkdirat+0x8a/0xc3
[<c016069c>] sys_mkdir+0x1f/0x23
[<c01025ee>] syscall_call+0x7/0xb
=======================
xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xc0212690
Filesystem "sdb1": Corruption of in-memory data detected. Shutting
down filesystem: sdb1
Please umount the filesystem, and rectify the problem(s)
Trying to umount /dev/sdb1 fails (umount just hangs) .
Rebooting the system seems to hang also - and I believe the kernel
outputs this message
when trying to umount /dev/sdb1:
xfs_force_shutdown(sdb1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.
Return address = 0xc021cb21
After waiting 5 minutes I power-cycle the system to bring it back up.
After the restart, I ran:
xfs_check /dev/sdb1
(there was no output from xfs_check).
Could this be the same problem I experienced with 2.6.17.7 ?
Thanks
Christian
btw - I've previously run memtest overnight and not found any memory problems.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-05 13:53 ` Christian Røsnes
@ 2008-03-06 11:10 ` Christian Røsnes
2008-03-07 11:19 ` Christian Røsnes
2008-03-09 22:59 ` David Chinner
1 sibling, 1 reply; 20+ messages in thread
From: Christian Røsnes @ 2008-03-06 11:10 UTC (permalink / raw)
To: xfs
On Wed, Mar 5, 2008 at 2:53 PM, Christian Røsnes
<christian.rosnes@gmail.com> wrote:
> > On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
> > > Over the past month I've been hit with two cases of "xfs_trans_cancel
> > > at line 1150"
> > > The two errors occurred on different raid sets. In both cases the
> > > error happened during
> > > rsync from a remote server to this server, and the local partition
> > > which reported
> > > the error was 99% full (as reported by df -k, see below for details).
> > >
> > > System: Dell 2850
> > > Mem: 4GB RAM
> > > OS: Debian 3 (32-bit)
> > > Kernel: 2.6.17.7 (custom compiled)
> > >
>
> After being hit several times by the problem mentioned above (running
> kernel 2.6.17.7),
> I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
> a 99% full partition:
>
> df -k:
> /dev/sdb1 286380096 282994528 3385568 99% /data
>
> The rsync application will probably fail because it will most likely
> run out of space,
> but I got another xfs_trans_cancel kernel message:
>
> Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xc021a010
> Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
> [<c0212678>] xfs_trans_cancel+0x5d/0xe6
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
> [<c0222d70>] xfs_vn_mknod+0x165/0x243
> [<c0217b9e>] xfs_access+0x2f/0x35
> [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
> [<c016057b>] vfs_mkdir+0xa3/0xe2
> [<c0160644>] sys_mkdirat+0x8a/0xc3
> [<c016069c>] sys_mkdir+0x1f/0x23
> [<c01025ee>] syscall_call+0x7/0xb
> =======================
> xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
> fs/xfs/xfs_trans.c. Return address = 0xc0212690
>
> Filesystem "sdb1": Corruption of in-memory data detected. Shutting
> down filesystem: sdb1
> Please umount the filesystem, and rectify the problem(s)
>
Actually, a single mkdir command is enough to trigger the filesystem
shutdown when its 99% full (according to df -k):
/data# mkdir test
mkdir: cannot create directory `test': No space left on device
Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xc021a010
Pid: 23380, comm: mkdir Not tainted 2.6.24.3FC #1
[<c0212678>] xfs_trans_cancel+0x5d/0xe6
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
[<c0222d70>] xfs_vn_mknod+0x165/0x243
[<c0217b9e>] xfs_access+0x2f/0x35
[<c0222e6d>] xfs_vn_mkdir+0x12/0x14
[<c016057b>] vfs_mkdir+0xa3/0xe2
[<c0160644>] sys_mkdirat+0x8a/0xc3
[<c016069c>] sys_mkdir+0x1f/0x23
[<c01025ee>] syscall_call+0x7/0xb
[<c03b0000>] atm_reset_addr+0xd/0x83
=======================
xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xc0212690
Filesystem "sdb1": Corruption of in-memory data detected. Shutting
down filesystem: sdb1
Please umount the filesystem, and rectify the problem(s)
df -k
-----
/dev/sdb1 286380096 282994528 3385568 99% /data
df -i
-----
/dev/sdb1 10341248 3570112 6771136 35% /data
xfs_info
--------
meta-data=/dev/sdb1 isize=512 agcount=16, agsize=4476752 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=71627792, imaxpct=25
= sunit=16 swidth=32 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=16 blks, lazy-count=0
realtime =none extsz=65536 blocks=0, rtextents=0
xfs_db -r -c 'sb 0' -c p /dev/sdb1
----------------------------------
magicnum = 0x58465342
blocksize = 4096
dblocks = 71627792
rblocks = 0
rextents = 0
uuid = d16489ab-4898-48c2-8345-6334af943b2d
logstart = 67108880
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 16
agblocks = 4476752
agcount = 16
rbmblocks = 0
logblocks = 32768
versionnum = 0x3584
sectsize = 512
inodesize = 512
inopblock = 8
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 9
inopblog = 3
agblklog = 23
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 3570112
ifree = 0
fdblocks = 847484
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 16
width = 32
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 65536
features2 = 0
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-06 11:10 ` Christian Røsnes
@ 2008-03-07 11:19 ` Christian Røsnes
2008-03-10 0:08 ` David Chinner
0 siblings, 1 reply; 20+ messages in thread
From: Christian Røsnes @ 2008-03-07 11:19 UTC (permalink / raw)
To: xfs
On Thu, Mar 6, 2008 at 12:10 PM, Christian Røsnes
<christian.rosnes@gmail.com> wrote:
> On Wed, Mar 5, 2008 at 2:53 PM, Christian Røsnes
> <christian.rosnes@gmail.com> wrote:
> > > On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
> > > > Over the past month I've been hit with two cases of "xfs_trans_cancel
> > > > at line 1150"
> > > > The two errors occurred on different raid sets. In both cases the
> > > > error happened during
> > > > rsync from a remote server to this server, and the local partition
> > > > which reported
> > > > the error was 99% full (as reported by df -k, see below for details).
> > > >
> > > > System: Dell 2850
> > > > Mem: 4GB RAM
> > > > OS: Debian 3 (32-bit)
> > > > Kernel: 2.6.17.7 (custom compiled)
> > > >
> >
>
>
> > After being hit several times by the problem mentioned above (running
> > kernel 2.6.17.7),
> > I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
> > a 99% full partition:
> >
> > df -k:
> > /dev/sdb1 286380096 282994528 3385568 99% /data
> >
> > The rsync application will probably fail because it will most likely
> > run out of space,
> > but I got another xfs_trans_cancel kernel message:
> >
> > Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
> > file fs/xfs/xfs_trans.c. Caller 0xc021a010
> > Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
> > [<c0212678>] xfs_trans_cancel+0x5d/0xe6
> > [<c021a010>] xfs_mkdir+0x45a/0x493
> > [<c021a010>] xfs_mkdir+0x45a/0x493
> > [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
> > [<c0222d70>] xfs_vn_mknod+0x165/0x243
> > [<c0217b9e>] xfs_access+0x2f/0x35
> > [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
> > [<c016057b>] vfs_mkdir+0xa3/0xe2
> > [<c0160644>] sys_mkdirat+0x8a/0xc3
> > [<c016069c>] sys_mkdir+0x1f/0x23
> > [<c01025ee>] syscall_call+0x7/0xb
> > =======================
> > xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
> > fs/xfs/xfs_trans.c. Return address = 0xc0212690
> >
> > Filesystem "sdb1": Corruption of in-memory data detected. Shutting
> > down filesystem: sdb1
> > Please umount the filesystem, and rectify the problem(s)
> >
>
> Actually, a single mkdir command is enough to trigger the filesystem
> shutdown when its 99% full (according to df -k):
>
> /data# mkdir test
> mkdir: cannot create directory `test': No space left on device
>
>
>
> Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xc021a010
> Pid: 23380, comm: mkdir Not tainted 2.6.24.3FC #1
>
> [<c0212678>] xfs_trans_cancel+0x5d/0xe6
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
> [<c0222d70>] xfs_vn_mknod+0x165/0x243
> [<c0217b9e>] xfs_access+0x2f/0x35
> [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
> [<c016057b>] vfs_mkdir+0xa3/0xe2
> [<c0160644>] sys_mkdirat+0x8a/0xc3
> [<c016069c>] sys_mkdir+0x1f/0x23
> [<c01025ee>] syscall_call+0x7/0xb
> [<c03b0000>] atm_reset_addr+0xd/0x83
>
> =======================
> xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
> fs/xfs/xfs_trans.c. Return address = 0xc0212690
> Filesystem "sdb1": Corruption of in-memory data detected. Shutting
> down filesystem: sdb1
> Please umount the filesystem, and rectify the problem(s)
>
>
> df -k
> -----
> /dev/sdb1 286380096 282994528 3385568 99% /data
>
> df -i
> -----
> /dev/sdb1 10341248 3570112 6771136 35% /data
>
>
> xfs_info
> --------
> meta-data=/dev/sdb1 isize=512 agcount=16, agsize=4476752 blks
> = sectsz=512 attr=0
> data = bsize=4096 blocks=71627792, imaxpct=25
> = sunit=16 swidth=32 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=2
> = sectsz=512 sunit=16 blks, lazy-count=0
> realtime =none extsz=65536 blocks=0, rtextents=0
>
> xfs_db -r -c 'sb 0' -c p /dev/sdb1
> ----------------------------------
> magicnum = 0x58465342
> blocksize = 4096
> dblocks = 71627792
> rblocks = 0
> rextents = 0
> uuid = d16489ab-4898-48c2-8345-6334af943b2d
> logstart = 67108880
> rootino = 128
> rbmino = 129
> rsumino = 130
> rextsize = 16
> agblocks = 4476752
> agcount = 16
> rbmblocks = 0
> logblocks = 32768
> versionnum = 0x3584
> sectsize = 512
> inodesize = 512
> inopblock = 8
> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
> blocklog = 12
> sectlog = 9
> inodelog = 9
> inopblog = 3
> agblklog = 23
> rextslog = 0
> inprogress = 0
> imax_pct = 25
> icount = 3570112
> ifree = 0
> fdblocks = 847484
> frextents = 0
> uquotino = 0
> gquotino = 0
> qflags = 0
> flags = 0
> shared_vn = 0
> inoalignmt = 2
> unit = 16
> width = 32
> dirblklog = 0
> logsectlog = 0
> logsectsize = 0
> logsunit = 65536
> features2 = 0
>
Instrumenting the code, I found that this occurs on my system when I
do a 'mkdir /data/test' on the partition in question:
in xfs_mkdir (xfs_vnodeops.c):
error = xfs_dir_ialloc(&tp, dp, mode, 2,
0, credp, prid, resblks > 0,
&cdp, NULL);
if (error) {
if (error == ENOSPC)
goto error_return; <=== this is hit and then
execution jumps to error_return
goto abort_return;
}
Is this the correct behavior for this type of situation: mkdir command
fails due to no available space on filesystem,
and xfs_mkdir goes to label error_return ? (And after this the
filesystem is shutdown)
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-05 13:53 ` Christian Røsnes
2008-03-06 11:10 ` Christian Røsnes
@ 2008-03-09 22:59 ` David Chinner
1 sibling, 0 replies; 20+ messages in thread
From: David Chinner @ 2008-03-09 22:59 UTC (permalink / raw)
To: Christian Røsnes; +Cc: David Chinner, xfs
On Wed, Mar 05, 2008 at 02:53:18PM +0100, Christian Røsnes wrote:
> On Wed, Feb 13, 2008 at 10:45 PM, David Chinner <dgc@sgi.com> wrote:
> After being hit several times by the problem mentioned above (running
> kernel 2.6.17.7),
> I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
> a 99% full partition:
>
> df -k:
> /dev/sdb1 286380096 282994528 3385568 99% /data
>
> The rsync application will probably fail because it will most likely
> run out of space,
> but I got another xfs_trans_cancel kernel message:
>
> Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xc021a010
> Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
> [<c0212678>] xfs_trans_cancel+0x5d/0xe6
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c021a010>] xfs_mkdir+0x45a/0x493
> [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
> [<c0222d70>] xfs_vn_mknod+0x165/0x243
> [<c0217b9e>] xfs_access+0x2f/0x35
> [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
> [<c016057b>] vfs_mkdir+0xa3/0xe2
> [<c0160644>] sys_mkdirat+0x8a/0xc3
> [<c016069c>] sys_mkdir+0x1f/0x23
> [<c01025ee>] syscall_call+0x7/0xb
> =======================
> xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
> fs/xfs/xfs_trans.c. Return address = 0xc0212690
> Filesystem "sdb1": Corruption of in-memory data detected. Shutting
> down filesystem: sdb1
> Please umount the filesystem, and rectify the problem(s)
Ok, so the problem still exists.
> Trying to umount /dev/sdb1 fails (umount just hangs) .
That shouldn't happen. Any output in the log when it hung? What
were the blocked process stack traces (/proc/sysrq-trigger is your friend)?
> Rebooting the system seems to hang also - and I believe the kernel
> outputs this message
> when trying to umount /dev/sdb1:
>
> xfs_force_shutdown(sdb1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.
> Return address = 0xc021cb21
It's already been shut down, right? An unmount should not trigger more
of these warnings...
>
> After waiting 5 minutes I power-cycle the system to bring it back up.
>
> After the restart, I ran:
>
> xfs_check /dev/sdb1
>
> (there was no output from xfs_check).
>
> Could this be the same problem I experienced with 2.6.17.7 ?
Yes, it likely is. Can you apply the patch below and reproduce the problem?
I can't reproduce the problem locally, so I'll need you to apply test patches
to isolate the error. I suspect a xfs_dir_canenter()/xfs_dir_createname()
with resblks == 0 issue, and the patch below will tell us if this is the
case. It annotates the error paths for both create and mkdir (the two places
I've seen this error occur), and what I am expecting to see is something
like:
xfs_create: dir_enter w/ 0 resblks ok.
xfs_create: dir_createname error 28
<shutdown>
Cheers,
Dave.
---
fs/xfs/xfs_vnodeops.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2008-02-22 17:40:04.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2008-03-10 09:53:43.658179381 +1100
@@ -1886,12 +1886,17 @@ xfs_create(
if (error)
goto error_return;
- if (resblks == 0 && (error = xfs_dir_canenter(tp, dp, name, namelen)))
- goto error_return;
+ if (!resblks) {
+ error = xfs_dir_canenter(tp, dp, name, namelen);
+ if (error)
+ goto error_return;
+ printk(KERN_WARNING "xfs_create: dir_enter w/ 0 resblks ok.\n");
+ }
error = xfs_dir_ialloc(&tp, dp, mode, 1,
rdev, credp, prid, resblks > 0,
&ip, &committed);
if (error) {
+ printk(KERN_WARNING "xfs_create: dir_ialloc error %d\n", error);
if (error == ENOSPC)
goto error_return;
goto abort_return;
@@ -1921,6 +1926,7 @@ xfs_create(
resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
if (error) {
ASSERT(error != ENOSPC);
+ printk(KERN_WARNING "xfs_create: dir_createname error %d\n", error);
goto abort_return;
}
xfs_ichgtime(dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -1955,6 +1961,7 @@ xfs_create(
error = xfs_bmap_finish(&tp, &free_list, &committed);
if (error) {
xfs_bmap_cancel(&free_list);
+ printk(KERN_WARNING "xfs_create: xfs_bmap_finish error %d\n", error);
goto abort_rele;
}
@@ -2727,9 +2734,12 @@ xfs_mkdir(
if (error)
goto error_return;
- if (resblks == 0 &&
- (error = xfs_dir_canenter(tp, dp, dir_name, dir_namelen)))
- goto error_return;
+ if (!resblks) {
+ error = xfs_dir_canenter(tp, dp, dir_name, dir_namelen);
+ if (error)
+ goto error_return;
+ printk(KERN_WARNING "xfs_mkdir: dir_enter w/ 0 resblks ok.\n");
+ }
/*
* create the directory inode.
*/
@@ -2737,6 +2747,7 @@ xfs_mkdir(
0, credp, prid, resblks > 0,
&cdp, NULL);
if (error) {
+ printk(KERN_WARNING "xfs_mkdir: dir_ialloc error %d\n", error);
if (error == ENOSPC)
goto error_return;
goto abort_return;
@@ -2761,6 +2772,7 @@ xfs_mkdir(
&first_block, &free_list, resblks ?
resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
if (error) {
+ printk(KERN_WARNING "xfs_mkdir: dir_createname error %d\n", error);
ASSERT(error != ENOSPC);
goto error1;
}
@@ -2805,6 +2817,7 @@ xfs_mkdir(
error = xfs_bmap_finish(&tp, &free_list, &committed);
if (error) {
+ printk(KERN_WARNING "xfs_mkdir: bmap_finish error %d\n", error);
IRELE(cdp);
goto error2;
}
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-07 11:19 ` Christian Røsnes
@ 2008-03-10 0:08 ` David Chinner
2008-03-10 8:34 ` Christian Røsnes
0 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-03-10 0:08 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
On Fri, Mar 07, 2008 at 12:19:28PM +0100, Christian Røsnes wrote:
> > Actually, a single mkdir command is enough to trigger the filesystem
> > shutdown when its 99% full (according to df -k):
> >
> > /data# mkdir test
> > mkdir: cannot create directory `test': No space left on device
Ok, that's helpful ;)
So, can you dump the directory inode with xfs_db? i.e.
# ls -ia /data
The directory inode is the inode at ".", and if this is the root of
the filesystem it will probably be 128.
Then run:
# xfs_db -r -c 'inode 128' -c p /dev/sdb1
> > --------
> > meta-data=/dev/sdb1 isize=512 agcount=16, agsize=4476752 blks
> > = sectsz=512 attr=0
> > data = bsize=4096 blocks=71627792, imaxpct=25
> > = sunit=16 swidth=32 blks, unwritten=1
> > naming =version 2 bsize=4096
> > log =internal bsize=4096 blocks=32768, version=2
> > = sectsz=512 sunit=16 blks, lazy-count=0
> > realtime =none extsz=65536 blocks=0, rtextents=0
> >
> > xfs_db -r -c 'sb 0' -c p /dev/sdb1
> > ----------------------------------
.....
> > fdblocks = 847484
Apparently there are still lots of free blocks. I wonder if you are out of
space in the metadata AGs.
Can you do this for me:
-------
#!/bin/bash
for i in `seq 0 1 15`; do
echo freespace histogram for AG $i
xfs_db -r -c "freesp -bs -a $i" /dev/sdb1
done
------
> Instrumenting the code, I found that this occurs on my system when I
> do a 'mkdir /data/test' on the partition in question:
>
> in xfs_mkdir (xfs_vnodeops.c):
>
> error = xfs_dir_ialloc(&tp, dp, mode, 2,
> 0, credp, prid, resblks > 0,
> &cdp, NULL);
>
> if (error) {
> if (error == ENOSPC)
> goto error_return; <=== this is hit and then
> execution jumps to error_return
> goto abort_return;
> }
Ah - you can ignore my last email, then. You're already one step ahead
of me ;)
This does not appear to be the case I was expecting, though I can
see how we can get an ENOSPC here with plenty of blocks free - none
are large enough to allocate an inode chunk. What would be worth
knowing is the value of resblks when this error is reported.
This tends to imply we are returning an ENOSPC with a dirty
transaction. Right now I can't see how that would occur, though
the fragmented free space is something I can try to reproduce with.
> Is this the correct behavior for this type of situation: mkdir command
> fails due to no available space on filesystem,
> and xfs_mkdir goes to label error_return ? (And after this the
> filesystem is shutdown)
The filesystem should not be shutdown here. We need to trace through
further to the source of the error....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-10 0:08 ` David Chinner
@ 2008-03-10 8:34 ` Christian Røsnes
2008-03-10 10:02 ` Christian Røsnes
2008-03-10 22:21 ` David Chinner
0 siblings, 2 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-03-10 8:34 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
Thanks for the help, David. Answers below:
On Mon, Mar 10, 2008 at 1:08 AM, David Chinner <dgc@sgi.com> wrote:
> On Fri, Mar 07, 2008 at 12:19:28PM +0100, Christian Røsnes wrote:
> > > Actually, a single mkdir command is enough to trigger the filesystem
> > > shutdown when its 99% full (according to df -k):
> > >
> > > /data# mkdir test
> > > mkdir: cannot create directory `test': No space left on device
>
> Ok, that's helpful ;)
>
> So, can you dump the directory inode with xfs_db? i.e.
>
> # ls -ia /data
# ls -ia /data
128 . 128 .. 131 content 149256847 rsync
>
> The directory inode is the inode at ".", and if this is the root of
> the filesystem it will probably be 128.
>
> Then run:
>
> # xfs_db -r -c 'inode 128' -c p /dev/sdb1
>
# xfs_db -r -c 'inode 128' -c p /dev/sdb1
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 1 (local)
core.nlinkv1 = 4
core.uid = 0
core.gid = 0
core.flushiter = 47007
core.atime.sec = Wed Oct 19 12:14:10 2005
core.atime.nsec = 640092000
core.mtime.sec = Fri Dec 15 10:27:21 2006
core.mtime.nsec = 624437500
core.ctime.sec = Fri Dec 15 10:27:21 2006
core.ctime.nsec = 624437500
core.size = 32
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 0
next_unlinked = null
u.sfdir2.hdr.count = 2
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 128
u.sfdir2.list[0].namelen = 7
u.sfdir2.list[0].offset = 0x30
u.sfdir2.list[0].name = "content"
u.sfdir2.list[0].inumber.i4 = 131
u.sfdir2.list[1].namelen = 5
u.sfdir2.list[1].offset = 0x48
u.sfdir2.list[1].name = "rsync"
u.sfdir2.list[1].inumber.i4 = 149256847
>
> > > --------
> > > meta-data=/dev/sdb1 isize=512 agcount=16, agsize=4476752 blks
> > > = sectsz=512 attr=0
> > > data = bsize=4096 blocks=71627792, imaxpct=25
> > > = sunit=16 swidth=32 blks, unwritten=1
> > > naming =version 2 bsize=4096
> > > log =internal bsize=4096 blocks=32768, version=2
> > > = sectsz=512 sunit=16 blks, lazy-count=0
> > > realtime =none extsz=65536 blocks=0, rtextents=0
> > >
> > > xfs_db -r -c 'sb 0' -c p /dev/sdb1
> > > ----------------------------------
> .....
> > > fdblocks = 847484
>
> Apparently there are still lots of free blocks. I wonder if you are out of
> space in the metadata AGs.
>
> Can you do this for me:
>
> -------
> #!/bin/bash
>
> for i in `seq 0 1 15`; do
> echo freespace histogram for AG $i
> xfs_db -r -c "freesp -bs -a $i" /dev/sdb1
> done
> ------
>
# for i in `seq 0 1 15`; do
> echo freespace histogram for AG $i
> xfs_db -r -c "freesp -bs -a $i" /dev/sdb1
> done
freespace histogram for AG 0
from to extents blocks pct
1 1 2098 2098 3.77
2 3 8032 16979 30.54
4 7 6158 33609 60.46
8 15 363 2904 5.22
total free extents 16651
total free blocks 55590
average free extent size 3.33854
freespace histogram for AG 1
from to extents blocks pct
1 1 2343 2343 3.90
2 3 9868 21070 35.10
4 7 6000 34535 57.52
8 15 261 2088 3.48
total free extents 18472
total free blocks 60036
average free extent size 3.25011
freespace histogram for AG 2
from to extents blocks pct
1 1 1206 1206 10.55
2 3 3919 8012 70.10
4 7 394 2211 19.35
total free extents 5519
total free blocks 11429
average free extent size 2.07085
freespace histogram for AG 3
from to extents blocks pct
1 1 3179 3179 8.48
2 3 14689 29736 79.35
4 7 820 4560 12.17
total free extents 18688
total free blocks 37475
average free extent size 2.0053
freespace histogram for AG 4
from to extents blocks pct
1 1 4113 4113 9.62
2 3 10685 22421 52.45
4 7 2951 16212 37.93
total free extents 17749
total free blocks 42746
average free extent size 2.40836
freespace histogram for AG 5
from to extents blocks pct
1 1 2909 2909 4.23
2 3 20370 41842 60.81
4 7 3973 23861 34.68
8 15 24 192 0.28
total free extents 27276
total free blocks 68804
average free extent size 2.52251
freespace histogram for AG 6
from to extents blocks pct
1 1 3577 3577 4.86
2 3 18592 38577 52.43
4 7 4427 25764 35.02
8 15 707 5656 7.69
total free extents 27303
total free blocks 73574
average free extent size 2.69472
freespace histogram for AG 7
from to extents blocks pct
1 1 2634 2634 9.14
2 3 11928 24349 84.48
4 7 366 1840 6.38
total free extents 14928
total free blocks 28823
average free extent size 1.9308
freespace histogram for AG 8
from to extents blocks pct
1 1 6473 6473 6.39
2 3 22020 46190 45.61
4 7 7343 40137 39.64
8 15 1058 8464 8.36
total free extents 36894
total free blocks 101264
average free extent size 2.74473
freespace histogram for AG 9
from to extents blocks pct
1 1 2165 2165 2.22
2 3 15746 33317 34.20
4 7 9402 55502 56.97
8 15 805 6440 6.61
total free extents 28118
total free blocks 97424
average free extent size 3.46483
freespace histogram for AG 10
from to extents blocks pct
1 1 5886 5886 9.46
2 3 13682 29881 48.01
4 7 4561 23919 38.43
8 15 319 2552 4.10
total free extents 24448
total free blocks 62238
average free extent size 2.54573
freespace histogram for AG 11
from to extents blocks pct
1 1 4197 4197 7.47
2 3 8421 18061 32.14
4 7 4336 24145 42.97
8 15 1224 9792 17.43
total free extents 18178
total free blocks 56195
average free extent size 3.09137
freespace histogram for AG 12
from to extents blocks pct
1 1 310 310 90.64
2 3 16 32 9.36
total free extents 326
total free blocks 342
average free extent size 1.04908
freespace histogram for AG 13
from to extents blocks pct
1 1 4845 4845 22.31
2 3 7533 16873 77.69
total free extents 12378
total free blocks 21718
average free extent size 1.75456
freespace histogram for AG 14
from to extents blocks pct
1 1 3572 3572 6.50
2 3 17437 36656 66.72
4 7 2702 14711 26.78
total free extents 23711
total free blocks 54939
average free extent size 2.31703
freespace histogram for AG 15
from to extents blocks pct
1 1 4568 4568 6.24
2 3 13400 28983 39.62
4 7 6992 39606 54.14
total free extents 24960
total free blocks 73157
average free extent size 2.93097
>
> > Instrumenting the code, I found that this occurs on my system when I
> > do a 'mkdir /data/test' on the partition in question:
> >
> > in xfs_mkdir (xfs_vnodeops.c):
> >
> > error = xfs_dir_ialloc(&tp, dp, mode, 2,
> > 0, credp, prid, resblks > 0,
> > &cdp, NULL);
> >
> > if (error) {
> > if (error == ENOSPC)
> > goto error_return; <=== this is hit and then
> > execution jumps to error_return
> > goto abort_return;
> > }
>
> Ah - you can ignore my last email, then. You're already one step ahead
> of me ;)
>
> This does not appear to be the case I was expecting, though I can
> see how we can get an ENOSPC here with plenty of blocks free - none
> are large enough to allocate an inode chunk. What would be worth
> knowing is the value of resblks when this error is reported.
Ok. I'll see if I can print it out.
>
> This tends to imply we are returning an ENOSPC with a dirty
> transaction. Right now I can't see how that would occur, though
> the fragmented free space is something I can try to reproduce with.
ok
>
>
> > Is this the correct behavior for this type of situation: mkdir command
> > fails due to no available space on filesystem,
> > and xfs_mkdir goes to label error_return ? (And after this the
> > filesystem is shutdown)
>
> The filesystem should not be shutdown here. We need to trace through
> further to the source of the error....
>
ok
Btw - to debug this on a test-system, can I do a dd if=/dev/sdb1 or dd
if=/dev/sdb,
and output it to an image which is then loopback mounted on the test-system ?
Ie. is there some sort of "best practice" on how to copy this
partition to a test-system
for further testing ?
Thanks
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-10 8:34 ` Christian Røsnes
@ 2008-03-10 10:02 ` Christian Røsnes
2008-03-10 22:21 ` David Chinner
1 sibling, 0 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-03-10 10:02 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Mon, Mar 10, 2008 at 9:34 AM, Christian Røsnes
<christian.rosnes@gmail.com> wrote:
> On Mon, Mar 10, 2008 at 1:08 AM, David Chinner <dgc@sgi.com> wrote:
> > On Fri, Mar 07, 2008 at 12:19:28PM +0100, Christian Røsnes wrote:
> >
> > > Instrumenting the code, I found that this occurs on my system when I
> > > do a 'mkdir /data/test' on the partition in question:
> > >
> > > in xfs_mkdir (xfs_vnodeops.c):
> > >
> > > error = xfs_dir_ialloc(&tp, dp, mode, 2,
> > > 0, credp, prid, resblks > 0,
> > > &cdp, NULL);
> > >
> > > if (error) {
> > > if (error == ENOSPC)
> > > goto error_return; <=== this is hit and then
> > > execution jumps to error_return
> > > goto abort_return;
> > > }
> >
> > Ah - you can ignore my last email, then. You're already one step ahead
> > of me ;)
> >
> > This does not appear to be the case I was expecting, though I can
> > see how we can get an ENOSPC here with plenty of blocks free - none
> > are large enough to allocate an inode chunk. What would be worth
> > knowing is the value of resblks when this error is reported.
>
> Ok. I'll see if I can print it out.
Ok. I added printk statments to xfs_mkdir in xfs_vnodeops.c:
'resblks=45' is the value returned by:
resblks = XFS_MKDIR_SPACE_RES(mp, dir_namelen);
and this is the value when the error_return label is called.
--
and inside xfs_dir_ialloc (file: xfs_utils.c) this is where it returns
...
code = xfs_ialloc(tp, dp, mode, nlink, rdev, credp, prid, okalloc,
&ialloc_context, &call_again, &ip);
/*
* Return an error if we were unable to allocate a new inode.
* This should only happen if we run out of space on disk or
* encounter a disk error.
*/
if (code) {
*ipp = NULL;
return code;
}
if (!call_again && (ip == NULL)) {
*ipp = NULL;
return XFS_ERROR(ENOSPC); <============== returns here
}
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-10 8:34 ` Christian Røsnes
2008-03-10 10:02 ` Christian Røsnes
@ 2008-03-10 22:21 ` David Chinner
2008-03-11 8:08 ` Christian Røsnes
1 sibling, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-03-10 22:21 UTC (permalink / raw)
To: Christian Røsnes; +Cc: David Chinner, xfs
On Mon, Mar 10, 2008 at 09:34:14AM +0100, Christian Røsnes wrote:
> On Mon, Mar 10, 2008 at 1:08 AM, David Chinner <dgc@sgi.com> wrote:
> > On Fri, Mar 07, 2008 at 12:19:28PM +0100, Christian Røsnes wrote:
> > > > Actually, a single mkdir command is enough to trigger the filesystem
> > > > shutdown when its 99% full (according to df -k):
> > > >
> > > > /data# mkdir test
> > > > mkdir: cannot create directory `test': No space left on device
> >
> > Ok, that's helpful ;)
> > So, can you dump the directory inode with xfs_db? i.e.
> > # ls -ia /data
>
> # ls -ia /data
> 128 . 128 .. 131 content 149256847 rsync
>
> > The directory inode is the inode at ".", and if this is the root of
> > the filesystem it will probably be 128. Then run:
> > # xfs_db -r -c 'inode 128' -c p /dev/sdb1
>
> # xfs_db -r -c 'inode 128' -c p /dev/sdb1
> core.magic = 0x494e
> core.mode = 040755
> core.version = 1
> core.format = 1 (local)
.....
> core.size = 32
....
> u.sfdir2.hdr.count = 2
> u.sfdir2.hdr.i8count = 0
> u.sfdir2.hdr.parent.i4 = 128
> u.sfdir2.list[0].namelen = 7
> u.sfdir2.list[0].offset = 0x30
> u.sfdir2.list[0].name = "content"
> u.sfdir2.list[0].inumber.i4 = 131
> u.sfdir2.list[1].namelen = 5
> u.sfdir2.list[1].offset = 0x48
> u.sfdir2.list[1].name = "rsync"
> u.sfdir2.list[1].inumber.i4 = 149256847
Ok, so a shortform directory still with heaps of space in it. so
it's definitely not a directory namespace creation issue.
> > > > xfs_db -r -c 'sb 0' -c p /dev/sdb1
> > > > ----------------------------------
> > .....
> > > > fdblocks = 847484
> >
> > Apparently there are still lots of free blocks. I wonder if you are out of
> > space in the metadata AGs.
> >
> > Can you do this for me:
> >
> > -------
> > #!/bin/bash
> >
> > for i in `seq 0 1 15`; do
> > echo freespace histogram for AG $i
> > xfs_db -r -c "freesp -bs -a $i" /dev/sdb1
> > done
> > ------
> freespace histogram for AG 0
> from to extents blocks pct
> 1 1 2098 2098 3.77
> 2 3 8032 16979 30.54
> 4 7 6158 33609 60.46
> 8 15 363 2904 5.22
So with 256 byte inodes, we need a 16k allocation or a 4 block extent.
There's plenty of extents large enough to use for that, so it's
not an inode chunk allocation error.
> Btw - to debug this on a test-system, can I do a dd if=/dev/sdb1 or dd
> if=/dev/sdb,
> and output it to an image which is then loopback mounted on the test-system ?
That would work. Use /dev/sdb1 as the source so all you copy are
filesystem blocks.
> Ie. is there some sort of "best practice" on how to copy this
> partition to a test-system
> for further testing ?
Do what fit's your needs - for debugging identical images are generally
best. For debugging metadata or repair problems, xfs_metadump works
very well (replaces data with zeros, though), and for imaging purposes
xfs_copy is very efficient.
On Mon, Mar 10, 2008 at 11:02:28AM +0100, Christian Røsnes wrote:
> On Mon, Mar 10, 2008 at 9:34 AM, Christian Røsnes
> <christian.rosnes@gmail.com> wrote:
> > On Mon, Mar 10, 2008 at 1:08 AM, David Chinner <dgc@sgi.com> wrote:
> > > This does not appear to be the case I was expecting, though I can
> > > see how we can get an ENOSPC here with plenty of blocks free - none
> > > are large enough to allocate an inode chunk. What would be worth
> > > knowing is the value of resblks when this error is reported.
> >
> > Ok. I'll see if I can print it out.
>
> Ok. I added printk statments to xfs_mkdir in xfs_vnodeops.c:
>
> 'resblks=45' is the value returned by:
>
> resblks = XFS_MKDIR_SPACE_RES(mp, dir_namelen);
>
> and this is the value when the error_return label is called.
That confirms we're not out of directory space or filesystem space.
> --
>
> and inside xfs_dir_ialloc (file: xfs_utils.c) this is where it returns
>
> ...
>
> code = xfs_ialloc(tp, dp, mode, nlink, rdev, credp, prid, okalloc,
> &ialloc_context, &call_again, &ip);
>
> /*
> * Return an error if we were unable to allocate a new inode.
> * This should only happen if we run out of space on disk or
> * encounter a disk error.
> */
> if (code) {
> *ipp = NULL;
> return code;
> }
> if (!call_again && (ip == NULL)) {
> *ipp = NULL;
> return XFS_ERROR(ENOSPC); <============== returns here
> }
Interesting. That implies that xfs_ialloc() failed here:
1053 /*
1054 * Call the space management code to pick
1055 * the on-disk inode to be allocated.
1056 */
1057 error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
1058 ialloc_context, call_again, &ino);
1059 if (error != 0) {
1060 return error;
1061 }
1062 if (*call_again || ino == NULLFSINO) { <<<<<<<<<<<<<<<<
1063 *ipp = NULL;
1064 return 0;
1065 }
Which means that xfs_dialloc() failed without ian error or setting
*call_again but setting ino == NULLFSINO. That leaves these possible
failure places:
544 agbp = xfs_ialloc_ag_select(tp, parent, mode, okalloc);
545 /*
546 * Couldn't find an allocation group satisfying the
547 * criteria, give up.
548 */
549 if (!agbp) {
550 *inop = NULLFSINO;
551 >>>>>>>>>> return 0;
552 }
........
572 /*
573 * If we have already hit the ceiling of inode blocks then clear
574 * okalloc so we scan all available agi structures for a free
575 * inode.
576 */
577
578 if (mp->m_maxicount &&
579 mp->m_sb.sb_icount + XFS_IALLOC_INODES(mp) > mp->m_maxicount) {
580 noroom = 1;
581 okalloc = 0;
582 }
........
600 if ((error = xfs_ialloc_ag_alloc(tp, agbp, &ialloced))) {
601 xfs_trans_brelse(tp, agbp);
602 if (error == ENOSPC) {
603 *inop = NULLFSINO;
604 >>>>>>>>>> return 0;
605 } else
606 return error;
........
629 nextag:
630 if (++tagno == agcount)
631 tagno = 0;
632 if (tagno == agno) {
633 *inop = NULLFSINO;
634 >>>>>>>>>> return noroom ? ENOSPC : 0;
635 }
Note that for the last case, we don't know what the value of "noroom" is.
noroom gets set to 1 if we've reached the maximum number of inodes in the
filesystem. Fromteh earlier superblock dump you did:
> dblocks = 71627792
.....
> inopblog = 3
.....
> imax_pct = 25
> icount = 3570112
> ifree = 0
and the code that calculates this is:
icount = sbp->sb_dblocks * sbp->sb_imax_pct;
do_div(icount, 100);
do_div(icount, mp->m_ialloc_blks);
mp->m_maxicount = (icount * mp->m_ialloc_blks) <<
sbp->sb_inopblog;
therefore:
m_maxicount = (((((71627792 * 25) / 100) / 4) * 4) << 3)
= 143,255,584
which is way larger than the 3,570,112 that you have already allocated.
Hence I think that noroom == 0 and the last chunk of code above is
a possibility.
Further - we need to allocate new inodes as there are none free. That
implies we are calling xfs_ialloc_ag_alloc(). Taking a stab in the
dark, I suspect that we are not getting an error from xfs_ialloc_ag_alloc()
but we are not allocating inode chunks. Why?
Back to the superblock:
> unit = 16
> width = 32
You've got a filesystem with stripe alignment set. In xfs_ialloc_ag_alloc()
we attempt inode allocation by the following rules:
1. a) If we haven't previously allocated inodes, fall through to 2.
b) If we have previously allocated inode, attempt to allocate next
to the last inode chunk.
2. If we do not have an extent now:
a) if we have stripe alignment, try with alignment
b) if we don't have stripe alignment try cluster alignment
3. If we do not have an extent now:
a) if we have stripe alignment, try with cluster alignment
b) no stripe alignment, turn off alignment.
4. If we do not have an extent now: FAIL.
Note the case missing from the stripe alignment fallback path - it does not
try without alignment at all. That means if all those extents large enough
that we found above are not correctly aligned, then we will still fail
to allocate an inode chunk. if all the AGs are like this, then we'll
fail to allocate at all and fall out of xfs_dialloc() through the last
fragment I quoted above.
As to the shutdown that this triggers - the attempt to allocate dirties
the AGFL and the AGF by moving free blocks into the free list for btree
splits and cancelling a dirty transaction results in a shutdown.
Now, to test this theory. ;) Luckily, it's easy to test. mount the
filesystem with the mount option "noalign" and rerun the mkdir test.
If it is an alignment problem, then setting noalign will prevent
this ENOSPC and shutdown as the filesystem will be able to allocate
more inodes.
Can you test this for me, Christian?
Cheers,
Dave.
>
>
> Christian
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-10 22:21 ` David Chinner
@ 2008-03-11 8:08 ` Christian Røsnes
2008-03-11 9:34 ` David Chinner
0 siblings, 1 reply; 20+ messages in thread
From: Christian Røsnes @ 2008-03-11 8:08 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Mon, Mar 10, 2008 at 11:21 PM, David Chinner <dgc@sgi.com> wrote:
> You've got a filesystem with stripe alignment set. In xfs_ialloc_ag_alloc()
> we attempt inode allocation by the following rules:
>
> 1. a) If we haven't previously allocated inodes, fall through to 2.
> b) If we have previously allocated inode, attempt to allocate next
> to the last inode chunk.
>
> 2. If we do not have an extent now:
> a) if we have stripe alignment, try with alignment
> b) if we don't have stripe alignment try cluster alignment
>
> 3. If we do not have an extent now:
> a) if we have stripe alignment, try with cluster alignment
> b) no stripe alignment, turn off alignment.
>
> 4. If we do not have an extent now: FAIL.
>
> Note the case missing from the stripe alignment fallback path - it does not
> try without alignment at all. That means if all those extents large enough
> that we found above are not correctly aligned, then we will still fail
> to allocate an inode chunk. if all the AGs are like this, then we'll
> fail to allocate at all and fall out of xfs_dialloc() through the last
> fragment I quoted above.
>
> As to the shutdown that this triggers - the attempt to allocate dirties
> the AGFL and the AGF by moving free blocks into the free list for btree
> splits and cancelling a dirty transaction results in a shutdown.
>
> Now, to test this theory. ;) Luckily, it's easy to test. mount the
> filesystem with the mount option "noalign" and rerun the mkdir test.
> If it is an alignment problem, then setting noalign will prevent
> this ENOSPC and shutdown as the filesystem will be able to allocate
> more inodes.
>
> Can you test this for me, Christian?
Thanks. Unfortunately noalign didn't solve my problem:
# mount | grep /data
/dev/sdb1 on /data type xfs (rw,noatime,noalign,logbufs=8,nobarrier)
# mkdir /data/test
results in:
Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xc021a010
Pid: 17889, comm: mkdir Not tainted 2.6.24.3FC #7
[<c0212678>] xfs_trans_cancel+0x5d/0xe6
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c021a010>] xfs_mkdir+0x45a/0x493
[<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
[<c0222d70>] xfs_vn_mknod+0x165/0x243
[<c0217b9e>] xfs_access+0x2f/0x35
[<c0222e6d>] xfs_vn_mkdir+0x12/0x14
[<c016057b>] vfs_mkdir+0xa3/0xe2
[<c0160644>] sys_mkdirat+0x8a/0xc3
[<c016069c>] sys_mkdir+0x1f/0x23
[<c01025ee>] syscall_call+0x7/0xb
[<c03b0000>] atm_reset_addr+0xd/0x83
=======================
xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xc0212690
Filesystem "sdb1": Corruption of in-memory data detected. Shutting
down filesystem: sdb1
Please umount the filesystem, and rectify the problem(s)
I'll try to add some printk statements to the codepaths you mentioned,
and see where it leads.
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-11 8:08 ` Christian Røsnes
@ 2008-03-11 9:34 ` David Chinner
2008-03-11 11:19 ` Christian Røsnes
0 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-03-11 9:34 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
On Tue, Mar 11, 2008 at 09:08:31AM +0100, Christian Røsnes wrote:
> On Mon, Mar 10, 2008 at 11:21 PM, David Chinner <dgc@sgi.com> wrote:
> > You've got a filesystem with stripe alignment set. In xfs_ialloc_ag_alloc()
> > we attempt inode allocation by the following rules:
> >
> > 1. a) If we haven't previously allocated inodes, fall through to 2.
> > b) If we have previously allocated inode, attempt to allocate next
> > to the last inode chunk.
> >
> > 2. If we do not have an extent now:
> > a) if we have stripe alignment, try with alignment
> > b) if we don't have stripe alignment try cluster alignment
> >
> > 3. If we do not have an extent now:
> > a) if we have stripe alignment, try with cluster alignment
> > b) no stripe alignment, turn off alignment.
> >
> > 4. If we do not have an extent now: FAIL.
> >
> > Note the case missing from the stripe alignment fallback path - it does not
> > try without alignment at all. That means if all those extents large enough
> > that we found above are not correctly aligned, then we will still fail
> > to allocate an inode chunk. if all the AGs are like this, then we'll
> > fail to allocate at all and fall out of xfs_dialloc() through the last
> > fragment I quoted above.
> >
> > As to the shutdown that this triggers - the attempt to allocate dirties
> > the AGFL and the AGF by moving free blocks into the free list for btree
> > splits and cancelling a dirty transaction results in a shutdown.
> >
> > Now, to test this theory. ;) Luckily, it's easy to test. mount the
> > filesystem with the mount option "noalign" and rerun the mkdir test.
> > If it is an alignment problem, then setting noalign will prevent
> > this ENOSPC and shutdown as the filesystem will be able to allocate
> > more inodes.
> >
> > Can you test this for me, Christian?
>
> Thanks. Unfortunately noalign didn't solve my problem:
Ok, reading the code a bit further, I've mixed up m_sinoalign,
m_sinoalignmt and the noalign mount option. The noalign mount option
turns off m_sinoalign, but it does not turn off inode cluster
alignment, hence we can't fall back to an unaligned allocation.
So the above theory still holds, just the test case was broken.
Unfortunately, further investigation indicates that inodes are
always allocated aligned; I expect that I could count the number of
linux XFS filesystems not using inode allocation alignment because
mkfs.xfs has set this as the default since it was added in mid-1996.
The problem with unaligned inode allocation is the lookup case
(xfs_dilocate()) in that it requires btree lookups to convert the
inode number to a block number as you don't know where in the chunk
the inode exists just by looking at the inode number. With aligned
allocations, the block number can be derived directly from the inode
number because we know how the inode chunks are aligned.
IOWs, if we allow an unaligned inode chunk allocation to occur, we
have to strip the "aligned inode allocation" feature bit from the
filesystem and the related state and use the slow, btree based
lookup path forever more. That involves I/O instead of a simple
mask operation....
Hence I'm inclined to leave the allocation alignment as it stands
and work out how to prevent the shutdown (a difficult issue in
itself).
> I'll try to add some printk statements to the codepaths you mentioned,
> and see where it leads.
Definitely worth confirming this is where the error is coming from.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-11 9:34 ` David Chinner
@ 2008-03-11 11:19 ` Christian Røsnes
2008-03-11 12:21 ` David Chinner
0 siblings, 1 reply; 20+ messages in thread
From: Christian Røsnes @ 2008-03-11 11:19 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Tue, Mar 11, 2008 at 10:34 AM, David Chinner <dgc@sgi.com> wrote:
>
> On Tue, Mar 11, 2008 at 09:08:31AM +0100, Christian Røsnes wrote:
> > I'll try to add some printk statements to the codepaths you mentioned,
> > and see where it leads.
>
> Definitely worth confirming this is where the error is coming from.
>
if (tagno == agno) {
printk("XFS: xfs_dialloc:0021\n");
*inop = NULLFSINO;
return noroom ? ENOSPC : 0;
}
seems to be what triggers this inside xfs_dialloc.
Here a trace which give some indication to the codepath taken inside
xfs_dialloc (xfs_ialloc.c):
mount:
/dev/sdb1 on /data type xfs (rw,noatime,logbufs=8,nobarrier)
# mkdir /data/test
mkdir: cannot create directory `/data/test': No space left on device
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0001
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0002
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0003
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0005
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0006
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0010
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0012
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0013
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0014
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0018
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0020
Mar 11 11:47:00 linux kernel: XFS: xfs_dialloc:0021
Mar 11 11:47:00 linux kernel: Filesystem "sdb1": XFS internal error
xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
0xc021a390
Mar 11 11:47:00 linux kernel: Pid: 5598, comm: mkdir Not tainted 2.6.24.3FC #9
Mar 11 11:47:00 linux kernel: [<c02129f8>] xfs_trans_cancel+0x5d/0xe6
Mar 11 11:47:00 linux kernel: [<c021a390>] xfs_mkdir+0x45a/0x493
Mar 11 11:47:00 linux kernel: [<c021a390>] xfs_mkdir+0x45a/0x493
Mar 11 11:47:00 linux kernel: [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
Mar 11 11:47:00 linux kernel: [<c02230f0>] xfs_vn_mknod+0x165/0x243
Mar 11 11:47:00 linux kernel: [<c0217f1e>] xfs_access+0x2f/0x35
Mar 11 11:47:00 linux kernel: [<c02231ed>] xfs_vn_mkdir+0x12/0x14
Mar 11 11:47:00 linux kernel: [<c016057b>] vfs_mkdir+0xa3/0xe2
Mar 11 11:47:00 linux kernel: [<c0160644>] sys_mkdirat+0x8a/0xc3
Mar 11 11:47:00 linux kernel: [<c016069c>] sys_mkdir+0x1f/0x23
Mar 11 11:47:00 linux kernel: [<c01025ee>] syscall_call+0x7/0xb
Mar 11 11:47:00 linux kernel: [<c03b0000>] svc_proc_register+0x3c/0x4b
Mar 11 11:47:00 linux kernel: =======================
Mar 11 11:47:00 linux kernel: xfs_force_shutdown(sdb1,0x8) called from
line 1164 of file fs/xfs/xfs_trans.c. Return address = 0xc0212a10
Mar 11 11:47:00 linux kernel: Filesystem "sdb1": Corruption of
in-memory data detected. Shutting down filesystem: sdb1
Mar 11 11:47:00 linux kernel: Please umount the filesystem, and
rectify the problem(s)
instrumented code:
int
xfs_dialloc(
xfs_trans_t *tp, /* transaction pointer */
xfs_ino_t parent, /* parent inode (directory) */
mode_t mode, /* mode bits for new inode */
int okalloc, /* ok to allocate more space */
xfs_buf_t **IO_agbp, /* in/out ag header's buffer */
boolean_t *alloc_done, /* true if we needed to replenish
inode freelist */
xfs_ino_t *inop) /* inode number allocated */
{
xfs_agnumber_t agcount; /* number of allocation groups */
xfs_buf_t *agbp; /* allocation group header's buffer */
xfs_agnumber_t agno; /* allocation group number */
xfs_agi_t *agi; /* allocation group header structure */
xfs_btree_cur_t *cur; /* inode allocation btree cursor */
int error; /* error return value */
int i; /* result code */
int ialloced; /* inode allocation status */
int noroom = 0; /* no space for inode blk allocation */
xfs_ino_t ino; /* fs-relative inode to be returned */
/* REFERENCED */
int j; /* result code */
xfs_mount_t *mp; /* file system mount structure */
int offset; /* index of inode in chunk */
xfs_agino_t pagino; /* parent's a.g. relative inode # */
xfs_agnumber_t pagno; /* parent's allocation group number */
xfs_inobt_rec_incore_t rec; /* inode allocation record */
xfs_agnumber_t tagno; /* testing allocation group number */
xfs_btree_cur_t *tcur; /* temp cursor */
xfs_inobt_rec_incore_t trec; /* temp inode allocation record */
printk("XFS: xfs_dialloc:0001\n");
if (*IO_agbp == NULL) {
printk("XFS: xfs_dialloc:0002\n");
/*
* We do not have an agbp, so select an initial allocation
* group for inode allocation.
*/
agbp = xfs_ialloc_ag_select(tp, parent, mode, okalloc);
printk("XFS: xfs_dialloc:0003\n");
/*
* Couldn't find an allocation group satisfying the
* criteria, give up.
*/
if (!agbp) {
printk("XFS: xfs_dialloc:0004\n");
*inop = NULLFSINO;
return 0;
}
printk("XFS: xfs_dialloc:0005\n");
agi = XFS_BUF_TO_AGI(agbp);
ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
printk("XFS: xfs_dialloc:0006\n");
} else {
printk("XFS: xfs_dialloc:0007\n");
/*
* Continue where we left off before. In this case, we
* know that the allocation group has free inodes.
*/
agbp = *IO_agbp;
agi = XFS_BUF_TO_AGI(agbp);
ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
printk("XFS: xfs_dialloc:0008\n");
ASSERT(be32_to_cpu(agi->agi_freecount) > 0);
printk("XFS: xfs_dialloc:0009\n");
}
printk("XFS: xfs_dialloc:0010\n");
mp = tp->t_mountp;
agcount = mp->m_sb.sb_agcount;
agno = be32_to_cpu(agi->agi_seqno);
tagno = agno;
pagno = XFS_INO_TO_AGNO(mp, parent);
pagino = XFS_INO_TO_AGINO(mp, parent);
/*
* If we have already hit the ceiling of inode blocks then clear
* okalloc so we scan all available agi structures for a free
* inode.
*/
if (mp->m_maxicount &&
mp->m_sb.sb_icount + XFS_IALLOC_INODES(mp) > mp->m_maxicount) {
printk("XFS: xfs_dialloc:0011\n");
noroom = 1;
okalloc = 0;
}
/*
* Loop until we find an allocation group that either has free inodes
* or in which we can allocate some inodes. Iterate through the
* allocation groups upward, wrapping at the end.
*/
printk("XFS: xfs_dialloc:0012\n");
*alloc_done = B_FALSE;
while (!agi->agi_freecount) {
printk("XFS: xfs_dialloc:0013\n");
/*
* Don't do anything if we're not supposed to allocate
* any blocks, just go on to the next ag.
*/
if (okalloc) {
printk("XFS: xfs_dialloc:0014\n");
/*
* Try to allocate some new inodes in the allocation
* group.
*/
if ((error = xfs_ialloc_ag_alloc(tp, agbp, &ialloced))) {
printk("XFS: xfs_dialloc:0015\n");
xfs_trans_brelse(tp, agbp);
if (error == ENOSPC) {
printk("XFS: xfs_dialloc:0016\n");
*inop = NULLFSINO;
return 0;
} else {
printk("XFS: xfs_dialloc:0017\n");
return error;
}
}
printk("XFS: xfs_dialloc:0018\n");
if (ialloced) {
/*
* We successfully allocated some inodes, return
* the current context to the caller so that it
* can commit the current transaction and call
* us again where we left off.
*/
printk("XFS: xfs_dialloc:0019\n");
ASSERT(be32_to_cpu(agi->agi_freecount) > 0);
*alloc_done = B_TRUE;
*IO_agbp = agbp;
*inop = NULLFSINO;
return 0;
}
}
printk("XFS: xfs_dialloc:0020\n");
/*
* If it failed, give up on this ag.
*/
xfs_trans_brelse(tp, agbp);
/*
* Go on to the next ag: get its ag header.
*/
nextag:
if (++tagno == agcount)
tagno = 0;
if (tagno == agno) {
printk("XFS: xfs_dialloc:0021\n");
*inop = NULLFSINO;
return noroom ? ENOSPC : 0;
}
down_read(&mp->m_peraglock);
if (mp->m_perag[tagno].pagi_inodeok == 0) {
up_read(&mp->m_peraglock);
printk("XFS: xfs_dialloc:0022\n");
goto nextag;
}
error = xfs_ialloc_read_agi(mp, tp, tagno, &agbp);
up_read(&mp->m_peraglock);
if (error) {
printk("XFS: xfs_dialloc:0023\n");
goto nextag;
}
agi = XFS_BUF_TO_AGI(agbp);
ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
}
/*
* Here with an allocation group that has a free inode.
* Reset agno since we may have chosen a new ag in the
* loop above.
*/
printk("XFS: xfs_dialloc:0024\n");
agno = tagno;
*IO_agbp = NULL;
cur = xfs_btree_init_cursor(mp, tp, agbp, be32_to_cpu(agi->agi_seqno),
XFS_BTNUM_INO, (xfs_inode_t *)0, 0);
...
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-11 11:19 ` Christian Røsnes
@ 2008-03-11 12:21 ` David Chinner
2008-03-11 12:39 ` Christian Røsnes
0 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-03-11 12:21 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
On Tue, Mar 11, 2008 at 12:19:29PM +0100, Christian Røsnes wrote:
> On Tue, Mar 11, 2008 at 10:34 AM, David Chinner <dgc@sgi.com> wrote:
> >
> > On Tue, Mar 11, 2008 at 09:08:31AM +0100, Christian Røsnes wrote:
>
> > > I'll try to add some printk statements to the codepaths you mentioned,
> > > and see where it leads.
> >
> > Definitely worth confirming this is where the error is coming from.
> >
>
> if (tagno == agno) {
> printk("XFS: xfs_dialloc:0021\n");
> *inop = NULLFSINO;
> return noroom ? ENOSPC : 0;
> }
>
> seems to be what triggers this inside xfs_dialloc.
>
> Here a trace which give some indication to the codepath taken inside
> xfs_dialloc (xfs_ialloc.c):
Yup, that's trying to allocate in each AG and failing. Almost certainly
the problem is the described alignment issue.
FYI, I'm travelling tomorrow so I won't really get a chance to look
at this more until thursday....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-11 12:21 ` David Chinner
@ 2008-03-11 12:39 ` Christian Røsnes
[not found] ` <20080312232425.GR155407@sgi.com>
0 siblings, 1 reply; 20+ messages in thread
From: Christian Røsnes @ 2008-03-11 12:39 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
On Tue, Mar 11, 2008 at 1:21 PM, David Chinner <dgc@sgi.com> wrote:
> On Tue, Mar 11, 2008 at 12:19:29PM +0100, Christian Røsnes wrote:
> > On Tue, Mar 11, 2008 at 10:34 AM, David Chinner <dgc@sgi.com> wrote:
> > >
> > > On Tue, Mar 11, 2008 at 09:08:31AM +0100, Christian Røsnes wrote:
> >
> > > > I'll try to add some printk statements to the codepaths you mentioned,
> > > > and see where it leads.
> > >
> > > Definitely worth confirming this is where the error is coming from.
> > >
> >
> > if (tagno == agno) {
> > printk("XFS: xfs_dialloc:0021\n");
> > *inop = NULLFSINO;
> > return noroom ? ENOSPC : 0;
> > }
> >
> > seems to be what triggers this inside xfs_dialloc.
> >
> > Here a trace which give some indication to the codepath taken inside
> > xfs_dialloc (xfs_ialloc.c):
>
> Yup, that's trying to allocate in each AG and failing. Almost certainly
> the problem is the described alignment issue.
>
> FYI, I'm travelling tomorrow so I won't really get a chance to look
> at this more until thursday....
>
Ok. Thanks again for all your help so far in tracking this down.
Here's the codepath taken within xfs_ialloc_ag_alloc (xfs_ialloc.c):
mount:
/dev/sdb1 on /data type xfs (rw,noatime,logbufs=8,nobarrier)
# mkdir /data/test
mkdir: cannot create directory `/data/test': No space left on device
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0001
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0003
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0004
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0007
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0008
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0011
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0012
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0014
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0015
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0016
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0017
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0020
Mar 11 13:27:44 linux kernel: XFS: xfs_ialloc_ag_alloc:0021
Mar 11 13:27:44 linux kernel: Filesystem "sdb1": XFS internal error
xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
0xc021a1f8
Mar 11 13:27:44 linux kernel: Pid: 5593, comm: mkdir Not tainted 2.6.24.3FC #10
Mar 11 13:27:44 linux kernel: [<c0212860>] xfs_trans_cancel+0x5d/0xe6
Mar 11 13:27:44 linux kernel: [<c021a1f8>] xfs_mkdir+0x45a/0x493
Mar 11 13:27:44 linux kernel: [<c021a1f8>] xfs_mkdir+0x45a/0x493
Mar 11 13:27:44 linux kernel: [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
Mar 11 13:27:44 linux kernel: [<c0222f58>] xfs_vn_mknod+0x165/0x243
Mar 11 13:27:44 linux kernel: [<c0217d86>] xfs_access+0x2f/0x35
Mar 11 13:27:44 linux kernel: [<c0223055>] xfs_vn_mkdir+0x12/0x14
Mar 11 13:27:44 linux kernel: [<c016057b>] vfs_mkdir+0xa3/0xe2
Mar 11 13:27:44 linux kernel: [<c0160644>] sys_mkdirat+0x8a/0xc3
Mar 11 13:27:44 linux kernel: [<c016069c>] sys_mkdir+0x1f/0x23
Mar 11 13:27:44 linux kernel: [<c01025ee>] syscall_call+0x7/0xb
Mar 11 13:27:44 linux kernel: [<c03b0000>] proc_dodebug+0xc6/0x1e2
Mar 11 13:27:44 linux kernel: =======================
Mar 11 13:27:44 linux kernel: xfs_force_shutdown(sdb1,0x8) called from
line 1164 of file fs/xfs/xfs_trans.c. Return address = 0xc0212878
Mar 11 13:27:44 linux kernel: Filesystem "sdb1": Corruption of
in-memory data detected. Shutting down filesystem: sdb1
Mar 11 13:27:44 linux kernel: Please umount the filesystem, and
rectify the problem(s)
/*
* Allocate new inodes in the allocation group specified by agbp.
* Return 0 for success, else error code.
*/
STATIC int /* error code or 0 */
xfs_ialloc_ag_alloc(
xfs_trans_t *tp, /* transaction pointer */
xfs_buf_t *agbp, /* alloc group buffer */
int *alloc)
{
xfs_agi_t *agi; /* allocation group header */
xfs_alloc_arg_t args; /* allocation argument structure */
int blks_per_cluster; /* fs blocks per inode cluster */
xfs_btree_cur_t *cur; /* inode btree cursor */
xfs_daddr_t d; /* disk addr of buffer */
xfs_agnumber_t agno;
int error;
xfs_buf_t *fbuf; /* new free inodes' buffer */
xfs_dinode_t *free; /* new free inode structure */
int i; /* inode counter */
int j; /* block counter */
int nbufs; /* num bufs of new inodes */
xfs_agino_t newino; /* new first inode's number */
xfs_agino_t newlen; /* new number of inodes */
int ninodes; /* num inodes per buf */
xfs_agino_t thisino; /* current inode number, for loop */
int version; /* inode version number to use */
int isaligned = 0; /* inode allocation at stripe unit */
/* boundary */
args.tp = tp;
args.mp = tp->t_mountp;
printk("XFS: xfs_ialloc_ag_alloc:0001\n");
/*
* Locking will ensure that we don't have two callers in here
* at one time.
*/
newlen = XFS_IALLOC_INODES(args.mp);
if (args.mp->m_maxicount &&
args.mp->m_sb.sb_icount + newlen > args.mp->m_maxicount) {
printk("XFS: xfs_ialloc_ag_alloc:0002\n");
return XFS_ERROR(ENOSPC);
}
printk("XFS: xfs_ialloc_ag_alloc:0003\n");
args.minlen = args.maxlen = XFS_IALLOC_BLOCKS(args.mp);
/*
* First try to allocate inodes contiguous with the last-allocated
* chunk of inodes. If the filesystem is striped, this will fill
* an entire stripe unit with inodes.
*/
agi = XFS_BUF_TO_AGI(agbp);
newino = be32_to_cpu(agi->agi_newino);
args.agbno = XFS_AGINO_TO_AGBNO(args.mp, newino) +
XFS_IALLOC_BLOCKS(args.mp);
if (likely(newino != NULLAGINO &&
(args.agbno < be32_to_cpu(agi->agi_length)))) {
printk("XFS: xfs_ialloc_ag_alloc:0004\n");
args.fsbno = XFS_AGB_TO_FSB(args.mp,
be32_to_cpu(agi->agi_seqno), args.agbno);
args.type = XFS_ALLOCTYPE_THIS_BNO;
args.mod = args.total = args.wasdel = args.isfl =
args.userdata = args.minalignslop = 0;
args.prod = 1;
args.alignment = 1;
/*
* Allow space for the inode btree to split.
*/
args.minleft = XFS_IN_MAXLEVELS(args.mp) - 1;
if ((error = xfs_alloc_vextent(&args))) {
printk("XFS: xfs_ialloc_ag_alloc:0005\n");
return error;
}
} else {
printk("XFS: xfs_ialloc_ag_alloc:0006\n");
args.fsbno = NULLFSBLOCK;
}
if (unlikely(args.fsbno == NULLFSBLOCK)) {
printk("XFS: xfs_ialloc_ag_alloc:0007\n");
/*
* Set the alignment for the allocation.
* If stripe alignment is turned on then align at stripe unit
* boundary.
* If the cluster size is smaller than a filesystem block
* then we're doing I/O for inodes in filesystem block size
* pieces, so don't need alignment anyway.
*/
isaligned = 0;
if (args.mp->m_sinoalign) {
printk("XFS: xfs_ialloc_ag_alloc:0008\n");
ASSERT(!(args.mp->m_flags & XFS_MOUNT_NOALIGN));
args.alignment = args.mp->m_dalign;
isaligned = 1;
} else if (XFS_SB_VERSION_HASALIGN(&args.mp->m_sb) &&
args.mp->m_sb.sb_inoalignmt >=
XFS_B_TO_FSBT(args.mp,
XFS_INODE_CLUSTER_SIZE(args.mp))) {
printk("XFS: xfs_ialloc_ag_alloc:0009\n");
args.alignment = args.mp->m_sb.sb_inoalignmt;
}
else {
printk("XFS: xfs_ialloc_ag_alloc:0010\n");
args.alignment = 1;
}
/*
* Need to figure out where to allocate the inode blocks.
* Ideally they should be spaced out through the a.g.
* For now, just allocate blocks up front.
*/
printk("XFS: xfs_ialloc_ag_alloc:0011\n");
args.agbno = be32_to_cpu(agi->agi_root);
args.fsbno = XFS_AGB_TO_FSB(args.mp,
be32_to_cpu(agi->agi_seqno), args.agbno);
/*
* Allocate a fixed-size extent of inodes.
*/
args.type = XFS_ALLOCTYPE_NEAR_BNO;
args.mod = args.total = args.wasdel = args.isfl =
args.userdata = args.minalignslop = 0;
args.prod = 1;
/*
* Allow space for the inode btree to split.
*/
args.minleft = XFS_IN_MAXLEVELS(args.mp) - 1;
printk("XFS: xfs_ialloc_ag_alloc:0012\n");
if ((error = xfs_alloc_vextent(&args))) {
printk("XFS: xfs_ialloc_ag_alloc:0013\n");
return error;
}
printk("XFS: xfs_ialloc_ag_alloc:0014\n");
}
printk("XFS: xfs_ialloc_ag_alloc:0015\n");
/*
* If stripe alignment is turned on, then try again with cluster
* alignment.
*/
if (isaligned && args.fsbno == NULLFSBLOCK) {
printk("XFS: xfs_ialloc_ag_alloc:0016\n");
args.type = XFS_ALLOCTYPE_NEAR_BNO;
args.agbno = be32_to_cpu(agi->agi_root);
args.fsbno = XFS_AGB_TO_FSB(args.mp,
be32_to_cpu(agi->agi_seqno), args.agbno);
if (XFS_SB_VERSION_HASALIGN(&args.mp->m_sb) &&
args.mp->m_sb.sb_inoalignmt >=
XFS_B_TO_FSBT(args.mp, XFS_INODE_CLUSTER_SIZE(args.mp))) {
printk("XFS: xfs_ialloc_ag_alloc:0017\n");
args.alignment = args.mp->m_sb.sb_inoalignmt;
}
else {
printk("XFS: xfs_ialloc_ag_alloc:0018\n");
args.alignment = 1;
}
if ((error = xfs_alloc_vextent(&args))) {
printk("XFS: xfs_ialloc_ag_alloc:0019\n");
return error;
}
}
printk("XFS: xfs_ialloc_ag_alloc:0020\n");
if (args.fsbno == NULLFSBLOCK) {
printk("XFS: xfs_ialloc_ag_alloc:0021\n");
*alloc = 0;
return 0;
}
printk("XFS: xfs_ialloc_ag_alloc:0022\n");
ASSERT(args.len == args.minlen);
/*
* Convert the results.
*/
newino = XFS_OFFBNO_TO_AGINO(args.mp, args.agbno, 0);
/*
* Loop over the new block(s), filling in the inodes.
* For small block sizes, manipulate the inodes in buffers
* which are multiples of the blocks size.
*/
if (args.mp->m_sb.sb_blocksize >= XFS_INODE_CLUSTER_SIZE(args.mp)) {
printk("XFS: xfs_ialloc_ag_alloc:0023\n");
blks_per_cluster = 1;
nbufs = (int)args.len;
ninodes = args.mp->m_sb.sb_inopblock;
} else {
printk("XFS: xfs_ialloc_ag_alloc:0024\n");
blks_per_cluster = XFS_INODE_CLUSTER_SIZE(args.mp) /
args.mp->m_sb.sb_blocksize;
nbufs = (int)args.len / blks_per_cluster;
ninodes = blks_per_cluster * args.mp->m_sb.sb_inopblock;
}
printk("XFS: xfs_ialloc_ag_alloc:0025\n");
/*
* Figure out what version number to use in the inodes we create.
* If the superblock version has caught up to the one that supports
* the new inode format, then use the new inode version. Otherwise
* use the old version so that old kernels will continue to be
* able to use the file system.
*/
if (XFS_SB_VERSION_HASNLINK(&args.mp->m_sb)) {
printk("XFS: xfs_ialloc_ag_alloc:0026\n");
version = XFS_DINODE_VERSION_2;
}
else {
printk("XFS: xfs_ialloc_ag_alloc:0027\n");
version = XFS_DINODE_VERSION_1;
}
for (j = 0; j < nbufs; j++) {
printk("XFS: xfs_ialloc_ag_alloc:0028\n");
/*
* Get the block.
*/
d = XFS_AGB_TO_DADDR(args.mp, be32_to_cpu(agi->agi_seqno),
args.agbno + (j * blks_per_cluster));
fbuf = xfs_trans_get_buf(tp, args.mp->m_ddev_targp, d,
args.mp->m_bsize * blks_per_cluster,
XFS_BUF_LOCK);
ASSERT(fbuf);
ASSERT(!XFS_BUF_GETERROR(fbuf));
/*
* Set initial values for the inodes in this buffer.
*/
xfs_biozero(fbuf, 0, ninodes << args.mp->m_sb.sb_inodelog);
for (i = 0; i < ninodes; i++) {
printk("XFS: xfs_ialloc_ag_alloc:0029\n");
free = XFS_MAKE_IPTR(args.mp, fbuf, i);
free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
free->di_core.di_version = version;
free->di_next_unlinked = cpu_to_be32(NULLAGINO);
xfs_ialloc_log_di(tp, fbuf, i,
XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
}
xfs_trans_inode_alloc_buf(tp, fbuf);
printk("XFS: xfs_ialloc_ag_alloc:0030\n");
}
printk("XFS: xfs_ialloc_ag_alloc:0031\n");
be32_add(&agi->agi_count, newlen);
be32_add(&agi->agi_freecount, newlen);
agno = be32_to_cpu(agi->agi_seqno);
down_read(&args.mp->m_peraglock);
args.mp->m_perag[agno].pagi_freecount += newlen;
up_read(&args.mp->m_peraglock);
agi->agi_newino = cpu_to_be32(newino);
/*
* Insert records describing the new inode chunk into the btree.
*/
cur = xfs_btree_init_cursor(args.mp, tp, agbp, agno,
XFS_BTNUM_INO, (xfs_inode_t *)0, 0);
for (thisino = newino;
thisino < newino + newlen;
thisino += XFS_INODES_PER_CHUNK) {
printk("XFS: xfs_ialloc_ag_alloc:0032\n");
if ((error = xfs_inobt_lookup_eq(cur, thisino,
XFS_INODES_PER_CHUNK, XFS_INOBT_ALL_FREE, &i))) {
printk("XFS: xfs_ialloc_ag_alloc:0033\n");
xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
return error;
}
printk("XFS: xfs_ialloc_ag_alloc:0034\n");
ASSERT(i == 0);
if ((error = xfs_inobt_insert(cur, &i))) {
printk("XFS: xfs_ialloc_ag_alloc:0035\n");
xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
return error;
}
ASSERT(i == 1);
printk("XFS: xfs_ialloc_ag_alloc:0036\n");
}
printk("XFS: xfs_ialloc_ag_alloc:0037\n");
xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
/*
* Log allocation group header fields
*/
xfs_ialloc_log_agi(tp, agbp,
XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO);
/*
* Modify/log superblock values for inode count and inode free count.
*/
xfs_trans_mod_sb(tp, XFS_TRANS_SB_ICOUNT, (long)newlen);
xfs_trans_mod_sb(tp, XFS_TRANS_SB_IFREE, (long)newlen);
*alloc = 1;
printk("XFS: xfs_ialloc_ag_alloc:0038\n");
return 0;
}
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
[not found] ` <1a4a774c0803130446x609b9cb2mf3da323183c35606@mail.gmail.com>
@ 2008-03-13 14:53 ` David Chinner
2008-03-14 9:02 ` Christian Røsnes
0 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-03-13 14:53 UTC (permalink / raw)
To: Christian Røsnes; +Cc: xfs
ok..... loads the metadump...
Looking at the AGF status before the mkdir:
dgc@budgie:/mnt/test$ for i in `seq 0 1 15`; do echo AG $i ; sudo xfs_db -r -c "agf $i" -c 'p flcount longest' -f /mnt/scratch/shutdown; done
AG 0
flcount = 6
longest = 8
AG 1
flcount = 6
longest = 8
AG 2
flcount = 6
longest = 7
AG 3
flcount = 6
longest = 7
AG 4
flcount = 6
longest = 7
AG 5
flcount = 7
longest = 8
....
AG 5 immediately caught my eye:
seqno = 5
length = 4476752
bnoroot = 7
cntroot = 46124
bnolevel = 2
cntlevel = 2
flfirst = 56
fllast = 62
flcount = 7
freeblks = 68797
longest = 8
btreeblks = 0
magicnum = 0x58414746
versionnum = 1
Mainly because at level 2 btrees this:
blocks = XFS_MIN_FREELIST_PAG(pag,mp);
gives blocks = 6 and the freelist count says 7 blocks.
hence if the alignment check fails in some way, it will
try to reduce the free list down to 6 blocks. Unsurprisingly,
then, this breakpoint (what function does every "log object"
operation call?) eventually tripped:
Stack traceback for pid 2936
0xe000003817440000 2936 2902 1 1 R 0xe0000038174403a0 *mkdir
0xa0000001003c3cc0 xfs_trans_find_item
0xa0000001003c0d10 xfs_trans_log_buf+0x2f0
0xa0000001002f81e0 xfs_alloc_log_agf+0x80
0xa0000001002fa3d0 xfs_alloc_get_freelist+0x3d0
0xa0000001002ffe90 xfs_alloc_fix_freelist+0x770
0xa000000100300a00 xfs_alloc_vextent+0x440
0xa000000100374d70 xfs_ialloc_ag_alloc+0x2d0
0xa000000100375dd0 xfs_dialloc+0x2d0
.......
Which is the first place we dirty a log item. It's the
AGF of block 5:
[1]kdb> xbuf 0xe0000038143e2e00
buf 0xe0000038143e2e00 agf 0xe000003817550200
magicnum 0x58414746 versionnum 0x1 seqno 0x5 length 0x444f50
roots b 0x7 c 0xb42c levels b 2 c 2
flfirst 57 fllast 62 flcount 6 freeblks 68797 longest 8
And you'll note that flcount = 6 and flfirst = 57 now. In memory:
[1]kdb> xperag 0xe000003802979510
.....
ag 5 f_init 1 i_init 1
f_levels[b,c] 2,2 f_flcount 6 f_freeblks 68797 f_longest 8
f__metadata 0
i_freecount 0 i_inodeok 1
.....
f_flcount = 6 as well. So, we've really modified the AGF here,
and find out why the alignement checks failed.
[1]kdb> xalloc 0xe00000381744fc48
tp 0xe000003817450000 mp 0xe000003802979510 agbp 0x0000000000020024 pag 0xe000003802972378 fsbno 42563856[5:97910]
agno 0x5 agbno 0xffffffff minlen 0x8 maxlen 0x8 mod 0x0
prod 0x1 minleft 0x0 total 0x0 alignment 0x1
minalignslop 0x0 len 0x0 type this_bno otype this_bno wasdel 0
wasfromfl 0 isfl 0 userdata 0
Oh - alignment = 1. How did that happen? And why did it fail? I
note: "this_bno" means it wants an exact allocation (fsbno
42563856[5:97910]). Ah, that means we are in the first attmpt to
allocate a block in an AG. i.e here:
153 /*
154 * First try to allocate inodes contiguous with the last-allocated
155 * chunk of inodes. If the filesystem is striped, this will fill
156 * an entire stripe unit with inodes.
157 */
158 agi = XFS_BUF_TO_AGI(agbp);
159 newino = be32_to_cpu(agi->agi_newino);
160 args.agbno = XFS_AGINO_TO_AGBNO(args.mp, newino) +
161 XFS_IALLOC_BLOCKS(args.mp);
162 if (likely(newino != NULLAGINO &&
163 (args.agbno < be32_to_cpu(agi->agi_length)))) {
164 args.fsbno = XFS_AGB_TO_FSB(args.mp,
165 be32_to_cpu(agi->agi_seqno), args.agbno);
166 args.type = XFS_ALLOCTYPE_THIS_BNO;
167 args.mod = args.total = args.wasdel = args.isfl =
168 args.userdata = args.minalignslop = 0;
169 >>>>>>>> args.prod = 1;
170 >>>>>>>> args.alignment = 1;
171 /*
172 * Allow space for the inode btree to split.
173 */
174 args.minleft = XFS_IN_MAXLEVELS(args.mp) - 1;
175 >>>>>>>> if ((error = xfs_alloc_vextent(&args)))
176 return error;
This now makes sense - at first we attempt an unaligned, exact block
allocation. This gets us to modifying the free list because we have
a free 8 block extent as required. However, the exact extent being
asked for is not free, so the btree lookup fails and we abort the
allocation attempt.
We then fall back to method 2 - try stripe alignment - which now
fails the longest free block checks because alignment is accounted
for and we need ~24 blocks to make sure of this.
We fall back to method 3 - cluster alignment - which also fails
because we need a extent of 9 blocks, but we only have extents of
8 blocks available.
We never try again without alignment....
Now we fail allocation in that AG having dirtied the AGF, the AGFL,
and a block out of both the by-size and by-count free space btrees.
Hence when we fail to allocate in all other AGs, we return ENOSPC
and the transaction get cancelled. Because it has dirty items
in it, we get shut down.
But no wonder it was so hard to reproduce....
The patch below fixes the shutdown for me. Can you give it a go?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_ialloc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-13 13:07:24.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-14 01:40:21.926153338 +1100
@@ -167,7 +167,12 @@ xfs_ialloc_ag_alloc(
args.mod = args.total = args.wasdel = args.isfl =
args.userdata = args.minalignslop = 0;
args.prod = 1;
- args.alignment = 1;
+ if (xfs_sb_version_hasalign(&args.mp->m_sb) &&
+ args.mp->m_sb.sb_inoalignmt >=
+ XFS_B_TO_FSBT(args.mp, XFS_INODE_CLUSTER_SIZE(args.mp)))
+ args.alignment = args.mp->m_sb.sb_inoalignmt;
+ else
+ args.alignment = 1;
/*
* Allow space for the inode btree to split.
*/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
2008-03-13 14:53 ` David Chinner
@ 2008-03-14 9:02 ` Christian Røsnes
0 siblings, 0 replies; 20+ messages in thread
From: Christian Røsnes @ 2008-03-14 9:02 UTC (permalink / raw)
To: xfs
On Thu, Mar 13, 2008 at 3:53 PM, David Chinner <dgc@sgi.com> wrote:
> The patch below fixes the shutdown for me. Can you give it a go?
>
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
>
> ---
> fs/xfs/xfs_ialloc.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-13 13:07:24.000000000 +1100
> +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-14 01:40:21.926153338 +1100
> @@ -167,7 +167,12 @@ xfs_ialloc_ag_alloc(
>
> args.mod = args.total = args.wasdel = args.isfl =
> args.userdata = args.minalignslop = 0;
> args.prod = 1;
> - args.alignment = 1;
> + if (xfs_sb_version_hasalign(&args.mp->m_sb) &&
>
> + args.mp->m_sb.sb_inoalignmt >=
> + XFS_B_TO_FSBT(args.mp, XFS_INODE_CLUSTER_SIZE(args.mp)))
> + args.alignment = args.mp->m_sb.sb_inoalignmt;
> + else
> + args.alignment = 1;
>
>
> /*
> * Allow space for the inode btree to split.
> */
>
Yes. This patch fixes the shutdown problem for me too.
Once again, thanks.
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-03-14 9:02 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13 10:51 XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c Christian Røsnes
2008-02-13 11:04 ` Justin Piszcz
2008-02-13 11:44 ` Christian Røsnes
2008-02-13 21:45 ` David Chinner
2008-02-14 8:41 ` Christian Røsnes
2008-03-05 13:53 ` Christian Røsnes
2008-03-06 11:10 ` Christian Røsnes
2008-03-07 11:19 ` Christian Røsnes
2008-03-10 0:08 ` David Chinner
2008-03-10 8:34 ` Christian Røsnes
2008-03-10 10:02 ` Christian Røsnes
2008-03-10 22:21 ` David Chinner
2008-03-11 8:08 ` Christian Røsnes
2008-03-11 9:34 ` David Chinner
2008-03-11 11:19 ` Christian Røsnes
2008-03-11 12:21 ` David Chinner
2008-03-11 12:39 ` Christian Røsnes
[not found] ` <20080312232425.GR155407@sgi.com>
[not found] ` <1a4a774c0803130114l3927051byd54cd96cdb0efbe7@mail.gmail.com>
[not found] ` <20080313090830.GD95344431@sgi.com>
[not found] ` <1a4a774c0803130214x406a4eb9wfb8738d1f503663f@mail.gmail.com>
[not found] ` <20080313092139.GF95344431@sgi.com>
[not found] ` <1a4a774c0803130227l2fdf4861v21183b9bd3e7ce8d@mail.gmail.com>
[not found] ` <20080313113634.GH95344431@sgi.com>
[not found] ` <1a4a774c0803130446x609b9cb2mf3da323183c35606@mail.gmail.com>
2008-03-13 14:53 ` David Chinner
2008-03-14 9:02 ` Christian Røsnes
2008-03-09 22:59 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox