* Corruption of in-memory data detected - on heavy hard linking
@ 2008-07-23 17:40 Christian Affolter
2008-07-25 5:20 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2008-07-23 17:40 UTC (permalink / raw)
To: xfs
Dear XFS users
While using rsnapshot [1] on an XFS filesystem I encountered the
following error, which I can reproduce using the bash command below:
i=0; while (( $i <20 )); do
rm -rf link-dir;
cp -al orig-dir link-dir
echo "Round $i over"; let i++;
done
The problem mostly happens between the 6th and 10th run.
"orig-dir" contains around 12 GB of data and around 40'000 files (the
filesystem is 50 GB whereas 27 GB are free, mount options:
rw,noatime,usrquota).
The problem doesn't occur with a reiserfs filesystem (on the same device).
Kernel-Error:
Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xffffffff803a4fcf
Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
Call Trace:
[<ffffffff803a4fcf>] xfs_link+0x26f/0x390
[<ffffffff8039c656>] xfs_trans_cancel+0x126/0x150
[<ffffffff803a4fcf>] xfs_link+0x26f/0x390
[<ffffffff8039cf4b>] xfs_trans_unlocked_item+0x3b/0x60
[<ffffffff803aff8c>] xfs_vn_link+0x3c/0xb0
[<ffffffff802952b3>] vfs_link+0x123/0x180
[<ffffffff802984b1>] sys_linkat+0x151/0x180
[<ffffffff802901b7>] cp_new_stat+0xe7/0x100
[<ffffffff80290276>] sys_newlstat+0x36/0x50
[<ffffffff8020bbce>] system_call+0x7e/0x83
xfs_force_shutdown(sdc1,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xffffffff8039c66f
Filesystem "sdc1": Corruption of in-memory data detected. Shutting down
filesystem: sdc1
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(sdc1,0x1) called from line 420 of file
fs/xfs/xfs_rw.c. Return address = 0xffffffff803a7fe9
xfs_force_shutdown(sdc1,0x1) called from line 420 of file
fs/xfs/xfs_rw.c. Return address = 0xffffffff803a7fe9
After remounting, everything seems to be fine, even xfs_repair (xfsprogs
2.8.11) doesn't find any problems on the filesystem.
The above error happens on a 2.6.24-gentoo-r8 SMP 64bit kernel with 4 GB
of memory (~3G free), Intel 5000V chipset (ASUS DSBV-D), on top of a
Areca ARC-1160 (V1.42) RAID controller.
According to memcheck the memory seems to be OK, anyway I replaced the
DIMMs with no success.
Google leads to a lot of results, for the above error message, however I
only found one forum entry [2] which seems to look similar.
Any help would be highly appreciated
Many thanks!
Chris
[1]http://www.rsnapshot.org/
[2]http://ubuntuforums.org/showthread.php?t=741425
--
stepping stone GmbH
Pappelweg 41
CH-3013 Bern
Telefon: +41 31 332 53 63
www.stepping-stone.ch
christian.affolter@stepping-stone.ch
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Corruption of in-memory data detected - on heavy hard linking
2008-07-23 17:40 Corruption of in-memory data detected - on heavy hard linking Christian Affolter
@ 2008-07-25 5:20 ` Christoph Hellwig
2008-08-04 16:47 ` Christian Affolter
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2008-07-25 5:20 UTC (permalink / raw)
To: Christian Affolter; +Cc: xfs
On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
> Kernel-Error:
> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xffffffff803a4fcf
> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
2.6.24 is pretty old. Did you try with a recent kernel? We had some
fixes for in-core memory corruption although I don't remember one in
this area.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Corruption of in-memory data detected - on heavy hard linking
2008-07-25 5:20 ` Christoph Hellwig
@ 2008-08-04 16:47 ` Christian Affolter
2008-08-05 0:19 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2008-08-04 16:47 UTC (permalink / raw)
To: xfs
Hi
> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
>> Kernel-Error:
>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of
>> file fs/xfs/xfs_trans.c. Caller 0xffffffff803a4fcf
>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
>
> 2.6.24 is pretty old. Did you try with a recent kernel? We had some
> fixes for in-core memory corruption although I don't remember one in
> this area.
I finally found the time to update the kernel to a recent 2.6.26 version.
Unfortunately the problem still exists:
Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xffffffff803a6672
Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1
Call Trace:
[<ffffffff803a6672>] xfs_create+0x1c2/0x4c0
[<ffffffff8039fd16>] xfs_trans_cancel+0x126/0x150
[<ffffffff803a6672>] xfs_create+0x1c2/0x4c0
[<ffffffff803b186d>] xfs_vn_mknod+0x16d/0x2c0
[<ffffffff80291b7c>] vfs_create+0xcc/0x130
[<ffffffff8029539f>] do_filp_open+0x77f/0x860
[<ffffffff80286d1a>] do_sys_open+0x5a/0xf0
[<ffffffff8020b49b>] system_call_after_swapgs+0x7b/0x80
xfs_force_shutdown(dm-3,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xffffffff8039fd2f
Filesystem "dm-3": Corruption of in-memory data detected. Shutting down
filesystem: dm-3
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
xfs_force_shutdown(dm-3,0x1) called from line 420 of file
fs/xfs/xfs_rw.c. Return address = 0xffffffff803a9529
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
xfs_force_shutdown(dm-3,0x1) called from line 420 of file
fs/xfs/xfs_rw.c. Return address = 0xffffffff803a9529
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Before the shutdown happens the copy command receives a
"No space left on device" error:
cp: cannot create regular file `[file name snipped': No space left on device
cp: cannot create regular file `[file name snipped]': Input/output error
Although the device has more than 50% free space as well as free inodes.
The affected device was initialized with old xfsprogs (2.8.11):
meta-data=/dev/evms/vol1 isize=256 agcount=3207, agsize=4096 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=13132799, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=1024, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=65536 blocks=0, rtextents=0
Creating a new device with xfsprogs (2.9.7) leads to the following layout:
meta-data=/dev/sdc1 isize=256 agcount=5, agsize=3662818 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=17750000, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=7153, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
On the newly created device, the problem is much harder to reproduce,
however it happens nonetheless after around a day of heavy copying and
deleting.
Any further hints?
Many thanks
Chris
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Corruption of in-memory data detected - on heavy hard linking
2008-08-04 16:47 ` Christian Affolter
@ 2008-08-05 0:19 ` Dave Chinner
[not found] ` <48A02FF6.70703@stepping-stone.ch>
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2008-08-05 0:19 UTC (permalink / raw)
To: Christian Affolter; +Cc: xfs
On Mon, Aug 04, 2008 at 06:47:46PM +0200, Christian Affolter wrote:
> Hi
>
>> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
>>> Kernel-Error:
>>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163
>>> of file fs/xfs/xfs_trans.c. Caller 0xffffffff803a4fcf
>>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
>>
>> 2.6.24 is pretty old. Did you try with a recent kernel? We had some
>> fixes for in-core memory corruption although I don't remember one in
>> this area.
>
> I finally found the time to update the kernel to a recent 2.6.26 version.
>
> Unfortunately the problem still exists:
> Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xffffffff803a6672
> Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1
Ok, what we need is the following. First, try to reproduce the
problem on a small filesystem (say a few GB). Once you've reproduced
the problem, unmount and remount the filesystem to get the log
replayed, then take a xfs_metadump image of the filesystem. Put the
metadump image somewhere that can be downloaded (ftp/web site) and
let us know where it is.
If this is anything like the previous problem I found and fixed,
then it will be a corner-case bug that is only triggered by a
specific layout of free space and we need the filesystem image
to be able to work out exactly what corner case is broken....
> Before the shutdown happens the copy command receives a
> "No space left on device" error:
> cp: cannot create regular file `[file name snipped': No space left on device
> cp: cannot create regular file `[file name snipped]': Input/output error
>
> Although the device has more than 50% free space as well as free inodes.
It will be an AG that is out of space, not the entire filesystem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-08-11 23:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-23 17:40 Corruption of in-memory data detected - on heavy hard linking Christian Affolter
2008-07-25 5:20 ` Christoph Hellwig
2008-08-04 16:47 ` Christian Affolter
2008-08-05 0:19 ` Dave Chinner
[not found] ` <48A02FF6.70703@stepping-stone.ch>
2008-08-11 23:52 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox