public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Corruption of in-memory data detected - on heavy  hard linking
@ 2008-07-23 17:40 Christian Affolter
  2008-07-25  5:20 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2008-07-23 17:40 UTC (permalink / raw)
  To: xfs

Dear XFS users

While using rsnapshot [1] on an XFS filesystem I encountered the 
following error, which I can reproduce using the bash command below:

i=0; while (( $i <20 )); do
     rm -rf link-dir;
     cp -al orig-dir link-dir

     echo "Round $i over"; let i++;
done

The problem mostly happens between the 6th and 10th run.

"orig-dir" contains around 12 GB of data and around 40'000 files (the 
filesystem is 50 GB whereas 27 GB are free, mount options: 
rw,noatime,usrquota).

The problem doesn't occur with a reiserfs filesystem (on the same device).

Kernel-Error:
Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of 
file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a4fcf
Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1

Call Trace:
  [<ffffffff803a4fcf>] xfs_link+0x26f/0x390
  [<ffffffff8039c656>] xfs_trans_cancel+0x126/0x150
  [<ffffffff803a4fcf>] xfs_link+0x26f/0x390
  [<ffffffff8039cf4b>] xfs_trans_unlocked_item+0x3b/0x60
  [<ffffffff803aff8c>] xfs_vn_link+0x3c/0xb0
  [<ffffffff802952b3>] vfs_link+0x123/0x180
  [<ffffffff802984b1>] sys_linkat+0x151/0x180
  [<ffffffff802901b7>] cp_new_stat+0xe7/0x100
  [<ffffffff80290276>] sys_newlstat+0x36/0x50
  [<ffffffff8020bbce>] system_call+0x7e/0x83

xfs_force_shutdown(sdc1,0x8) called from line 1164 of file 
fs/xfs/xfs_trans.c.  Return address = 0xffffffff8039c66f
Filesystem "sdc1": Corruption of in-memory data detected.  Shutting down 
filesystem: sdc1
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(sdc1,0x1) called from line 420 of file 
fs/xfs/xfs_rw.c.  Return address = 0xffffffff803a7fe9
xfs_force_shutdown(sdc1,0x1) called from line 420 of file 
fs/xfs/xfs_rw.c.  Return address = 0xffffffff803a7fe9

After remounting, everything seems to be fine, even xfs_repair (xfsprogs 
2.8.11) doesn't find any problems on the filesystem.

The above error happens on a 2.6.24-gentoo-r8 SMP 64bit kernel with 4 GB 
of memory (~3G free), Intel 5000V chipset (ASUS DSBV-D), on top of a 
Areca ARC-1160 (V1.42) RAID controller.

According to memcheck the memory seems to be OK, anyway I replaced the 
DIMMs with no success.

Google leads to a lot of results, for the above error message, however I 
only found one forum entry [2] which seems to look similar.


Any help would be highly appreciated

Many thanks!
Chris


[1]http://www.rsnapshot.org/
[2]http://ubuntuforums.org/showthread.php?t=741425

-- 
stepping stone GmbH
Pappelweg 41
CH-3013 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
christian.affolter@stepping-stone.ch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corruption of in-memory data detected - on heavy  hard linking
  2008-07-23 17:40 Corruption of in-memory data detected - on heavy hard linking Christian Affolter
@ 2008-07-25  5:20 ` Christoph Hellwig
  2008-08-04 16:47   ` Christian Affolter
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2008-07-25  5:20 UTC (permalink / raw)
  To: Christian Affolter; +Cc: xfs

On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
> Kernel-Error:
> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of  
> file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a4fcf
> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1

2.6.24 is pretty old.  Did you try with a recent kernel?  We had some
fixes for in-core memory corruption although I don't remember one in
this area.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corruption of in-memory data detected - on heavy  hard linking
  2008-07-25  5:20 ` Christoph Hellwig
@ 2008-08-04 16:47   ` Christian Affolter
  2008-08-05  0:19     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2008-08-04 16:47 UTC (permalink / raw)
  To: xfs

Hi

> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
>> Kernel-Error:
>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 of  
>> file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a4fcf
>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
> 
> 2.6.24 is pretty old.  Did you try with a recent kernel?  We had some
> fixes for in-core memory corruption although I don't remember one in
> this area.

I finally found the time to update the kernel to a recent 2.6.26 version.

Unfortunately the problem still exists:
Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 of 
file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a6672
Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1

Call Trace:
  [<ffffffff803a6672>] xfs_create+0x1c2/0x4c0
  [<ffffffff8039fd16>] xfs_trans_cancel+0x126/0x150
  [<ffffffff803a6672>] xfs_create+0x1c2/0x4c0
  [<ffffffff803b186d>] xfs_vn_mknod+0x16d/0x2c0
  [<ffffffff80291b7c>] vfs_create+0xcc/0x130
  [<ffffffff8029539f>] do_filp_open+0x77f/0x860
  [<ffffffff80286d1a>] do_sys_open+0x5a/0xf0
  [<ffffffff8020b49b>] system_call_after_swapgs+0x7b/0x80

xfs_force_shutdown(dm-3,0x8) called from line 1164 of file 
fs/xfs/xfs_trans.c.  Return address = 0xffffffff8039fd2f
Filesystem "dm-3": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-3
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
xfs_force_shutdown(dm-3,0x1) called from line 420 of file 
fs/xfs/xfs_rw.c.  Return address = 0xffffffff803a9529
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
xfs_force_shutdown(dm-3,0x1) called from line 420 of file 
fs/xfs/xfs_rw.c.  Return address = 0xffffffff803a9529
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.
Filesystem "dm-3": xfs_log_force: error 5 returned.


Before the shutdown happens the copy command receives a
"No space left on device" error:
cp: cannot create regular file `[file name snipped': No space left on device
cp: cannot create regular file `[file name snipped]': Input/output error

Although the device has more than 50% free space as well as free inodes.

The affected device was initialized with old xfsprogs (2.8.11):
meta-data=/dev/evms/vol1 isize=256    agcount=3207, agsize=4096 blks
          =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=13132799, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=1024, version=1
          =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0


Creating a new device with xfsprogs (2.9.7) leads to the following layout:
meta-data=/dev/sdc1              isize=256    agcount=5, agsize=3662818 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=17750000, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=7153, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0


On the newly created device, the problem is much harder to reproduce, 
however it happens nonetheless after around a day of heavy copying and 
deleting.


Any further hints?


Many thanks
Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corruption of in-memory data detected - on heavy  hard linking
  2008-08-04 16:47   ` Christian Affolter
@ 2008-08-05  0:19     ` Dave Chinner
       [not found]       ` <48A02FF6.70703@stepping-stone.ch>
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2008-08-05  0:19 UTC (permalink / raw)
  To: Christian Affolter; +Cc: xfs

On Mon, Aug 04, 2008 at 06:47:46PM +0200, Christian Affolter wrote:
> Hi
>
>> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
>>> Kernel-Error:
>>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 
>>> of  file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a4fcf
>>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
>>
>> 2.6.24 is pretty old.  Did you try with a recent kernel?  We had some
>> fixes for in-core memory corruption although I don't remember one in
>> this area.
>
> I finally found the time to update the kernel to a recent 2.6.26 version.
>
> Unfortunately the problem still exists:
> Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 of  
> file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a6672
> Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1

Ok, what we need is the following. First, try to reproduce the
problem on a small filesystem (say a few GB). Once you've reproduced
the problem, unmount and remount the filesystem to get the log
replayed, then take a xfs_metadump image of the filesystem. Put the
metadump image somewhere that can be downloaded (ftp/web site) and
let us know where it is.

If this is anything like the previous problem I found and fixed,
then it will be a corner-case bug that is only triggered by a
specific layout of free space and we need the filesystem image
to be able to work out exactly what corner case is broken....

> Before the shutdown happens the copy command receives a
> "No space left on device" error:
> cp: cannot create regular file `[file name snipped': No space left on device
> cp: cannot create regular file `[file name snipped]': Input/output error
>
> Although the device has more than 50% free space as well as free inodes.

It will be an AG that is out of space, not the entire filesystem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Corruption of in-memory data detected - on heavy  hard linking
       [not found]       ` <48A02FF6.70703@stepping-stone.ch>
@ 2008-08-11 23:52         ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-11 23:52 UTC (permalink / raw)
  To: Christian Affolter; +Cc: xfs

On Mon, Aug 11, 2008 at 02:26:30PM +0200, Christian Affolter wrote:
> Hi Dave
>
>>>> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote:
>>>>> Kernel-Error:
>>>>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 
>>>>> 1163 of  file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a4fcf
>>>>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1
>>>> 2.6.24 is pretty old.  Did you try with a recent kernel?  We had some
>>>> fixes for in-core memory corruption although I don't remember one in
>>>> this area.
>>> I finally found the time to update the kernel to a recent 2.6.26 version.
>>>
>>> Unfortunately the problem still exists:
>>> Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 
>>> of  file fs/xfs/xfs_trans.c.  Caller 0xffffffff803a6672
>>> Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1
>>
>> Ok, what we need is the following. First, try to reproduce the
>> problem on a small filesystem (say a few GB). Once you've reproduced
>> the problem, unmount and remount the filesystem to get the log
>> replayed, then take a xfs_metadump image of the filesystem. Put the
>> metadump image somewhere that can be downloaded (ftp/web site) and
>> let us know where it is.
> Please excuse the delay, it took some time to reproduce the issue with  
> newly generated nonsensitive data...

You probably didn't need to do that. from the man page:

	By default, xfs_metadump obfuscates most file (regular file,
	directory and symbolic link) names and extended attribute names to
	allow the dumps to be sent without revealing confi‐ dential
	information. Extended attribute values are zeroed and no data is
	copied. The only exceptions are file or attribute names that are 4
	or less characters in length. Also file names that span extents
	(this can only occur with the mkfs.xfs(8) options where -n size > -b
	size) are not obfuscated. Names between 5 and 8 characters in length
	inclusively  are partially obfuscated.

> However while looking at the meta dump (with the help of the strings  
> command), a lot of non-existing file names appears. Non-existing in the  
> sense of not present on this device, they may exist on other devices,  
> but they definitely were never on the dumped device (the device was  
> filled with /dev/zero before creating the xfs filesystem).

That's the obfuscation.

> Therefor I'm a bit scared to place the dump publicly on the internet,  
> might it be possible to put it somewhere with user/pw protection and  
> hand the credentials to you privately?

Sure.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-11 23:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-23 17:40 Corruption of in-memory data detected - on heavy hard linking Christian Affolter
2008-07-25  5:20 ` Christoph Hellwig
2008-08-04 16:47   ` Christian Affolter
2008-08-05  0:19     ` Dave Chinner
     [not found]       ` <48A02FF6.70703@stepping-stone.ch>
2008-08-11 23:52         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox