public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Corruption of in-memory data detected
@ 2009-01-02  2:46 Thomas Gutzler
  2009-01-02  3:24 ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gutzler @ 2009-01-02  2:46 UTC (permalink / raw)
  To: xfs

Hi,

I've been running an 8x500G hardware SATA RAID5 on an adaptec 31605
controller for a while. The operating system is ubuntu feisty with the
2.6.22-16-server kernel. Recently, I added a disk. After the array
rebuild was completed, I kept getting errors from the xfs module such
as this one:
Dec 30 22:55:39 io kernel: [21844.939832] Filesystem "sda":
xfs_iflush: Bad inode 1610669723 magic number 0xec9d, ptr 0xe523eb00
Dec 30 22:55:39 io kernel: [21844.939879] xfs_force_shutdown(sda,0x8)
called from line 3277 of file
/build/buildd/linux-source-2.6.22-2.6.22/fs/xfs/xfs_inode.c.  Return
address = 0xf8af263c
Dec 30 22:55:39 io kernel: [21844.939885] Filesystem "sda": Corruption
of in-memory data detected.  Shutting down filesystem: sda

My first thought was to run memcheck on the machine, which completed
several passes without error; the raid controller doesn't report any
SMART failures either.

After an xfs_repair, which fixed a few things, I mounted the file
system but the error kept reappearing after a few hours unless I
mounted read-only. Since xfs_ncheck -i always exited with 'Out of
memory' I decided to reduce the max amount of inodes to 1% (156237488)
by running xfs_growfs -m 1 - the total amount of inodes used is still
less than 1%. Unfortunately, both xfs_check and xfs_ncheck still say
'out of memory' with 2GB installed.
.
After the modification, the file system survived for a day until the
following happened:
Jan  2 09:33:29 io kernel: [232751.699812] BUG: unable to handle
kernel paging request at virtual address 0003fffb
Jan  2 09:33:29 io kernel: [232751.699848]  printing eip:
Jan  2 09:33:29 io kernel: [232751.699863] c017d872
Jan  2 09:33:29 io kernel: [232751.699865] *pdpt = 000000003711e001
Jan  2 09:33:29 io kernel: [232751.699881] *pde = 0000000000000000
Jan  2 09:33:29 io kernel: [232751.699898] Oops: 0002 [#1]
Jan  2 09:33:29 io kernel: [232751.699913] SMP
Jan  2 09:33:29 io kernel: [232751.699931] Modules linked in: nfs nfsd
exportfs lockd sunrpc xt_tcpudp nf_conntrack_ipv4 xt_state
nf_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 ext2
mbcache coretemp w83627ehf i2c_isa i2c_core acpi_cpufreq
cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
freq_table cpufreq_conservative psmouse serio_raw pcspkr shpchp
pci_hotplug evdev intel_agp agpgart xfs sr_mod cdrom pata_jmicron
ata_piix sg sd_mod ata_generic ohci1394 ieee1394 ahci libata e1000
aacraid scsi_mod uhci_hcd ehci_hcd usbcore thermal processor fan fuse
apparmor commoncap
Jan  2 09:33:29 io kernel: [232751.700180] CPU:    1
Jan  2 09:33:29 io kernel: [232751.700181] EIP:
0060:[__slab_free+50/672]    Not tainted VLI
Jan  2 09:33:29 io kernel: [232751.700182] EFLAGS: 00010046
(2.6.22-16-server #1)
Jan  2 09:33:29 io kernel: [232751.700234] EIP is at __slab_free+0x32/0x2a0
Jan  2 09:33:29 io kernel: [232751.700252] eax: 0000ffff   ebx:
ffffffff   ecx: ffffffff   edx: 000014aa
Jan  2 09:33:29 io kernel: [232751.700273] esi: c17fffe0   edi:
e6b8e0c0   ebp: f8ac2c8c   esp: c21dfe44
Jan  2 09:33:29 io kernel: [232751.700293] ds: 007b   es: 007b   fs:
00d8  gs: 0000  ss: 0068
Jan  2 09:33:29 io kernel: [232751.700313] Process kswapd0 (pid: 198,
ti=c21de000 task=c21f39f0 task.ti=c21de000)
Jan  2 09:33:29 io kernel: [232751.700334] Stack: 00000000 00000065
00000000 fffffffe ffffffff c17fffe0 00000287 e6b8e0c0
Jan  2 09:33:29 io kernel: [232751.700378]        00000001 c017e3fe
f8ac2c8c cecb7d20 00000001 df2e2600 f8ac2c8c df2e2600
Jan  2 09:33:29 io kernel: [232751.700422]        f8d7559c e8247900
f8ac5224 df2e2600 f8d7559c e8247900 f8ae1606 00000001
Jan  2 09:33:29 io kernel: [232751.700466] Call Trace:
Jan  2 09:33:29 io kernel: [232751.700499]  [kfree+126/192] kfree+0x7e/0xc0
Jan  2 09:33:29 io kernel: [232751.700519]  [<f8ac2c8c>]
xfs_idestroy_fork+0x2c/0xf0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700561]  [<f8ac2c8c>]
xfs_idestroy_fork+0x2c/0xf0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700601]  [<f8ac5224>]
xfs_idestroy+0x44/0xb0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700640]  [<f8ae1606>]
xfs_finish_reclaim+0x36/0x160 [xfs]
Jan  2 09:33:29 io kernel: [232751.700681]  [<f8af1c47>]
xfs_fs_clear_inode+0x97/0xc0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700721]  [clear_inode+143/320]
clear_inode+0x8f/0x140
Jan  2 09:33:29 io kernel: [232751.700743]  [dispose_list+26/224]
dispose_list+0x1a/0xe0
Jan  2 09:33:29 io kernel: [232751.700765]
[shrink_icache_memory+379/592] shrink_icache_memory+0x17b/0x250
Jan  2 09:33:29 io kernel: [232751.700789]  [shrink_slab+279/368]
shrink_slab+0x117/0x170
Jan  2 09:33:29 io kernel: [232751.700815]  [kswapd+859/1136] kswapd+0x35b/0x470
Jan  2 09:33:29 io kernel: [232751.700842]
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Jan  2 09:33:29 io kernel: [232751.700867]  [kswapd+0/1136] kswapd+0x0/0x470
Jan  2 09:33:29 io kernel: [232751.700886]  [kthread+66/112] kthread+0x42/0x70
Jan  2 09:33:29 io kernel: [232751.700904]  [kthread+0/112] kthread+0x0/0x70
Jan  2 09:33:29 io kernel: [232751.700923]
[kernel_thread_helper+7/28] kernel_thread_helper+0x7/0x1c
Jan  2 09:33:29 io kernel: [232751.700946]  =======================
Jan  2 09:33:29 io kernel: [232751.700962] Code: 53 89 cb 83 ec 14 8b
6c 24 28 f0 0f ba 2e 00 19 c0 85 c0 74 0a 8b 06 a8 01 74 ef f3 90 eb
f6 f6 06 02 75 48 0f b7 46 0a 8b 56 14 <89> 14 83 0f b7 46 08 89 5e 14
83 e8 01 f6 06 40 66 89 46 08 75
Jan  2 09:33:29 io kernel: [232751.701128] EIP: [__slab_free+50/672]
__slab_free+0x32/0x2a0 SS:ESP 0068:c21dfe44

Any thoughts what this could be or what could be done to fix it?

Cheers,
  Tom

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* corruption of in-memory data detected
@ 2014-07-01  6:44 Alexandru Cardaniuc
  2014-07-01  7:02 ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandru Cardaniuc @ 2014-07-01  6:44 UTC (permalink / raw)
  To: xfs; +Cc: Alexandru Cardaniuc


[-- Attachment #1.1: Type: text/plain, Size: 3752 bytes --]

Hi All,

I am having an issue with an XFS filesystem shutting down under high load
with very many small files.
Basically, I have around 3.5 - 4 million files on this filesystem. New
files are being written to the FS all the time, until I get to 9-11 mln
small files (35k on average).

at some point I get the following in dmesg:

[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at
line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8826bb7d
[2870477.695558]
[2870477.695559] Call Trace:
[2870477.695611]  [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
[2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
[2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
[2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
[2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
[2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
[2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
[2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
[2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
[2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
[2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
[2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
[2870477.695977]
[2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file
fs/xfs/xfs_trans.c.  Return address = 0xffffffff88262c46
[2870477.696452] Filesystem "sda5": Corruption of in-memory data detected.
Shutting down filesystem: sda5
[2870477.696464] Please umount the filesystem, and rectify the problem(s)

# ls -l /store
ls: /store: Input/output error
?--------- 0 root root 0 Jan  1  1970 /store

Filesystems is ~1T in size
# df -hT /store
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda5      xfs    910G  142G  769G  16% /store


Using CentOS 5.9 with kernel 2.6.18-348.el5xen

The filesystem is in a virtual machine (Xen) and on top of LVM.

Filesystem was created using mkfs.xfs defaults with
xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by
default.)

These are the defaults with which the filesystem was created:
# xfs_info /store
meta-data=/dev/sda5              isize=256    agcount=32, agsize=7454720
blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=238551040, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0


The problem is reproducible and I don't think it's hardware related. The
problem was reproduced on multiple servers of the same type. So, I doubt
it's a memory issue or something like that.


Is that a known issue? If it is then what's the fix? I went through the
kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs
related fixes since CentOS 5.9

Any help will be greatly appreciated...

-- 
Sincerely yours,
Alexandru Cardaniuc

[-- Attachment #1.2: Type: text/html, Size: 4684 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-07-01 21:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-02  2:46 Corruption of in-memory data detected Thomas Gutzler
2009-01-02  3:24 ` Eric Sandeen
2009-03-11  2:44   ` Thomas Gutzler
2009-03-11  4:30     ` Eric Sandeen
2009-03-11 10:42       ` Thomas Gutzler
2009-03-12  2:23         ` Eric Sandeen
2009-03-12  5:06           ` Thomas Gutzler
  -- strict thread matches above, loose matches on Subject: below --
2014-07-01  6:44 corruption " Alexandru Cardaniuc
2014-07-01  7:02 ` Dave Chinner
2014-07-01  8:29   ` Alexandru Cardaniuc
2014-07-01  9:38     ` Dave Chinner
2014-07-01 20:13       ` Alexandru Cardaniuc
2014-07-01 21:43         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox