public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Corruption of in-memory data detected
@ 2009-01-02  2:46 Thomas Gutzler
  2009-01-02  3:24 ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gutzler @ 2009-01-02  2:46 UTC (permalink / raw)
  To: xfs

Hi,

I've been running an 8x500G hardware SATA RAID5 on an adaptec 31605
controller for a while. The operating system is ubuntu feisty with the
2.6.22-16-server kernel. Recently, I added a disk. After the array
rebuild was completed, I kept getting errors from the xfs module such
as this one:
Dec 30 22:55:39 io kernel: [21844.939832] Filesystem "sda":
xfs_iflush: Bad inode 1610669723 magic number 0xec9d, ptr 0xe523eb00
Dec 30 22:55:39 io kernel: [21844.939879] xfs_force_shutdown(sda,0x8)
called from line 3277 of file
/build/buildd/linux-source-2.6.22-2.6.22/fs/xfs/xfs_inode.c.  Return
address = 0xf8af263c
Dec 30 22:55:39 io kernel: [21844.939885] Filesystem "sda": Corruption
of in-memory data detected.  Shutting down filesystem: sda

My first thought was to run memcheck on the machine, which completed
several passes without error; the raid controller doesn't report any
SMART failures either.

After an xfs_repair, which fixed a few things, I mounted the file
system but the error kept reappearing after a few hours unless I
mounted read-only. Since xfs_ncheck -i always exited with 'Out of
memory' I decided to reduce the max amount of inodes to 1% (156237488)
by running xfs_growfs -m 1 - the total amount of inodes used is still
less than 1%. Unfortunately, both xfs_check and xfs_ncheck still say
'out of memory' with 2GB installed.
.
After the modification, the file system survived for a day until the
following happened:
Jan  2 09:33:29 io kernel: [232751.699812] BUG: unable to handle
kernel paging request at virtual address 0003fffb
Jan  2 09:33:29 io kernel: [232751.699848]  printing eip:
Jan  2 09:33:29 io kernel: [232751.699863] c017d872
Jan  2 09:33:29 io kernel: [232751.699865] *pdpt = 000000003711e001
Jan  2 09:33:29 io kernel: [232751.699881] *pde = 0000000000000000
Jan  2 09:33:29 io kernel: [232751.699898] Oops: 0002 [#1]
Jan  2 09:33:29 io kernel: [232751.699913] SMP
Jan  2 09:33:29 io kernel: [232751.699931] Modules linked in: nfs nfsd
exportfs lockd sunrpc xt_tcpudp nf_conntrack_ipv4 xt_state
nf_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 ext2
mbcache coretemp w83627ehf i2c_isa i2c_core acpi_cpufreq
cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
freq_table cpufreq_conservative psmouse serio_raw pcspkr shpchp
pci_hotplug evdev intel_agp agpgart xfs sr_mod cdrom pata_jmicron
ata_piix sg sd_mod ata_generic ohci1394 ieee1394 ahci libata e1000
aacraid scsi_mod uhci_hcd ehci_hcd usbcore thermal processor fan fuse
apparmor commoncap
Jan  2 09:33:29 io kernel: [232751.700180] CPU:    1
Jan  2 09:33:29 io kernel: [232751.700181] EIP:
0060:[__slab_free+50/672]    Not tainted VLI
Jan  2 09:33:29 io kernel: [232751.700182] EFLAGS: 00010046
(2.6.22-16-server #1)
Jan  2 09:33:29 io kernel: [232751.700234] EIP is at __slab_free+0x32/0x2a0
Jan  2 09:33:29 io kernel: [232751.700252] eax: 0000ffff   ebx:
ffffffff   ecx: ffffffff   edx: 000014aa
Jan  2 09:33:29 io kernel: [232751.700273] esi: c17fffe0   edi:
e6b8e0c0   ebp: f8ac2c8c   esp: c21dfe44
Jan  2 09:33:29 io kernel: [232751.700293] ds: 007b   es: 007b   fs:
00d8  gs: 0000  ss: 0068
Jan  2 09:33:29 io kernel: [232751.700313] Process kswapd0 (pid: 198,
ti=c21de000 task=c21f39f0 task.ti=c21de000)
Jan  2 09:33:29 io kernel: [232751.700334] Stack: 00000000 00000065
00000000 fffffffe ffffffff c17fffe0 00000287 e6b8e0c0
Jan  2 09:33:29 io kernel: [232751.700378]        00000001 c017e3fe
f8ac2c8c cecb7d20 00000001 df2e2600 f8ac2c8c df2e2600
Jan  2 09:33:29 io kernel: [232751.700422]        f8d7559c e8247900
f8ac5224 df2e2600 f8d7559c e8247900 f8ae1606 00000001
Jan  2 09:33:29 io kernel: [232751.700466] Call Trace:
Jan  2 09:33:29 io kernel: [232751.700499]  [kfree+126/192] kfree+0x7e/0xc0
Jan  2 09:33:29 io kernel: [232751.700519]  [<f8ac2c8c>]
xfs_idestroy_fork+0x2c/0xf0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700561]  [<f8ac2c8c>]
xfs_idestroy_fork+0x2c/0xf0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700601]  [<f8ac5224>]
xfs_idestroy+0x44/0xb0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700640]  [<f8ae1606>]
xfs_finish_reclaim+0x36/0x160 [xfs]
Jan  2 09:33:29 io kernel: [232751.700681]  [<f8af1c47>]
xfs_fs_clear_inode+0x97/0xc0 [xfs]
Jan  2 09:33:29 io kernel: [232751.700721]  [clear_inode+143/320]
clear_inode+0x8f/0x140
Jan  2 09:33:29 io kernel: [232751.700743]  [dispose_list+26/224]
dispose_list+0x1a/0xe0
Jan  2 09:33:29 io kernel: [232751.700765]
[shrink_icache_memory+379/592] shrink_icache_memory+0x17b/0x250
Jan  2 09:33:29 io kernel: [232751.700789]  [shrink_slab+279/368]
shrink_slab+0x117/0x170
Jan  2 09:33:29 io kernel: [232751.700815]  [kswapd+859/1136] kswapd+0x35b/0x470
Jan  2 09:33:29 io kernel: [232751.700842]
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Jan  2 09:33:29 io kernel: [232751.700867]  [kswapd+0/1136] kswapd+0x0/0x470
Jan  2 09:33:29 io kernel: [232751.700886]  [kthread+66/112] kthread+0x42/0x70
Jan  2 09:33:29 io kernel: [232751.700904]  [kthread+0/112] kthread+0x0/0x70
Jan  2 09:33:29 io kernel: [232751.700923]
[kernel_thread_helper+7/28] kernel_thread_helper+0x7/0x1c
Jan  2 09:33:29 io kernel: [232751.700946]  =======================
Jan  2 09:33:29 io kernel: [232751.700962] Code: 53 89 cb 83 ec 14 8b
6c 24 28 f0 0f ba 2e 00 19 c0 85 c0 74 0a 8b 06 a8 01 74 ef f3 90 eb
f6 f6 06 02 75 48 0f b7 46 0a 8b 56 14 <89> 14 83 0f b7 46 08 89 5e 14
83 e8 01 f6 06 40 66 89 46 08 75
Jan  2 09:33:29 io kernel: [232751.701128] EIP: [__slab_free+50/672]
__slab_free+0x32/0x2a0 SS:ESP 0068:c21dfe44

Any thoughts what this could be or what could be done to fix it?

Cheers,
  Tom

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-01-02  2:46 Corruption of in-memory data detected Thomas Gutzler
@ 2009-01-02  3:24 ` Eric Sandeen
  2009-03-11  2:44   ` Thomas Gutzler
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-01-02  3:24 UTC (permalink / raw)
  To: Thomas Gutzler; +Cc: xfs

Thomas Gutzler wrote:
> Hi,
> 
> I've been running an 8x500G hardware SATA RAID5 on an adaptec 31605
> controller for a while. The operating system is ubuntu feisty with the
> 2.6.22-16-server kernel. Recently, I added a disk. After the array
> rebuild was completed, I kept getting errors from the xfs module such
> as this one:
> Dec 30 22:55:39 io kernel: [21844.939832] Filesystem "sda":
> xfs_iflush: Bad inode 1610669723 magic number 0xec9d, ptr 0xe523eb00
> Dec 30 22:55:39 io kernel: [21844.939879] xfs_force_shutdown(sda,0x8)
> called from line 3277 of file
> /build/buildd/linux-source-2.6.22-2.6.22/fs/xfs/xfs_inode.c.  Return
> address = 0xf8af263c
> Dec 30 22:55:39 io kernel: [21844.939885] Filesystem "sda": Corruption
> of in-memory data detected.  Shutting down filesystem: sda
> 
> My first thought was to run memcheck on the machine, which completed
> several passes without error; the raid controller doesn't report any
> SMART failures either.

Both good ideas, but note that "Corruption of in-memory data detected"
doesn't necessarily mean bad memory (though it might, so memcheck was
prudent).  0xec9d is not the correct magic nr. for an on-disk inode, so
that's why things went south.  Were there no storage related errors
prior to this?

> After an xfs_repair, which fixed a few things, 

Knowing which things were fixed might lend some clues ...

> I mounted the file
> system but the error kept reappearing after a few hours unless I
> mounted read-only. Since xfs_ncheck -i always exited with 'Out of
> memory'

xfs_check takes a ton of memory; xfs_repair much less so

> I decided to reduce the max amount of inodes to 1% (156237488)
> by running xfs_growfs -m 1 - the total amount of inodes used is still
> less than 1%. Unfortunately, both xfs_check and xfs_ncheck still say
> 'out of memory' with 2GB installed.

the max inodes really have no bearing on check or repair memory usage;
it's just an upper limit on how many inodes *could* be created.

> After the modification, the file system survived for a day until the
> following happened:
> Jan  2 09:33:29 io kernel: [232751.699812] BUG: unable to handle
> kernel paging request at virtual address 0003fffb
> Jan  2 09:33:29 io kernel: [232751.699848]  printing eip:
> Jan  2 09:33:29 io kernel: [232751.699863] c017d872
> Jan  2 09:33:29 io kernel: [232751.699865] *pdpt = 000000003711e001
> Jan  2 09:33:29 io kernel: [232751.699881] *pde = 0000000000000000
> Jan  2 09:33:29 io kernel: [232751.699898] Oops: 0002 [#1]
> Jan  2 09:33:29 io kernel: [232751.699913] SMP
> Jan  2 09:33:29 io kernel: [232751.699931] Modules linked in: nfs nfsd
> exportfs lockd sunrpc xt_tcpudp nf_conntrack_ipv4 xt_state
> nf_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 ext2
> mbcache coretemp w83627ehf i2c_isa i2c_core acpi_cpufreq
> cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
> freq_table cpufreq_conservative psmouse serio_raw pcspkr shpchp
> pci_hotplug evdev intel_agp agpgart xfs sr_mod cdrom pata_jmicron
> ata_piix sg sd_mod ata_generic ohci1394 ieee1394 ahci libata e1000
> aacraid scsi_mod uhci_hcd ehci_hcd usbcore thermal processor fan fuse
> apparmor commoncap
> Jan  2 09:33:29 io kernel: [232751.700180] CPU:    1
> Jan  2 09:33:29 io kernel: [232751.700181] EIP:
> 0060:[__slab_free+50/672]    Not tainted VLI
> Jan  2 09:33:29 io kernel: [232751.700182] EFLAGS: 00010046
> (2.6.22-16-server #1)
> Jan  2 09:33:29 io kernel: [232751.700234] EIP is at __slab_free+0x32/0x2a0

Memory corruption perhaps?

> Jan  2 09:33:29 io kernel: [232751.700252] eax: 0000ffff   ebx:
> ffffffff   ecx: ffffffff   edx: 000014aa
> Jan  2 09:33:29 io kernel: [232751.700273] esi: c17fffe0   edi:
> e6b8e0c0   ebp: f8ac2c8c   esp: c21dfe44
> Jan  2 09:33:29 io kernel: [232751.700293] ds: 007b   es: 007b   fs:
> 00d8  gs: 0000  ss: 0068
> Jan  2 09:33:29 io kernel: [232751.700313] Process kswapd0 (pid: 198,
> ti=c21de000 task=c21f39f0 task.ti=c21de000)
> Jan  2 09:33:29 io kernel: [232751.700334] Stack: 00000000 00000065
> 00000000 fffffffe ffffffff c17fffe0 00000287 e6b8e0c0
> Jan  2 09:33:29 io kernel: [232751.700378]        00000001 c017e3fe
> f8ac2c8c cecb7d20 00000001 df2e2600 f8ac2c8c df2e2600
> Jan  2 09:33:29 io kernel: [232751.700422]        f8d7559c e8247900
> f8ac5224 df2e2600 f8d7559c e8247900 f8ae1606 00000001
> Jan  2 09:33:29 io kernel: [232751.700466] Call Trace:
> Jan  2 09:33:29 io kernel: [232751.700499]  [kfree+126/192] kfree+0x7e/0xc0
> Jan  2 09:33:29 io kernel: [232751.700519]  [<f8ac2c8c>]
> xfs_idestroy_fork+0x2c/0xf0 [xfs]
> Jan  2 09:33:29 io kernel: [232751.700561]  [<f8ac2c8c>]
> xfs_idestroy_fork+0x2c/0xf0 [xfs]
> Jan  2 09:33:29 io kernel: [232751.700601]  [<f8ac5224>]
> xfs_idestroy+0x44/0xb0 [xfs]
> Jan  2 09:33:29 io kernel: [232751.700640]  [<f8ae1606>]
> xfs_finish_reclaim+0x36/0x160 [xfs]
> Jan  2 09:33:29 io kernel: [232751.700681]  [<f8af1c47>]
> xfs_fs_clear_inode+0x97/0xc0 [xfs]
> Jan  2 09:33:29 io kernel: [232751.700721]  [clear_inode+143/320]
> clear_inode+0x8f/0x140
> Jan  2 09:33:29 io kernel: [232751.700743]  [dispose_list+26/224]
> dispose_list+0x1a/0xe0
> Jan  2 09:33:29 io kernel: [232751.700765]
> [shrink_icache_memory+379/592] shrink_icache_memory+0x17b/0x250
> Jan  2 09:33:29 io kernel: [232751.700789]  [shrink_slab+279/368]
> shrink_slab+0x117/0x170
> Jan  2 09:33:29 io kernel: [232751.700815]  [kswapd+859/1136] kswapd+0x35b/0x470
> Jan  2 09:33:29 io kernel: [232751.700842]
> [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
> Jan  2 09:33:29 io kernel: [232751.700867]  [kswapd+0/1136] kswapd+0x0/0x470
> Jan  2 09:33:29 io kernel: [232751.700886]  [kthread+66/112] kthread+0x42/0x70
> Jan  2 09:33:29 io kernel: [232751.700904]  [kthread+0/112] kthread+0x0/0x70
> Jan  2 09:33:29 io kernel: [232751.700923]
> [kernel_thread_helper+7/28] kernel_thread_helper+0x7/0x1c
> Jan  2 09:33:29 io kernel: [232751.700946]  =======================
> Jan  2 09:33:29 io kernel: [232751.700962] Code: 53 89 cb 83 ec 14 8b
> 6c 24 28 f0 0f ba 2e 00 19 c0 85 c0 74 0a 8b 06 a8 01 74 ef f3 90 eb
> f6 f6 06 02 75 48 0f b7 46 0a 8b 56 14 <89> 14 83 0f b7 46 08 89 5e 14
> 83 e8 01 f6 06 40 66 89 46 08 75
> Jan  2 09:33:29 io kernel: [232751.701128] EIP: [__slab_free+50/672]
> __slab_free+0x32/0x2a0 SS:ESP 0068:c21dfe44
> 
> Any thoughts what this could be or what could be done to fix it?

seems like maybe something went wrong w/ the raid rebuild, if that's
when things started going south.  Do you get any storage error related
messages at all?

Ubuntu knows best what's in this oldish distro kernel, I guess; I don't
know offhand what might be going wrong.  If they have a debug kernel
variant, you could run that to see if you get earlier indications of
problems.

If you can reproduce on a more recent upstream kernel, that would be
interesting.

-Eric

> Cheers,
>   Tom

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-01-02  3:24 ` Eric Sandeen
@ 2009-03-11  2:44   ` Thomas Gutzler
  2009-03-11  4:30     ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gutzler @ 2009-03-11  2:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Hi,

a while ago I was having problems with the xfs module on ubuntu
feisty. I have then upgraded to 8.10 intrepid and am still getting the
occasional bug. Good thing is that now the system keeps running after
the bug occurs with load slowly increasing as random processes are
affected by this turn into zombies.

On Fri, Jan 2, 2009 at 12:24, Eric Sandeen <sandeen@sandeen.net> wrote:
> Thomas Gutzler wrote:
>> Hi,
>>
>> I've been running an 8x500G hardware SATA RAID5 on an adaptec 31605
>> controller for a while. The operating system is ubuntu feisty with the
>> 2.6.22-16-server kernel. Recently, I added a disk. After the array
>> rebuild was completed, I kept getting errors from the xfs module such
>> as this one:
>> Dec 30 22:55:39 io kernel: [21844.939832] Filesystem "sda":
>> xfs_iflush: Bad inode 1610669723 magic number 0xec9d, ptr 0xe523eb00
>> Dec 30 22:55:39 io kernel: [21844.939879] xfs_force_shutdown(sda,0x8)
>> called from line 3277 of file
>> /build/buildd/linux-source-2.6.22-2.6.22/fs/xfs/xfs_inode.c.  Return
>> address = 0xf8af263c
>> Dec 30 22:55:39 io kernel: [21844.939885] Filesystem "sda": Corruption
>> of in-memory data detected.  Shutting down filesystem: sda
>>
>> My first thought was to run memcheck on the machine, which completed
>> several passes without error; the raid controller doesn't report any
>> SMART failures either.
>
> Both good ideas, but note that "Corruption of in-memory data detected"
> doesn't necessarily mean bad memory (though it might, so memcheck was
> prudent).  0xec9d is not the correct magic nr. for an on-disk inode, so
> that's why things went south.  Were there no storage related errors
> prior to this?
>
>> After an xfs_repair, which fixed a few things,
>
> Knowing which things were fixed might lend some clues ...
>
>> I mounted the file
>> system but the error kept reappearing after a few hours unless I
>> mounted read-only. Since xfs_ncheck -i always exited with 'Out of
>> memory'
>
> xfs_check takes a ton of memory; xfs_repair much less so
>
>> I decided to reduce the max amount of inodes to 1% (156237488)
>> by running xfs_growfs -m 1 - the total amount of inodes used is still
>> less than 1%. Unfortunately, both xfs_check and xfs_ncheck still say
>> 'out of memory' with 2GB installed.
>
> the max inodes really have no bearing on check or repair memory usage;
> it's just an upper limit on how many inodes *could* be created.
>
>> After the modification, the file system survived for a day until the
>> following happened:
>> Jan  2 09:33:29 io kernel: [232751.699812] BUG: unable to handle
>> kernel paging request at virtual address 0003fffb
>> Jan  2 09:33:29 io kernel: [232751.699848]  printing eip:
>> Jan  2 09:33:29 io kernel: [232751.699863] c017d872
>> Jan  2 09:33:29 io kernel: [232751.699865] *pdpt = 000000003711e001
>> Jan  2 09:33:29 io kernel: [232751.699881] *pde = 0000000000000000
>> Jan  2 09:33:29 io kernel: [232751.699898] Oops: 0002 [#1]
>> Jan  2 09:33:29 io kernel: [232751.699913] SMP
>> Jan  2 09:33:29 io kernel: [232751.699931] Modules linked in: nfs nfsd
>> exportfs lockd sunrpc xt_tcpudp nf_conntrack_ipv4 xt_state
>> nf_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 ext2
>> mbcache coretemp w83627ehf i2c_isa i2c_core acpi_cpufreq
>> cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
>> freq_table cpufreq_conservative psmouse serio_raw pcspkr shpchp
>> pci_hotplug evdev intel_agp agpgart xfs sr_mod cdrom pata_jmicron
>> ata_piix sg sd_mod ata_generic ohci1394 ieee1394 ahci libata e1000
>> aacraid scsi_mod uhci_hcd ehci_hcd usbcore thermal processor fan fuse
>> apparmor commoncap
>> Jan  2 09:33:29 io kernel: [232751.700180] CPU:    1
>> Jan  2 09:33:29 io kernel: [232751.700181] EIP:
>> 0060:[__slab_free+50/672]    Not tainted VLI
>> Jan  2 09:33:29 io kernel: [232751.700182] EFLAGS: 00010046
>> (2.6.22-16-server #1)
>> Jan  2 09:33:29 io kernel: [232751.700234] EIP is at __slab_free+0x32/0x2a0
>
> Memory corruption perhaps?
>
>> Jan  2 09:33:29 io kernel: [232751.700252] eax: 0000ffff   ebx:
>> ffffffff   ecx: ffffffff   edx: 000014aa
>> Jan  2 09:33:29 io kernel: [232751.700273] esi: c17fffe0   edi:
>> e6b8e0c0   ebp: f8ac2c8c   esp: c21dfe44
>> Jan  2 09:33:29 io kernel: [232751.700293] ds: 007b   es: 007b   fs:
>> 00d8  gs: 0000  ss: 0068
>> Jan  2 09:33:29 io kernel: [232751.700313] Process kswapd0 (pid: 198,
>> ti=c21de000 task=c21f39f0 task.ti=c21de000)
>> Jan  2 09:33:29 io kernel: [232751.700334] Stack: 00000000 00000065
>> 00000000 fffffffe ffffffff c17fffe0 00000287 e6b8e0c0
>> Jan  2 09:33:29 io kernel: [232751.700378]        00000001 c017e3fe
>> f8ac2c8c cecb7d20 00000001 df2e2600 f8ac2c8c df2e2600
>> Jan  2 09:33:29 io kernel: [232751.700422]        f8d7559c e8247900
>> f8ac5224 df2e2600 f8d7559c e8247900 f8ae1606 00000001
>> Jan  2 09:33:29 io kernel: [232751.700466] Call Trace:
>> Jan  2 09:33:29 io kernel: [232751.700499]  [kfree+126/192] kfree+0x7e/0xc0
>> Jan  2 09:33:29 io kernel: [232751.700519]  [<f8ac2c8c>]
>> xfs_idestroy_fork+0x2c/0xf0 [xfs]
>> Jan  2 09:33:29 io kernel: [232751.700561]  [<f8ac2c8c>]
>> xfs_idestroy_fork+0x2c/0xf0 [xfs]
>> Jan  2 09:33:29 io kernel: [232751.700601]  [<f8ac5224>]
>> xfs_idestroy+0x44/0xb0 [xfs]
>> Jan  2 09:33:29 io kernel: [232751.700640]  [<f8ae1606>]
>> xfs_finish_reclaim+0x36/0x160 [xfs]
>> Jan  2 09:33:29 io kernel: [232751.700681]  [<f8af1c47>]
>> xfs_fs_clear_inode+0x97/0xc0 [xfs]
>> Jan  2 09:33:29 io kernel: [232751.700721]  [clear_inode+143/320]
>> clear_inode+0x8f/0x140
>> Jan  2 09:33:29 io kernel: [232751.700743]  [dispose_list+26/224]
>> dispose_list+0x1a/0xe0
>> Jan  2 09:33:29 io kernel: [232751.700765]
>> [shrink_icache_memory+379/592] shrink_icache_memory+0x17b/0x250
>> Jan  2 09:33:29 io kernel: [232751.700789]  [shrink_slab+279/368]
>> shrink_slab+0x117/0x170
>> Jan  2 09:33:29 io kernel: [232751.700815]  [kswapd+859/1136] kswapd+0x35b/0x470
>> Jan  2 09:33:29 io kernel: [232751.700842]
>> [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
>> Jan  2 09:33:29 io kernel: [232751.700867]  [kswapd+0/1136] kswapd+0x0/0x470
>> Jan  2 09:33:29 io kernel: [232751.700886]  [kthread+66/112] kthread+0x42/0x70
>> Jan  2 09:33:29 io kernel: [232751.700904]  [kthread+0/112] kthread+0x0/0x70
>> Jan  2 09:33:29 io kernel: [232751.700923]
>> [kernel_thread_helper+7/28] kernel_thread_helper+0x7/0x1c
>> Jan  2 09:33:29 io kernel: [232751.700946]  =======================
>> Jan  2 09:33:29 io kernel: [232751.700962] Code: 53 89 cb 83 ec 14 8b
>> 6c 24 28 f0 0f ba 2e 00 19 c0 85 c0 74 0a 8b 06 a8 01 74 ef f3 90 eb
>> f6 f6 06 02 75 48 0f b7 46 0a 8b 56 14 <89> 14 83 0f b7 46 08 89 5e 14
>> 83 e8 01 f6 06 40 66 89 46 08 75
>> Jan  2 09:33:29 io kernel: [232751.701128] EIP: [__slab_free+50/672]
>> __slab_free+0x32/0x2a0 SS:ESP 0068:c21dfe44
>>
>> Any thoughts what this could be or what could be done to fix it?
>
> seems like maybe something went wrong w/ the raid rebuild, if that's
> when things started going south.  Do you get any storage error related
> messages at all?

I couldn't find any errors other than the dump in dmesg (see below).
I also called adaptec and they said they never had memory failure in
they raid cards.

> Ubuntu knows best what's in this oldish distro kernel, I guess; I don't
> know offhand what might be going wrong.  If they have a debug kernel
> variant, you could run that to see if you get earlier indications of
> problems.
>
> If you can reproduce on a more recent upstream kernel, that would be
> interesting.

Here's the dmesg output:
[1369713.678092] BUG: unable to handle kernel paging request at ffff87f947da5088
[1369713.682882] IP: [<ffffffffa01db24f>]
xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
[1369713.687802] PGD 0
[1369713.688055] Oops: 0000 [1] SMP
[1369713.688055] CPU 1
[1369713.688055] Modules linked in: nls_cp437 cifs nfsd auth_rpcgss
exportfs wmi video output sbs sbshc pci
_slot container battery ac xt_tcpudp nf_conntrack_ipv4 xt_state
nf_conntrack nfs lockd nfs_acl sunrpc ipv6
iptable_filter ip_tables x_tables ext3 jbd mbcache cpufreq_userspace
cpufreq_stats cpufreq_powersave cpufre
q_ondemand cpufreq_conservative acpi_cpufreq freq_table sbp2
parport_pc lp parport loop evdev pcspkr iTCO_w
dt iTCO_vendor_support button shpchp pci_hotplug intel_agp xfs
pata_jmicron sd_mod crc_t10dif sg pata_acpi
ata_piix ohci1394 ieee1394 aacraid ata_generic ahci libata scsi_mod
e1000e dock uhci_hcd ehci_hcd usbcore t
hermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
[1369713.688055] Pid: 5278, comm: smbd Not tainted 2.6.27-11-server #1
[1369713.688055] RIP: 0010:[<ffffffffa01db24f>]  [<ffffffffa01db24f>]
xfs_dir2_block_lookup_int+0xcf/0x210
[xfs]
[1369713.688055] RSP: 0018:ffff88007120ba28  EFLAGS: 00010286
[1369713.688055] RAX: 00000005a072ded8 RBX: 00000000da072ded RCX:
0000000000000000
[1369713.688055] RDX: 00000000b40e5bda RSI: ffff88007a474c48 RDI:
ffffffffda072ded
[1369713.688055] RBP: ffff88007120ba98 R08: ffff87fa77a0e120 R09:
00000000ffffffff
[1369713.688055] R10: 00000000db5b0eb4 R11: ffff88001813bff8 R12:
ffff88007120bae8
[1369713.688055] R13: 000000000000002a R14: 0000000000000000 R15:
ffff88007120bb74
[1369713.688055] FS:  00007f79bf44e700(0000) GS:ffff88007f802880(0000)
knlGS:0000000000000000
[1369713.688055] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1369713.688055] CR2: ffff87f947da5088 CR3: 0000000053e88000 CR4:
00000000000006e0
[1369713.688055] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1369713.688055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[1369713.688055] Process smbd (pid: 5278, threadinfo ffff88007120a000,
task ffff88000350ace0)
[1369713.688055] Stack:  ffff88007120bd68 ffffffff802fa04c
ffff88007120bab4 ffff88007120baa8
[1369713.688055]  ffff88001813b000 ffff88007acb3000 0000000000000000
ffffffffa01d0289
[1369713.688055]  ffff88007a474c48 ffff880035a2c8c0 ffff88007120bae8
ffff88007120bae8
[1369713.688055] Call Trace:
[1369713.688055]  [<ffffffff802fa04c>] ? do_select+0x5bc/0x610
[1369713.688055]  [<ffffffffa01d0289>] ? xfs_bmbt_get_blockcount+0x9/0x20 [xfs]
[1369713.688055]  [<ffffffffa01db470>] xfs_dir2_block_lookup+0x20/0xc0 [xfs]
[1369713.688055]  [<ffffffffa01da7e5>] xfs_dir_lookup+0x195/0x1c0 [xfs]
[1369713.688055]  [<ffffffffa020a6cb>] xfs_lookup+0x7b/0xe0 [xfs]
[1369713.688055]  [<ffffffff802ff9f6>] ? __d_lookup+0x16/0x150
[1369713.688055]  [<ffffffffa02159a1>] xfs_vn_lookup+0x51/0x90 [xfs]
[1369713.688055]  [<ffffffff802f495e>] real_lookup+0xee/0x170
[1369713.688055]  [<ffffffff802f4a90>] do_lookup+0xb0/0x110
[1369713.688055]  [<ffffffff802f50fd>] __link_path_walk+0x60d/0xc20
[1369713.688055]  [<ffffffff80305ef6>] ? mntput_no_expire+0x36/0x160
[1369713.688055]  [<ffffffff802f5c4e>] path_walk+0x6e/0xe0
[1369713.688055]  [<ffffffff802f5e63>] do_path_lookup+0xe3/0x200
[1369713.688055]  [<ffffffff802f386a>] ? getname+0x4a/0xb0
[1369713.688055]  [<ffffffff802f6cbb>] user_path_at+0x7b/0xb0
[1369713.688055]  [<ffffffff802eda68>] ? cp_new_stat+0xe8/0x100
[1369713.688055]  [<ffffffff802ede7d>] vfs_stat_fd+0x2d/0x60
[1369713.688055]  [<ffffffff802edf5c>] sys_newstat+0x2c/0x50
[1369713.688055]  [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
[1369713.688055]
[1369713.688055]
[1369713.688055] Code: f8 31 c9 45 8b 13 4d 89 d8 44 89 d2 0f ca 89 d0
83 ea 01 48 c1 e0 03 49 29 c0 eb 07 8d 4b 01 39 ca 7c 1c 8d 1c 11 d1
fb 48 63 fb <41> 8b 04 f8 0f c8 41 39 c5 74 26 77 e4 8d 53 ff 39 ca 7d
e4 48
[1369713.688055] RIP  [<ffffffffa01db24f>]
xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
[1369713.688055]  RSP <ffff88007120ba28>
[1369713.688055] CR2: ffff87f947da5088
[1369714.057232] ---[ end trace 0735c8702d5e7899 ]---

What can I do to help getting this fixed?

Tom

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-03-11  2:44   ` Thomas Gutzler
@ 2009-03-11  4:30     ` Eric Sandeen
  2009-03-11 10:42       ` Thomas Gutzler
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-03-11  4:30 UTC (permalink / raw)
  To: Thomas Gutzler; +Cc: xfs

Thomas Gutzler wrote:
> Hi,
> 
> a while ago I was having problems with the xfs module on ubuntu
> feisty. I have then upgraded to 8.10 intrepid and am still getting the
> occasional bug. 

although from the looks of it, a different one.  I'm curious, if you log
these bug with Ubuntu, do they look into it?  It'd be nice to at least
have initial triage to know for example where in the function it blew up.

> Good thing is that now the system keeps running after
> the bug occurs with load slowly increasing as random processes are
> affected by this turn into zombies.

...

>>> Jan  2 09:33:29 io kernel: [232751.701128] EIP: [__slab_free+50/672]
>>> __slab_free+0x32/0x2a0 SS:ESP 0068:c21dfe44

... last error was slab corruption perhaps

>>> Any thoughts what this could be or what could be done to fix it?
>> seems like maybe something went wrong w/ the raid rebuild, if that's
>> when things started going south.  Do you get any storage error related
>> messages at all?
> 
> I couldn't find any errors other than the dump in dmesg (see below).
> I also called adaptec and they said they never had memory failure in
> they raid cards.
> 
>> Ubuntu knows best what's in this oldish distro kernel, I guess; I don't
>> know offhand what might be going wrong.  If they have a debug kernel
>> variant, you could run that to see if you get earlier indications of
>> problems.
>>
>> If you can reproduce on a more recent upstream kernel, that would be
>> interesting.
> 
> Here's the dmesg output:
> [1369713.678092] BUG: unable to handle kernel paging request at ffff87f947da5088
> [1369713.682882] IP: [<ffffffffa01db24f>]
> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]

Ok, so this looks like a different problem; at least a different oops.

> [1369713.687802] PGD 0
> [1369713.688055] Oops: 0000 [1] SMP
> [1369713.688055] CPU 1
> [1369713.688055] Modules linked in: nls_cp437 cifs nfsd auth_rpcgss
> exportfs wmi video output sbs sbshc pci
> _slot container battery ac xt_tcpudp nf_conntrack_ipv4 xt_state
> nf_conntrack nfs lockd nfs_acl sunrpc ipv6
> iptable_filter ip_tables x_tables ext3 jbd mbcache cpufreq_userspace
> cpufreq_stats cpufreq_powersave cpufre
> q_ondemand cpufreq_conservative acpi_cpufreq freq_table sbp2
> parport_pc lp parport loop evdev pcspkr iTCO_w
> dt iTCO_vendor_support button shpchp pci_hotplug intel_agp xfs
> pata_jmicron sd_mod crc_t10dif sg pata_acpi
> ata_piix ohci1394 ieee1394 aacraid ata_generic ahci libata scsi_mod
> e1000e dock uhci_hcd ehci_hcd usbcore t
> hermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
> [1369713.688055] Pid: 5278, comm: smbd Not tainted 2.6.27-11-server #1
> [1369713.688055] RIP: 0010:[<ffffffffa01db24f>]  [<ffffffffa01db24f>]
> xfs_dir2_block_lookup_int+0xcf/0x210
> [xfs]
> [1369713.688055] RSP: 0018:ffff88007120ba28  EFLAGS: 00010286
> [1369713.688055] RAX: 00000005a072ded8 RBX: 00000000da072ded RCX:
> 0000000000000000
> [1369713.688055] RDX: 00000000b40e5bda RSI: ffff88007a474c48 RDI:
> ffffffffda072ded
> [1369713.688055] RBP: ffff88007120ba98 R08: ffff87fa77a0e120 R09:
> 00000000ffffffff
> [1369713.688055] R10: 00000000db5b0eb4 R11: ffff88001813bff8 R12:
> ffff88007120bae8
> [1369713.688055] R13: 000000000000002a R14: 0000000000000000 R15:
> ffff88007120bb74
> [1369713.688055] FS:  00007f79bf44e700(0000) GS:ffff88007f802880(0000)
> knlGS:0000000000000000
> [1369713.688055] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [1369713.688055] CR2: ffff87f947da5088 CR3: 0000000053e88000 CR4:
> 00000000000006e0
> [1369713.688055] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [1369713.688055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [1369713.688055] Process smbd (pid: 5278, threadinfo ffff88007120a000,
> task ffff88000350ace0)
> [1369713.688055] Stack:  ffff88007120bd68 ffffffff802fa04c
> ffff88007120bab4 ffff88007120baa8
> [1369713.688055]  ffff88001813b000 ffff88007acb3000 0000000000000000
> ffffffffa01d0289
> [1369713.688055]  ffff88007a474c48 ffff880035a2c8c0 ffff88007120bae8
> ffff88007120bae8
> [1369713.688055] Call Trace:
> [1369713.688055]  [<ffffffff802fa04c>] ? do_select+0x5bc/0x610
> [1369713.688055]  [<ffffffffa01d0289>] ? xfs_bmbt_get_blockcount+0x9/0x20 [xfs]
> [1369713.688055]  [<ffffffffa01db470>] xfs_dir2_block_lookup+0x20/0xc0 [xfs]
> [1369713.688055]  [<ffffffffa01da7e5>] xfs_dir_lookup+0x195/0x1c0 [xfs]
> [1369713.688055]  [<ffffffffa020a6cb>] xfs_lookup+0x7b/0xe0 [xfs]
> [1369713.688055]  [<ffffffff802ff9f6>] ? __d_lookup+0x16/0x150
> [1369713.688055]  [<ffffffffa02159a1>] xfs_vn_lookup+0x51/0x90 [xfs]
> [1369713.688055]  [<ffffffff802f495e>] real_lookup+0xee/0x170
> [1369713.688055]  [<ffffffff802f4a90>] do_lookup+0xb0/0x110
> [1369713.688055]  [<ffffffff802f50fd>] __link_path_walk+0x60d/0xc20
> [1369713.688055]  [<ffffffff80305ef6>] ? mntput_no_expire+0x36/0x160
> [1369713.688055]  [<ffffffff802f5c4e>] path_walk+0x6e/0xe0
> [1369713.688055]  [<ffffffff802f5e63>] do_path_lookup+0xe3/0x200
> [1369713.688055]  [<ffffffff802f386a>] ? getname+0x4a/0xb0
> [1369713.688055]  [<ffffffff802f6cbb>] user_path_at+0x7b/0xb0
> [1369713.688055]  [<ffffffff802eda68>] ? cp_new_stat+0xe8/0x100
> [1369713.688055]  [<ffffffff802ede7d>] vfs_stat_fd+0x2d/0x60
> [1369713.688055]  [<ffffffff802edf5c>] sys_newstat+0x2c/0x50
> [1369713.688055]  [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
> [1369713.688055]
> [1369713.688055]
> [1369713.688055] Code: f8 31 c9 45 8b 13 4d 89 d8 44 89 d2 0f ca 89 d0
> 83 ea 01 48 c1 e0 03 49 29 c0 eb 07 8d 4b 01 39 ca 7c 1c 8d 1c 11 d1
> fb 48 63 fb <41> 8b 04 f8 0f c8 41 39 c5 74 26 77 e4 8d 53 ff 39 ca 7d
> e4 48
> [1369713.688055] RIP  [<ffffffffa01db24f>]
> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
> [1369713.688055]  RSP <ffff88007120ba28>
> [1369713.688055] CR2: ffff87f947da5088
> [1369714.057232] ---[ end trace 0735c8702d5e7899 ]---
> 
> What can I do to help getting this fixed?
> 
> Tom

can you make an xfs_metadump of the filesystem in question, and then try
an xfs_repair?  Capture/save the repair output.  If repair finds errors,
then perhaps the bug is triggered by bad error checking on a corrupted
image, and we might reproduce it w/ the metadump image.

It'd be nice if ubuntu had debug kernel variants (Fedora does this, I
dunno about ubuntu) - if you are hitting any kind of memory corruption
then a kernel with debug checks enabled might catch it sooner.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-03-11  4:30     ` Eric Sandeen
@ 2009-03-11 10:42       ` Thomas Gutzler
  2009-03-12  2:23         ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gutzler @ 2009-03-11 10:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 9849 bytes --]

Eric Sandeen wrote:
> Thomas Gutzler wrote:
>> Hi,
>>
>> a while ago I was having problems with the xfs module on ubuntu
>> feisty. I have then upgraded to 8.10 intrepid and am still getting the
>> occasional bug. 
> 
> although from the looks of it, a different one.  I'm curious, if you log
> these bug with Ubuntu, do they look into it?  It'd be nice to at least
> have initial triage to know for example where in the function it blew up.

I can see myself heading this way.

>> Here's the dmesg output:
>> [1369713.678092] BUG: unable to handle kernel paging request at ffff87f947da5088
>> [1369713.682882] IP: [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
> 
> Ok, so this looks like a different problem; at least a different oops.
> 
>> [1369713.687802] PGD 0
>> [1369713.688055] Oops: 0000 [1] SMP
>> [1369713.688055] CPU 1
>> [1369713.688055] Modules linked in: nls_cp437 cifs nfsd auth_rpcgss
>> exportfs wmi video output sbs sbshc pci
>> _slot container battery ac xt_tcpudp nf_conntrack_ipv4 xt_state
>> nf_conntrack nfs lockd nfs_acl sunrpc ipv6
>> iptable_filter ip_tables x_tables ext3 jbd mbcache cpufreq_userspace
>> cpufreq_stats cpufreq_powersave cpufre
>> q_ondemand cpufreq_conservative acpi_cpufreq freq_table sbp2
>> parport_pc lp parport loop evdev pcspkr iTCO_w
>> dt iTCO_vendor_support button shpchp pci_hotplug intel_agp xfs
>> pata_jmicron sd_mod crc_t10dif sg pata_acpi
>> ata_piix ohci1394 ieee1394 aacraid ata_generic ahci libata scsi_mod
>> e1000e dock uhci_hcd ehci_hcd usbcore t
>> hermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
>> [1369713.688055] Pid: 5278, comm: smbd Not tainted 2.6.27-11-server #1
>> [1369713.688055] RIP: 0010:[<ffffffffa01db24f>]  [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210
>> [xfs]
>> [1369713.688055] RSP: 0018:ffff88007120ba28  EFLAGS: 00010286
>> [1369713.688055] RAX: 00000005a072ded8 RBX: 00000000da072ded RCX:
>> 0000000000000000
>> [1369713.688055] RDX: 00000000b40e5bda RSI: ffff88007a474c48 RDI:
>> ffffffffda072ded
>> [1369713.688055] RBP: ffff88007120ba98 R08: ffff87fa77a0e120 R09:
>> 00000000ffffffff
>> [1369713.688055] R10: 00000000db5b0eb4 R11: ffff88001813bff8 R12:
>> ffff88007120bae8
>> [1369713.688055] R13: 000000000000002a R14: 0000000000000000 R15:
>> ffff88007120bb74
>> [1369713.688055] FS:  00007f79bf44e700(0000) GS:ffff88007f802880(0000)
>> knlGS:0000000000000000
>> [1369713.688055] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [1369713.688055] CR2: ffff87f947da5088 CR3: 0000000053e88000 CR4:
>> 00000000000006e0
>> [1369713.688055] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [1369713.688055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [1369713.688055] Process smbd (pid: 5278, threadinfo ffff88007120a000,
>> task ffff88000350ace0)
>> [1369713.688055] Stack:  ffff88007120bd68 ffffffff802fa04c
>> ffff88007120bab4 ffff88007120baa8
>> [1369713.688055]  ffff88001813b000 ffff88007acb3000 0000000000000000
>> ffffffffa01d0289
>> [1369713.688055]  ffff88007a474c48 ffff880035a2c8c0 ffff88007120bae8
>> ffff88007120bae8
>> [1369713.688055] Call Trace:
>> [1369713.688055]  [<ffffffff802fa04c>] ? do_select+0x5bc/0x610
>> [1369713.688055]  [<ffffffffa01d0289>] ? xfs_bmbt_get_blockcount+0x9/0x20 [xfs]
>> [1369713.688055]  [<ffffffffa01db470>] xfs_dir2_block_lookup+0x20/0xc0 [xfs]
>> [1369713.688055]  [<ffffffffa01da7e5>] xfs_dir_lookup+0x195/0x1c0 [xfs]
>> [1369713.688055]  [<ffffffffa020a6cb>] xfs_lookup+0x7b/0xe0 [xfs]
>> [1369713.688055]  [<ffffffff802ff9f6>] ? __d_lookup+0x16/0x150
>> [1369713.688055]  [<ffffffffa02159a1>] xfs_vn_lookup+0x51/0x90 [xfs]
>> [1369713.688055]  [<ffffffff802f495e>] real_lookup+0xee/0x170
>> [1369713.688055]  [<ffffffff802f4a90>] do_lookup+0xb0/0x110
>> [1369713.688055]  [<ffffffff802f50fd>] __link_path_walk+0x60d/0xc20
>> [1369713.688055]  [<ffffffff80305ef6>] ? mntput_no_expire+0x36/0x160
>> [1369713.688055]  [<ffffffff802f5c4e>] path_walk+0x6e/0xe0
>> [1369713.688055]  [<ffffffff802f5e63>] do_path_lookup+0xe3/0x200
>> [1369713.688055]  [<ffffffff802f386a>] ? getname+0x4a/0xb0
>> [1369713.688055]  [<ffffffff802f6cbb>] user_path_at+0x7b/0xb0
>> [1369713.688055]  [<ffffffff802eda68>] ? cp_new_stat+0xe8/0x100
>> [1369713.688055]  [<ffffffff802ede7d>] vfs_stat_fd+0x2d/0x60
>> [1369713.688055]  [<ffffffff802edf5c>] sys_newstat+0x2c/0x50
>> [1369713.688055]  [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
>> [1369713.688055]
>> [1369713.688055]
>> [1369713.688055] Code: f8 31 c9 45 8b 13 4d 89 d8 44 89 d2 0f ca 89 d0
>> 83 ea 01 48 c1 e0 03 49 29 c0 eb 07 8d 4b 01 39 ca 7c 1c 8d 1c 11 d1
>> fb 48 63 fb <41> 8b 04 f8 0f c8 41 39 c5 74 26 77 e4 8d 53 ff 39 ca 7d
>> e4 48
>> [1369713.688055] RIP  [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
>> [1369713.688055]  RSP <ffff88007120ba28>
>> [1369713.688055] CR2: ffff87f947da5088
>> [1369714.057232] ---[ end trace 0735c8702d5e7899 ]---
>>
>> What can I do to help getting this fixed?
>>
> can you make an xfs_metadump of the filesystem in question, and then try
> an xfs_repair?  Capture/save the repair output.  If repair finds errors,
> then perhaps the bug is triggered by bad error checking on a corrupted
> image, and we might reproduce it w/ the metadump image.

I tried...

root@io:~# xfs_metadump /dev/sda xfs_metadump_sda
*** glibc detected *** xfs_db: double free or corruption (out):
0x00000000017b8000 ***
======= Backtrace: =========
/lib/libc.so.6[0x7f4f10298a58]
/lib/libc.so.6(cfree+0x76)[0x7f4f1029b0a6]
xfs_db[0x416f53]
xfs_db[0x4189b9]
xfs_db[0x41b4db]
xfs_db[0x4186f5]
xfs_db[0x41b07e]
xfs_db[0x4186f5]
xfs_db[0x41aa4b]
xfs_db[0x415d03]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f4f1023d466]
xfs_db[0x4029d9]
======= Memory map: ========
00400000-00477000 r-xp 00000000 08:17 100768239
 /usr/sbin/xfs_db
00676000-00678000 rw-p 00076000 08:17 100768239
 /usr/sbin/xfs_db
00678000-00683000 rw-p 00678000 00:00 0
0176a000-017cc000 rw-p 0176a000 00:00 0
 [heap]
7f4f08000000-7f4f08021000 rw-p 7f4f08000000 00:00 0
7f4f08021000-7f4f0c000000 ---p 7f4f08021000 00:00 0
7f4f0fbc8000-7f4f0fbde000 r-xp 00000000 08:15 50382939
 /lib/libgcc_s.so.1
7f4f0fbde000-7f4f0fdde000 ---p 00016000 08:15 50382939
 /lib/libgcc_s.so.1
7f4f0fdde000-7f4f0fddf000 r--p 00016000 08:15 50382939
 /lib/libgcc_s.so.1
7f4f0fddf000-7f4f0fde0000 rw-p 00017000 08:15 50382939
 /lib/libgcc_s.so.1
7f4f0fde0000-7f4f0fde2000 r-xp 00000000 08:15 50712055
 /lib/libdl-2.8.90.so
7f4f0fde2000-7f4f0ffe2000 ---p 00002000 08:15 50712055
 /lib/libdl-2.8.90.so
7f4f0ffe2000-7f4f0ffe3000 r--p 00002000 08:15 50712055
 /lib/libdl-2.8.90.so
7f4f0ffe3000-7f4f0ffe4000 rw-p 00003000 08:15 50712055
 /lib/libdl-2.8.90.so
7f4f0ffe4000-7f4f1001b000 r-xp 00000000 08:15 50331947
 /lib/libncurses.so.5.6
7f4f1001b000-7f4f1021a000 ---p 00037000 08:15 50331947
 /lib/libncurses.so.5.6
7f4f1021a000-7f4f1021f000 rw-p 00036000 08:15 50331947
 /lib/libncurses.so.5.6
7f4f1021f000-7f4f10388000 r-xp 00000000 08:15 50712052
 /lib/libc-2.8.90.so
7f4f10388000-7f4f10587000 ---p 00169000 08:15 50712052
 /lib/libc-2.8.90.so
7f4f10587000-7f4f1058b000 r--p 00168000 08:15 50712052
 /lib/libc-2.8.90.so
7f4f1058b000-7f4f1058c000 rw-p 0016c000 08:15 50712052
 /lib/libc-2.8.90.so
7f4f1058c000-7f4f10591000 rw-p 7f4f1058c000 00:00 0
7f4f10591000-7f4f105c8000 r-xp 00000000 08:15 50363305
 /lib/libreadline.so.5.2
7f4f105c8000-7f4f107c8000 ---p 00037000 08:15 50363305
 /lib/libreadline.so.5.2
7f4f107c8000-7f4f107d0000 rw-p 00037000 08:15 50363305
 /lib/libreadline.so.5.2
7f4f107d0000-7f4f107d1000 rw-p 7f4f107d0000 00:00 0
7f4f107d1000-7f4f107e8000 r-xp 00000000 08:15 50630786
 /lib/libpthread-2.8.90.so
7f4f107e8000-7f4f109e7000 ---p 00017000 08:15 50630786
 /lib/libpthread-2.8.90.so
7f4f109e7000-7f4f109e8000 r--p 00016000 08:15 50630786
 /lib/libpthread-2.8.90.so
7f4f109e8000-7f4f109e9000 rw-p 00017000 08:15 50630786
 /lib/libpthread-2.8.90.so
7f4f109e9000-7f4f109ed000 rw-p 7f4f109e9000 00:00 0
7f4f109ed000-7f4f109f5000 r-xp 00000000 08:15 50630788
 /lib/librt-2.8.90.so
7f4f109f5000-7f4f10bf4000 ---p 00008000 08:15 50630788
 /lib/librt-2.8.90.so
7f4f10bf4000-7f4f10bf5000 r--p 00007000 08:15 50630788
 /lib/librt-2.8.90.so
7f4f10bf5000-7f4f10bf6000 rw-p 00008000 08:15 50630788
 /lib/librt-2.8.90.so
7f4f10bf6000-7f4f10bf9000 r-xp 00000000 08:15 50331815
 /lib/libuuid.so.1.2
7f4f10bf9000-7f4f10df9000 ---p 00003000 08:15 50331815
 /lib/libuuid.so.1.2
7f4f10df9000-7f4f10dfa000 r--p 00003000 08:15 50331815
 /lib/libuuid.so.1.2
7f4f10dfa000-7f4f10dfb000 rw-p 00004000 08:15 50331815
 /lib/libuuid.so.1.2
7f4f10dfb000-7f4f10e1a000 r-xp 00000000 08:15 50712049
 /lib/ld-2.8.90.so
7f4f10fcc000-7f4f11011000 rw-p 7f4f10fcc000 00:00 0
7f4f11015000-7f4f11019000 rw-p 7f4f11015000 00:00 0
7f4f11019000-7f4f1101a000 r--p 0001e000 08:15 50712049
 /lib/ld-2.8.90.so
7f4f1101a000-7f4f1101b000 rw-p 0001f000 08:15 50712049
 /lib/ld-2.8.90.so
7fff19006000-7fff1901b000 rw-p 7ffffffea000 00:00 0
 [stack]
7fff191ff000-7fff19200000 r-xp 7fff191ff000 00:00 0
 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
 [vsyscall]
Aborted

and tried again:
root@io:~# xfs_metadump -w /dev/sda xfs_metadump_sda
root@io:~# ll xfs_metadump_sda
-rw-r--r-- 1 root root 588407296 2009-03-11 19:09 xfs_metadump_sda
(where do I upload this to?)

xfs_repair fixed "bad names" of 4 inodes (see attached log file)

another xfs_metadump /dev/sda xfs_metadump_sda_2 (without -w):
-rw-r--r--  1 root root 588261888 2009-03-11 19:23 xfs_metadump_sda_2

> It'd be nice if ubuntu had debug kernel variants (Fedora does this, I
> dunno about ubuntu) - if you are hitting any kind of memory corruption
> then a kernel with debug checks enabled might catch it sooner.

I haven't seen any - probably have to build my own debug kernel. Oh joy.

Is the metadump of any use?

Tom

[-- Attachment #2: xfs_repair.log --]
[-- Type: text/plain, Size: 5821 bytes --]

Script started on Wed 11 Mar 2009 19:12:31 WST
root@io:~# xfs_repair -v /dev/sda 
Phase 1 - find and verify superblock...
        - block cache size set to 126344 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 95447 tail block 95447
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
attribute entry 0 in attr block 0, inode 536928896 has bad name (namelen = 0)
problem with attribute contents in inode 536928896
clearing inode 536928896 attributes
correcting nblocks for inode 536928896, was 1 - counted 0
attribute entry 0 in attr block 0, inode 538095920 has bad name (namelen = 0)
problem with attribute contents in inode 538095920
clearing inode 538095920 attributes
correcting nblocks for inode 538095920, was 1 - counted 0
        - agno = 2
attribute entry 0 in attr block 0, inode 1075035709 has bad name (namelen = 0)
problem with attribute contents in inode 1075035709
clearing inode 1075035709 attributes
correcting nblocks for inode 1075035709, was 1 - counted 0
        - agno = 3
        - agno = 4
        - agno = 5
attribute entry 0 in attr block 0, inode 2684371715 has bad name (namelen = 0)
problem with attribute contents in inode 2684371715
clearing inode 2684371715 attributes
correcting nblocks for inode 2684371715, was 1 - counted 0
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
bad attribute format 1 in inode 536928896, resetting value
bad attribute format 1 in inode 538095920, resetting value
        - agno = 2
        - agno = 3
bad attribute format 1 in inode 1075035709, resetting value
        - agno = 4
        - agno = 5
bad attribute format 1 in inode 2684371715, resetting value
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
resetting inode 15584 nlinks from 2 to 7

        XFS_REPAIR Summary    Wed Mar 11 19:14:51 2009

Phase		Start		End		Duration
Phase 1:	03/11 19:14:13	03/11 19:14:13	
Phase 2:	03/11 19:14:13	03/11 19:14:17	4 seconds
Phase 3:	03/11 19:14:17	03/11 19:14:46	29 seconds
Phase 4:	03/11 19:14:46	03/11 19:14:48	2 seconds
Phase 5:	03/11 19:14:48	03/11 19:14:48	
Phase 6:	03/11 19:14:48	03/11 19:14:50	2 seconds
Phase 7:	03/11 19:14:50	03/11 19:14:50	

Total run time: 37 seconds
done
root@io:~# exit

Script done on Wed 11 Mar 2009 19:15:33 WST

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-03-11 10:42       ` Thomas Gutzler
@ 2009-03-12  2:23         ` Eric Sandeen
  2009-03-12  5:06           ` Thomas Gutzler
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-03-12  2:23 UTC (permalink / raw)
  To: Thomas Gutzler; +Cc: xfs

Thomas Gutzler wrote:
> Eric Sandeen wrote:
>> Thomas Gutzler wrote:
...

>>> What can I do to help getting this fixed?
>>>
>> can you make an xfs_metadump of the filesystem in question, and then try
>> an xfs_repair?  Capture/save the repair output.  If repair finds errors,
>> then perhaps the bug is triggered by bad error checking on a corrupted
>> image, and we might reproduce it w/ the metadump image.
> 
> I tried...
> 
> root@io:~# xfs_metadump /dev/sda xfs_metadump_sda
> *** glibc detected *** xfs_db: double free or corruption (out):
> 0x00000000017b8000 ***

...

> Aborted

:(  what version of xfsprogs?

> and tried again:
> root@io:~# xfs_metadump -w /dev/sda xfs_metadump_sda
> root@io:~# ll xfs_metadump_sda
> -rw-r--r-- 1 root root 588407296 2009-03-11 19:09 xfs_metadump_sda
> (where do I upload this to?)

You can bzip2 it and probably shrink it pretty well (it's sparse).  See
how big that is, and we can find a place for it.

> xfs_repair fixed "bad names" of 4 inodes (see attached log file)
> 
> another xfs_metadump /dev/sda xfs_metadump_sda_2 (without -w):
> -rw-r--r--  1 root root 588261888 2009-03-11 19:23 xfs_metadump_sda_2
> 
>> It'd be nice if ubuntu had debug kernel variants (Fedora does this, I
>> dunno about ubuntu) - if you are hitting any kind of memory corruption
>> then a kernel with debug checks enabled might catch it sooner.
> 
> I haven't seen any - probably have to build my own debug kernel. Oh joy.
> 
> Is the metadump of any use?

it might be, let's see how well it shrinks.

-Eric

> Tom
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Corruption of in-memory data detected
  2009-03-12  2:23         ` Eric Sandeen
@ 2009-03-12  5:06           ` Thomas Gutzler
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gutzler @ 2009-03-12  5:06 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:
> Thomas Gutzler wrote:
>>
>> root@io:~# xfs_metadump /dev/sda xfs_metadump_sda
>> *** glibc detected *** xfs_db: double free or corruption (out):
>> 0x00000000017b8000 ***
> 
> ...
> 
>> Aborted
> 
> :(  what version of xfsprogs?

xfsprogs 2.9.8-1
most recent version that comes with ubuntu intrepid.

>> and tried again:
>> root@io:~# xfs_metadump -w /dev/sda xfs_metadump_sda
>> root@io:~# ll xfs_metadump_sda
>> -rw-r--r-- 1 root root 588407296 2009-03-11 19:09 xfs_metadump_sda
>> (where do I upload this to?)
> 
> You can bzip2 it and probably shrink it pretty well (it's sparse).  See
> how big that is, and we can find a place for it.

87M     xfs_metadump_sda_2.bz2
109M    xfs_metadump_sda.bz2

The filesystem has 3.1TB of data on it.
I can put that on our webserver for download or put it somewhere - let
me know what's best.

Tom

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* corruption of in-memory data detected
@ 2014-07-01  6:44 Alexandru Cardaniuc
  2014-07-01  7:02 ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandru Cardaniuc @ 2014-07-01  6:44 UTC (permalink / raw)
  To: xfs; +Cc: Alexandru Cardaniuc


[-- Attachment #1.1: Type: text/plain, Size: 3752 bytes --]

Hi All,

I am having an issue with an XFS filesystem shutting down under high load
with very many small files.
Basically, I have around 3.5 - 4 million files on this filesystem. New
files are being written to the FS all the time, until I get to 9-11 mln
small files (35k on average).

at some point I get the following in dmesg:

[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at
line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8826bb7d
[2870477.695558]
[2870477.695559] Call Trace:
[2870477.695611]  [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
[2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
[2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
[2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
[2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
[2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
[2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
[2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
[2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
[2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
[2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
[2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
[2870477.695977]
[2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file
fs/xfs/xfs_trans.c.  Return address = 0xffffffff88262c46
[2870477.696452] Filesystem "sda5": Corruption of in-memory data detected.
Shutting down filesystem: sda5
[2870477.696464] Please umount the filesystem, and rectify the problem(s)

# ls -l /store
ls: /store: Input/output error
?--------- 0 root root 0 Jan  1  1970 /store

Filesystems is ~1T in size
# df -hT /store
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda5      xfs    910G  142G  769G  16% /store


Using CentOS 5.9 with kernel 2.6.18-348.el5xen

The filesystem is in a virtual machine (Xen) and on top of LVM.

Filesystem was created using mkfs.xfs defaults with
xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by
default.)

These are the defaults with which the filesystem was created:
# xfs_info /store
meta-data=/dev/sda5              isize=256    agcount=32, agsize=7454720
blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=238551040, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0


The problem is reproducible and I don't think it's hardware related. The
problem was reproduced on multiple servers of the same type. So, I doubt
it's a memory issue or something like that.


Is that a known issue? If it is then what's the fix? I went through the
kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs
related fixes since CentOS 5.9

Any help will be greatly appreciated...

-- 
Sincerely yours,
Alexandru Cardaniuc

[-- Attachment #1.2: Type: text/html, Size: 4684 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: corruption of in-memory data detected
  2014-07-01  6:44 corruption " Alexandru Cardaniuc
@ 2014-07-01  7:02 ` Dave Chinner
  2014-07-01  8:29   ` Alexandru Cardaniuc
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2014-07-01  7:02 UTC (permalink / raw)
  To: Alexandru Cardaniuc; +Cc: xfs

On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc wrote:
> Hi All,
> 
> I am having an issue with an XFS filesystem shutting down under high load
> with very many small files.
> Basically, I have around 3.5 - 4 million files on this filesystem. New
> files are being written to the FS all the time, until I get to 9-11 mln
> small files (35k on average).
> 
> at some point I get the following in dmesg:
> 
> [2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at
> line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8826bb7d
> [2870477.695558]
> [2870477.695559] Call Trace:
> [2870477.695611]  [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
> [2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
> [2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
> [2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
> [2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
> [2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
> [2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
> [2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
> [2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
> [2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
> [2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
> [2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
> [2870477.695977]
> [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file
> fs/xfs/xfs_trans.c.  Return address = 0xffffffff88262c46
> [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected.
> Shutting down filesystem: sda5
> [2870477.696464] Please umount the filesystem, and rectify the problem(s)

You've probably fragmented free space to the point where inodes
cannot be allocated anymore, and then it's shutdown because it got
enospc with a dirty inode allocation transaction.

xfs_db -c "freespc -s" <dev>

should tell us whether this is the case or not.

> Using CentOS 5.9 with kernel 2.6.18-348.el5xen

The "enospc with dirty transaction" shutdown bugs have been fixed in
more recent kernels than RHEL5.

> The problem is reproducible and I don't think it's hardware related. The
> problem was reproduced on multiple servers of the same type. So, I doubt
> it's a memory issue or something like that.

Nope, it's not hardware, it's buggy software that has been fixed in
the years since 2.6.18....

> Is that a known issue?

Yes.

> If it is then what's the fix?

If you've fragmented free space, then your ony options are:

	- dump/mkfs/restore
	- remove a large number of files from the filesystem so free
	  space defragments.

If you simply want to avoid the shutdown, then upgrade to a more
recent kernel (3.x of some kind) where all the known issues have
been fixed.

> I went through the
> kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs
> related fixes since CentOS 5.9

That's something you need to talk to your distro maintainers
about....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: corruption of in-memory data detected
  2014-07-01  7:02 ` Dave Chinner
@ 2014-07-01  8:29   ` Alexandru Cardaniuc
  2014-07-01  9:38     ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandru Cardaniuc @ 2014-07-01  8:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Dave Chinner <david@fromorbit.com> writes:

> On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc wrote:
>> Hi All,
 
>> I am having an issue with an XFS filesystem shutting down under high
>> load with very many small files. Basically, I have around 3.5 - 4
>> million files on this filesystem. New files are being written to the
>> FS all the time, until I get to 9-11 mln small files (35k on
>> average).
>> 
>> at some point I get the following in dmesg:
>> 
>> [2870477.695512] Filesystem "sda5": XFS internal error
>> xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller
>> 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace:
>> [2870477.695611] [<ffffffff88262c28>]
>> :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643]
>> [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673]
>> [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707]
>> [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764]
>> [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791]
>> [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803]
>> [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814]
>> [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861]
>> [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887]
>> [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923]
>> [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933]
>> [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953]
>> [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963]
>> [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977]
>> [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
>> of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46
>> [2870477.696452] Filesystem "sda5": Corruption of in-memory data
>> detected. Shutting down filesystem: sda5 [2870477.696464] Please
>> umount the filesystem, and rectify the problem(s)
>
> You've probably fragmented free space to the point where inodes cannot
> be allocated anymore, and then it's shutdown because it got enospc
> with a dirty inode allocation transaction.

> xfs_db -c "freespc -s" <dev>

> should tell us whether this is the case or not.

This is what I have

#  xfs_db -c "freesp -s" /dev/sda5
   from      to extents  blocks    pct
      1       1     657     657   0.00
      2       3     264     607   0.00
      4       7      29     124   0.00
      8      15      13     143   0.00
     16      31      41     752   0.00
     32      63       8     293   0.00
     64     127      12    1032   0.00
    128     255       8    1565   0.00
    256     511      10    4044   0.00
    512    1023       7    5750   0.00
   1024    2047      10   16061   0.01
   2048    4095       5   16948   0.01
   4096    8191       7   43312   0.02
   8192   16383       9  115578   0.06
  16384   32767       6  159576   0.08
  32768   65535       3  104586   0.05
 262144  524287       1  507710   0.25
4194304 7454720      28 200755934  99.51
total free extents 1118
total free blocks 201734672
average free extent size 180442



>> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>
> The "enospc with dirty transaction" shutdown bugs have been fixed in
> more recent kernels than RHEL5.

These fixes were not backported to RHEL5 kernels?

>> The problem is reproducible and I don't think it's hardware related.
>> The problem was reproduced on multiple servers of the same type. So,
>> I doubt it's a memory issue or something like that.

> Nope, it's not hardware, it's buggy software that has been fixed in
> the years since 2.6.18....

I would hope these fixes would be backported to RHEL5 (CentOS 5) kernels...

>> Is that a known issue?

> Yes.

Well at least that's a good thing :)

>> If it is then what's the fix?

> If you've fragmented free space, then your ony options are:

> 	- dump/mkfs/restore - remove a large number of files from the
> filesystem so free space defragments.

That wouldn't be fixed automagically using xfs_repair, wouldn't it?

> If you simply want to avoid the shutdown, then upgrade to a more
> recent kernel (3.x of some kind) where all the known issues have been
> fixed.

How about 2.6.32? That's the kernel that comes with RHEL 6.x

>> I went through the kernel updates for CentOS 5.10 (newer kernel),
>> but didn't see any xfs related fixes since CentOS 5.9

> That's something you need to talk to your distro maintainers about....

I was worried you gonna say that :)

What are my options at this point? Am I correct to assume that the issue
is related to the load and if I manage to decrease the load, the issue
is not going to reproduce itself? We have been using XFS on RHEL 5
kernels for years and didn't see this issue. Now, the issue happens
consistently, but seems to be related to high load...

We have hundreds of these servers deployed in production right now, so
some way to address the current situation would be very welcomed.

thanks for help :)

> Cheers,
> Dave.

-- 
"Every problem that I solved became a rule which served afterwards to
solve other problems."  
- Descartes

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: corruption of in-memory data detected
  2014-07-01  8:29   ` Alexandru Cardaniuc
@ 2014-07-01  9:38     ` Dave Chinner
  2014-07-01 20:13       ` Alexandru Cardaniuc
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2014-07-01  9:38 UTC (permalink / raw)
  To: Alexandru Cardaniuc; +Cc: xfs

On Tue, Jul 01, 2014 at 01:29:35AM -0700, Alexandru Cardaniuc wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc wrote:
> >> Hi All,
>  
> >> I am having an issue with an XFS filesystem shutting down under high
> >> load with very many small files. Basically, I have around 3.5 - 4
> >> million files on this filesystem. New files are being written to the
> >> FS all the time, until I get to 9-11 mln small files (35k on
> >> average).
....
> > You've probably fragmented free space to the point where inodes cannot
> > be allocated anymore, and then it's shutdown because it got enospc
> > with a dirty inode allocation transaction.
> 
> > xfs_db -c "freespc -s" <dev>
> 
> > should tell us whether this is the case or not.
> 
> This is what I have
> 
> #  xfs_db -c "freesp -s" /dev/sda5
>    from      to extents  blocks    pct
>       1       1     657     657   0.00
>       2       3     264     607   0.00
>       4       7      29     124   0.00
>       8      15      13     143   0.00
>      16      31      41     752   0.00
>      32      63       8     293   0.00
>      64     127      12    1032   0.00
>     128     255       8    1565   0.00
>     256     511      10    4044   0.00
>     512    1023       7    5750   0.00
>    1024    2047      10   16061   0.01
>    2048    4095       5   16948   0.01
>    4096    8191       7   43312   0.02
>    8192   16383       9  115578   0.06
>   16384   32767       6  159576   0.08
>   32768   65535       3  104586   0.05
>  262144  524287       1  507710   0.25
> 4194304 7454720      28 200755934  99.51
> total free extents 1118
> total free blocks 201734672
> average free extent size 180442

So it's not freespace fragmentation, but that was just the most
likely cause. Most likely it's a transient condition where an AG is
out of space but in determining that condition the AGF was
modified. We've fixed several bugs in that area over the past few
years....

> >> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
> >
> > The "enospc with dirty transaction" shutdown bugs have been fixed in
> > more recent kernels than RHEL5.
> 
> These fixes were not backported to RHEL5 kernels?

No.

> >> The problem is reproducible and I don't think it's hardware related.
> >> The problem was reproduced on multiple servers of the same type. So,
> >> I doubt it's a memory issue or something like that.
> 
> > Nope, it's not hardware, it's buggy software that has been fixed in
> > the years since 2.6.18....
> 
> I would hope these fixes would be backported to RHEL5 (CentOS 5) kernels...

TANSTAAFL.

> > If you've fragmented free space, then your ony options are:
> 
> > 	- dump/mkfs/restore - remove a large number of files from the
> > filesystem so free space defragments.
> 
> That wouldn't be fixed automagically using xfs_repair, wouldn't it?

No.

> > If you simply want to avoid the shutdown, then upgrade to a more
> > recent kernel (3.x of some kind) where all the known issues have been
> > fixed.
> 
> How about 2.6.32? That's the kernel that comes with RHEL 6.x

It might, but I don't know the exact root cause of your problem so I
couldn't say for sure.

> >> I went through the kernel updates for CentOS 5.10 (newer kernel),
> >> but didn't see any xfs related fixes since CentOS 5.9
> 
> > That's something you need to talk to your distro maintainers about....
> 
> I was worried you gonna say that :)

Theres only so much that upstream can do to support heavily patched,
6 year old distro kernels.

> What are my options at this point? Am I correct to assume that the issue
> is related to the load and if I manage to decrease the load, the issue
> is not going to reproduce itself?

It's more likely related to the layout of data and metadata on disk.

> We have been using XFS on RHEL 5
> kernels for years and didn't see this issue. Now, the issue happens
> consistently, but seems to be related to high load...

There are several different potential causes - high load just
iterates the problem space faster.

> We have hundreds of these servers deployed in production right now, so
> some way to address the current situation would be very welcomed.

I'd suggest talking to Red Hat about what they can do to help you,
especially as CentOS is a now RH distro....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: corruption of in-memory data detected
  2014-07-01  9:38     ` Dave Chinner
@ 2014-07-01 20:13       ` Alexandru Cardaniuc
  2014-07-01 21:43         ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandru Cardaniuc @ 2014-07-01 20:13 UTC (permalink / raw)
  To: xfs

Dave Chinner <david@fromorbit.com> writes:

> On Tue, Jul 01, 2014 at 01:29:35AM -0700, Alexandru Cardaniuc wrote:
>> Dave Chinner <david@fromorbit.com> writes:
>> 
>> > On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc
>> > wrote:
>> >> Hi All,
>>
>> >> I am having an issue with an XFS filesystem shutting down under
>> >> high load with very many small files. Basically, I have around
>> >> 3.5 - 4 million files on this filesystem. New files are being
>> >> written to the FS all the time, until I get to 9-11 mln small
>> >> files (35k on average).
> ....
>> > You've probably fragmented free space to the point where inodes
>> > cannot be allocated anymore, and then it's shutdown because it got
>> > enospc with a dirty inode allocation transaction.
>>
>> > xfs_db -c "freespc -s" <dev>
>>
>> > should tell us whether this is the case or not.
>>  This is what I have
>> 
>> # xfs_db -c "freesp -s" /dev/sda5 from to extents blocks pct 1 1 657
>> 657 0.00 2 3 264 607 0.00 4 7 29 124 0.00 8 15 13 143 0.00 16 31 41
>> 752 0.00 32 63 8 293 0.00 64 127 12 1032 0.00 128 255 8 1565 0.00
>> 256 511 10 4044 0.00 512 1023 7 5750 0.00 1024 2047 10 16061 0.01
>> 2048 4095 5 16948 0.01 4096 8191 7 43312 0.02 8192 16383 9 115578
>> 0.06 16384 32767 6 159576 0.08 32768 65535 3 104586 0.05 262144
>> 524287 1 507710 0.25 4194304 7454720 28 200755934 99.51 total free
>> extents 1118 total free blocks 201734672 average free extent size
>> 180442
>
> So it's not freespace fragmentation, but that was just the most likely
> cause. Most likely it's a transient condition where an AG is out of
> space but in determining that condition the AGF was modified. We've
> fixed several bugs in that area over the past few years....

I still have the FS available. Any other information I can assemble to
help you identify the issue?

>> >> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>> > The "enospc with dirty transaction" shutdown bugs have been fixed
>> > in more recent kernels than RHEL5.
>>  These fixes were not backported to RHEL5 kernels?

> No.

I assume I wouldn't just be able to take the source for XFS kernel module
and compile it against the 2.6.18 kernel in CentOS 5.x?

>> >> The problem is reproducible and I don't think it's hardware
>> >> related. The problem was reproduced on multiple servers of the
>> >> same type. So, I doubt it's a memory issue or something like
>> >> that.
>>
>> > Nope, it's not hardware, it's buggy software that has been fixed
>> > in the years since 2.6.18....
>>  I would hope these fixes would be backported to RHEL5 (CentOS 5)
>> kernels...
>
> TANSTAAFL.

>> > If you've fragmented free space, then your ony options are:
>>
>> > 	- dump/mkfs/restore - remove a large number of files from the
>> > filesystem so free space defragments.
>>  That wouldn't be fixed automagically using xfs_repair, wouldn't it?

> No.

>> > If you simply want to avoid the shutdown, then upgrade to a more
>> > recent kernel (3.x of some kind) where all the known issues have
>> > been fixed.
>>  How about 2.6.32? That's the kernel that comes with RHEL 6.x
>
> It might, but I don't know the exact root cause of your problem so I
> couldn't say for sure.

>> >> I went through the kernel updates for CentOS 5.10 (newer kernel),
>> >> but didn't see any xfs related fixes since CentOS 5.9
>>
>> > That's something you need to talk to your distro maintainers
>> > about....
>>  I was worried you gonna say that :)
>
> Theres only so much that upstream can do to support heavily patched, 6
> year old distro kernels.

>> What are my options at this point? Am I correct to assume that the
>> issue is related to the load and if I manage to decrease the load,
>> the issue is not going to reproduce itself?

> It's more likely related to the layout of data and metadata on disk.



>> We have been using XFS on RHEL 5 kernels for years and didn't see
>> this issue. Now, the issue happens consistently, but seems to be
>> related to high load...

> There are several different potential causes - high load just iterates
> the problem space faster.

>> We have hundreds of these servers deployed in production right now,
>> so some way to address the current situation would be very welcomed.

> I'd suggest talking to Red Hat about what they can do to help you,
> especially as CentOS is a now RH distro....

I will try that. Thanks.

-- 
"It's very well to be thrifty, but don't amass a hoard of regrets."
- Charles D'Orleans

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: corruption of in-memory data detected
  2014-07-01 20:13       ` Alexandru Cardaniuc
@ 2014-07-01 21:43         ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2014-07-01 21:43 UTC (permalink / raw)
  To: Alexandru Cardaniuc; +Cc: xfs

On Tue, Jul 01, 2014 at 01:13:19PM -0700, Alexandru Cardaniuc wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Tue, Jul 01, 2014 at 01:29:35AM -0700, Alexandru Cardaniuc wrote:
> >> Dave Chinner <david@fromorbit.com> writes:
> >> 
> >> > On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc
> >> > wrote:
> >> >> Hi All,
> >>
> >> >> I am having an issue with an XFS filesystem shutting down under
> >> >> high load with very many small files. Basically, I have around
> >> >> 3.5 - 4 million files on this filesystem. New files are being
> >> >> written to the FS all the time, until I get to 9-11 mln small
> >> >> files (35k on average).
> > ....
> >> > You've probably fragmented free space to the point where inodes
> >> > cannot be allocated anymore, and then it's shutdown because it got
> >> > enospc with a dirty inode allocation transaction.
> >>
> >> > xfs_db -c "freespc -s" <dev>
> >>
> >> > should tell us whether this is the case or not.
> >>  This is what I have
> >> 
> >> # xfs_db -c "freesp -s" /dev/sda5 from to extents blocks pct 1 1 657
> >> 657 0.00 2 3 264 607 0.00 4 7 29 124 0.00 8 15 13 143 0.00 16 31 41
> >> 752 0.00 32 63 8 293 0.00 64 127 12 1032 0.00 128 255 8 1565 0.00
> >> 256 511 10 4044 0.00 512 1023 7 5750 0.00 1024 2047 10 16061 0.01
> >> 2048 4095 5 16948 0.01 4096 8191 7 43312 0.02 8192 16383 9 115578
> >> 0.06 16384 32767 6 159576 0.08 32768 65535 3 104586 0.05 262144
> >> 524287 1 507710 0.25 4194304 7454720 28 200755934 99.51 total free
> >> extents 1118 total free blocks 201734672 average free extent size
> >> 180442
> >
> > So it's not freespace fragmentation, but that was just the most likely
> > cause. Most likely it's a transient condition where an AG is out of
> > space but in determining that condition the AGF was modified. We've
> > fixed several bugs in that area over the past few years....
> 
> I still have the FS available. Any other information I can assemble to
> help you identify the issue?

Not really. Historically the only way to work out the exact problem
causing the transaction failure is to be able to reproduce the
problem on demand in a controlled environment.  Unfortunately, I
don't scale to doing this for everyone who has a problem because it
can take days to build an equivalent environment and reproduce the
problem and refine it to something that can be used to debug the
issue.

However, if you can reproduce the problem on a current upstream
kernel with a metadump image and a script that runs on the image,
then I'll definitely look at it.  i.e. if you can make it 5 minutes
work for me to reproduce the problem on an upstream kernel, then
I should be able find and solve the problem pretty quickly.

> >> >> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
> >> > The "enospc with dirty transaction" shutdown bugs have been fixed
> >> > in more recent kernels than RHEL5.
> >>  These fixes were not backported to RHEL5 kernels?
> 
> > No.
> 
> I assume I wouldn't just be able to take the source for XFS kernel module
> and compile it against the 2.6.18 kernel in CentOS 5.x?

You could try, but you'd only be digging a deeper hole.  Triaging
and solving a bug like this bug is a walk in the park compared to
the issues that typically arise(*) during large scale backports to
older kernels.

Cheers,

Dave.

(*) speaking as a RH engineer who has done multiple large (several
hundred commit) XFS backports for RHEL.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-07-01 21:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-02  2:46 Corruption of in-memory data detected Thomas Gutzler
2009-01-02  3:24 ` Eric Sandeen
2009-03-11  2:44   ` Thomas Gutzler
2009-03-11  4:30     ` Eric Sandeen
2009-03-11 10:42       ` Thomas Gutzler
2009-03-12  2:23         ` Eric Sandeen
2009-03-12  5:06           ` Thomas Gutzler
  -- strict thread matches above, loose matches on Subject: below --
2014-07-01  6:44 corruption " Alexandru Cardaniuc
2014-07-01  7:02 ` Dave Chinner
2014-07-01  8:29   ` Alexandru Cardaniuc
2014-07-01  9:38     ` Dave Chinner
2014-07-01 20:13       ` Alexandru Cardaniuc
2014-07-01 21:43         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox