From: Thomas Gutzler <thomas.gutzler@gmail.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: xfs@oss.sgi.com
Subject: Re: Corruption of in-memory data detected
Date: Wed, 11 Mar 2009 19:42:14 +0900 [thread overview]
Message-ID: <49B79586.5060801@gmail.com> (raw)
In-Reply-To: <49B73E50.5080906@sandeen.net>
[-- Attachment #1: Type: text/plain, Size: 9849 bytes --]
Eric Sandeen wrote:
> Thomas Gutzler wrote:
>> Hi,
>>
>> a while ago I was having problems with the xfs module on ubuntu
>> feisty. I have then upgraded to 8.10 intrepid and am still getting the
>> occasional bug.
>
> although from the looks of it, a different one. I'm curious, if you log
> these bug with Ubuntu, do they look into it? It'd be nice to at least
> have initial triage to know for example where in the function it blew up.
I can see myself heading this way.
>> Here's the dmesg output:
>> [1369713.678092] BUG: unable to handle kernel paging request at ffff87f947da5088
>> [1369713.682882] IP: [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
>
> Ok, so this looks like a different problem; at least a different oops.
>
>> [1369713.687802] PGD 0
>> [1369713.688055] Oops: 0000 [1] SMP
>> [1369713.688055] CPU 1
>> [1369713.688055] Modules linked in: nls_cp437 cifs nfsd auth_rpcgss
>> exportfs wmi video output sbs sbshc pci
>> _slot container battery ac xt_tcpudp nf_conntrack_ipv4 xt_state
>> nf_conntrack nfs lockd nfs_acl sunrpc ipv6
>> iptable_filter ip_tables x_tables ext3 jbd mbcache cpufreq_userspace
>> cpufreq_stats cpufreq_powersave cpufre
>> q_ondemand cpufreq_conservative acpi_cpufreq freq_table sbp2
>> parport_pc lp parport loop evdev pcspkr iTCO_w
>> dt iTCO_vendor_support button shpchp pci_hotplug intel_agp xfs
>> pata_jmicron sd_mod crc_t10dif sg pata_acpi
>> ata_piix ohci1394 ieee1394 aacraid ata_generic ahci libata scsi_mod
>> e1000e dock uhci_hcd ehci_hcd usbcore t
>> hermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
>> [1369713.688055] Pid: 5278, comm: smbd Not tainted 2.6.27-11-server #1
>> [1369713.688055] RIP: 0010:[<ffffffffa01db24f>] [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210
>> [xfs]
>> [1369713.688055] RSP: 0018:ffff88007120ba28 EFLAGS: 00010286
>> [1369713.688055] RAX: 00000005a072ded8 RBX: 00000000da072ded RCX:
>> 0000000000000000
>> [1369713.688055] RDX: 00000000b40e5bda RSI: ffff88007a474c48 RDI:
>> ffffffffda072ded
>> [1369713.688055] RBP: ffff88007120ba98 R08: ffff87fa77a0e120 R09:
>> 00000000ffffffff
>> [1369713.688055] R10: 00000000db5b0eb4 R11: ffff88001813bff8 R12:
>> ffff88007120bae8
>> [1369713.688055] R13: 000000000000002a R14: 0000000000000000 R15:
>> ffff88007120bb74
>> [1369713.688055] FS: 00007f79bf44e700(0000) GS:ffff88007f802880(0000)
>> knlGS:0000000000000000
>> [1369713.688055] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [1369713.688055] CR2: ffff87f947da5088 CR3: 0000000053e88000 CR4:
>> 00000000000006e0
>> [1369713.688055] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [1369713.688055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [1369713.688055] Process smbd (pid: 5278, threadinfo ffff88007120a000,
>> task ffff88000350ace0)
>> [1369713.688055] Stack: ffff88007120bd68 ffffffff802fa04c
>> ffff88007120bab4 ffff88007120baa8
>> [1369713.688055] ffff88001813b000 ffff88007acb3000 0000000000000000
>> ffffffffa01d0289
>> [1369713.688055] ffff88007a474c48 ffff880035a2c8c0 ffff88007120bae8
>> ffff88007120bae8
>> [1369713.688055] Call Trace:
>> [1369713.688055] [<ffffffff802fa04c>] ? do_select+0x5bc/0x610
>> [1369713.688055] [<ffffffffa01d0289>] ? xfs_bmbt_get_blockcount+0x9/0x20 [xfs]
>> [1369713.688055] [<ffffffffa01db470>] xfs_dir2_block_lookup+0x20/0xc0 [xfs]
>> [1369713.688055] [<ffffffffa01da7e5>] xfs_dir_lookup+0x195/0x1c0 [xfs]
>> [1369713.688055] [<ffffffffa020a6cb>] xfs_lookup+0x7b/0xe0 [xfs]
>> [1369713.688055] [<ffffffff802ff9f6>] ? __d_lookup+0x16/0x150
>> [1369713.688055] [<ffffffffa02159a1>] xfs_vn_lookup+0x51/0x90 [xfs]
>> [1369713.688055] [<ffffffff802f495e>] real_lookup+0xee/0x170
>> [1369713.688055] [<ffffffff802f4a90>] do_lookup+0xb0/0x110
>> [1369713.688055] [<ffffffff802f50fd>] __link_path_walk+0x60d/0xc20
>> [1369713.688055] [<ffffffff80305ef6>] ? mntput_no_expire+0x36/0x160
>> [1369713.688055] [<ffffffff802f5c4e>] path_walk+0x6e/0xe0
>> [1369713.688055] [<ffffffff802f5e63>] do_path_lookup+0xe3/0x200
>> [1369713.688055] [<ffffffff802f386a>] ? getname+0x4a/0xb0
>> [1369713.688055] [<ffffffff802f6cbb>] user_path_at+0x7b/0xb0
>> [1369713.688055] [<ffffffff802eda68>] ? cp_new_stat+0xe8/0x100
>> [1369713.688055] [<ffffffff802ede7d>] vfs_stat_fd+0x2d/0x60
>> [1369713.688055] [<ffffffff802edf5c>] sys_newstat+0x2c/0x50
>> [1369713.688055] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
>> [1369713.688055]
>> [1369713.688055]
>> [1369713.688055] Code: f8 31 c9 45 8b 13 4d 89 d8 44 89 d2 0f ca 89 d0
>> 83 ea 01 48 c1 e0 03 49 29 c0 eb 07 8d 4b 01 39 ca 7c 1c 8d 1c 11 d1
>> fb 48 63 fb <41> 8b 04 f8 0f c8 41 39 c5 74 26 77 e4 8d 53 ff 39 ca 7d
>> e4 48
>> [1369713.688055] RIP [<ffffffffa01db24f>]
>> xfs_dir2_block_lookup_int+0xcf/0x210 [xfs]
>> [1369713.688055] RSP <ffff88007120ba28>
>> [1369713.688055] CR2: ffff87f947da5088
>> [1369714.057232] ---[ end trace 0735c8702d5e7899 ]---
>>
>> What can I do to help getting this fixed?
>>
> can you make an xfs_metadump of the filesystem in question, and then try
> an xfs_repair? Capture/save the repair output. If repair finds errors,
> then perhaps the bug is triggered by bad error checking on a corrupted
> image, and we might reproduce it w/ the metadump image.
I tried...
root@io:~# xfs_metadump /dev/sda xfs_metadump_sda
*** glibc detected *** xfs_db: double free or corruption (out):
0x00000000017b8000 ***
======= Backtrace: =========
/lib/libc.so.6[0x7f4f10298a58]
/lib/libc.so.6(cfree+0x76)[0x7f4f1029b0a6]
xfs_db[0x416f53]
xfs_db[0x4189b9]
xfs_db[0x41b4db]
xfs_db[0x4186f5]
xfs_db[0x41b07e]
xfs_db[0x4186f5]
xfs_db[0x41aa4b]
xfs_db[0x415d03]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f4f1023d466]
xfs_db[0x4029d9]
======= Memory map: ========
00400000-00477000 r-xp 00000000 08:17 100768239
/usr/sbin/xfs_db
00676000-00678000 rw-p 00076000 08:17 100768239
/usr/sbin/xfs_db
00678000-00683000 rw-p 00678000 00:00 0
0176a000-017cc000 rw-p 0176a000 00:00 0
[heap]
7f4f08000000-7f4f08021000 rw-p 7f4f08000000 00:00 0
7f4f08021000-7f4f0c000000 ---p 7f4f08021000 00:00 0
7f4f0fbc8000-7f4f0fbde000 r-xp 00000000 08:15 50382939
/lib/libgcc_s.so.1
7f4f0fbde000-7f4f0fdde000 ---p 00016000 08:15 50382939
/lib/libgcc_s.so.1
7f4f0fdde000-7f4f0fddf000 r--p 00016000 08:15 50382939
/lib/libgcc_s.so.1
7f4f0fddf000-7f4f0fde0000 rw-p 00017000 08:15 50382939
/lib/libgcc_s.so.1
7f4f0fde0000-7f4f0fde2000 r-xp 00000000 08:15 50712055
/lib/libdl-2.8.90.so
7f4f0fde2000-7f4f0ffe2000 ---p 00002000 08:15 50712055
/lib/libdl-2.8.90.so
7f4f0ffe2000-7f4f0ffe3000 r--p 00002000 08:15 50712055
/lib/libdl-2.8.90.so
7f4f0ffe3000-7f4f0ffe4000 rw-p 00003000 08:15 50712055
/lib/libdl-2.8.90.so
7f4f0ffe4000-7f4f1001b000 r-xp 00000000 08:15 50331947
/lib/libncurses.so.5.6
7f4f1001b000-7f4f1021a000 ---p 00037000 08:15 50331947
/lib/libncurses.so.5.6
7f4f1021a000-7f4f1021f000 rw-p 00036000 08:15 50331947
/lib/libncurses.so.5.6
7f4f1021f000-7f4f10388000 r-xp 00000000 08:15 50712052
/lib/libc-2.8.90.so
7f4f10388000-7f4f10587000 ---p 00169000 08:15 50712052
/lib/libc-2.8.90.so
7f4f10587000-7f4f1058b000 r--p 00168000 08:15 50712052
/lib/libc-2.8.90.so
7f4f1058b000-7f4f1058c000 rw-p 0016c000 08:15 50712052
/lib/libc-2.8.90.so
7f4f1058c000-7f4f10591000 rw-p 7f4f1058c000 00:00 0
7f4f10591000-7f4f105c8000 r-xp 00000000 08:15 50363305
/lib/libreadline.so.5.2
7f4f105c8000-7f4f107c8000 ---p 00037000 08:15 50363305
/lib/libreadline.so.5.2
7f4f107c8000-7f4f107d0000 rw-p 00037000 08:15 50363305
/lib/libreadline.so.5.2
7f4f107d0000-7f4f107d1000 rw-p 7f4f107d0000 00:00 0
7f4f107d1000-7f4f107e8000 r-xp 00000000 08:15 50630786
/lib/libpthread-2.8.90.so
7f4f107e8000-7f4f109e7000 ---p 00017000 08:15 50630786
/lib/libpthread-2.8.90.so
7f4f109e7000-7f4f109e8000 r--p 00016000 08:15 50630786
/lib/libpthread-2.8.90.so
7f4f109e8000-7f4f109e9000 rw-p 00017000 08:15 50630786
/lib/libpthread-2.8.90.so
7f4f109e9000-7f4f109ed000 rw-p 7f4f109e9000 00:00 0
7f4f109ed000-7f4f109f5000 r-xp 00000000 08:15 50630788
/lib/librt-2.8.90.so
7f4f109f5000-7f4f10bf4000 ---p 00008000 08:15 50630788
/lib/librt-2.8.90.so
7f4f10bf4000-7f4f10bf5000 r--p 00007000 08:15 50630788
/lib/librt-2.8.90.so
7f4f10bf5000-7f4f10bf6000 rw-p 00008000 08:15 50630788
/lib/librt-2.8.90.so
7f4f10bf6000-7f4f10bf9000 r-xp 00000000 08:15 50331815
/lib/libuuid.so.1.2
7f4f10bf9000-7f4f10df9000 ---p 00003000 08:15 50331815
/lib/libuuid.so.1.2
7f4f10df9000-7f4f10dfa000 r--p 00003000 08:15 50331815
/lib/libuuid.so.1.2
7f4f10dfa000-7f4f10dfb000 rw-p 00004000 08:15 50331815
/lib/libuuid.so.1.2
7f4f10dfb000-7f4f10e1a000 r-xp 00000000 08:15 50712049
/lib/ld-2.8.90.so
7f4f10fcc000-7f4f11011000 rw-p 7f4f10fcc000 00:00 0
7f4f11015000-7f4f11019000 rw-p 7f4f11015000 00:00 0
7f4f11019000-7f4f1101a000 r--p 0001e000 08:15 50712049
/lib/ld-2.8.90.so
7f4f1101a000-7f4f1101b000 rw-p 0001f000 08:15 50712049
/lib/ld-2.8.90.so
7fff19006000-7fff1901b000 rw-p 7ffffffea000 00:00 0
[stack]
7fff191ff000-7fff19200000 r-xp 7fff191ff000 00:00 0
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]
Aborted
and tried again:
root@io:~# xfs_metadump -w /dev/sda xfs_metadump_sda
root@io:~# ll xfs_metadump_sda
-rw-r--r-- 1 root root 588407296 2009-03-11 19:09 xfs_metadump_sda
(where do I upload this to?)
xfs_repair fixed "bad names" of 4 inodes (see attached log file)
another xfs_metadump /dev/sda xfs_metadump_sda_2 (without -w):
-rw-r--r-- 1 root root 588261888 2009-03-11 19:23 xfs_metadump_sda_2
> It'd be nice if ubuntu had debug kernel variants (Fedora does this, I
> dunno about ubuntu) - if you are hitting any kind of memory corruption
> then a kernel with debug checks enabled might catch it sooner.
I haven't seen any - probably have to build my own debug kernel. Oh joy.
Is the metadump of any use?
Tom
[-- Attachment #2: xfs_repair.log --]
[-- Type: text/plain, Size: 5821 bytes --]
Script started on Wed 11 Mar 2009 19:12:31 WST
root@io:~# xfs_repair -v /dev/sda
Phase 1 - find and verify superblock...
- block cache size set to 126344 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 95447 tail block 95447
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
attribute entry 0 in attr block 0, inode 536928896 has bad name (namelen = 0)
problem with attribute contents in inode 536928896
clearing inode 536928896 attributes
correcting nblocks for inode 536928896, was 1 - counted 0
attribute entry 0 in attr block 0, inode 538095920 has bad name (namelen = 0)
problem with attribute contents in inode 538095920
clearing inode 538095920 attributes
correcting nblocks for inode 538095920, was 1 - counted 0
- agno = 2
attribute entry 0 in attr block 0, inode 1075035709 has bad name (namelen = 0)
problem with attribute contents in inode 1075035709
clearing inode 1075035709 attributes
correcting nblocks for inode 1075035709, was 1 - counted 0
- agno = 3
- agno = 4
- agno = 5
attribute entry 0 in attr block 0, inode 2684371715 has bad name (namelen = 0)
problem with attribute contents in inode 2684371715
clearing inode 2684371715 attributes
correcting nblocks for inode 2684371715, was 1 - counted 0
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
bad attribute format 1 in inode 536928896, resetting value
bad attribute format 1 in inode 538095920, resetting value
- agno = 2
- agno = 3
bad attribute format 1 in inode 1075035709, resetting value
- agno = 4
- agno = 5
bad attribute format 1 in inode 2684371715, resetting value
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
Phase 5 - rebuild AG headers and trees...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
resetting inode 15584 nlinks from 2 to 7
XFS_REPAIR Summary Wed Mar 11 19:14:51 2009
Phase Start End Duration
Phase 1: 03/11 19:14:13 03/11 19:14:13
Phase 2: 03/11 19:14:13 03/11 19:14:17 4 seconds
Phase 3: 03/11 19:14:17 03/11 19:14:46 29 seconds
Phase 4: 03/11 19:14:46 03/11 19:14:48 2 seconds
Phase 5: 03/11 19:14:48 03/11 19:14:48
Phase 6: 03/11 19:14:48 03/11 19:14:50 2 seconds
Phase 7: 03/11 19:14:50 03/11 19:14:50
Total run time: 37 seconds
done
root@io:~# exit
Script done on Wed 11 Mar 2009 19:15:33 WST
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-03-11 10:43 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-02 2:46 Corruption of in-memory data detected Thomas Gutzler
2009-01-02 3:24 ` Eric Sandeen
2009-03-11 2:44 ` Thomas Gutzler
2009-03-11 4:30 ` Eric Sandeen
2009-03-11 10:42 ` Thomas Gutzler [this message]
2009-03-12 2:23 ` Eric Sandeen
2009-03-12 5:06 ` Thomas Gutzler
-- strict thread matches above, loose matches on Subject: below --
2014-07-01 6:44 corruption " Alexandru Cardaniuc
2014-07-01 7:02 ` Dave Chinner
2014-07-01 8:29 ` Alexandru Cardaniuc
2014-07-01 9:38 ` Dave Chinner
2014-07-01 20:13 ` Alexandru Cardaniuc
2014-07-01 21:43 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49B79586.5060801@gmail.com \
--to=thomas.gutzler@gmail.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.