* kernel BUG at <bad filename>:50307! @ 2006-08-15 14:27 Martin Braun 2006-08-15 14:31 ` Arjan van de Ven 2006-08-16 0:11 ` Nathan Scott 0 siblings, 2 replies; 10+ messages in thread From: Martin Braun @ 2006-08-15 14:27 UTC (permalink / raw) To: linux-kernel Hello all, I got this bug (see below) in my logs, the system showed with "top" an increasing load average of 11 and more but with an cpu-idle of 99% and no processes used mentionable resources, there were 6 zombies. A shutdown was not possible most of the samba processes didn't respond to a kill. Before the exception the server was -as usual- under heavy load of samba processes 4-5 clients, with many automated activity (batch-processes with image processing). What does this bug mean? Hardware-Details: * Device sdc (on an easy-raid system) has an XFS Filesystem. * uname -a Linux pers109 2.6.17.8 #1 SMP Mon Aug 7 11:04:08 CEST 2006 i686 i686 i386 GNU/Linux * lspci 0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03) 0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03) 0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit Ethernet Controller (LOM) (rev 02) 0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) 0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) === cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 7 cpu MHz : 1595.130 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 3193.91 === /usr/local/samba/sbin/smbd -V Version 3.0.20 _________________________ /var/log/messages extract -------------------------- Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode: 254474718 start_block : 0 start_off : c0a0b0e8a099 0 blkcnt : 90000 extent-state : 0 Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------ Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307! Aug 15 15:01:02 pers109 kernel: invalid opcode: 0000 [#1] Aug 15 15:01:02 pers109 kernel: SMP Aug 15 15:01:02 pers109 kernel: CPU: 0 Aug 15 15:01:02 pers109 kernel: EIP: 0060:[<c0257d64>] Not tainted VLI Aug 15 15:01:02 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #1) Aug 15 15:01:02 pers109 kernel: eax: c0479f84 ebx: c0436464 ecx: c046c9bc edx: 00000282 Aug 15 15:01:02 pers109 kernel: esi: cea51cb0 edi: c0526120 ebp: 00000000 esp: cea51b70 Aug 15 15:01:02 pers109 kernel: ds: 007b es: 007b ss: 0068 Aug 15 15:01:02 pers109 kernel: Process smbd (pid: 18095, threadinfo=cea50000 task=c212e0b0) Aug 15 15:01:02 pers109 kernel: Stack: c04452ac c042855c c0526120 00000282 f7204db0 cea51cb0 00000000 e31d5b00 Aug 15 15:01:02 pers109 kernel: c01fe13d 00000000 c0436464 c49083e0 0f2af9de 00000000 00000000 00000000 Aug 15 15:01:02 pers109 kernel: 0e8a0990 000c0a0b 00090000 00000000 00000000 cea51cb0 00000000 00000000 Aug 15 15:01:02 pers109 kernel: Call Trace: Aug 15 15:01:02 pers109 kernel: <c01fe13d> <c01ff637> Aug 15 15:01:02 pers109 kernel: <c0115e51> <c0115e51> Aug 15 15:01:02 pers109 kernel: <c015987b> <c015a791> Aug 15 15:01:03 pers109 kernel: <c0140a91> <c0254ff3> Aug 15 15:01:03 pers109 kernel: <c039c7d2> <c017187e> Aug 15 15:01:03 pers109 kernel: <c0255653> <c0395c89> Aug 15 15:01:03 pers109 kernel: <c01696c8> <c0288a3a> Aug 15 15:01:03 pers109 kernel: <c025091f> <c0157383> Aug 15 15:01:03 pers109 kernel: <c012d613> <c01574a9> Aug 15 15:01:03 pers109 kernel: <c015774e> <c01027df> Aug 15 15:01:03 pers109 kernel: Code: c0 c7 44 24 08 20 61 52 c0 c7 04 24 ac 52 44 c0 89 44 24 04 e8 5b 34 ec ff b8 84 9f 47 c0 8b 54 24 0c e8 bc fa 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 55 b8 07 00 00 00 57 bf 20 61 52 Aug 15 15:01:03 pers109 kernel: EIP: [<c0257d64>] SS:ESP 0068:cea51b70 thanks in advance, martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307! 2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun @ 2006-08-15 14:31 ` Arjan van de Ven 2006-08-16 0:11 ` Nathan Scott 1 sibling, 0 replies; 10+ messages in thread From: Arjan van de Ven @ 2006-08-15 14:31 UTC (permalink / raw) To: mbraun; +Cc: linux-kernel On Tue, 2006-08-15 at 16:27 +0200, Martin Braun wrote: > Hello all, > > I got this bug (see below) in my logs, the system showed with "top" an > increasing load average of 11 and more but with an cpu-idle of 99% and > no processes used mentionable resources, there were 6 zombies. A > shutdown was not possible most of the samba processes didn't respond to > a kill. > Before the exception the server was -as usual- under heavy load of samba > processes 4-5 clients, with many automated activity (batch-processes > with image processing). > > What does this bug mean? Hi, it means you don't have CONFIG_KALLSYMS enabled, so the kernel isn't able to give a decent debugging output in the oops.. if it's a repeatable oops turning that option on would be a great help to even figure out which part of the kernel is involved... Greetings, Arjan van de Ven -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307! 2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun 2006-08-15 14:31 ` Arjan van de Ven @ 2006-08-16 0:11 ` Nathan Scott 2006-08-16 9:05 ` Martin Braun [not found] ` <44EB228F.6020903@uni-hd.de> 1 sibling, 2 replies; 10+ messages in thread From: Nathan Scott @ 2006-08-16 0:11 UTC (permalink / raw) To: Martin Braun; +Cc: linux-kernel, xfs Hi Martin, On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote: > ... > What does this bug mean? > ... > Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode: > 254474718 start_block : 0 start_off : c0a0b0e8a099 > 0 blkcnt : 90000 extent-state : 0 > Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------ > Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307! It means XFS detected ondisk corruption in inode# 254474718, and paniced your system (stupidly; a fix for this is around, will be merged with the next mainline update). For me, a more interesting question is how that inode got into this state... have you had any crashes recently (i.e. has the filesystem journal needed to be replayed recently?) Can you send the output of: # xfs_db -c 'inode 254474718' -c print /dev/sdc1 You'll need to run xfs_repair on that filesystem to fix this up, but please send us that output first. thanks. -- Nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307! 2006-08-16 0:11 ` Nathan Scott @ 2006-08-16 9:05 ` Martin Braun [not found] ` <44EB228F.6020903@uni-hd.de> 1 sibling, 0 replies; 10+ messages in thread From: Martin Braun @ 2006-08-16 9:05 UTC (permalink / raw) To: Nathan Scott, linux-kernel Hi Nathan, > It means XFS detected ondisk corruption in inode# 254474718, and > paniced your system (stupidly; a fix for this is around, will be > merged with the next mainline update). For me, a more interesting > question is how that inode got into this state... have you had any > crashes recently (i.e. has the filesystem journal needed to be > replayed recently?) Can you send the output of: We had recently problems with our XFS partition caused by the Kernel-Bug in 2.6.17. I updated xfsprogs-2.8.10 and repaired the partition with xfs_repair - it found a corrupted dir-inode (254474253) > > # xfs_db -c 'inode 254474718' -c print /dev/sdc1 > You'll need to run xfs_repair on that filesystem to fix this up, > but please send us that output first. core.magic = 0x494e core.mode = 0100774 core.version = 1 core.format = 3 (btree) core.nlinkv1 = 1 core.uid = 1348 core.gid = 104 core.flushiter = 0 core.atime.sec = Tue Aug 15 15:00:58 2006 core.atime.nsec = 934572500 core.mtime.sec = Tue Aug 15 15:01:02 2006 core.mtime.nsec = 261116500 core.ctime.sec = Tue Aug 15 15:01:02 2006 core.ctime.nsec = 261116500 core.size = 10092544 core.nblocks = 197 core.extsize = 0 core.nextents = 182 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 9 next_unlinked = null u.bmbt.level = 1 u.bmbt.numrecs = 1 u.bmbt.keys[1] = [startoff] 1:[1] u.bmbt.ptrs[1] = 1:112941297 thanks. martin ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <44EB228F.6020903@uni-hd.de>]
[parent not found: <20060823134211.E2968256@wobbly.melbourne.sgi.com>]
* xfs kernel BUG again in 2.6.17.11 [not found] ` <20060823134211.E2968256@wobbly.melbourne.sgi.com> @ 2006-11-13 9:28 ` Martin Braun 2006-11-14 4:00 ` David Chinner 0 siblings, 1 reply; 10+ messages in thread From: Martin Braun @ 2006-11-13 9:28 UTC (permalink / raw) To: linux-kernel Hi , is it possible that the xfs kernel bug is in the 2.6.17.11 Kernel again? we got obviously the same bug as with 2.6.17.8: Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode: 637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84 extent-state : 0 Nov 13 09:27:01 pers109 kernel: ------------[ cut here ]------------ Nov 13 09:27:01 pers109 kernel: kernel BUG at <bad filename>:50307! Nov 13 09:27:01 pers109 kernel: invalid opcode: 0000 [#2] Nov 13 09:27:01 pers109 kernel: SMP Nov 13 09:27:01 pers109 kernel: CPU: 1 Nov 13 09:27:01 pers109 kernel: EIP: 0060:[<c0258984>] Not tainted VLI Nov 13 09:27:01 pers109 kernel: EFLAGS: 00010246 (2.6.17.11 #1) Nov 13 09:27:01 pers109 kernel: EIP is at cmn_err+0xa0/0xaa Nov 13 09:27:01 pers109 kernel: eax: c047d144 ebx: c04385a0 ecx: c046f9bc edx: 00000282 Nov 13 09:27:01 pers109 kernel: esi: c33a3cb0 edi: c055e120 ebp: 00000000 esp: c33a3b70 Nov 13 09:27:01 pers109 kernel: ds: 007b es: 007b ss: 0068 Nov 13 09:27:01 pers109 kernel: Process smbd (pid: 26181, threadinfo=c33a2000 task=e00bead0) Nov 13 09:27:01 pers109 kernel: Stack: c0447536 c042a5d5 c055e120 00000282 ec894ae0 c33a3cb0 00000000 e2d85c80 Nov 13 09:27:01 pers109 kernel: c01fed1d 00000000 c04385a0 f69e3a00 2600182f 00000000 00000000 00000000 Nov 13 09:27:01 pers109 kernel: 30000000 00238125 00000084 00000000 00000000 c33a3cb0 00000000 00000000 Nov 13 09:27:01 pers109 kernel: Call Trace: Nov 13 09:27:01 pers109 kernel: <c01fed1d> xfs_bmap_search_extents+0xf5/0xf7 <c0200217> xfs_bmapi+0x229/0x162c Nov 13 09:27:01 pers109 kernel: <c0115eb1> default_wake_function+0x0/0x12 <c03bb980> ip_output+0x189/0x270 Nov 13 09:27:01 pers109 kernel: <c015a22b> mark_buffer_dirty+0x25/0x29 <c015b131> __block_commit_write+0x7e/0xb4 Nov 13 09:27:01 pers109 kernel: <c0141441> __pagevec_lru_add+0xa2/0xb5 <c0255c13> xfs_zero_eof+0x1ca/0x340 Nov 13 09:27:01 pers109 kernel: <c039d882> memcpy_toiovec+0x37/0x5c <c0172283> file_update_time+0xa1/0xc0 Nov 13 09:27:01 pers109 kernel: <c0256273> xfs_write+0x4ea/0xda5 <c0396d39> sock_aio_read+0x83/0x8e Nov 13 09:27:01 pers109 kernel: <c025153f> xfs_file_aio_write+0x8f/0x9a <c0157d33> do_sync_write+0xd5/0x130 Nov 13 09:27:01 pers109 kernel: <c012d743> autoremove_wake_function+0x0/0x4b <c0157e59> vfs_write+0xcb/0x195 Nov 13 09:27:01 pers109 kernel: <c01580fe> sys_pwrite64+0x73/0x80 <c01027ef> sysenter_past_esp+0x54/0x75 Nov 13 09:27:01 pers109 kernel: Code: c0 c7 44 24 08 20 e1 55 c0 c7 04 24 36 75 44 c0 89 44 24 04 e8 8b 29 ec ff b8 44 d1 47 c0 8b 54 24 0c e8 bc ff 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 55 b8 07 00 00 00 57 bf 20 e1 55 Nov 13 09:27:01 pers109 kernel: EIP: [<c0258984>] cmn_err+0xa0/0xaa SS:ESP 0068:c33a3b70 I will remove the corresponding block... thanks, martin > On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote: >> Hi Nathan, >> >> since I haven't repaired the fs we had a crash again (see below). >> >> unfortunately we copied at the time of the crash over iscsi some files >> to an xfs-fs on a nas. >> and the directory was completely deleted. neither a xfs-check or a >> xfs_repair did find something. was that due to the combination of iscsi >> and xfs? > > Sorry for not getting back to you earlier, I've been too busy. :( > > I think you will need to clear out the affected inode (looks like a > form of corruption that repair doesn't know about today) - you'll > need to forcibly remove that inode via xfs_db, something like: > > # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1 > # xfs_repair /dev/sdc1 > > cheers. > > ps: Barry, looks like repair needs some work in this area... > >> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode: >> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c >> extent-state : 0 >> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------ >> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307! >> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1] >> Aug 22 12:48:12 pers109 kernel: SMP >> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi >> scsi_transport_iscsi >> Aug 22 12:48:12 pers109 kernel: CPU: 0 >> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI >> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5) >> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa >> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx: >> c047c9bc edx: 00000282 >> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp: >> 00000000 esp: e595db70 >> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068 >> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510, >> threadinfo=e595c000 task=d9628a90) >> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120 >> 00000282 f3507260 e595dcb0 00000000 d9f9de00 >> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4 >> f686cba0 02183812 00000000 00000000 00000000 >> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c >> 00000000 00000000 e595dcb0 00000000 00000000 >> Aug 22 12:48:12 pers109 kernel: Call Trace: >> Aug 22 12:48:12 pers109 kernel: <c0202f0d> >> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c >> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f >> <c03b8660> ip_output+0x189/0x270 >> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc >> <c0104d7a> do_IRQ+0x1e/0x24 >> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20 >> <c0259e03> xfs_zero_eof+0x1ca/0x340 >> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c >> <c01762b3> file_update_time+0xa1/0xc0 >> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5 >> <c0393654> sock_aio_read+0x83/0x8e >> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3 >> <c028dc12> copy_to_user+0x3c/0x4a >> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a >> <c015ba73> do_sync_write+0xd5/0x130 >> Aug 22 12:48:12 pers109 kernel: <c012de03> >> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195 >> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80 >> <c01027ef> sysenter_past_esp+0x54/0x75 >> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04 >> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c >> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 >> 55 b8 07 00 00 00 57 bf 20 a1 56 >> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa >> SS:ESP 0068:e595db70 >> >> >> >> >> >> >> Scott schrieb: >>> Hi Martin, >>> >>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote: >>>> ... >>>> What does this bug mean? >>>> ... >>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode: >>>> 254474718 start_block : 0 start_off : c0a0b0e8a099 >>>> 0 blkcnt : 90000 extent-state : 0 >>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------ >>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307! >>> It means XFS detected ondisk corruption in inode# 254474718, and >>> paniced your system (stupidly; a fix for this is around, will be >>> merged with the next mainline update). For me, a more interesting >>> question is how that inode got into this state... have you had any >>> crashes recently (i.e. has the filesystem journal needed to be >>> replayed recently?) Can you send the output of: >>> >>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1 >>> >>> You'll need to run xfs_repair on that filesystem to fix this up, >>> but please send us that output first. >>> >>> thanks. >>> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11 2006-11-13 9:28 ` xfs kernel BUG again in 2.6.17.11 Martin Braun @ 2006-11-14 4:00 ` David Chinner 2006-11-14 9:23 ` Martin Braun 0 siblings, 1 reply; 10+ messages in thread From: David Chinner @ 2006-11-14 4:00 UTC (permalink / raw) To: Martin Braun; +Cc: linux-kernel, xfs On Mon, Nov 13, 2006 at 10:28:30AM +0100, Martin Braun wrote: > Hi , > > is it possible that the xfs kernel bug is in the 2.6.17.11 Kernel again? > we got obviously the same bug as with 2.6.17.8: It's likely that XFS is identical in those 2 releases. BTW, Martin, can you cc XFS bug reports to xfs@oss.sgi.com in future? > Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode: > 637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84 > extent-state : 0 Looks like you are managing to trigger an inode corruption of some sort. Have you managed to repair the filesystem since you first reported this problem? I don't know the history of the bug you are seeing othat than what you included, so can you give us a more complete picture of your hardware and what sort of workload you are doing that triggers this problem? FWIW, are there any I/o errors being reported in dmesg or syslog? Cheers, Dave. > > On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote: > >> Hi Nathan, > >> > >> since I haven't repaired the fs we had a crash again (see below). > >> > >> unfortunately we copied at the time of the crash over iscsi some files > >> to an xfs-fs on a nas. > >> and the directory was completely deleted. neither a xfs-check or a > >> xfs_repair did find something. was that due to the combination of iscsi > >> and xfs? > > > > Sorry for not getting back to you earlier, I've been too busy. :( > > > > I think you will need to clear out the affected inode (looks like a > > form of corruption that repair doesn't know about today) - you'll > > need to forcibly remove that inode via xfs_db, something like: > > > > # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1 > > # xfs_repair /dev/sdc1 > > > > cheers. > > > > ps: Barry, looks like repair needs some work in this area... > > > >> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode: > >> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c > >> extent-state : 0 > >> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------ > >> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307! > >> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1] > >> Aug 22 12:48:12 pers109 kernel: SMP > >> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi > >> scsi_transport_iscsi > >> Aug 22 12:48:12 pers109 kernel: CPU: 0 > >> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI > >> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5) > >> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa > >> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx: > >> c047c9bc edx: 00000282 > >> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp: > >> 00000000 esp: e595db70 > >> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068 > >> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510, > >> threadinfo=e595c000 task=d9628a90) > >> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120 > >> 00000282 f3507260 e595dcb0 00000000 d9f9de00 > >> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4 > >> f686cba0 02183812 00000000 00000000 00000000 > >> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c > >> 00000000 00000000 e595dcb0 00000000 00000000 > >> Aug 22 12:48:12 pers109 kernel: Call Trace: > >> Aug 22 12:48:12 pers109 kernel: <c0202f0d> > >> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c > >> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f > >> <c03b8660> ip_output+0x189/0x270 > >> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc > >> <c0104d7a> do_IRQ+0x1e/0x24 > >> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20 > >> <c0259e03> xfs_zero_eof+0x1ca/0x340 > >> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c > >> <c01762b3> file_update_time+0xa1/0xc0 > >> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5 > >> <c0393654> sock_aio_read+0x83/0x8e > >> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3 > >> <c028dc12> copy_to_user+0x3c/0x4a > >> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a > >> <c015ba73> do_sync_write+0xd5/0x130 > >> Aug 22 12:48:12 pers109 kernel: <c012de03> > >> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195 > >> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80 > >> <c01027ef> sysenter_past_esp+0x54/0x75 > >> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04 > >> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c > >> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 > >> 55 b8 07 00 00 00 57 bf 20 a1 56 > >> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa > >> SS:ESP 0068:e595db70 > >> > >> > >> > >> > >> > >> > >> Scott schrieb: > >>> Hi Martin, > >>> > >>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote: > >>>> ... > >>>> What does this bug mean? > >>>> ... > >>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode: > >>>> 254474718 start_block : 0 start_off : c0a0b0e8a099 > >>>> 0 blkcnt : 90000 extent-state : 0 > >>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------ > >>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307! > >>> It means XFS detected ondisk corruption in inode# 254474718, and > >>> paniced your system (stupidly; a fix for this is around, will be > >>> merged with the next mainline update). For me, a more interesting > >>> question is how that inode got into this state... have you had any > >>> crashes recently (i.e. has the filesystem journal needed to be > >>> replayed recently?) Can you send the output of: > >>> > >>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1 > >>> > >>> You'll need to run xfs_repair on that filesystem to fix this up, > >>> but please send us that output first. > >>> > >>> thanks. > >>> > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11 2006-11-14 4:00 ` David Chinner @ 2006-11-14 9:23 ` Martin Braun 2006-11-14 10:12 ` Oleg Verych 0 siblings, 1 reply; 10+ messages in thread From: Martin Braun @ 2006-11-14 9:23 UTC (permalink / raw) To: David Chinner; +Cc: linux-kernel, xfs Hi David, > Have you managed to repair the filesystem since you first > reported this problem? I don't know the history of the bug that's something I am not sure about, I have used the newest xfs_repair tools and it found and repaired some inodes. And for about two months there weren't any crashes. > you are seeing othat than what you included, so can you > give us a more complete picture of your hardware and > what sort of workload you are doing that triggers this > problem? The main workload of this machine is high samba activity with few clients but many IO tasks (i.e. Photoshop batch processing on many 3-6 MB Images). The XFS Partition is on an easy-RAID 16 P. Other Partitions are EXT3. There are also 2 iSCSI-Partitions with XFS. For Hardware Information, see below. After the crash I did an xfs_repair and it found corrupt directory inode and moved it to lost+found as " 254474253". Normally the Kernel freezes/hangs completely, but I found two new Kernel BUG (see below) in the log-messages (without a freeze), the corresponding java-program was building an lucene-index from a mysql-database. It seems that xfs_repair (2.8.10), did not find all of the errors of the FS. Is there a way to be sure that the FS is clean? > > FWIW, are there any I/o errors being reported in dmesg or syslog? There weren't any I/o errors. Nov 13 14:16:28 pers109 kernel: ------------[ cut here ]------------ Nov 13 14:16:28 pers109 kernel: kernel BUG at :29837! Nov 13 14:16:28 pers109 kernel: invalid opcode: 0000 [#1] Nov 13 14:16:28 pers109 kernel: SMP Nov 13 14:16:28 pers109 kernel: CPU: 2 Nov 13 14:16:28 pers109 kernel: EIP: 0060:[<c0171eea>] Not tainted VLI Nov 13 14:16:28 pers109 kernel: EFLAGS: 00210202 (2.6.17.11 #1) Nov 13 14:16:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9 Nov 13 14:16:28 pers109 kernel: eax: c2001e80 ebx: ecadeca0 ecx: 00000003 edx: ecadedd8 Nov 13 14:16:28 pers109 kernel: esi: 00000000 edi: ecadeca0 ebp: d8699f4c esp: d8699f18 Nov 13 14:16:28 pers109 kernel: ds: 007b es: 007b ss: 0068 Nov 13 14:16:28 pers109 kernel: Process java (pid: 15883, threadinfo=d8698000 task=d6c78a10) Nov 13 14:16:28 pers109 kernel: Stack: ecadeca0 00000000 00000000 ecadeca0 d7ce4000 c01720cd ecadeca0 c04738dc Nov 13 14:16:28 pers109 kernel: 00000000 c01683fc ecadeca0 f1862094 f1862094 c92b5114 c214c0c0 4859aa9a Nov 13 14:16:28 pers109 kernel: 00000008 d7ce4029 00000010 00000000 00000000 00000000 00000000 c214c0c0 Nov 13 14:16:28 pers109 kernel: Call Trace: Nov 13 14:16:28 pers109 kernel: <c01720cd> iput+0x5f/0x74 <c01683fc> do_unlinkat+0xc9/0x107 Nov 13 14:16:28 pers109 kernel: <c015739a> filp_close+0x44/0x6c <c0168481> sys_unlink+0x17/0x1b Nov 13 14:16:28 pers109 kernel: <c01027ef> sysenter_past_esp+0x54/0x75 Nov 13 14:16:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44 24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8 9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b 5c 24 18 8b 53 04 8b Nov 13 14:16:28 pers109 kernel: EIP: [<c0171eea>] generic_delete_inode+0xf1/0xf9 SS:ESP 0068:d8699f18 Nov 13 20:22:28 pers109 kernel: ------------[ cut here ]------------ Nov 13 20:22:28 pers109 kernel: kernel BUG at :29837! Nov 13 20:22:28 pers109 kernel: invalid opcode: 0000 [#2] Nov 13 20:22:28 pers109 kernel: SMP Nov 13 20:22:28 pers109 kernel: CPU: 3 Nov 13 20:22:28 pers109 kernel: EIP: 0060:[<c0171eea>] Not tainted VLI Nov 13 20:22:28 pers109 kernel: EFLAGS: 00010202 (2.6.17.11 #1) Nov 13 20:22:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9 Nov 13 20:22:28 pers109 kernel: eax: c2001f10 ebx: d6c586a0 ecx: 00000003 edx: d6c587d8 Nov 13 20:22:28 pers109 kernel: esi: 00000000 edi: d6c586a0 ebp: d2cd9f4c esp: d2cd9f18 Nov 13 20:22:28 pers109 kernel: ds: 007b es: 007b ss: 0068 Nov 13 20:22:28 pers109 kernel: Process java (pid: 19824, threadinfo=d2cd8000 task=d1f575a0) Nov 13 20:22:28 pers109 kernel: Stack: d6c586a0 00000000 00000000 d6c586a0 d5144000 c01720cd d6c586a0 c04738dc Nov 13 20:22:28 pers109 kernel: 00000000 c01683fc d6c586a0 dd28e794 dd28e794 f69dd894 c214c0c0 281c233e Nov 13 20:22:28 pers109 kernel: 00000009 d5144029 00000010 00000000 00000000 00000000 00000000 c214c0c0 Nov 13 20:22:28 pers109 kernel: Call Trace: Nov 13 20:22:28 pers109 kernel: <c01720cd> iput+0x5f/0x74 <c01683fc> do_unlinkat+0xc9/0x107 Nov 13 20:22:28 pers109 kernel: <c015739a> filp_close+0x44/0x6c <c0168481> sys_unlink+0x17/0x1b Nov 13 20:22:28 pers109 kernel: <c01027ef> sysenter_past_esp+0x54/0x75 Nov 13 20:22:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44 24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8 9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b 5c 24 18 8b 53 04 8b ________ Hardware Info: ________ (Output of cpu0 from 4 (virtual, 2 physical cpus) cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 7 cpu MHz : 1595.120 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 3193.91 __________ uname -a Linux pers109 2.6.17.11 #1 SMP Mon Aug 28 10:45:48 CEST 2006 i686 i686 i386 GNU/Linux ---------------------- cat /etc/SuSE-release SuSE Linux 9.3 (i586) VERSION = 9.3 --------------------- lspci 0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03) 0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03) 0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit Ethernet Controller (LOM) (rev 02) 0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) 0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: IBM Model: DCAS-34330W Rev: S65A Type: Direct-Access ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 04 Lun: 00 Vendor: easyRAID Model: 16P Rev: 0001 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi1 Channel: 00 Id: 04 Lun: 01 Vendor: easyRAID Model: 16P Rev: 0001 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi1 Channel: 00 Id: 06 Lun: 00 Vendor: easyRAID Model: X16P Rev: 0001 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi1 Channel: 00 Id: 06 Lun: 01 Vendor: easyRAID Model: X16P Rev: 0001 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: LITE-ON Model: LTR-48246K Rev: SKS7 Type: CD-ROM ANSI SCSI revision: ffffffff Host: scsi3 Channel: 00 Id: 00 Lun: 00 Vendor: HITACHI Model: DF600F Rev: 0000 Type: Direct-Access ANSI SCSI revision: 04 Host: scsi3 Channel: 00 Id: 00 Lun: 01 Vendor: HITACHI Model: DF600F Rev: 0000 Type: Direct-Access ANSI SCSI revision: 03 ------------ free total used free shared buffers cached Mem: 2075168 2022916 52252 0 4480 1848936 -/+ buffers/cache: 169500 1905668 Swap: 1959920 1782356 177564 > > Cheers, > > Dave. > >>> On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote: >>>> Hi Nathan, >>>> >>>> since I haven't repaired the fs we had a crash again (see below). >>>> >>>> unfortunately we copied at the time of the crash over iscsi some files >>>> to an xfs-fs on a nas. >>>> and the directory was completely deleted. neither a xfs-check or a >>>> xfs_repair did find something. was that due to the combination of iscsi >>>> and xfs? >>> Sorry for not getting back to you earlier, I've been too busy. :( >>> >>> I think you will need to clear out the affected inode (looks like a >>> form of corruption that repair doesn't know about today) - you'll >>> need to forcibly remove that inode via xfs_db, something like: >>> >>> # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1 >>> # xfs_repair /dev/sdc1 >>> >>> cheers. >>> >>> ps: Barry, looks like repair needs some work in this area... >>> >>>> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode: >>>> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c >>>> extent-state : 0 >>>> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------ >>>> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307! >>>> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1] >>>> Aug 22 12:48:12 pers109 kernel: SMP >>>> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi >>>> scsi_transport_iscsi >>>> Aug 22 12:48:12 pers109 kernel: CPU: 0 >>>> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI >>>> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5) >>>> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa >>>> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx: >>>> c047c9bc edx: 00000282 >>>> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp: >>>> 00000000 esp: e595db70 >>>> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068 >>>> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510, >>>> threadinfo=e595c000 task=d9628a90) >>>> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120 >>>> 00000282 f3507260 e595dcb0 00000000 d9f9de00 >>>> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4 >>>> f686cba0 02183812 00000000 00000000 00000000 >>>> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c >>>> 00000000 00000000 e595dcb0 00000000 00000000 >>>> Aug 22 12:48:12 pers109 kernel: Call Trace: >>>> Aug 22 12:48:12 pers109 kernel: <c0202f0d> >>>> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c >>>> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f >>>> <c03b8660> ip_output+0x189/0x270 >>>> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc >>>> <c0104d7a> do_IRQ+0x1e/0x24 >>>> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20 >>>> <c0259e03> xfs_zero_eof+0x1ca/0x340 >>>> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c >>>> <c01762b3> file_update_time+0xa1/0xc0 >>>> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5 >>>> <c0393654> sock_aio_read+0x83/0x8e >>>> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3 >>>> <c028dc12> copy_to_user+0x3c/0x4a >>>> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a >>>> <c015ba73> do_sync_write+0xd5/0x130 >>>> Aug 22 12:48:12 pers109 kernel: <c012de03> >>>> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195 >>>> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80 >>>> <c01027ef> sysenter_past_esp+0x54/0x75 >>>> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04 >>>> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c >>>> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 >>>> 55 b8 07 00 00 00 57 bf 20 a1 56 >>>> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa >>>> SS:ESP 0068:e595db70 >>>> >>>> >>>> >>>> >>>> >>>> >>>> Scott schrieb: >>>>> Hi Martin, >>>>> >>>>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote: >>>>>> ... >>>>>> What does this bug mean? >>>>>> ... >>>>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode: >>>>>> 254474718 start_block : 0 start_off : c0a0b0e8a099 >>>>>> 0 blkcnt : 90000 extent-state : 0 >>>>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------ >>>>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307! >>>>> It means XFS detected ondisk corruption in inode# 254474718, and >>>>> paniced your system (stupidly; a fix for this is around, will be >>>>> merged with the next mainline update). For me, a more interesting >>>>> question is how that inode got into this state... have you had any >>>>> crashes recently (i.e. has the filesystem journal needed to be >>>>> replayed recently?) Can you send the output of: >>>>> >>>>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1 >>>>> >>>>> You'll need to run xfs_repair on that filesystem to fix this up, >>>>> but please send us that output first. >>>>> >>>>> thanks. >>>>> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > -- Universitaetsbibliothek Heidelberg Tel: +49 6221 54-2580 Ploeck 107-109, D-69117 Heidelberg Fax: +49 6221 54-2623 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11 2006-11-14 9:23 ` Martin Braun @ 2006-11-14 10:12 ` Oleg Verych 2006-11-14 10:31 ` Martin Braun 0 siblings, 1 reply; 10+ messages in thread From: Oleg Verych @ 2006-11-14 10:12 UTC (permalink / raw) To: Martin Braun, David Chinner, LKML, xfs Hallo. On 2006-11-14, Martin Braun wrote: > Hi David, > > >> Have you managed to repair the filesystem since you first >> reported this problem? I don't know the history of the bug [Well. Just to help (probably) new developers, after Nathan left SGI.] Here's FAQ node about bug: http://oss.sgi.com/projects/xfs/faq.html#dir2 You can find fixes in .17 stable git tree. If it was really just sparse annotations, they were obviously fixed, i think. If not, meybe there are some new bugs. > that's something I am not sure about, I have used the newest xfs_repair > tools and it found and repaired some inodes. And for about two months > there weren't any crashes. + > It seems that xfs_repair (2.8.10), did not find all of the errors of the FS. > Is there a way to be sure that the FS is clean? As in faq: ,-- ..... | Update: a fixed xfs_repair is now available; version 2.8.10 or later | of the xfsprogs package contains the fixed version. ..... | The xfs_check tool, or xfs_repair -n, should be able to detect any | directory corruption. `-- [] > Normally the Kernel freezes/hangs completely, but I found two new Do you mean panic or oops here, or just freeze? ____ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11 2006-11-14 10:12 ` Oleg Verych @ 2006-11-14 10:31 ` Martin Braun 2006-11-14 11:21 ` Oleg Verych 0 siblings, 1 reply; 10+ messages in thread From: Martin Braun @ 2006-11-14 10:31 UTC (permalink / raw) To: Oleg Verych; +Cc: David Chinner, LKML, xfs Hi Oleg, thanks for your response. > You can find fixes in .17 stable git tree. Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate shows two times: first run correct time, second run time is half an hour in the future - so we switched back to 2.6.17.11 > If it was really just sparse annotations, they were obviously > fixed, i think. If not, meybe there are some new bugs. > + >> It seems that xfs_repair (2.8.10), did not find all of the errors of the FS. >> Is there a way to be sure that the FS is clean? > > As in faq: > | Update: a fixed xfs_repair is now available; version 2.8.10 or later > | of the xfsprogs package contains the fixed version. > ..... > | The xfs_check tool, or xfs_repair -n, should be able to detect any > | directory corruption. However the two Kernel BUGS were _after_ xfs_repair (version 2.8.10). >> Normally the Kernel freezes/hangs completely, but I found two new > > Do you mean panic or oops here, or just freeze? In detail: a Kernel BUG in /var/log/messages is written and after that the cpu load average is climbing up to 20-30, any tries to shutdown the system, kill processes umounts etc. are in vain. Than the system freezes completely: no keyboard, nothing. cheers, martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11 2006-11-14 10:31 ` Martin Braun @ 2006-11-14 11:21 ` Oleg Verych 0 siblings, 0 replies; 10+ messages in thread From: Oleg Verych @ 2006-11-14 11:21 UTC (permalink / raw) To: Martin Braun; +Cc: LKML On Tue, Nov 14, 2006 at 11:31:42AM +0100, Martin Braun wrote: > Hi Oleg, > > thanks for your response. > > You can find fixes in .17 stable git tree. > Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup > kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate 2.6.18 have many XFS fixes, that were not backported to 2.6.17 (and will not, i think). > shows two times: first run correct time, second run time is half an hour > in the future - so we switched back to 2.6.17.11 (And not mixing too much here, try to search last 3 months for "ntp". There are people, who can actually help with that. 2.6.19-rc have even more timekeeping fixes, maybe something will work for you). ____ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-11-14 11:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
2006-08-15 14:31 ` Arjan van de Ven
2006-08-16 0:11 ` Nathan Scott
2006-08-16 9:05 ` Martin Braun
[not found] ` <44EB228F.6020903@uni-hd.de>
[not found] ` <20060823134211.E2968256@wobbly.melbourne.sgi.com>
2006-11-13 9:28 ` xfs kernel BUG again in 2.6.17.11 Martin Braun
2006-11-14 4:00 ` David Chinner
2006-11-14 9:23 ` Martin Braun
2006-11-14 10:12 ` Oleg Verych
2006-11-14 10:31 ` Martin Braun
2006-11-14 11:21 ` Oleg Verych
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox