* kernel BUG at <bad filename>:50307!
@ 2006-08-15 14:27 Martin Braun
2006-08-15 14:31 ` Arjan van de Ven
2006-08-16 0:11 ` Nathan Scott
0 siblings, 2 replies; 10+ messages in thread
From: Martin Braun @ 2006-08-15 14:27 UTC (permalink / raw)
To: linux-kernel
Hello all,
I got this bug (see below) in my logs, the system showed with "top" an
increasing load average of 11 and more but with an cpu-idle of 99% and
no processes used mentionable resources, there were 6 zombies. A
shutdown was not possible most of the samba processes didn't respond to
a kill.
Before the exception the server was -as usual- under heavy load of samba
processes 4-5 clients, with many automated activity (batch-processes
with image processing).
What does this bug mean?
Hardware-Details:
* Device sdc (on an easy-raid system) has an XFS Filesystem.
* uname -a
Linux pers109 2.6.17.8 #1 SMP Mon Aug 7 11:04:08 CEST 2006 i686 i686
i386 GNU/Linux
* lspci
0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub
(rev 01)
0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B
PCI-to-PCI Bridge (rev 01)
0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1)
(rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2)
(rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3)
(rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface
Controller (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage
Controller (rev 02)
0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit
Ethernet Controller (LOM) (rev 02)
0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit
Ethernet Controller (rev 02)
0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27)
===
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 7
cpu MHz : 1595.130
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 3193.91
===
/usr/local/samba/sbin/smbd -V
Version 3.0.20
_________________________
/var/log/messages extract
--------------------------
Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
254474718 start_block : 0 start_off : c0a0b0e8a099
0 blkcnt : 90000 extent-state : 0
Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
Aug 15 15:01:02 pers109 kernel: invalid opcode: 0000 [#1]
Aug 15 15:01:02 pers109 kernel: SMP
Aug 15 15:01:02 pers109 kernel: CPU: 0
Aug 15 15:01:02 pers109 kernel: EIP: 0060:[<c0257d64>] Not tainted VLI
Aug 15 15:01:02 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #1)
Aug 15 15:01:02 pers109 kernel: eax: c0479f84 ebx: c0436464 ecx:
c046c9bc edx: 00000282
Aug 15 15:01:02 pers109 kernel: esi: cea51cb0 edi: c0526120 ebp:
00000000 esp: cea51b70
Aug 15 15:01:02 pers109 kernel: ds: 007b es: 007b ss: 0068
Aug 15 15:01:02 pers109 kernel: Process smbd (pid: 18095,
threadinfo=cea50000 task=c212e0b0)
Aug 15 15:01:02 pers109 kernel: Stack: c04452ac c042855c c0526120
00000282 f7204db0 cea51cb0 00000000 e31d5b00
Aug 15 15:01:02 pers109 kernel: c01fe13d 00000000 c0436464
c49083e0 0f2af9de 00000000 00000000 00000000
Aug 15 15:01:02 pers109 kernel: 0e8a0990 000c0a0b 00090000
00000000 00000000 cea51cb0 00000000 00000000
Aug 15 15:01:02 pers109 kernel: Call Trace:
Aug 15 15:01:02 pers109 kernel: <c01fe13d> <c01ff637>
Aug 15 15:01:02 pers109 kernel: <c0115e51> <c0115e51>
Aug 15 15:01:02 pers109 kernel: <c015987b> <c015a791>
Aug 15 15:01:03 pers109 kernel: <c0140a91> <c0254ff3>
Aug 15 15:01:03 pers109 kernel: <c039c7d2> <c017187e>
Aug 15 15:01:03 pers109 kernel: <c0255653> <c0395c89>
Aug 15 15:01:03 pers109 kernel: <c01696c8> <c0288a3a>
Aug 15 15:01:03 pers109 kernel: <c025091f> <c0157383>
Aug 15 15:01:03 pers109 kernel: <c012d613> <c01574a9>
Aug 15 15:01:03 pers109 kernel: <c015774e> <c01027df>
Aug 15 15:01:03 pers109 kernel: Code: c0 c7 44 24 08 20 61 52 c0 c7 04
24 ac 52 44 c0 89 44 24 04 e8 5b 34 ec ff b8 84 9f
47 c0 8b 54 24 0c e8 bc fa 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f
5d c3 55 b8 07 00 00 00 57 bf 20 61 52
Aug 15 15:01:03 pers109 kernel: EIP: [<c0257d64>] SS:ESP 0068:cea51b70
thanks in advance,
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307!
2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
@ 2006-08-15 14:31 ` Arjan van de Ven
2006-08-16 0:11 ` Nathan Scott
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-08-15 14:31 UTC (permalink / raw)
To: mbraun; +Cc: linux-kernel
On Tue, 2006-08-15 at 16:27 +0200, Martin Braun wrote:
> Hello all,
>
> I got this bug (see below) in my logs, the system showed with "top" an
> increasing load average of 11 and more but with an cpu-idle of 99% and
> no processes used mentionable resources, there were 6 zombies. A
> shutdown was not possible most of the samba processes didn't respond to
> a kill.
> Before the exception the server was -as usual- under heavy load of samba
> processes 4-5 clients, with many automated activity (batch-processes
> with image processing).
>
> What does this bug mean?
Hi,
it means you don't have CONFIG_KALLSYMS enabled, so the kernel isn't
able to give a decent debugging output in the oops.. if it's a
repeatable oops turning that option on would be a great help to even
figure out which part of the kernel is involved...
Greetings,
Arjan van de Ven
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307!
2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
2006-08-15 14:31 ` Arjan van de Ven
@ 2006-08-16 0:11 ` Nathan Scott
2006-08-16 9:05 ` Martin Braun
[not found] ` <44EB228F.6020903@uni-hd.de>
1 sibling, 2 replies; 10+ messages in thread
From: Nathan Scott @ 2006-08-16 0:11 UTC (permalink / raw)
To: Martin Braun; +Cc: linux-kernel, xfs
Hi Martin,
On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
> ...
> What does this bug mean?
> ...
> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> 254474718 start_block : 0 start_off : c0a0b0e8a099
> 0 blkcnt : 90000 extent-state : 0
> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
It means XFS detected ondisk corruption in inode# 254474718, and
paniced your system (stupidly; a fix for this is around, will be
merged with the next mainline update). For me, a more interesting
question is how that inode got into this state... have you had any
crashes recently (i.e. has the filesystem journal needed to be
replayed recently?) Can you send the output of:
# xfs_db -c 'inode 254474718' -c print /dev/sdc1
You'll need to run xfs_repair on that filesystem to fix this up,
but please send us that output first.
thanks.
--
Nathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at <bad filename>:50307!
2006-08-16 0:11 ` Nathan Scott
@ 2006-08-16 9:05 ` Martin Braun
[not found] ` <44EB228F.6020903@uni-hd.de>
1 sibling, 0 replies; 10+ messages in thread
From: Martin Braun @ 2006-08-16 9:05 UTC (permalink / raw)
To: Nathan Scott, linux-kernel
Hi Nathan,
> It means XFS detected ondisk corruption in inode# 254474718, and
> paniced your system (stupidly; a fix for this is around, will be
> merged with the next mainline update). For me, a more interesting
> question is how that inode got into this state... have you had any
> crashes recently (i.e. has the filesystem journal needed to be
> replayed recently?) Can you send the output of:
We had recently problems with our XFS partition caused by the Kernel-Bug
in 2.6.17. I updated xfsprogs-2.8.10 and repaired the partition with
xfs_repair - it found a corrupted dir-inode (254474253)
>
> # xfs_db -c 'inode 254474718' -c print /dev/sdc1
> You'll need to run xfs_repair on that filesystem to fix this up,
> but please send us that output first.
core.magic = 0x494e
core.mode = 0100774
core.version = 1
core.format = 3 (btree)
core.nlinkv1 = 1
core.uid = 1348
core.gid = 104
core.flushiter = 0
core.atime.sec = Tue Aug 15 15:00:58 2006
core.atime.nsec = 934572500
core.mtime.sec = Tue Aug 15 15:01:02 2006
core.mtime.nsec = 261116500
core.ctime.sec = Tue Aug 15 15:01:02 2006
core.ctime.nsec = 261116500
core.size = 10092544
core.nblocks = 197
core.extsize = 0
core.nextents = 182
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 9
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[1]
u.bmbt.ptrs[1] = 1:112941297
thanks.
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* xfs kernel BUG again in 2.6.17.11
[not found] ` <20060823134211.E2968256@wobbly.melbourne.sgi.com>
@ 2006-11-13 9:28 ` Martin Braun
2006-11-14 4:00 ` David Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-13 9:28 UTC (permalink / raw)
To: linux-kernel
Hi ,
is it possible that the xfs kernel bug is in the 2.6.17.11 Kernel again?
we got obviously the same bug as with 2.6.17.8:
Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode:
637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84
extent-state : 0
Nov 13 09:27:01 pers109 kernel: ------------[ cut here ]------------
Nov 13 09:27:01 pers109 kernel: kernel BUG at <bad filename>:50307!
Nov 13 09:27:01 pers109 kernel: invalid opcode: 0000 [#2]
Nov 13 09:27:01 pers109 kernel: SMP
Nov 13 09:27:01 pers109 kernel: CPU: 1
Nov 13 09:27:01 pers109 kernel: EIP: 0060:[<c0258984>] Not tainted VLI
Nov 13 09:27:01 pers109 kernel: EFLAGS: 00010246 (2.6.17.11 #1)
Nov 13 09:27:01 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
Nov 13 09:27:01 pers109 kernel: eax: c047d144 ebx: c04385a0 ecx:
c046f9bc edx: 00000282
Nov 13 09:27:01 pers109 kernel: esi: c33a3cb0 edi: c055e120 ebp:
00000000 esp: c33a3b70
Nov 13 09:27:01 pers109 kernel: ds: 007b es: 007b ss: 0068
Nov 13 09:27:01 pers109 kernel: Process smbd (pid: 26181,
threadinfo=c33a2000 task=e00bead0)
Nov 13 09:27:01 pers109 kernel: Stack: c0447536 c042a5d5 c055e120
00000282 ec894ae0 c33a3cb0 00000000 e2d85c80
Nov 13 09:27:01 pers109 kernel: c01fed1d 00000000 c04385a0
f69e3a00 2600182f 00000000 00000000 00000000
Nov 13 09:27:01 pers109 kernel: 30000000 00238125 00000084
00000000 00000000 c33a3cb0 00000000 00000000
Nov 13 09:27:01 pers109 kernel: Call Trace:
Nov 13 09:27:01 pers109 kernel: <c01fed1d>
xfs_bmap_search_extents+0xf5/0xf7 <c0200217> xfs_bmapi+0x229/0x162c
Nov 13 09:27:01 pers109 kernel: <c0115eb1>
default_wake_function+0x0/0x12 <c03bb980> ip_output+0x189/0x270
Nov 13 09:27:01 pers109 kernel: <c015a22b> mark_buffer_dirty+0x25/0x29
<c015b131> __block_commit_write+0x7e/0xb4
Nov 13 09:27:01 pers109 kernel: <c0141441> __pagevec_lru_add+0xa2/0xb5
<c0255c13> xfs_zero_eof+0x1ca/0x340
Nov 13 09:27:01 pers109 kernel: <c039d882> memcpy_toiovec+0x37/0x5c
<c0172283> file_update_time+0xa1/0xc0
Nov 13 09:27:01 pers109 kernel: <c0256273> xfs_write+0x4ea/0xda5
<c0396d39> sock_aio_read+0x83/0x8e
Nov 13 09:27:01 pers109 kernel: <c025153f> xfs_file_aio_write+0x8f/0x9a
<c0157d33> do_sync_write+0xd5/0x130
Nov 13 09:27:01 pers109 kernel: <c012d743>
autoremove_wake_function+0x0/0x4b <c0157e59> vfs_write+0xcb/0x195
Nov 13 09:27:01 pers109 kernel: <c01580fe> sys_pwrite64+0x73/0x80
<c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 09:27:01 pers109 kernel: Code: c0 c7 44 24 08 20 e1 55 c0 c7 04
24 36 75 44 c0 89 44 24 04 e8 8b 29 ec ff b8 44 d1 47 c0 8b 54 24 0c e8
bc ff 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 55 b8 07 00 00
00 57 bf 20 e1 55
Nov 13 09:27:01 pers109 kernel: EIP: [<c0258984>] cmn_err+0xa0/0xaa
SS:ESP 0068:c33a3b70
I will remove the corresponding block...
thanks,
martin
> On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
>> Hi Nathan,
>>
>> since I haven't repaired the fs we had a crash again (see below).
>>
>> unfortunately we copied at the time of the crash over iscsi some files
>> to an xfs-fs on a nas.
>> and the directory was completely deleted. neither a xfs-check or a
>> xfs_repair did find something. was that due to the combination of iscsi
>> and xfs?
>
> Sorry for not getting back to you earlier, I've been too busy. :(
>
> I think you will need to clear out the affected inode (looks like a
> form of corruption that repair doesn't know about today) - you'll
> need to forcibly remove that inode via xfs_db, something like:
>
> # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
> # xfs_repair /dev/sdc1
>
> cheers.
>
> ps: Barry, looks like repair needs some work in this area...
>
>> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
>> extent-state : 0
>> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
>> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
>> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
>> Aug 22 12:48:12 pers109 kernel: SMP
>> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
>> scsi_transport_iscsi
>> Aug 22 12:48:12 pers109 kernel: CPU: 0
>> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI
>> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5)
>> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
>> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx:
>> c047c9bc edx: 00000282
>> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp:
>> 00000000 esp: e595db70
>> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068
>> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
>> threadinfo=e595c000 task=d9628a90)
>> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
>> 00000282 f3507260 e595dcb0 00000000 d9f9de00
>> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4
>> f686cba0 02183812 00000000 00000000 00000000
>> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c
>> 00000000 00000000 e595dcb0 00000000 00000000
>> Aug 22 12:48:12 pers109 kernel: Call Trace:
>> Aug 22 12:48:12 pers109 kernel: <c0202f0d>
>> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c
>> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f
>> <c03b8660> ip_output+0x189/0x270
>> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc
>> <c0104d7a> do_IRQ+0x1e/0x24
>> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20
>> <c0259e03> xfs_zero_eof+0x1ca/0x340
>> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c
>> <c01762b3> file_update_time+0xa1/0xc0
>> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5
>> <c0393654> sock_aio_read+0x83/0x8e
>> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3
>> <c028dc12> copy_to_user+0x3c/0x4a
>> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a
>> <c015ba73> do_sync_write+0xd5/0x130
>> Aug 22 12:48:12 pers109 kernel: <c012de03>
>> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195
>> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80
>> <c01027ef> sysenter_past_esp+0x54/0x75
>> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
>> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
>> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
>> 55 b8 07 00 00 00 57 bf 20 a1 56
>> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
>> SS:ESP 0068:e595db70
>>
>>
>>
>>
>>
>>
>> Scott schrieb:
>>> Hi Martin,
>>>
>>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
>>>> ...
>>>> What does this bug mean?
>>>> ...
>>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
>>>> 0 blkcnt : 90000 extent-state : 0
>>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
>>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
>>> It means XFS detected ondisk corruption in inode# 254474718, and
>>> paniced your system (stupidly; a fix for this is around, will be
>>> merged with the next mainline update). For me, a more interesting
>>> question is how that inode got into this state... have you had any
>>> crashes recently (i.e. has the filesystem journal needed to be
>>> replayed recently?) Can you send the output of:
>>>
>>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1
>>>
>>> You'll need to run xfs_repair on that filesystem to fix this up,
>>> but please send us that output first.
>>>
>>> thanks.
>>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11
2006-11-13 9:28 ` xfs kernel BUG again in 2.6.17.11 Martin Braun
@ 2006-11-14 4:00 ` David Chinner
2006-11-14 9:23 ` Martin Braun
0 siblings, 1 reply; 10+ messages in thread
From: David Chinner @ 2006-11-14 4:00 UTC (permalink / raw)
To: Martin Braun; +Cc: linux-kernel, xfs
On Mon, Nov 13, 2006 at 10:28:30AM +0100, Martin Braun wrote:
> Hi ,
>
> is it possible that the xfs kernel bug is in the 2.6.17.11 Kernel again?
> we got obviously the same bug as with 2.6.17.8:
It's likely that XFS is identical in those 2 releases.
BTW, Martin, can you cc XFS bug reports to xfs@oss.sgi.com in future?
> Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> 637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84
> extent-state : 0
Looks like you are managing to trigger an inode corruption
of some sort.
Have you managed to repair the filesystem since you first
reported this problem? I don't know the history of the bug
you are seeing othat than what you included, so can you
give us a more complete picture of your hardware and
what sort of workload you are doing that triggers this
problem?
FWIW, are there any I/o errors being reported in dmesg or syslog?
Cheers,
Dave.
> > On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
> >> Hi Nathan,
> >>
> >> since I haven't repaired the fs we had a crash again (see below).
> >>
> >> unfortunately we copied at the time of the crash over iscsi some files
> >> to an xfs-fs on a nas.
> >> and the directory was completely deleted. neither a xfs-check or a
> >> xfs_repair did find something. was that due to the combination of iscsi
> >> and xfs?
> >
> > Sorry for not getting back to you earlier, I've been too busy. :(
> >
> > I think you will need to clear out the affected inode (looks like a
> > form of corruption that repair doesn't know about today) - you'll
> > need to forcibly remove that inode via xfs_db, something like:
> >
> > # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
> > # xfs_repair /dev/sdc1
> >
> > cheers.
> >
> > ps: Barry, looks like repair needs some work in this area...
> >
> >> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> >> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
> >> extent-state : 0
> >> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
> >> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
> >> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
> >> Aug 22 12:48:12 pers109 kernel: SMP
> >> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
> >> scsi_transport_iscsi
> >> Aug 22 12:48:12 pers109 kernel: CPU: 0
> >> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI
> >> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5)
> >> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
> >> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx:
> >> c047c9bc edx: 00000282
> >> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp:
> >> 00000000 esp: e595db70
> >> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068
> >> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
> >> threadinfo=e595c000 task=d9628a90)
> >> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
> >> 00000282 f3507260 e595dcb0 00000000 d9f9de00
> >> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4
> >> f686cba0 02183812 00000000 00000000 00000000
> >> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c
> >> 00000000 00000000 e595dcb0 00000000 00000000
> >> Aug 22 12:48:12 pers109 kernel: Call Trace:
> >> Aug 22 12:48:12 pers109 kernel: <c0202f0d>
> >> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c
> >> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f
> >> <c03b8660> ip_output+0x189/0x270
> >> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc
> >> <c0104d7a> do_IRQ+0x1e/0x24
> >> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20
> >> <c0259e03> xfs_zero_eof+0x1ca/0x340
> >> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c
> >> <c01762b3> file_update_time+0xa1/0xc0
> >> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5
> >> <c0393654> sock_aio_read+0x83/0x8e
> >> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3
> >> <c028dc12> copy_to_user+0x3c/0x4a
> >> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a
> >> <c015ba73> do_sync_write+0xd5/0x130
> >> Aug 22 12:48:12 pers109 kernel: <c012de03>
> >> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195
> >> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80
> >> <c01027ef> sysenter_past_esp+0x54/0x75
> >> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
> >> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
> >> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
> >> 55 b8 07 00 00 00 57 bf 20 a1 56
> >> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
> >> SS:ESP 0068:e595db70
> >>
> >>
> >>
> >>
> >>
> >>
> >> Scott schrieb:
> >>> Hi Martin,
> >>>
> >>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
> >>>> ...
> >>>> What does this bug mean?
> >>>> ...
> >>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> >>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
> >>>> 0 blkcnt : 90000 extent-state : 0
> >>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
> >>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
> >>> It means XFS detected ondisk corruption in inode# 254474718, and
> >>> paniced your system (stupidly; a fix for this is around, will be
> >>> merged with the next mainline update). For me, a more interesting
> >>> question is how that inode got into this state... have you had any
> >>> crashes recently (i.e. has the filesystem journal needed to be
> >>> replayed recently?) Can you send the output of:
> >>>
> >>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1
> >>>
> >>> You'll need to run xfs_repair on that filesystem to fix this up,
> >>> but please send us that output first.
> >>>
> >>> thanks.
> >>>
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11
2006-11-14 4:00 ` David Chinner
@ 2006-11-14 9:23 ` Martin Braun
2006-11-14 10:12 ` Oleg Verych
0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-14 9:23 UTC (permalink / raw)
To: David Chinner; +Cc: linux-kernel, xfs
Hi David,
> Have you managed to repair the filesystem since you first
> reported this problem? I don't know the history of the bug
that's something I am not sure about, I have used the newest xfs_repair
tools and it found and repaired some inodes. And for about two months
there weren't any crashes.
> you are seeing othat than what you included, so can you
> give us a more complete picture of your hardware and
> what sort of workload you are doing that triggers this
> problem?
The main workload of this machine is high samba activity with few
clients but many IO tasks (i.e. Photoshop batch processing on many 3-6
MB Images). The XFS Partition is on an easy-RAID 16 P. Other Partitions
are EXT3. There are also 2 iSCSI-Partitions with XFS. For Hardware
Information, see below.
After the crash I did an xfs_repair and it found corrupt directory inode
and moved it to lost+found as " 254474253".
Normally the Kernel freezes/hangs completely, but I found two new
Kernel BUG (see below) in the log-messages (without a freeze), the
corresponding java-program was building an lucene-index from a
mysql-database.
It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
Is there a way to be sure that the FS is clean?
>
> FWIW, are there any I/o errors being reported in dmesg or syslog?
There weren't any I/o errors.
Nov 13 14:16:28 pers109 kernel: ------------[ cut here ]------------
Nov 13 14:16:28 pers109 kernel: kernel BUG at :29837!
Nov 13 14:16:28 pers109 kernel: invalid opcode: 0000 [#1]
Nov 13 14:16:28 pers109 kernel: SMP
Nov 13 14:16:28 pers109 kernel: CPU: 2
Nov 13 14:16:28 pers109 kernel: EIP: 0060:[<c0171eea>] Not tainted VLI
Nov 13 14:16:28 pers109 kernel: EFLAGS: 00210202 (2.6.17.11 #1)
Nov 13 14:16:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9
Nov 13 14:16:28 pers109 kernel: eax: c2001e80 ebx: ecadeca0 ecx:
00000003 edx: ecadedd8
Nov 13 14:16:28 pers109 kernel: esi: 00000000 edi: ecadeca0 ebp:
d8699f4c esp: d8699f18
Nov 13 14:16:28 pers109 kernel: ds: 007b es: 007b ss: 0068
Nov 13 14:16:28 pers109 kernel: Process java (pid: 15883,
threadinfo=d8698000 task=d6c78a10)
Nov 13 14:16:28 pers109 kernel: Stack: ecadeca0 00000000 00000000
ecadeca0 d7ce4000 c01720cd ecadeca0 c04738dc
Nov 13 14:16:28 pers109 kernel: 00000000 c01683fc ecadeca0
f1862094 f1862094 c92b5114 c214c0c0 4859aa9a
Nov 13 14:16:28 pers109 kernel: 00000008 d7ce4029 00000010
00000000 00000000 00000000 00000000 c214c0c0
Nov 13 14:16:28 pers109 kernel: Call Trace:
Nov 13 14:16:28 pers109 kernel: <c01720cd> iput+0x5f/0x74 <c01683fc>
do_unlinkat+0xc9/0x107
Nov 13 14:16:28 pers109 kernel: <c015739a> filp_close+0x44/0x6c
<c0168481> sys_unlink+0x17/0x1b
Nov 13 14:16:28 pers109 kernel: <c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 14:16:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44
24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8
9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b
5c 24 18 8b 53 04 8b
Nov 13 14:16:28 pers109 kernel: EIP: [<c0171eea>]
generic_delete_inode+0xf1/0xf9 SS:ESP 0068:d8699f18
Nov 13 20:22:28 pers109 kernel: ------------[ cut here ]------------
Nov 13 20:22:28 pers109 kernel: kernel BUG at :29837!
Nov 13 20:22:28 pers109 kernel: invalid opcode: 0000 [#2]
Nov 13 20:22:28 pers109 kernel: SMP
Nov 13 20:22:28 pers109 kernel: CPU: 3
Nov 13 20:22:28 pers109 kernel: EIP: 0060:[<c0171eea>] Not tainted VLI
Nov 13 20:22:28 pers109 kernel: EFLAGS: 00010202 (2.6.17.11 #1)
Nov 13 20:22:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9
Nov 13 20:22:28 pers109 kernel: eax: c2001f10 ebx: d6c586a0 ecx:
00000003 edx: d6c587d8
Nov 13 20:22:28 pers109 kernel: esi: 00000000 edi: d6c586a0 ebp:
d2cd9f4c esp: d2cd9f18
Nov 13 20:22:28 pers109 kernel: ds: 007b es: 007b ss: 0068
Nov 13 20:22:28 pers109 kernel: Process java (pid: 19824,
threadinfo=d2cd8000 task=d1f575a0)
Nov 13 20:22:28 pers109 kernel: Stack: d6c586a0 00000000 00000000
d6c586a0 d5144000 c01720cd d6c586a0 c04738dc
Nov 13 20:22:28 pers109 kernel: 00000000 c01683fc d6c586a0
dd28e794 dd28e794 f69dd894 c214c0c0 281c233e
Nov 13 20:22:28 pers109 kernel: 00000009 d5144029 00000010
00000000 00000000 00000000 00000000 c214c0c0
Nov 13 20:22:28 pers109 kernel: Call Trace:
Nov 13 20:22:28 pers109 kernel: <c01720cd> iput+0x5f/0x74 <c01683fc>
do_unlinkat+0xc9/0x107
Nov 13 20:22:28 pers109 kernel: <c015739a> filp_close+0x44/0x6c
<c0168481> sys_unlink+0x17/0x1b
Nov 13 20:22:28 pers109 kernel: <c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 20:22:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44
24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8
9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b
5c 24 18 8b 53 04 8b
________
Hardware Info:
________
(Output of cpu0 from 4 (virtual, 2 physical cpus)
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 7
cpu MHz : 1595.120
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 3193.91
__________
uname -a
Linux pers109 2.6.17.11 #1 SMP Mon Aug 28 10:45:48 CEST 2006 i686 i686
i386 GNU/Linux
----------------------
cat /etc/SuSE-release
SuSE Linux 9.3 (i586)
VERSION = 9.3
---------------------
lspci
0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub
(rev 01)
0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B
PCI-to-PCI Bridge (rev 01)
0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1)
(rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2)
(rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3)
(rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface
Controller (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage
Controller (rev 02)
0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit
Ethernet Controller (LOM) (rev 02)
0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit
Ethernet Controller (rev 02)
0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27)
cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: DCAS-34330W Rev: S65A
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 00
Vendor: easyRAID Model: 16P Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 04 Lun: 01
Vendor: easyRAID Model: 16P Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 06 Lun: 00
Vendor: easyRAID Model: X16P Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 06 Lun: 01
Vendor: easyRAID Model: X16P Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: LITE-ON Model: LTR-48246K Rev: SKS7
Type: CD-ROM ANSI SCSI revision: ffffffff
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: HITACHI Model: DF600F Rev: 0000
Type: Direct-Access ANSI SCSI revision: 04
Host: scsi3 Channel: 00 Id: 00 Lun: 01
Vendor: HITACHI Model: DF600F Rev: 0000
Type: Direct-Access ANSI SCSI revision: 03
------------
free
total used free shared buffers cached
Mem: 2075168 2022916 52252 0 4480 1848936
-/+ buffers/cache: 169500 1905668
Swap: 1959920 1782356 177564
>
> Cheers,
>
> Dave.
>
>>> On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
>>>> Hi Nathan,
>>>>
>>>> since I haven't repaired the fs we had a crash again (see below).
>>>>
>>>> unfortunately we copied at the time of the crash over iscsi some files
>>>> to an xfs-fs on a nas.
>>>> and the directory was completely deleted. neither a xfs-check or a
>>>> xfs_repair did find something. was that due to the combination of iscsi
>>>> and xfs?
>>> Sorry for not getting back to you earlier, I've been too busy. :(
>>>
>>> I think you will need to clear out the affected inode (looks like a
>>> form of corruption that repair doesn't know about today) - you'll
>>> need to forcibly remove that inode via xfs_db, something like:
>>>
>>> # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
>>> # xfs_repair /dev/sdc1
>>>
>>> cheers.
>>>
>>> ps: Barry, looks like repair needs some work in this area...
>>>
>>>> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
>>>> extent-state : 0
>>>> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
>>>> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
>>>> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
>>>> Aug 22 12:48:12 pers109 kernel: SMP
>>>> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
>>>> scsi_transport_iscsi
>>>> Aug 22 12:48:12 pers109 kernel: CPU: 0
>>>> Aug 22 12:48:12 pers109 kernel: EIP: 0060:[<c025cb74>] Not tainted VLI
>>>> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246 (2.6.17.8 #5)
>>>> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
>>>> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4 ebx: c04359e4 ecx:
>>>> c047c9bc edx: 00000282
>>>> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0 edi: c056a120 ebp:
>>>> 00000000 esp: e595db70
>>>> Aug 22 12:48:12 pers109 kernel: ds: 007b es: 007b ss: 0068
>>>> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
>>>> threadinfo=e595c000 task=d9628a90)
>>>> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
>>>> 00000282 f3507260 e595dcb0 00000000 d9f9de00
>>>> Aug 22 12:48:12 pers109 kernel: c0202f0d 00000000 c04359e4
>>>> f686cba0 02183812 00000000 00000000 00000000
>>>> Aug 22 12:48:12 pers109 kernel: 003a1531 00000000 0000000c
>>>> 00000000 00000000 e595dcb0 00000000 00000000
>>>> Aug 22 12:48:12 pers109 kernel: Call Trace:
>>>> Aug 22 12:48:12 pers109 kernel: <c0202f0d>
>>>> xfs_bmap_search_extents+0xf5/0xf7 <c0204407> xfs_bmapi+0x229/0x162c
>>>> Aug 22 12:48:12 pers109 kernel: <c039d890> dev_queue_xmit+0x1f4/0x26f
>>>> <c03b8660> ip_output+0x189/0x270
>>>> Aug 22 12:48:12 pers109 kernel: <c012018e> __do_softirq+0x6e/0xdc
>>>> <c0104d7a> do_IRQ+0x1e/0x24
>>>> Aug 22 12:48:12 pers109 kernel: <c0103222> common_interrupt+0x1a/0x20
>>>> <c0259e03> xfs_zero_eof+0x1ca/0x340
>>>> Aug 22 12:48:12 pers109 kernel: <c039a342> memcpy_toiovec+0x37/0x5c
>>>> <c01762b3> file_update_time+0xa1/0xc0
>>>> Aug 22 12:48:12 pers109 kernel: <c025a463> xfs_write+0x4ea/0xda5
>>>> <c0393654> sock_aio_read+0x83/0x8e
>>>> Aug 22 12:48:12 pers109 kernel: <c016e098> fasync_helper+0x4b/0xd3
>>>> <c028dc12> copy_to_user+0x3c/0x4a
>>>> Aug 22 12:48:12 pers109 kernel: <c025572f> xfs_file_aio_write+0x8f/0x9a
>>>> <c015ba73> do_sync_write+0xd5/0x130
>>>> Aug 22 12:48:12 pers109 kernel: <c012de03>
>>>> autoremove_wake_function+0x0/0x4b <c015bb99> vfs_write+0xcb/0x195
>>>> Aug 22 12:48:12 pers109 kernel: <c015be3e> sys_pwrite64+0x73/0x80
>>>> <c01027ef> sysenter_past_esp+0x54/0x75
>>>> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
>>>> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
>>>> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
>>>> 55 b8 07 00 00 00 57 bf 20 a1 56
>>>> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
>>>> SS:ESP 0068:e595db70
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Scott schrieb:
>>>>> Hi Martin,
>>>>>
>>>>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
>>>>>> ...
>>>>>> What does this bug mean?
>>>>>> ...
>>>>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
>>>>>> 0 blkcnt : 90000 extent-state : 0
>>>>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
>>>>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
>>>>> It means XFS detected ondisk corruption in inode# 254474718, and
>>>>> paniced your system (stupidly; a fix for this is around, will be
>>>>> merged with the next mainline update). For me, a more interesting
>>>>> question is how that inode got into this state... have you had any
>>>>> crashes recently (i.e. has the filesystem journal needed to be
>>>>> replayed recently?) Can you send the output of:
>>>>>
>>>>> # xfs_db -c 'inode 254474718' -c print /dev/sdc1
>>>>>
>>>>> You'll need to run xfs_repair on that filesystem to fix this up,
>>>>> but please send us that output first.
>>>>>
>>>>> thanks.
>>>>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
--
Universitaetsbibliothek Heidelberg Tel: +49 6221 54-2580
Ploeck 107-109, D-69117 Heidelberg Fax: +49 6221 54-2623
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11
2006-11-14 9:23 ` Martin Braun
@ 2006-11-14 10:12 ` Oleg Verych
2006-11-14 10:31 ` Martin Braun
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Verych @ 2006-11-14 10:12 UTC (permalink / raw)
To: Martin Braun, David Chinner, LKML, xfs
Hallo.
On 2006-11-14, Martin Braun wrote:
> Hi David,
>
>
>> Have you managed to repair the filesystem since you first
>> reported this problem? I don't know the history of the bug
[Well. Just to help (probably) new developers, after Nathan left SGI.]
Here's FAQ node about bug:
http://oss.sgi.com/projects/xfs/faq.html#dir2
You can find fixes in .17 stable git tree.
If it was really just sparse annotations, they were obviously
fixed, i think. If not, meybe there are some new bugs.
> that's something I am not sure about, I have used the newest xfs_repair
> tools and it found and repaired some inodes. And for about two months
> there weren't any crashes.
+
> It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
> Is there a way to be sure that the FS is clean?
As in faq:
,--
.....
| Update: a fixed xfs_repair is now available; version 2.8.10 or later
| of the xfsprogs package contains the fixed version.
.....
| The xfs_check tool, or xfs_repair -n, should be able to detect any
| directory corruption.
`--
[]
> Normally the Kernel freezes/hangs completely, but I found two new
Do you mean panic or oops here, or just freeze?
____
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11
2006-11-14 10:12 ` Oleg Verych
@ 2006-11-14 10:31 ` Martin Braun
2006-11-14 11:21 ` Oleg Verych
0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-14 10:31 UTC (permalink / raw)
To: Oleg Verych; +Cc: David Chinner, LKML, xfs
Hi Oleg,
thanks for your response.
> You can find fixes in .17 stable git tree.
Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup
kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate
shows two times: first run correct time, second run time is half an hour
in the future - so we switched back to 2.6.17.11
> If it was really just sparse annotations, they were obviously
> fixed, i think. If not, meybe there are some new bugs.
> +
>> It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
>> Is there a way to be sure that the FS is clean?
>
> As in faq:
> | Update: a fixed xfs_repair is now available; version 2.8.10 or later
> | of the xfsprogs package contains the fixed version.
> .....
> | The xfs_check tool, or xfs_repair -n, should be able to detect any
> | directory corruption.
However the two Kernel BUGS were _after_ xfs_repair (version 2.8.10).
>> Normally the Kernel freezes/hangs completely, but I found two new
>
> Do you mean panic or oops here, or just freeze?
In detail: a Kernel BUG in /var/log/messages is written and after that
the cpu load average is climbing up to 20-30, any tries to shutdown the
system, kill processes umounts etc. are in vain. Than the system freezes
completely: no keyboard, nothing.
cheers,
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs kernel BUG again in 2.6.17.11
2006-11-14 10:31 ` Martin Braun
@ 2006-11-14 11:21 ` Oleg Verych
0 siblings, 0 replies; 10+ messages in thread
From: Oleg Verych @ 2006-11-14 11:21 UTC (permalink / raw)
To: Martin Braun; +Cc: LKML
On Tue, Nov 14, 2006 at 11:31:42AM +0100, Martin Braun wrote:
> Hi Oleg,
>
> thanks for your response.
> > You can find fixes in .17 stable git tree.
> Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup
> kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate
2.6.18 have many XFS fixes, that were not backported to 2.6.17
(and will not, i think).
> shows two times: first run correct time, second run time is half an hour
> in the future - so we switched back to 2.6.17.11
(And not mixing too much here, try to search last 3 months for "ntp".
There are people, who can actually help with that. 2.6.19-rc have even
more timekeeping fixes, maybe something will work for you).
____
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-11-14 11:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
2006-08-15 14:31 ` Arjan van de Ven
2006-08-16 0:11 ` Nathan Scott
2006-08-16 9:05 ` Martin Braun
[not found] ` <44EB228F.6020903@uni-hd.de>
[not found] ` <20060823134211.E2968256@wobbly.melbourne.sgi.com>
2006-11-13 9:28 ` xfs kernel BUG again in 2.6.17.11 Martin Braun
2006-11-14 4:00 ` David Chinner
2006-11-14 9:23 ` Martin Braun
2006-11-14 10:12 ` Oleg Verych
2006-11-14 10:31 ` Martin Braun
2006-11-14 11:21 ` Oleg Verych
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox