kernel BUG at <bad filename>:50307!

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* kernel BUG at <bad filename>:50307!
@ 2006-08-15 14:27 Martin Braun
  2006-08-15 14:31 ` Arjan van de Ven
  2006-08-16  0:11 ` Nathan Scott
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Braun @ 2006-08-15 14:27 UTC (permalink / raw)
  To: linux-kernel

Hello all,

I got this bug (see below) in my logs, the system showed with "top" an
increasing load average of 11 and more but with an cpu-idle of 99% and
no processes used mentionable resources, there were 6 zombies. A
shutdown was not possible most of the samba processes  didn't respond to
a kill.
Before the exception the server was -as usual- under heavy load of samba
processes 4-5 clients, with many automated activity (batch-processes
with image processing).

What does this bug mean?

Hardware-Details:
* Device sdc (on an easy-raid system) has an XFS Filesystem.
* uname -a
Linux pers109 2.6.17.8 #1 SMP Mon Aug 7 11:04:08 CEST 2006 i686 i686
i386 GNU/Linux
* lspci
0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub
(rev 01)
0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B
PCI-to-PCI Bridge (rev 01)
0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1)
(rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2)
(rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3)
(rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface
Controller (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage
Controller (rev 02)
0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit
Ethernet Controller (LOM) (rev 02)
0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit
Ethernet Controller (rev 02)
0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27)
===

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 7
cpu MHz         : 1595.130
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips        : 3193.91
===


/usr/local/samba/sbin/smbd -V
Version 3.0.20

_________________________
/var/log/messages extract
--------------------------


Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
254474718 start_block : 0 start_off : c0a0b0e8a099
0 blkcnt : 90000 extent-state : 0
Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
Aug 15 15:01:02 pers109 kernel: invalid opcode: 0000 [#1]
Aug 15 15:01:02 pers109 kernel: SMP
Aug 15 15:01:02 pers109 kernel: CPU:    0
Aug 15 15:01:02 pers109 kernel: EIP:    0060:[<c0257d64>]    Not tainted VLI
Aug 15 15:01:02 pers109 kernel: EFLAGS: 00010246   (2.6.17.8 #1)
Aug 15 15:01:02 pers109 kernel: eax: c0479f84   ebx: c0436464   ecx:
c046c9bc   edx: 00000282
Aug 15 15:01:02 pers109 kernel: esi: cea51cb0   edi: c0526120   ebp:
00000000   esp: cea51b70
Aug 15 15:01:02 pers109 kernel: ds: 007b   es: 007b   ss: 0068
Aug 15 15:01:02 pers109 kernel: Process smbd (pid: 18095,
threadinfo=cea50000 task=c212e0b0)
Aug 15 15:01:02 pers109 kernel: Stack: c04452ac c042855c c0526120
00000282 f7204db0 cea51cb0 00000000 e31d5b00
Aug 15 15:01:02 pers109 kernel:        c01fe13d 00000000 c0436464
c49083e0 0f2af9de 00000000 00000000 00000000
Aug 15 15:01:02 pers109 kernel:        0e8a0990 000c0a0b 00090000
00000000 00000000 cea51cb0 00000000 00000000
Aug 15 15:01:02 pers109 kernel: Call Trace:
Aug 15 15:01:02 pers109 kernel:  <c01fe13d>   <c01ff637>
Aug 15 15:01:02 pers109 kernel:  <c0115e51>   <c0115e51>
Aug 15 15:01:02 pers109 kernel:  <c015987b>   <c015a791>
Aug 15 15:01:03 pers109 kernel:  <c0140a91>   <c0254ff3>
Aug 15 15:01:03 pers109 kernel:  <c039c7d2>   <c017187e>
Aug 15 15:01:03 pers109 kernel:  <c0255653>   <c0395c89>
Aug 15 15:01:03 pers109 kernel:  <c01696c8>   <c0288a3a>
Aug 15 15:01:03 pers109 kernel:  <c025091f>   <c0157383>
Aug 15 15:01:03 pers109 kernel:  <c012d613>   <c01574a9>
Aug 15 15:01:03 pers109 kernel:  <c015774e>   <c01027df>
Aug 15 15:01:03 pers109 kernel: Code: c0 c7 44 24 08 20 61 52 c0 c7 04
24 ac 52 44 c0 89 44 24 04 e8 5b 34 ec ff b8 84 9f
47 c0 8b 54 24 0c e8 bc fa 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f
5d c3 55 b8 07 00 00 00 57 bf 20 61 52
Aug 15 15:01:03 pers109 kernel: EIP: [<c0257d64>]  SS:ESP 0068:cea51b70


thanks in advance,
martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at <bad filename>:50307!
  2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
@ 2006-08-15 14:31 ` Arjan van de Ven
  2006-08-16  0:11 ` Nathan Scott
  1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-08-15 14:31 UTC (permalink / raw)
  To: mbraun; +Cc: linux-kernel

On Tue, 2006-08-15 at 16:27 +0200, Martin Braun wrote:
> Hello all,
> 
> I got this bug (see below) in my logs, the system showed with "top" an
> increasing load average of 11 and more but with an cpu-idle of 99% and
> no processes used mentionable resources, there were 6 zombies. A
> shutdown was not possible most of the samba processes  didn't respond to
> a kill.
> Before the exception the server was -as usual- under heavy load of samba
> processes 4-5 clients, with many automated activity (batch-processes
> with image processing).
> 
> What does this bug mean?

Hi,

it means you don't have CONFIG_KALLSYMS enabled, so the kernel isn't
able to give a decent debugging output in the oops.. if it's a
repeatable oops turning that option on would be a great help to even
figure out which part of the kernel is involved...

Greetings,
   Arjan van de Ven
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at <bad filename>:50307!
  2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
  2006-08-15 14:31 ` Arjan van de Ven
@ 2006-08-16  0:11 ` Nathan Scott
  2006-08-16  9:05   ` Martin Braun
       [not found]   ` <44EB228F.6020903@uni-hd.de>
  1 sibling, 2 replies; 10+ messages in thread
From: Nathan Scott @ 2006-08-16  0:11 UTC (permalink / raw)
  To: Martin Braun; +Cc: linux-kernel, xfs

Hi Martin,

On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
> ...
> What does this bug mean?
> ...
> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> 254474718 start_block : 0 start_off : c0a0b0e8a099
> 0 blkcnt : 90000 extent-state : 0
> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!

It means XFS detected ondisk corruption in inode# 254474718, and
paniced your system (stupidly; a fix for this is around, will be
merged with the next mainline update).  For me, a more interesting
question is how that inode got into this state... have you had any
crashes recently (i.e. has the filesystem journal needed to be
replayed recently?)  Can you send the output of:

	# xfs_db -c 'inode 254474718' -c print /dev/sdc1

You'll need to run xfs_repair on that filesystem to fix this up,
but please send us that output first.

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at <bad filename>:50307!
  2006-08-16  0:11 ` Nathan Scott
@ 2006-08-16  9:05   ` Martin Braun
       [not found]   ` <44EB228F.6020903@uni-hd.de>
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Braun @ 2006-08-16  9:05 UTC (permalink / raw)
  To: Nathan Scott, linux-kernel

Hi Nathan,
> It means XFS detected ondisk corruption in inode# 254474718, and
> paniced your system (stupidly; a fix for this is around, will be
> merged with the next mainline update).  For me, a more interesting
> question is how that inode got into this state... have you had any
> crashes recently (i.e. has the filesystem journal needed to be
> replayed recently?)  Can you send the output of:

We had recently problems with our XFS partition caused by the Kernel-Bug
in 2.6.17. I updated xfsprogs-2.8.10 and repaired the partition with
xfs_repair - it found a corrupted dir-inode (254474253)


> 
> 	# xfs_db -c 'inode 254474718' -c print /dev/sdc1
> You'll need to run xfs_repair on that filesystem to fix this up,
> but please send us that output first.

core.magic = 0x494e
core.mode = 0100774
core.version = 1
core.format = 3 (btree)
core.nlinkv1 = 1
core.uid = 1348
core.gid = 104
core.flushiter = 0
core.atime.sec = Tue Aug 15 15:00:58 2006
core.atime.nsec = 934572500
core.mtime.sec = Tue Aug 15 15:01:02 2006
core.mtime.nsec = 261116500
core.ctime.sec = Tue Aug 15 15:01:02 2006
core.ctime.nsec = 261116500
core.size = 10092544
core.nblocks = 197
core.extsize = 0
core.nextents = 182
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 9
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[1]
u.bmbt.ptrs[1] = 1:112941297



 thanks.
martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* xfs kernel BUG again in 2.6.17.11
       [not found]     ` <20060823134211.E2968256@wobbly.melbourne.sgi.com>
@ 2006-11-13  9:28       ` Martin Braun
  2006-11-14  4:00         ` David Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-13  9:28 UTC (permalink / raw)
  To: linux-kernel

Hi ,

is it possible that the xfs  kernel bug is in the 2.6.17.11 Kernel again?
we got obviously the same bug as with 2.6.17.8:


Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode:
637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84
extent-state : 0
Nov 13 09:27:01 pers109 kernel: ------------[ cut here ]------------
Nov 13 09:27:01 pers109 kernel: kernel BUG at <bad filename>:50307!
Nov 13 09:27:01 pers109 kernel: invalid opcode: 0000 [#2]
Nov 13 09:27:01 pers109 kernel: SMP
Nov 13 09:27:01 pers109 kernel: CPU:    1
Nov 13 09:27:01 pers109 kernel: EIP:    0060:[<c0258984>]    Not tainted VLI
Nov 13 09:27:01 pers109 kernel: EFLAGS: 00010246   (2.6.17.11 #1)
Nov 13 09:27:01 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
Nov 13 09:27:01 pers109 kernel: eax: c047d144   ebx: c04385a0   ecx:
c046f9bc   edx: 00000282
Nov 13 09:27:01 pers109 kernel: esi: c33a3cb0   edi: c055e120   ebp:
00000000   esp: c33a3b70
Nov 13 09:27:01 pers109 kernel: ds: 007b   es: 007b   ss: 0068
Nov 13 09:27:01 pers109 kernel: Process smbd (pid: 26181,
threadinfo=c33a2000 task=e00bead0)
Nov 13 09:27:01 pers109 kernel: Stack: c0447536 c042a5d5 c055e120
00000282 ec894ae0 c33a3cb0 00000000 e2d85c80
Nov 13 09:27:01 pers109 kernel:        c01fed1d 00000000 c04385a0
f69e3a00 2600182f 00000000 00000000 00000000
Nov 13 09:27:01 pers109 kernel:        30000000 00238125 00000084
00000000 00000000 c33a3cb0 00000000 00000000
Nov 13 09:27:01 pers109 kernel: Call Trace:
Nov 13 09:27:01 pers109 kernel:  <c01fed1d>
xfs_bmap_search_extents+0xf5/0xf7  <c0200217> xfs_bmapi+0x229/0x162c
Nov 13 09:27:01 pers109 kernel:  <c0115eb1>
default_wake_function+0x0/0x12  <c03bb980> ip_output+0x189/0x270
Nov 13 09:27:01 pers109 kernel:  <c015a22b> mark_buffer_dirty+0x25/0x29
 <c015b131> __block_commit_write+0x7e/0xb4
Nov 13 09:27:01 pers109 kernel:  <c0141441> __pagevec_lru_add+0xa2/0xb5
 <c0255c13> xfs_zero_eof+0x1ca/0x340
Nov 13 09:27:01 pers109 kernel:  <c039d882> memcpy_toiovec+0x37/0x5c
<c0172283> file_update_time+0xa1/0xc0
Nov 13 09:27:01 pers109 kernel:  <c0256273> xfs_write+0x4ea/0xda5
<c0396d39> sock_aio_read+0x83/0x8e
Nov 13 09:27:01 pers109 kernel:  <c025153f> xfs_file_aio_write+0x8f/0x9a
 <c0157d33> do_sync_write+0xd5/0x130
Nov 13 09:27:01 pers109 kernel:  <c012d743>
autoremove_wake_function+0x0/0x4b  <c0157e59> vfs_write+0xcb/0x195
Nov 13 09:27:01 pers109 kernel:  <c01580fe> sys_pwrite64+0x73/0x80
<c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 09:27:01 pers109 kernel: Code: c0 c7 44 24 08 20 e1 55 c0 c7 04
24 36 75 44 c0 89 44 24 04 e8 8b 29 ec ff b8 44 d1 47 c0 8b 54 24 0c e8
bc ff 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3 55 b8 07 00 00
00 57 bf 20 e1 55
Nov 13 09:27:01 pers109 kernel: EIP: [<c0258984>] cmn_err+0xa0/0xaa
SS:ESP 0068:c33a3b70

I will remove the corresponding block...

thanks,
martin


> On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
>> Hi Nathan,
>>
>> since I haven't repaired the fs we had a crash again (see below).
>>
>> unfortunately we copied at the time of the crash over iscsi some files
>> to an xfs-fs on a nas.
>> and the directory was completely deleted. neither a xfs-check or a
>> xfs_repair did find something. was that due to the combination of iscsi
>> and xfs?
> 
> Sorry for not getting back to you earlier, I've been too busy. :(
> 
> I think you will need to clear out the affected inode (looks like a
> form of corruption that repair doesn't know about today) - you'll
> need to forcibly remove that inode via xfs_db, something like:
> 
> # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
> # xfs_repair /dev/sdc1
> 
> cheers.
> 
> ps: Barry, looks like repair needs some work in this area...
> 
>> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
>>  extent-state : 0
>> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
>> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
>> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
>> Aug 22 12:48:12 pers109 kernel: SMP
>> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
>> scsi_transport_iscsi
>> Aug 22 12:48:12 pers109 kernel: CPU:    0
>> Aug 22 12:48:12 pers109 kernel: EIP:    0060:[<c025cb74>]    Not tainted VLI
>> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246   (2.6.17.8 #5)
>> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
>> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4   ebx: c04359e4   ecx:
>> c047c9bc   edx: 00000282
>> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0   edi: c056a120   ebp:
>> 00000000   esp: e595db70
>> Aug 22 12:48:12 pers109 kernel: ds: 007b   es: 007b   ss: 0068
>> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
>> threadinfo=e595c000 task=d9628a90)
>> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
>> 00000282 f3507260 e595dcb0 00000000 d9f9de00
>> Aug 22 12:48:12 pers109 kernel:        c0202f0d 00000000 c04359e4
>> f686cba0 02183812 00000000 00000000 00000000
>> Aug 22 12:48:12 pers109 kernel:        003a1531 00000000 0000000c
>> 00000000 00000000 e595dcb0 00000000 00000000
>> Aug 22 12:48:12 pers109 kernel: Call Trace:
>> Aug 22 12:48:12 pers109 kernel:  <c0202f0d>
>> xfs_bmap_search_extents+0xf5/0xf7  <c0204407> xfs_bmapi+0x229/0x162c
>> Aug 22 12:48:12 pers109 kernel:  <c039d890> dev_queue_xmit+0x1f4/0x26f
>> <c03b8660> ip_output+0x189/0x270
>> Aug 22 12:48:12 pers109 kernel:  <c012018e> __do_softirq+0x6e/0xdc
>> <c0104d7a> do_IRQ+0x1e/0x24
>> Aug 22 12:48:12 pers109 kernel:  <c0103222> common_interrupt+0x1a/0x20
>> <c0259e03> xfs_zero_eof+0x1ca/0x340
>> Aug 22 12:48:12 pers109 kernel:  <c039a342> memcpy_toiovec+0x37/0x5c
>> <c01762b3> file_update_time+0xa1/0xc0
>> Aug 22 12:48:12 pers109 kernel:  <c025a463> xfs_write+0x4ea/0xda5
>> <c0393654> sock_aio_read+0x83/0x8e
>> Aug 22 12:48:12 pers109 kernel:  <c016e098> fasync_helper+0x4b/0xd3
>> <c028dc12> copy_to_user+0x3c/0x4a
>> Aug 22 12:48:12 pers109 kernel:  <c025572f> xfs_file_aio_write+0x8f/0x9a
>>  <c015ba73> do_sync_write+0xd5/0x130
>> Aug 22 12:48:12 pers109 kernel:  <c012de03>
>> autoremove_wake_function+0x0/0x4b  <c015bb99> vfs_write+0xcb/0x195
>> Aug 22 12:48:12 pers109 kernel:  <c015be3e> sys_pwrite64+0x73/0x80
>> <c01027ef> sysenter_past_esp+0x54/0x75
>> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
>> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
>> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
>> 55 b8 07 00 00 00 57 bf 20 a1 56
>> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
>> SS:ESP 0068:e595db70
>>
>>
>>
>>
>>
>>
>>  Scott schrieb:
>>> Hi Martin,
>>>
>>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
>>>> ...
>>>> What does this bug mean?
>>>> ...
>>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
>>>> 0 blkcnt : 90000 extent-state : 0
>>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
>>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
>>> It means XFS detected ondisk corruption in inode# 254474718, and
>>> paniced your system (stupidly; a fix for this is around, will be
>>> merged with the next mainline update).  For me, a more interesting
>>> question is how that inode got into this state... have you had any
>>> crashes recently (i.e. has the filesystem journal needed to be
>>> replayed recently?)  Can you send the output of:
>>>
>>> 	# xfs_db -c 'inode 254474718' -c print /dev/sdc1
>>>
>>> You'll need to run xfs_repair on that filesystem to fix this up,
>>> but please send us that output first.
>>>
>>> thanks.
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xfs kernel BUG again in 2.6.17.11
  2006-11-13  9:28       ` xfs kernel BUG again in 2.6.17.11 Martin Braun
@ 2006-11-14  4:00         ` David Chinner
  2006-11-14  9:23           ` Martin Braun
  0 siblings, 1 reply; 10+ messages in thread
From: David Chinner @ 2006-11-14  4:00 UTC (permalink / raw)
  To: Martin Braun; +Cc: linux-kernel, xfs

On Mon, Nov 13, 2006 at 10:28:30AM +0100, Martin Braun wrote:
> Hi ,
> 
> is it possible that the xfs  kernel bug is in the 2.6.17.11 Kernel again?
> we got obviously the same bug as with 2.6.17.8:

It's likely that XFS is identical in those 2 releases.

BTW, Martin, can you cc XFS bug reports to xfs@oss.sgi.com in future?

> Nov 13 09:27:01 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> 637540399 start_block : 0 start_off : 23812530000000 blkcnt : 84
> extent-state : 0

Looks like you are managing to trigger an inode corruption
of some sort.

Have you managed to repair the filesystem since you first
reported this problem? I don't know the history of the bug
you are seeing othat than what you included, so can you
give us a more complete picture of your hardware and
what sort of workload you are doing that triggers this
problem?

FWIW, are there any I/o errors being reported in dmesg or syslog?

Cheers,

Dave.

> > On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
> >> Hi Nathan,
> >>
> >> since I haven't repaired the fs we had a crash again (see below).
> >>
> >> unfortunately we copied at the time of the crash over iscsi some files
> >> to an xfs-fs on a nas.
> >> and the directory was completely deleted. neither a xfs-check or a
> >> xfs_repair did find something. was that due to the combination of iscsi
> >> and xfs?
> > 
> > Sorry for not getting back to you earlier, I've been too busy. :(
> > 
> > I think you will need to clear out the affected inode (looks like a
> > form of corruption that repair doesn't know about today) - you'll
> > need to forcibly remove that inode via xfs_db, something like:
> > 
> > # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
> > # xfs_repair /dev/sdc1
> > 
> > cheers.
> > 
> > ps: Barry, looks like repair needs some work in this area...
> > 
> >> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> >> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
> >>  extent-state : 0
> >> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
> >> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
> >> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
> >> Aug 22 12:48:12 pers109 kernel: SMP
> >> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
> >> scsi_transport_iscsi
> >> Aug 22 12:48:12 pers109 kernel: CPU:    0
> >> Aug 22 12:48:12 pers109 kernel: EIP:    0060:[<c025cb74>]    Not tainted VLI
> >> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246   (2.6.17.8 #5)
> >> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
> >> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4   ebx: c04359e4   ecx:
> >> c047c9bc   edx: 00000282
> >> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0   edi: c056a120   ebp:
> >> 00000000   esp: e595db70
> >> Aug 22 12:48:12 pers109 kernel: ds: 007b   es: 007b   ss: 0068
> >> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
> >> threadinfo=e595c000 task=d9628a90)
> >> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
> >> 00000282 f3507260 e595dcb0 00000000 d9f9de00
> >> Aug 22 12:48:12 pers109 kernel:        c0202f0d 00000000 c04359e4
> >> f686cba0 02183812 00000000 00000000 00000000
> >> Aug 22 12:48:12 pers109 kernel:        003a1531 00000000 0000000c
> >> 00000000 00000000 e595dcb0 00000000 00000000
> >> Aug 22 12:48:12 pers109 kernel: Call Trace:
> >> Aug 22 12:48:12 pers109 kernel:  <c0202f0d>
> >> xfs_bmap_search_extents+0xf5/0xf7  <c0204407> xfs_bmapi+0x229/0x162c
> >> Aug 22 12:48:12 pers109 kernel:  <c039d890> dev_queue_xmit+0x1f4/0x26f
> >> <c03b8660> ip_output+0x189/0x270
> >> Aug 22 12:48:12 pers109 kernel:  <c012018e> __do_softirq+0x6e/0xdc
> >> <c0104d7a> do_IRQ+0x1e/0x24
> >> Aug 22 12:48:12 pers109 kernel:  <c0103222> common_interrupt+0x1a/0x20
> >> <c0259e03> xfs_zero_eof+0x1ca/0x340
> >> Aug 22 12:48:12 pers109 kernel:  <c039a342> memcpy_toiovec+0x37/0x5c
> >> <c01762b3> file_update_time+0xa1/0xc0
> >> Aug 22 12:48:12 pers109 kernel:  <c025a463> xfs_write+0x4ea/0xda5
> >> <c0393654> sock_aio_read+0x83/0x8e
> >> Aug 22 12:48:12 pers109 kernel:  <c016e098> fasync_helper+0x4b/0xd3
> >> <c028dc12> copy_to_user+0x3c/0x4a
> >> Aug 22 12:48:12 pers109 kernel:  <c025572f> xfs_file_aio_write+0x8f/0x9a
> >>  <c015ba73> do_sync_write+0xd5/0x130
> >> Aug 22 12:48:12 pers109 kernel:  <c012de03>
> >> autoremove_wake_function+0x0/0x4b  <c015bb99> vfs_write+0xcb/0x195
> >> Aug 22 12:48:12 pers109 kernel:  <c015be3e> sys_pwrite64+0x73/0x80
> >> <c01027ef> sysenter_past_esp+0x54/0x75
> >> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
> >> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
> >> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
> >> 55 b8 07 00 00 00 57 bf 20 a1 56
> >> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
> >> SS:ESP 0068:e595db70
> >>
> >>
> >>
> >>
> >>
> >>
> >>  Scott schrieb:
> >>> Hi Martin,
> >>>
> >>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
> >>>> ...
> >>>> What does this bug mean?
> >>>> ...
> >>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
> >>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
> >>>> 0 blkcnt : 90000 extent-state : 0
> >>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
> >>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
> >>> It means XFS detected ondisk corruption in inode# 254474718, and
> >>> paniced your system (stupidly; a fix for this is around, will be
> >>> merged with the next mainline update).  For me, a more interesting
> >>> question is how that inode got into this state... have you had any
> >>> crashes recently (i.e. has the filesystem journal needed to be
> >>> replayed recently?)  Can you send the output of:
> >>>
> >>> 	# xfs_db -c 'inode 254474718' -c print /dev/sdc1
> >>>
> >>> You'll need to run xfs_repair on that filesystem to fix this up,
> >>> but please send us that output first.
> >>>
> >>> thanks.
> >>>
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xfs kernel BUG again in 2.6.17.11
  2006-11-14  4:00         ` David Chinner
@ 2006-11-14  9:23           ` Martin Braun
  2006-11-14 10:12             ` Oleg Verych
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-14  9:23 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs

Hi David,


> Have you managed to repair the filesystem since you first
> reported this problem? I don't know the history of the bug
that's something I am not sure about, I have used the newest xfs_repair
tools and it found and repaired some inodes. And for about two months
there weren't any crashes.

> you are seeing othat than what you included, so can you
> give us a more complete picture of your hardware and
> what sort of workload you are doing that triggers this
> problem?
The main workload of this machine is high samba activity with few
clients but many IO tasks (i.e. Photoshop batch  processing on many 3-6
MB Images). The XFS Partition is on an easy-RAID 16 P. Other Partitions
are EXT3. There are also 2 iSCSI-Partitions with XFS.  For Hardware
Information, see below.
After the crash I did an xfs_repair and it found corrupt directory inode
and moved it to lost+found as " 254474253".

Normally  the Kernel freezes/hangs completely, but I found two new
Kernel BUG (see below) in the log-messages (without a freeze), the
corresponding java-program was building an lucene-index from a
mysql-database.

It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
Is there a way to be sure that the FS is clean?

>
> FWIW, are there any I/o errors being reported in dmesg or syslog?
There weren't any  I/o errors.

Nov 13 14:16:28 pers109 kernel: ------------[ cut here ]------------
Nov 13 14:16:28 pers109 kernel: kernel BUG at :29837!
Nov 13 14:16:28 pers109 kernel: invalid opcode: 0000 [#1]
Nov 13 14:16:28 pers109 kernel: SMP
Nov 13 14:16:28 pers109 kernel: CPU:    2
Nov 13 14:16:28 pers109 kernel: EIP:    0060:[<c0171eea>]    Not tainted VLI
Nov 13 14:16:28 pers109 kernel: EFLAGS: 00210202   (2.6.17.11 #1)
Nov 13 14:16:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9
Nov 13 14:16:28 pers109 kernel: eax: c2001e80   ebx: ecadeca0   ecx:
00000003   edx: ecadedd8
Nov 13 14:16:28 pers109 kernel: esi: 00000000   edi: ecadeca0   ebp:
d8699f4c   esp: d8699f18
Nov 13 14:16:28 pers109 kernel: ds: 007b   es: 007b   ss: 0068
Nov 13 14:16:28 pers109 kernel: Process java (pid: 15883,
threadinfo=d8698000 task=d6c78a10)
Nov 13 14:16:28 pers109 kernel: Stack: ecadeca0 00000000 00000000
ecadeca0 d7ce4000 c01720cd ecadeca0 c04738dc
Nov 13 14:16:28 pers109 kernel:        00000000 c01683fc ecadeca0
f1862094 f1862094 c92b5114 c214c0c0 4859aa9a
Nov 13 14:16:28 pers109 kernel:        00000008 d7ce4029 00000010
00000000 00000000 00000000 00000000 c214c0c0
Nov 13 14:16:28 pers109 kernel: Call Trace:
Nov 13 14:16:28 pers109 kernel:  <c01720cd> iput+0x5f/0x74  <c01683fc>
do_unlinkat+0xc9/0x107
Nov 13 14:16:28 pers109 kernel:  <c015739a> filp_close+0x44/0x6c
<c0168481> sys_unlink+0x17/0x1b
Nov 13 14:16:28 pers109 kernel:  <c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 14:16:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44
24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8
9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b
5c 24 18 8b 53 04 8b
Nov 13 14:16:28 pers109 kernel: EIP: [<c0171eea>]
generic_delete_inode+0xf1/0xf9 SS:ESP 0068:d8699f18


Nov 13 20:22:28 pers109 kernel: ------------[ cut here ]------------
Nov 13 20:22:28 pers109 kernel: kernel BUG at :29837!
Nov 13 20:22:28 pers109 kernel: invalid opcode: 0000 [#2]
Nov 13 20:22:28 pers109 kernel: SMP
Nov 13 20:22:28 pers109 kernel: CPU:    3
Nov 13 20:22:28 pers109 kernel: EIP:    0060:[<c0171eea>]    Not tainted VLI
Nov 13 20:22:28 pers109 kernel: EFLAGS: 00010202   (2.6.17.11 #1)
Nov 13 20:22:28 pers109 kernel: EIP is at generic_delete_inode+0xf1/0xf9
Nov 13 20:22:28 pers109 kernel: eax: c2001f10   ebx: d6c586a0   ecx:
00000003   edx: d6c587d8
Nov 13 20:22:28 pers109 kernel: esi: 00000000   edi: d6c586a0   ebp:
d2cd9f4c   esp: d2cd9f18
Nov 13 20:22:28 pers109 kernel: ds: 007b   es: 007b   ss: 0068
Nov 13 20:22:28 pers109 kernel: Process java (pid: 19824,
threadinfo=d2cd8000 task=d1f575a0)
Nov 13 20:22:28 pers109 kernel: Stack: d6c586a0 00000000 00000000
d6c586a0 d5144000 c01720cd d6c586a0 c04738dc
Nov 13 20:22:28 pers109 kernel:        00000000 c01683fc d6c586a0
dd28e794 dd28e794 f69dd894 c214c0c0 281c233e
Nov 13 20:22:28 pers109 kernel:        00000009 d5144029 00000010
00000000 00000000 00000000 00000000 c214c0c0
Nov 13 20:22:28 pers109 kernel: Call Trace:
Nov 13 20:22:28 pers109 kernel:  <c01720cd> iput+0x5f/0x74  <c01683fc>
do_unlinkat+0xc9/0x107
Nov 13 20:22:28 pers109 kernel:  <c015739a> filp_close+0x44/0x6c
<c0168481> sys_unlink+0x17/0x1b
Nov 13 20:22:28 pers109 kernel:  <c01027ef> sysenter_past_esp+0x54/0x75
Nov 13 20:22:28 pers109 kernel: Code: f0 ff ff 8d 83 a8 00 00 00 c7 44
24 04 00 00 00 00 c7 44 24 08 00 00 00 00 89 04 24 e8 b1 fb fc ff 8
9 1c 24 e8 aa f1 ff ff eb 89 <0f> 0b 8d 74 26 00 eb c2 56 53 83 ec 0c 8b
5c 24 18 8b 53 04 8b
________
Hardware Info:
________
(Output of cpu0 from 4 (virtual, 2 physical cpus)
cat /proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 7
cpu MHz         : 1595.120
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips        : 3193.91

__________

uname -a
Linux pers109 2.6.17.11 #1 SMP Mon Aug 28 10:45:48 CEST 2006 i686 i686
i386 GNU/Linux

----------------------
cat /etc/SuSE-release
SuSE Linux 9.3 (i586)
VERSION = 9.3
---------------------
lspci

0000:00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub
(rev 01)
0000:00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B
PCI-to-PCI Bridge (rev 01)
0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1)
(rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2)
(rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3)
(rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
0000:00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface
Controller (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage
Controller (rev 02)
0000:01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
0000:01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge
(rev 04)
0000:02:05.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:02:05.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
0000:03:03.0 Ethernet controller: Intel Corporation 82544GC Gigabit
Ethernet Controller (LOM) (rev 02)
0000:04:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit
Ethernet Controller (rev 02)
0000:04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27)

cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: DCAS-34330W      Rev: S65A
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 04 Lun: 00
  Vendor: easyRAID Model:  16P             Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 04 Lun: 01
  Vendor: easyRAID Model:  16P             Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 06 Lun: 00
  Vendor: easyRAID Model:  X16P            Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 06 Lun: 01
  Vendor: easyRAID Model:  X16P            Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: LITE-ON  Model: LTR-48246K       Rev: SKS7
  Type:   CD-ROM                           ANSI SCSI revision: ffffffff
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: HITACHI  Model: DF600F           Rev: 0000
  Type:   Direct-Access                    ANSI SCSI revision: 04
Host: scsi3 Channel: 00 Id: 00 Lun: 01
  Vendor: HITACHI  Model: DF600F           Rev: 0000
  Type:   Direct-Access                    ANSI SCSI revision: 03
------------

free
             total       used       free     shared    buffers     cached
Mem:       2075168    2022916      52252          0       4480    1848936
-/+ buffers/cache:     169500    1905668
Swap:      1959920    1782356     177564


> 
> Cheers,
> 
> Dave.
> 
>>> On Tue, Aug 22, 2006 at 05:28:15PM +0200, Martin Braun wrote:
>>>> Hi Nathan,
>>>>
>>>> since I haven't repaired the fs we had a crash again (see below).
>>>>
>>>> unfortunately we copied at the time of the crash over iscsi some files
>>>> to an xfs-fs on a nas.
>>>> and the directory was completely deleted. neither a xfs-check or a
>>>> xfs_repair did find something. was that due to the combination of iscsi
>>>> and xfs?
>>> Sorry for not getting back to you earlier, I've been too busy. :(
>>>
>>> I think you will need to clear out the affected inode (looks like a
>>> form of corruption that repair doesn't know about today) - you'll
>>> need to forcibly remove that inode via xfs_db, something like:
>>>
>>> # xfs_db -x -c 'inode 35141650' -c 'write core.mode 0' /dev/sdc1
>>> # xfs_repair /dev/sdc1
>>>
>>> cheers.
>>>
>>> ps: Barry, looks like repair needs some work in this area...
>>>
>>>> Aug 22 12:48:12 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>> 35141650 start_block : 0 start_off : 3a1531 blkcnt : c
>>>>  extent-state : 0
>>>> Aug 22 12:48:12 pers109 kernel: ------------[ cut here ]------------
>>>> Aug 22 12:48:12 pers109 kernel: kernel BUG at <bad filename>:50307!
>>>> Aug 22 12:48:12 pers109 kernel: invalid opcode: 0000 [#1]
>>>> Aug 22 12:48:12 pers109 kernel: SMP
>>>> Aug 22 12:48:12 pers109 kernel: Modules linked in: iscsi_tcp libiscsi
>>>> scsi_transport_iscsi
>>>> Aug 22 12:48:12 pers109 kernel: CPU:    0
>>>> Aug 22 12:48:12 pers109 kernel: EIP:    0060:[<c025cb74>]    Not tainted VLI
>>>> Aug 22 12:48:12 pers109 kernel: EFLAGS: 00010246   (2.6.17.8 #5)
>>>> Aug 22 12:48:12 pers109 kernel: EIP is at cmn_err+0xa0/0xaa
>>>> Aug 22 12:48:12 pers109 kernel: eax: c048a2c4   ebx: c04359e4   ecx:
>>>> c047c9bc   edx: 00000282
>>>> Aug 22 12:48:12 pers109 kernel: esi: e595dcb0   edi: c056a120   ebp:
>>>> 00000000   esp: e595db70
>>>> Aug 22 12:48:12 pers109 kernel: ds: 007b   es: 007b   ss: 0068
>>>> Aug 22 12:48:12 pers109 kernel: Process smbd (pid: 25510,
>>>> threadinfo=e595c000 task=d9628a90)
>>>> Aug 22 12:48:12 pers109 kernel: Stack: c044497a c0427525 c056a120
>>>> 00000282 f3507260 e595dcb0 00000000 d9f9de00
>>>> Aug 22 12:48:12 pers109 kernel:        c0202f0d 00000000 c04359e4
>>>> f686cba0 02183812 00000000 00000000 00000000
>>>> Aug 22 12:48:12 pers109 kernel:        003a1531 00000000 0000000c
>>>> 00000000 00000000 e595dcb0 00000000 00000000
>>>> Aug 22 12:48:12 pers109 kernel: Call Trace:
>>>> Aug 22 12:48:12 pers109 kernel:  <c0202f0d>
>>>> xfs_bmap_search_extents+0xf5/0xf7  <c0204407> xfs_bmapi+0x229/0x162c
>>>> Aug 22 12:48:12 pers109 kernel:  <c039d890> dev_queue_xmit+0x1f4/0x26f
>>>> <c03b8660> ip_output+0x189/0x270
>>>> Aug 22 12:48:12 pers109 kernel:  <c012018e> __do_softirq+0x6e/0xdc
>>>> <c0104d7a> do_IRQ+0x1e/0x24
>>>> Aug 22 12:48:12 pers109 kernel:  <c0103222> common_interrupt+0x1a/0x20
>>>> <c0259e03> xfs_zero_eof+0x1ca/0x340
>>>> Aug 22 12:48:12 pers109 kernel:  <c039a342> memcpy_toiovec+0x37/0x5c
>>>> <c01762b3> file_update_time+0xa1/0xc0
>>>> Aug 22 12:48:12 pers109 kernel:  <c025a463> xfs_write+0x4ea/0xda5
>>>> <c0393654> sock_aio_read+0x83/0x8e
>>>> Aug 22 12:48:12 pers109 kernel:  <c016e098> fasync_helper+0x4b/0xd3
>>>> <c028dc12> copy_to_user+0x3c/0x4a
>>>> Aug 22 12:48:12 pers109 kernel:  <c025572f> xfs_file_aio_write+0x8f/0x9a
>>>>  <c015ba73> do_sync_write+0xd5/0x130
>>>> Aug 22 12:48:12 pers109 kernel:  <c012de03>
>>>> autoremove_wake_function+0x0/0x4b  <c015bb99> vfs_write+0xcb/0x195
>>>> Aug 22 12:48:12 pers109 kernel:  <c015be3e> sys_pwrite64+0x73/0x80
>>>> <c01027ef> sysenter_past_esp+0x54/0x75
>>>> Aug 22 12:48:12 pers109 kernel: Code: c0 c7 44 24 08 20 a1 56 c0 c7 04
>>>> 24 7a 49 44 c0 89 44 24 04 e8 ab eb eb ff b8 c4 a2 48 c
>>>> 0 8b 54 24 0c e8 fc 95 1a 00 85 ed 75 02 <0f> 0b 83 c4 10 5b 5e 5f 5d c3
>>>> 55 b8 07 00 00 00 57 bf 20 a1 56
>>>> Aug 22 12:48:12 pers109 kernel: EIP: [<c025cb74>] cmn_err+0xa0/0xaa
>>>> SS:ESP 0068:e595db70
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  Scott schrieb:
>>>>> Hi Martin,
>>>>>
>>>>> On Tue, Aug 15, 2006 at 04:27:22PM +0200, Martin Braun wrote:
>>>>>> ...
>>>>>> What does this bug mean?
>>>>>> ...
>>>>>> Aug 15 15:01:02 pers109 kernel: Access to block zero: fs: <sdc1> inode:
>>>>>> 254474718 start_block : 0 start_off : c0a0b0e8a099
>>>>>> 0 blkcnt : 90000 extent-state : 0
>>>>>> Aug 15 15:01:02 pers109 kernel: ------------[ cut here ]------------
>>>>>> Aug 15 15:01:02 pers109 kernel: kernel BUG at <bad filename>:50307!
>>>>> It means XFS detected ondisk corruption in inode# 254474718, and
>>>>> paniced your system (stupidly; a fix for this is around, will be
>>>>> merged with the next mainline update).  For me, a more interesting
>>>>> question is how that inode got into this state... have you had any
>>>>> crashes recently (i.e. has the filesystem journal needed to be
>>>>> replayed recently?)  Can you send the output of:
>>>>>
>>>>> 	# xfs_db -c 'inode 254474718' -c print /dev/sdc1
>>>>>
>>>>> You'll need to run xfs_repair on that filesystem to fix this up,
>>>>> but please send us that output first.
>>>>>
>>>>> thanks.
>>>>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Universitaetsbibliothek Heidelberg   Tel: +49 6221 54-2580
Ploeck 107-109, D-69117 Heidelberg   Fax: +49 6221 54-2623

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xfs kernel BUG again in 2.6.17.11
  2006-11-14  9:23           ` Martin Braun
@ 2006-11-14 10:12             ` Oleg Verych
  2006-11-14 10:31               ` Martin Braun
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Verych @ 2006-11-14 10:12 UTC (permalink / raw)
  To: Martin Braun, David Chinner, LKML, xfs

Hallo.

On 2006-11-14, Martin Braun wrote:
> Hi David,
>
>
>> Have you managed to repair the filesystem since you first
>> reported this problem? I don't know the history of the bug

[Well. Just to help (probably) new developers, after Nathan left SGI.]

Here's FAQ node about bug:
http://oss.sgi.com/projects/xfs/faq.html#dir2

You can find fixes in .17 stable git tree.
If it was really just sparse annotations, they were obviously
fixed, i think. If not, meybe there are some new bugs.

> that's something I am not sure about, I have used the newest xfs_repair
> tools and it found and repaired some inodes. And for about two months
> there weren't any crashes.
+
> It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
> Is there a way to be sure that the FS is clean?

As in faq:
,--
.....
|   Update: a fixed xfs_repair is now available; version 2.8.10 or later
|   of the xfsprogs package contains the fixed version.
.....      
|   The xfs_check tool, or xfs_repair -n, should be able to detect any
|   directory corruption.
`--     

[]
> Normally  the Kernel freezes/hangs completely, but I found two new

Do you mean panic or oops here, or just freeze?

____

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xfs kernel BUG again in 2.6.17.11
  2006-11-14 10:12             ` Oleg Verych
@ 2006-11-14 10:31               ` Martin Braun
  2006-11-14 11:21                 ` Oleg Verych
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Braun @ 2006-11-14 10:31 UTC (permalink / raw)
  To: Oleg Verych; +Cc: David Chinner, LKML, xfs

Hi Oleg,

thanks for your response.
> You can find fixes in .17 stable git tree.
Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup
kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate
shows two times: first run correct time, second run time is half an hour
in the future - so we switched back to 2.6.17.11

> If it was really just sparse annotations, they were obviously
> fixed, i think. If not, meybe there are some new bugs.
> +
>> It seems that xfs_repair (2.8.10), did not find all of the errors of the FS.
>> Is there a way to be sure that the FS is clean?
> 
> As in faq:
> |   Update: a fixed xfs_repair is now available; version 2.8.10 or later
> |   of the xfsprogs package contains the fixed version.
> .....      
> |   The xfs_check tool, or xfs_repair -n, should be able to detect any
> |   directory corruption.

However the two Kernel BUGS were _after_ xfs_repair (version 2.8.10).

>> Normally  the Kernel freezes/hangs completely, but I found two new
> 
> Do you mean panic or oops here, or just freeze?

In detail:  a Kernel BUG in /var/log/messages is written and after that
the cpu load average is climbing up to 20-30, any tries to shutdown the
system, kill processes umounts etc. are in vain. Than the system freezes
completely: no keyboard, nothing.

 cheers,
martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xfs kernel BUG again in 2.6.17.11
  2006-11-14 10:31               ` Martin Braun
@ 2006-11-14 11:21                 ` Oleg Verych
  0 siblings, 0 replies; 10+ messages in thread
From: Oleg Verych @ 2006-11-14 11:21 UTC (permalink / raw)
  To: Martin Braun; +Cc: LKML

On Tue, Nov 14, 2006 at 11:31:42AM +0100, Martin Braun wrote:
> Hi Oleg,
> 
> thanks for your response.
> > You can find fixes in .17 stable git tree.
> Yes it is a 2.6.17.11 stable kernel. - By the way: we tried to setup
> kernel 2.6.18.2 on that machine but we got a weired time error, ntpdate

2.6.18 have many XFS fixes, that were not backported to 2.6.17
(and will not, i think).

> shows two times: first run correct time, second run time is half an hour
> in the future - so we switched back to 2.6.17.11
 
(And not mixing too much here, try to search last 3 months for "ntp".
 There are people, who can actually help with that. 2.6.19-rc have even
 more timekeeping fixes, maybe something will work for you).
____

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-11-14 11:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-15 14:27 kernel BUG at <bad filename>:50307! Martin Braun
2006-08-15 14:31 ` Arjan van de Ven
2006-08-16  0:11 ` Nathan Scott
2006-08-16  9:05   ` Martin Braun
     [not found]   ` <44EB228F.6020903@uni-hd.de>
     [not found]     ` <20060823134211.E2968256@wobbly.melbourne.sgi.com>
2006-11-13  9:28       ` xfs kernel BUG again in 2.6.17.11 Martin Braun
2006-11-14  4:00         ` David Chinner
2006-11-14  9:23           ` Martin Braun
2006-11-14 10:12             ` Oleg Verych
2006-11-14 10:31               ` Martin Braun
2006-11-14 11:21                 ` Oleg Verych

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox