Kernel Panic in ext3

All of lore.kernel.org
 help / color / mirror / Atom feed

* Kernel Panic in ext3
@ 2006-08-04 10:36 Loiseleur Michel
  2006-08-04 11:48 ` Erik Mouw
  0 siblings, 1 reply; 3+ messages in thread
From: Loiseleur Michel @ 2006-08-04 10:36 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

Hello,

    I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD. I
had a kernel panic this night, you will find an extract of the
/var/log/messages in the attached file. The server is a backup one, and
it was during really big batch processing. you will see too that's SMART
seems wrong, the hdds are not so hot.

    I have looked at the code and all seems to be in fs/ext3. It "seems"
that during an " ext3_ordered_writepage", the fs tries to walk along the
page (walk_page_buffers) but he can't because the "page" is null. that's
what the trace told me.

    My first idea is to correct it with something like this :
if (!page)
  goto out_fail;


    But I feel that's not the good way or maybe my thought is wrong. Is
there an ext3 maintener in the plane ? :)


-- 
Loiseleur Michel - TM2L (08000LINUX)
LINAGORA
27, rue de Berri
1er étage
75008 PARIS
Tél : 01 58 18 68 28
Fax : 01 58 18 68 29
"Si hoc legere scis nimium eruditionis habes"


[-- Attachment #2: messages.txt --]
[-- Type: text/plain, Size: 3627 bytes --]

Aug  4 01:01:01 ju crond(pam_unix)[26634]: session opened for user root by (uid=0)
Aug  4 01:01:19 ju crond(pam_unix)[26634]: session closed for user root
Aug  4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 58 to 57 
Aug  4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 29 
Aug  4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 57 
Aug  4 01:03:50 ju smartd[1745]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 203 
Aug  4 01:06:01 ju crond(pam_unix)[6980]: session opened for user root by (uid=0)
Aug  4 01:06:10 ju crond(pam_unix)[6980]: session closed for user root
Aug  4 01:08:09 ju kernel: Unable to handle kernel paging request at virtual address 006c0070
Aug  4 01:08:09 ju kernel:  printing eip:
Aug  4 01:08:09 ju kernel: f891bc87
Aug  4 01:08:09 ju kernel: *pde = 00000000
Aug  4 01:08:09 ju kernel: Oops: 0000 [#1]
Aug  4 01:08:09 ju kernel: SMP 
Aug  4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
Aug  4 01:08:09 ju kernel: CPU:    0
Aug  4 01:08:09 ju kernel: EIP:    0060:[<f891bc87>]    Tainted: P      VLI
Aug  4 01:08:09 ju kernel: EFLAGS: 00010202   (2.6.9-11.ELsmp) 
Aug  4 01:08:09 ju kernel: EIP is at walk_page_buffers+0x1e/0x87 [ext3]
Aug  4 01:08:09 ju kernel: eax: c3ebd901   ebx: 00002000   ecx: 006c006c   edx: c3ebd900
Aug  4 01:08:09 ju kernel: esi: 00002000   edi: c3ebd904   ebp: 00000000   esp: f7cb9e28
Aug  4 01:08:09 ju kernel: ds: 007b   es: 007b   ss: 0068
Aug  4 01:08:09 ju kernel: Process pdflush (pid: 34, threadinfo=f7cb9000 task=f7ca05f0)
Aug  4 01:08:09 ju kernel: Stack: 006c006c 00001000 00000000 f4344438 c153e080 f4344438 c3ebd904 f4344438 
Aug  4 01:08:09 ju kernel:        f891c23b 00001000 00000000 f891c15d f7cb9f64 c153e080 f7cb9f64 c9671410 
Aug  4 01:08:09 ju kernel:        0000000e c017336e 0000000d 00000000 00000001 ffffffff f891c17d 00000000 
Aug  4 01:08:09 ju kernel: Call Trace:
Aug  4 01:08:09 ju kernel:  [<f891c23b>] ext3_ordered_writepage+0xbe/0x13a [ext3]
Aug  4 01:08:09 ju kernel:  [<f891c15d>] bget_one+0x0/0x7 [ext3]
Aug  4 01:08:09 ju kernel:  [<c017336e>] mpage_writepages+0x1c2/0x314
Aug  4 01:08:09 ju kernel:  [<f891c17d>] ext3_ordered_writepage+0x0/0x13a [ext3]
Aug  4 01:08:09 ju kernel:  [<c0171ce8>] __sync_single_inode+0x5f/0x1c1
Aug  4 01:08:09 ju kernel:  [<c017207c>] sync_sb_inodes+0x1a7/0x274
Aug  4 01:08:09 ju kernel:  [<c01411ec>] pdflush+0x0/0x1e
Aug  4 01:08:09 ju kernel:  [<c01721da>] writeback_inodes+0x91/0xde
Aug  4 01:08:09 ju kernel:  [<c014089d>] background_writeout+0x65/0x97
Aug  4 01:08:09 ju kernel:  [<c0141158>] __pdflush+0xec/0x180
Aug  4 01:08:09 ju kernel:  [<c0141206>] pdflush+0x1a/0x1e
Aug  4 01:08:09 ju kernel:  [<c0140838>] background_writeout+0x0/0x97
Aug  4 01:08:09 ju kernel:  [<c01411ec>] pdflush+0x0/0x1e
Aug  4 01:08:09 ju kernel:  [<c0132e31>] kthread+0x73/0x9b
Aug  4 01:08:09 ju kernel:  [<c0132dbe>] kthread+0x0/0x9b
Aug  4 01:08:09 ju kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Aug  4 01:08:09 ju kernel: Code: 06 fb ff ff ff 31 c9 5a 89 c8 5b 5e c3 55 31 ed 57 89 d7 56 31 f6 53 83 ec 10 89 4c 24 08 89 d1 89 44 24 0c 8b 42 10 89 44 24 04 <8b> 41 04 89 04 24 8b 44 24 04 8d 1c 06 3b 5c 24 08 0f 96 c0 3b 
Aug  4 01:08:09 ju kernel:  <0>Fatal exception: panic in 5 seconds
Aug  4 08:32:18 ju syslogd 1.4.1: restart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel Panic in ext3
  2006-08-04 10:36 Kernel Panic in ext3 Loiseleur Michel
@ 2006-08-04 11:48 ` Erik Mouw
  2006-08-04 13:09   ` Loiseleur Michel
  0 siblings, 1 reply; 3+ messages in thread
From: Erik Mouw @ 2006-08-04 11:48 UTC (permalink / raw)
  To: Loiseleur Michel; +Cc: linux-fsdevel

On Fri, Aug 04, 2006 at 12:36:52PM +0200, Loiseleur Michel wrote:
>     I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD. 

Ask Red Hat for support.

> I had a kernel panic this night, you will find an extract of the
> /var/log/messages in the attached file. The server is a backup one, and
> it was during really big batch processing. you will see too that's SMART
> seems wrong, the hdds are not so hot.

The temperature attribute doesn't have to tell the temperature in
degrees Celcius (or Fahrenheit). There's sometimes a little calculation
needed to get to something humans understand. hddtemp can do that for
you.

>     I have looked at the code and all seems to be in fs/ext3. It "seems"
> that during an " ext3_ordered_writepage", the fs tries to walk along the
> page (walk_page_buffers) but he can't because the "page" is null. that's
> what the trace told me.
> 
>     My first idea is to correct it with something like this :
> if (!page)
>   goto out_fail;
> 
> 
>     But I feel that's not the good way or maybe my thought is wrong. Is
> there an ext3 maintener in the plane ? :)

There are, but...

> Aug  4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
> Aug  4 01:08:09 ju kernel: CPU:    0
> Aug  4 01:08:09 ju kernel: EIP:    0060:[<f891bc87>]    Tainted: P      VLI

Your kernel is tainted by a proprietary module (I guess the "basp"
module) so it's impossible to debug your problem. The only one able to
debug your problem is the vendor of that proprietary module.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel Panic in ext3
  2006-08-04 11:48 ` Erik Mouw
@ 2006-08-04 13:09   ` Loiseleur Michel
  0 siblings, 0 replies; 3+ messages in thread
From: Loiseleur Michel @ 2006-08-04 13:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Erik Mouw

Erik Mouw wrote:
> On Fri, Aug 04, 2006 at 12:36:52PM +0200, Loiseleur Michel wrote:
>   
>>     I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD. 
>>     
>
> Ask Red Hat for support.
>   
I am the support, or more correct, I try to be it :).
>> I had a kernel panic this night, you will find an extract of the
>> /var/log/messages in the attached file. The server is a backup one, and
>> it was during really big batch processing. you will see too that's SMART
>> seems wrong, the hdds are not so hot.
>>     
>
> The temperature attribute doesn't have to tell the temperature in
> degrees Celcius (or Fahrenheit). There's sometimes a little calculation
> needed to get to something humans understand. hddtemp can do that for
> you.
>   
thanks for the tip.
>   
>>     I have looked at the code and all seems to be in fs/ext3. It "seems"
>> that during an " ext3_ordered_writepage" 
in fs/ext3/inode.c:1262,
>> the fs tries to walk along the
>> page (walk_page_buffers) but he can't because the "page" is null. that's
>> what the trace told me.
>>
>>     My first idea is to correct it with something like this :
>> if (!page)
>>   goto out_fail;
>>
>>
>>     But I feel that's not the good way or maybe my thought is wrong. Is
>> there an ext3 maintener in the plane ? :)
>>     
>
> There are, but...
>
>   
>> Aug  4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
>> Aug  4 01:08:09 ju kernel: CPU:    0
>> Aug  4 01:08:09 ju kernel: EIP:    0060:[<f891bc87>]    Tainted: P      VLI
>>     
>
> Your kernel is tainted by a proprietary module (I guess the "basp"
> module) so it's impossible to debug your problem. The only one able to
> debug your problem is the vendor of that proprietary module.
>   

Oops, I didn't notice it. After some search, I saw it's a module for
load-balancing over network made by broadcom. basp stands for "Broadcom
Advanced Server Program". More info here :
http://support.3com.com/infodeli/tools/nic/linux/linuxasp996release.txt.
It don't seem to be related to my problem, but I will remove it, I don't
need it anyway.

Thanks for your advice,

-- 
Loiseleur Michel - TM2L (08000LINUX)
LINAGORA
27, rue de Berri
1er étage
75008 PARIS
Tél : 01 58 18 68 28
Fax : 01 58 18 68 29
"Si hoc legere scis nimium eruditionis habes"

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-04 13:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-04 10:36 Kernel Panic in ext3 Loiseleur Michel
2006-08-04 11:48 ` Erik Mouw
2006-08-04 13:09   ` Loiseleur Michel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.