* Kernel Panic in ext3
@ 2006-08-04 10:36 Loiseleur Michel
2006-08-04 11:48 ` Erik Mouw
0 siblings, 1 reply; 3+ messages in thread
From: Loiseleur Michel @ 2006-08-04 10:36 UTC (permalink / raw)
To: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 956 bytes --]
Hello,
I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD. I
had a kernel panic this night, you will find an extract of the
/var/log/messages in the attached file. The server is a backup one, and
it was during really big batch processing. you will see too that's SMART
seems wrong, the hdds are not so hot.
I have looked at the code and all seems to be in fs/ext3. It "seems"
that during an " ext3_ordered_writepage", the fs tries to walk along the
page (walk_page_buffers) but he can't because the "page" is null. that's
what the trace told me.
My first idea is to correct it with something like this :
if (!page)
goto out_fail;
But I feel that's not the good way or maybe my thought is wrong. Is
there an ext3 maintener in the plane ? :)
--
Loiseleur Michel - TM2L (08000LINUX)
LINAGORA
27, rue de Berri
1er étage
75008 PARIS
Tél : 01 58 18 68 28
Fax : 01 58 18 68 29
"Si hoc legere scis nimium eruditionis habes"
[-- Attachment #2: messages.txt --]
[-- Type: text/plain, Size: 3627 bytes --]
Aug 4 01:01:01 ju crond(pam_unix)[26634]: session opened for user root by (uid=0)
Aug 4 01:01:19 ju crond(pam_unix)[26634]: session closed for user root
Aug 4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 58 to 57
Aug 4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 240 to 29
Aug 4 01:03:50 ju smartd[1745]: Device: /dev/hdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 57
Aug 4 01:03:50 ju smartd[1745]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 203
Aug 4 01:06:01 ju crond(pam_unix)[6980]: session opened for user root by (uid=0)
Aug 4 01:06:10 ju crond(pam_unix)[6980]: session closed for user root
Aug 4 01:08:09 ju kernel: Unable to handle kernel paging request at virtual address 006c0070
Aug 4 01:08:09 ju kernel: printing eip:
Aug 4 01:08:09 ju kernel: f891bc87
Aug 4 01:08:09 ju kernel: *pde = 00000000
Aug 4 01:08:09 ju kernel: Oops: 0000 [#1]
Aug 4 01:08:09 ju kernel: SMP
Aug 4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
Aug 4 01:08:09 ju kernel: CPU: 0
Aug 4 01:08:09 ju kernel: EIP: 0060:[<f891bc87>] Tainted: P VLI
Aug 4 01:08:09 ju kernel: EFLAGS: 00010202 (2.6.9-11.ELsmp)
Aug 4 01:08:09 ju kernel: EIP is at walk_page_buffers+0x1e/0x87 [ext3]
Aug 4 01:08:09 ju kernel: eax: c3ebd901 ebx: 00002000 ecx: 006c006c edx: c3ebd900
Aug 4 01:08:09 ju kernel: esi: 00002000 edi: c3ebd904 ebp: 00000000 esp: f7cb9e28
Aug 4 01:08:09 ju kernel: ds: 007b es: 007b ss: 0068
Aug 4 01:08:09 ju kernel: Process pdflush (pid: 34, threadinfo=f7cb9000 task=f7ca05f0)
Aug 4 01:08:09 ju kernel: Stack: 006c006c 00001000 00000000 f4344438 c153e080 f4344438 c3ebd904 f4344438
Aug 4 01:08:09 ju kernel: f891c23b 00001000 00000000 f891c15d f7cb9f64 c153e080 f7cb9f64 c9671410
Aug 4 01:08:09 ju kernel: 0000000e c017336e 0000000d 00000000 00000001 ffffffff f891c17d 00000000
Aug 4 01:08:09 ju kernel: Call Trace:
Aug 4 01:08:09 ju kernel: [<f891c23b>] ext3_ordered_writepage+0xbe/0x13a [ext3]
Aug 4 01:08:09 ju kernel: [<f891c15d>] bget_one+0x0/0x7 [ext3]
Aug 4 01:08:09 ju kernel: [<c017336e>] mpage_writepages+0x1c2/0x314
Aug 4 01:08:09 ju kernel: [<f891c17d>] ext3_ordered_writepage+0x0/0x13a [ext3]
Aug 4 01:08:09 ju kernel: [<c0171ce8>] __sync_single_inode+0x5f/0x1c1
Aug 4 01:08:09 ju kernel: [<c017207c>] sync_sb_inodes+0x1a7/0x274
Aug 4 01:08:09 ju kernel: [<c01411ec>] pdflush+0x0/0x1e
Aug 4 01:08:09 ju kernel: [<c01721da>] writeback_inodes+0x91/0xde
Aug 4 01:08:09 ju kernel: [<c014089d>] background_writeout+0x65/0x97
Aug 4 01:08:09 ju kernel: [<c0141158>] __pdflush+0xec/0x180
Aug 4 01:08:09 ju kernel: [<c0141206>] pdflush+0x1a/0x1e
Aug 4 01:08:09 ju kernel: [<c0140838>] background_writeout+0x0/0x97
Aug 4 01:08:09 ju kernel: [<c01411ec>] pdflush+0x0/0x1e
Aug 4 01:08:09 ju kernel: [<c0132e31>] kthread+0x73/0x9b
Aug 4 01:08:09 ju kernel: [<c0132dbe>] kthread+0x0/0x9b
Aug 4 01:08:09 ju kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb
Aug 4 01:08:09 ju kernel: Code: 06 fb ff ff ff 31 c9 5a 89 c8 5b 5e c3 55 31 ed 57 89 d7 56 31 f6 53 83 ec 10 89 4c 24 08 89 d1 89 44 24 0c 8b 42 10 89 44 24 04 <8b> 41 04 89 04 24 8b 44 24 04 8d 1c 06 3b 5c 24 08 0f 96 c0 3b
Aug 4 01:08:09 ju kernel: <0>Fatal exception: panic in 5 seconds
Aug 4 08:32:18 ju syslogd 1.4.1: restart.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Kernel Panic in ext3
2006-08-04 10:36 Kernel Panic in ext3 Loiseleur Michel
@ 2006-08-04 11:48 ` Erik Mouw
2006-08-04 13:09 ` Loiseleur Michel
0 siblings, 1 reply; 3+ messages in thread
From: Erik Mouw @ 2006-08-04 11:48 UTC (permalink / raw)
To: Loiseleur Michel; +Cc: linux-fsdevel
On Fri, Aug 04, 2006 at 12:36:52PM +0200, Loiseleur Michel wrote:
> I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD.
Ask Red Hat for support.
> I had a kernel panic this night, you will find an extract of the
> /var/log/messages in the attached file. The server is a backup one, and
> it was during really big batch processing. you will see too that's SMART
> seems wrong, the hdds are not so hot.
The temperature attribute doesn't have to tell the temperature in
degrees Celcius (or Fahrenheit). There's sometimes a little calculation
needed to get to something humans understand. hddtemp can do that for
you.
> I have looked at the code and all seems to be in fs/ext3. It "seems"
> that during an " ext3_ordered_writepage", the fs tries to walk along the
> page (walk_page_buffers) but he can't because the "page" is null. that's
> what the trace told me.
>
> My first idea is to correct it with something like this :
> if (!page)
> goto out_fail;
>
>
> But I feel that's not the good way or maybe my thought is wrong. Is
> there an ext3 maintener in the plane ? :)
There are, but...
> Aug 4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
> Aug 4 01:08:09 ju kernel: CPU: 0
> Aug 4 01:08:09 ju kernel: EIP: 0060:[<f891bc87>] Tainted: P VLI
Your kernel is tainted by a proprietary module (I guess the "basp"
module) so it's impossible to debug your problem. The only one able to
debug your problem is the vendor of that proprietary module.
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Kernel Panic in ext3
2006-08-04 11:48 ` Erik Mouw
@ 2006-08-04 13:09 ` Loiseleur Michel
0 siblings, 0 replies; 3+ messages in thread
From: Loiseleur Michel @ 2006-08-04 13:09 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Erik Mouw
Erik Mouw wrote:
> On Fri, Aug 04, 2006 at 12:36:52PM +0200, Loiseleur Michel wrote:
>
>> I work with a Red Hat AS kernel (2.6.9-11-smp) on a bi-proc AMD.
>>
>
> Ask Red Hat for support.
>
I am the support, or more correct, I try to be it :).
>> I had a kernel panic this night, you will find an extract of the
>> /var/log/messages in the attached file. The server is a backup one, and
>> it was during really big batch processing. you will see too that's SMART
>> seems wrong, the hdds are not so hot.
>>
>
> The temperature attribute doesn't have to tell the temperature in
> degrees Celcius (or Fahrenheit). There's sometimes a little calculation
> needed to get to something humans understand. hddtemp can do that for
> you.
>
thanks for the tip.
>
>> I have looked at the code and all seems to be in fs/ext3. It "seems"
>> that during an " ext3_ordered_writepage"
in fs/ext3/inode.c:1262,
>> the fs tries to walk along the
>> page (walk_page_buffers) but he can't because the "page" is null. that's
>> what the trace told me.
>>
>> My first idea is to correct it with something like this :
>> if (!page)
>> goto out_fail;
>>
>>
>> But I feel that's not the good way or maybe my thought is wrong. Is
>> there an ext3 maintener in the plane ? :)
>>
>
> There are, but...
>
>
>> Aug 4 01:08:09 ju kernel: Modules linked in: nfsd exportfs lockd sunrpc basp(U) md5 ipv6 i2c_dev i2c_core dm_mod button battery ac hw_random e1000 floppy ext3 jbd raid1 aic7xxx sd_mod scsi_mod
>> Aug 4 01:08:09 ju kernel: CPU: 0
>> Aug 4 01:08:09 ju kernel: EIP: 0060:[<f891bc87>] Tainted: P VLI
>>
>
> Your kernel is tainted by a proprietary module (I guess the "basp"
> module) so it's impossible to debug your problem. The only one able to
> debug your problem is the vendor of that proprietary module.
>
Oops, I didn't notice it. After some search, I saw it's a module for
load-balancing over network made by broadcom. basp stands for "Broadcom
Advanced Server Program". More info here :
http://support.3com.com/infodeli/tools/nic/linux/linuxasp996release.txt.
It don't seem to be related to my problem, but I will remove it, I don't
need it anyway.
Thanks for your advice,
--
Loiseleur Michel - TM2L (08000LINUX)
LINAGORA
27, rue de Berri
1er étage
75008 PARIS
Tél : 01 58 18 68 28
Fax : 01 58 18 68 29
"Si hoc legere scis nimium eruditionis habes"
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-04 13:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-04 10:36 Kernel Panic in ext3 Loiseleur Michel
2006-08-04 11:48 ` Erik Mouw
2006-08-04 13:09 ` Loiseleur Michel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.