Re: FS corruption on 2.4.0-ac8

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: FS corruption on 2.4.0-ac8
  2001-01-15 22:47 FS corruption on 2.4.0-ac8 Jure Pecar
@ 2001-01-15 21:11 ` Marcelo Tosatti
  2001-01-16  1:00   ` Neil Brown
  2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
  2001-01-16  0:29 ` FS corruption on 2.4.0-ac8 Andreas Dilger
  2 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2001-01-15 21:11 UTC (permalink / raw)
  To: Neil Brown, Jure Pecar; +Cc: linux-kernel



On Mon, 15 Jan 2001, Jure Pecar wrote:

> Hi all,
> 
> I was running 2.4.0test10pre5 happily for months and wanted to see how
> things stand in the 'latest stuff'. Here's what i found:
> 
> I compiled 2.4.0-ac8 with nearly the same .config as test10pre5 (with
> latest gcc on rh7). Then i booted it and used X for some normal browsing
> and mp3s. Performance was poor, responsivness also, even the mouse
> stopped responding for a couple of seconds at a time, a lot of disk
> trashing & so on. I deceided to boot test10 back, and there was a nasty
> suprise: fsck found filesystem with errors, and LOTS of them ... i had
> to hold down 'y' for almost 5 minutes ... :)
> 
> Then i examined the logs for what would be the cause for this ... and
> here's what 2.4.0-ac8 left in the logs:
> 
> Jan 14 16:26:47 open kernel: ee_blocks: Freeing blocks not in datazone -
> block = 979727457, count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 1769096736,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 842080300,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 1851869728,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 808464928,
> count = 1
> ...
> and so on for about 150 such lines in 3 seconds.
> 
> There is something not that usual about my setup: i run raid1 /boot and
> raid5 root with one disk disconnected (its simply too loud...), so the
> array is in degraded mode all the time. Other hardware is more or less
> standard, p200 classic, 430vx board, adaptec2940u, 64mb ram.
> 
> Is this a known problem? If it's not, please advise me on how to provide
> more usefull informations.

Neil, 

This is the second report of corruption with RAID5. 

Do you know if any of your recent changes can be the reason?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
@ 2001-01-15 21:47   ` Marcelo Tosatti
  2001-01-15 23:45     ` Rainer Mager
  2001-01-16  8:40   ` Urban Widmark
  1 sibling, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2001-01-15 21:47 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel



On Tue, 16 Jan 2001, Rainer Mager wrote:

> 	Attached is my oops.txt and the result sent through ksymoops. The results
> don't look particularly useful to me so perhaps I'm doing something wrong.
> PLEASE tell me if I should parse this differently. Likewise, if there is
> anything else I can do to help debug this, please tell me.

It seems you forgot to attach oops.txt. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 23:45     ` Rainer Mager
@ 2001-01-15 22:09       ` Marcelo Tosatti
  2001-01-16  0:21         ` Rainer Mager
  2001-01-16  2:03         ` Keith Owens
  0 siblings, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2001-01-15 22:09 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel



On Tue, 16 Jan 2001, Rainer Mager wrote:

> I knew that, I was just testing you all.  ;-)

>>EIP; f889e044 <END_OF_CODE+385bfe34/????>   <=====
Trace; f889d966 <END_OF_CODE+385bf756/????>
Trace; c0140c10 <vfs_readdir+90/ec>
Trace; c0140e7c <filldir+0/d8>
Trace; c0140f9e <sys_getdents+4a/98>
Trace; c0140e7c <filldir+0/d8>

It seems the oops is happening in a module's function.

You have to make ksymoops parse the oops output against a System.map which
has all modules symbols. Load each module by hand with the insmod -m
option ("insmod -m module.o") and _append_ the outputs to System.map.

After that you can run ksymoops against this new System.map. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-16  0:21         ` Rainer Mager
@ 2001-01-15 22:37           ` Marcelo Tosatti
  0 siblings, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2001-01-15 22:37 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel, Urban Widmark



On Tue, 16 Jan 2001, Rainer Mager wrote:

> Ok, now were making progress. I did as you said and have attached (really!)
> the new parsed output. Now we have some useful information (I hope). I still
> got lots of warnings on symbols (which I have edited out of the parsed file
> for the sake of briefness). What's the next step?

Wait for someone who has a clue about smbfs to find out the problem. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* FS corruption on 2.4.0-ac8
@ 2001-01-15 22:47 Jure Pecar
  2001-01-15 21:11 ` Marcelo Tosatti
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jure Pecar @ 2001-01-15 22:47 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I was running 2.4.0test10pre5 happily for months and wanted to see how
things stand in the 'latest stuff'. Here's what i found:

I compiled 2.4.0-ac8 with nearly the same .config as test10pre5 (with
latest gcc on rh7). Then i booted it and used X for some normal browsing
and mp3s. Performance was poor, responsivness also, even the mouse
stopped responding for a couple of seconds at a time, a lot of disk
trashing & so on. I deceided to boot test10 back, and there was a nasty
suprise: fsck found filesystem with errors, and LOTS of them ... i had
to hold down 'y' for almost 5 minutes ... :)

Then i examined the logs for what would be the cause for this ... and
here's what 2.4.0-ac8 left in the logs:

Jan 14 16:26:47 open kernel: ee_blocks: Freeing blocks not in datazone -
block = 979727457, count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 1769096736,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 842080300,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 1851869728,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 808464928,
count = 1
...
and so on for about 150 such lines in 3 seconds.

There is something not that usual about my setup: i run raid1 /boot and
raid5 root with one disk disconnected (its simply too loud...), so the
array is in degraded mode all the time. Other hardware is more or less
standard, p200 classic, 430vx board, adaptec2940u, 64mb ram.

Is this a known problem? If it's not, please advise me on how to provide
more usefull informations.

-- 

Jure Pecar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 22:47 FS corruption on 2.4.0-ac8 Jure Pecar
  2001-01-15 21:11 ` Marcelo Tosatti
@ 2001-01-15 23:31 ` Rainer Mager
  2001-01-15 21:47   ` Marcelo Tosatti
  2001-01-16  8:40   ` Urban Widmark
  2001-01-16  0:29 ` FS corruption on 2.4.0-ac8 Andreas Dilger
  2 siblings, 2 replies; 14+ messages in thread
From: Rainer Mager @ 2001-01-15 23:31 UTC (permalink / raw)
  To: linux-kernel

Hi all,

	I have a 100% reproducable bug in all of the 2.4.0 kernels including the
latest stable one. The issue is that if I compile the kernel to support 4GB
RAM (I have 1 GB) and then try to access a samba mount I get an oops. This
ALWAYS happens. Usually after this the system is frozen (although the magic
SYSREQ still works). If the system isn't frozen then any commands that
access the disk will freeze. Fortunately GPM worked and I was able to paste
the oops to a file via telnet.

	Attached is my oops.txt and the result sent through ksymoops. The results
don't look particularly useful to me so perhaps I'm doing something wrong.
PLEASE tell me if I should parse this differently. Likewise, if there is
anything else I can do to help debug this, please tell me.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 21:47   ` Marcelo Tosatti
@ 2001-01-15 23:45     ` Rainer Mager
  2001-01-15 22:09       ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Rainer Mager @ 2001-01-15 23:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

I knew that, I was just testing you all.  ;-)

\e hides his head in shame



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Marcelo Tosatti
> Sent: Tuesday, January 16, 2001 6:47 AM
> To: Rainer Mager
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: Oops with 4GB memory setting in 2.4.0 stable
>
>
>
>
> On Tue, 16 Jan 2001, Rainer Mager wrote:
>
> > 	Attached is my oops.txt and the result sent through
> ksymoops. The results
> > don't look particularly useful to me so perhaps I'm doing
> something wrong.
> > PLEASE tell me if I should parse this differently. Likewise, if there is
> > anything else I can do to help debug this, please tell me.
>
> It seems you forgot to attach oops.txt.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

[-- Attachment #2: oops.parsed --]
[-- Type: application/octet-stream, Size: 2090 bytes --]

ksymoops 0.7c on i686 2.4.0.  Options used
     -v /boot/vmlinux-2.4.0-bigmem (specified)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.0/ (default)
     -m /boot/System.map-2.4.0-bigmem (specified)

No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 00000000
f889e044
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<f889e044>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: d5762800   ecx: 00000400   edx: c19665fc
esi: d55be120   edi: 00000000   ebp: d5764260   esp: d5505f1c
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 865, stackpage=d5505000)
Stack: d5762800 d55be120 d5764260 d5764260 d55be120 00000000 f889d966 d55be120
       d5762800 d5504000 d5764260 fffffffe fffffffb d5762800 d5764260 d55be120
       00000000 d5764260 bffffa40 00000006 c0140c10 d5764260 d5505fb0 c0140e7c
Call Trace: [<f889d966>] [<c0140c10>] [<c0140e7c>] [<c0140f9e>] [<c0140e7c>] [<c0108f4b>]
Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00

>>EIP; f889e044 <END_OF_CODE+385bfe34/????>   <=====
Trace; f889d966 <END_OF_CODE+385bf756/????>
Trace; c0140c10 <vfs_readdir+90/ec>
Trace; c0140e7c <filldir+0/d8>
Trace; c0140f9e <sys_getdents+4a/98>
Trace; c0140e7c <filldir+0/d8>
Trace; c0108f4b <system_call+33/38>
Code;  f889e044 <END_OF_CODE+385bfe34/????>
00000000 <_EIP>:
Code;  f889e044 <END_OF_CODE+385bfe34/????>   <=====
   0:   f3 ab                     repz stos %eax,%es:(%edi)   <=====
Code;  f889e046 <END_OF_CODE+385bfe36/????>
   2:   e9 8b 00 00 00            jmp    92 <_EIP+0x92> f889e0d6 <END_OF_CODE+385bfec6/????>
Code;  f889e04b <END_OF_CODE+385bfe3b/????>
   7:   90                        nop    
Code;  f889e04c <END_OF_CODE+385bfe3c/????>
   8:   8d 74 26 00               lea    0x0(%esi,1),%esi
Code;  f889e050 <END_OF_CODE+385bfe40/????>
   c:   8b 44 24 14               mov    0x14(%esp,1),%eax
Code;  f889e054 <END_OF_CODE+385bfe44/????>
  10:   c7 00 00 00 00 00         movl   $0x0,(%eax)


[-- Attachment #3: oops.txt --]
[-- Type: text/plain, Size: 810 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
f889e044
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<f889e044>]
EFLAGS: 00010246
eax: 00000000   ebx: d5762800   ecx: 00000400   edx: c19665fc
esi: d55be120   edi: 00000000   ebp: d5764260   esp: d5505f1c
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 865, stackpage=d5505000)
Stack: d5762800 d55be120 d5764260 d5764260 d55be120 00000000 f889d966 d55be120
       d5762800 d5504000 d5764260 fffffffe fffffffb d5762800 d5764260 d55be120
       00000000 d5764260 bffffa40 00000006 c0140c10 d5764260 d5505fb0 c0140e7c
Call Trace: [<f889d966>] [<c0140c10>] [<c0140e7c>] [<c0140f9e>] [<c0140e7c>] [<c0108f4b>]

Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00
Segmentation fault

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 22:09       ` Marcelo Tosatti
@ 2001-01-16  0:21         ` Rainer Mager
  2001-01-15 22:37           ` Marcelo Tosatti
  2001-01-16  2:03         ` Keith Owens
  1 sibling, 1 reply; 14+ messages in thread
From: Rainer Mager @ 2001-01-16  0:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Ok, now were making progress. I did as you said and have attached (really!)
the new parsed output. Now we have some useful information (I hope). I still
got lots of warnings on symbols (which I have edited out of the parsed file
for the sake of briefness). What's the next step?

--Rainer


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Marcelo Tosatti
> Sent: Tuesday, January 16, 2001 7:09 AM
> To: Rainer Mager
> Cc: linux-kernel@vger.kernel.org
> Subject: RE: Oops with 4GB memory setting in 2.4.0 stable
>
> >>EIP; f889e044 <END_OF_CODE+385bfe34/????>   <=====
> Trace; f889d966 <END_OF_CODE+385bf756/????>
> Trace; c0140c10 <vfs_readdir+90/ec>
> Trace; c0140e7c <filldir+0/d8>
> Trace; c0140f9e <sys_getdents+4a/98>
> Trace; c0140e7c <filldir+0/d8>
>
> It seems the oops is happening in a module's function.
>
> You have to make ksymoops parse the oops output against a System.map which
> has all modules symbols. Load each module by hand with the insmod -m
> option ("insmod -m module.o") and _append_ the outputs to System.map.
>
> After that you can run ksymoops against this new System.map.

[-- Attachment #2: oops.parsed.edit --]
[-- Type: application/octet-stream, Size: 2066 bytes --]

ksymoops 0.7c on i686 2.4.0.  Options used
     -v /boot/vmlinux-2.4.0-bigmem (specified)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.0/ (default)
     -m ./System.map-2.4.0-bigmem (specified)

No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 00000000
f889e044
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<f889e044>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: d5762800   ecx: 00000400   edx: c19665fc
esi: d55be120   edi: 00000000   ebp: d5764260   esp: d5505f1c
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 865, stackpage=d5505000)
Stack: d5762800 d55be120 d5764260 d5764260 d55be120 00000000 f889d966 d55be120
       d5762800 d5504000 d5764260 fffffffe fffffffb d5762800 d5764260 d55be120
       00000000 d5764260 bffffa40 00000006 c0140c10 d5764260 d5505fb0 c0140e7c
Call Trace: [<f889d966>] [<c0140c10>] [<c0140e7c>] [<c0140f9e>] [<c0140e7c>] [<c0108f4b>]
Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00

>>EIP; f889e044 <smb_rename+fc/19c>   <=====
Trace; f889d966 <smb_readdir+b6/188>
Trace; c0140c10 <vfs_readdir+90/ec>
Trace; c0140e7c <filldir+0/d8>
Trace; c0140f9e <sys_getdents+4a/98>
Trace; c0140e7c <filldir+0/d8>
Trace; c0108f4b <system_call+33/38>
Code;  f889e044 <smb_rename+fc/19c>
00000000 <_EIP>:
Code;  f889e044 <smb_rename+fc/19c>   <=====
   0:   f3 ab                     repz stos %eax,%es:(%edi)   <=====
Code;  f889e046 <smb_rename+fe/19c>
   2:   e9 8b 00 00 00            jmp    92 <_EIP+0x92> f889e0d6 <smb_rename+18e/19c>
Code;  f889e04b <smb_rename+103/19c>
   7:   90                        nop    
Code;  f889e04c <smb_rename+104/19c>
   8:   8d 74 26 00               lea    0x0(%esi,1),%esi
Code;  f889e050 <smb_rename+108/19c>
   c:   8b 44 24 14               mov    0x14(%esp,1),%eax
Code;  f889e054 <smb_rename+10c/19c>
  10:   c7 00 00 00 00 00         movl   $0x0,(%eax)


367 warnings issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FS corruption on 2.4.0-ac8
  2001-01-15 22:47 FS corruption on 2.4.0-ac8 Jure Pecar
  2001-01-15 21:11 ` Marcelo Tosatti
  2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
@ 2001-01-16  0:29 ` Andreas Dilger
  2 siblings, 0 replies; 14+ messages in thread
From: Andreas Dilger @ 2001-01-16  0:29 UTC (permalink / raw)
  To: Jure Pecar; +Cc: linux-kernel

Jure, you write:
> I was running 2.4.0test10pre5 happily for months and wanted to see how
> things stand in the 'latest stuff'. Here's what i found:
> 
> I compiled 2.4.0-ac8 with nearly the same .config as test10pre5 (with
> latest gcc on rh7). Then i booted it and used X for some normal browsing
> and mp3s. Performance was poor, responsivness also, even the mouse
> stopped responding for a couple of seconds at a time, a lot of disk
> trashing & so on. I deceided to boot test10 back, and there was a nasty
> suprise: fsck found filesystem with errors, and LOTS of them ... i had
> to hold down 'y' for almost 5 minutes ... :)
> 
> Then i examined the logs for what would be the cause for this ... and
> here's what 2.4.0-ac8 left in the logs:
> 
> Jan 14 16:26:47 open kernel: ee_blocks: Freeing blocks not in datazone -
> block = 979727457, count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 1769096736,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 842080300,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 1851869728,
> count = 1
> Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 808464928,
> count = 1
> ...
> and so on for about 150 such lines in 3 seconds.

These block numbers read as "ate: Fri, 12 Jan 200" (i.e. part of an email
header) if you convert to 32-bit hex, then ascii.  There was another report
of corruption like this under 2.4.0 as well.

> Is this a known problem? If it's not, please advise me on how to provide
> more usefull informations.

Actually, I thought there were problems in the test11? through test-13
kernels of this sort.  I'm not sure exactly when it started, but a search
of l-k should locate it.  Depending on when last you fsck'd your filesystem,
you may have already had some of these disk problems.  However, there was
another report recently on 2.4.0 that looked the same, and that user had
just e2fsck'd his filesystem before booting 2.4.0 for the first time after
running only 2.2, so it looks like a 2.4.0 bug.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: FS corruption on 2.4.0-ac8
  2001-01-15 21:11 ` Marcelo Tosatti
@ 2001-01-16  1:00   ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2001-01-16  1:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Neil Brown, Jure Pecar, linux-kernel

On Monday January 15, marcelo@conectiva.com.br wrote:
> 
> 
> On Mon, 15 Jan 2001, Jure Pecar wrote:
> 
> > 
> > There is something not that usual about my setup: i run raid1 /boot and
> > raid5 root with one disk disconnected (its simply too loud...), so the
> > array is in degraded mode all the time. Other hardware is more or less
> > standard, p200 classic, 430vx board, adaptec2940u, 64mb ram.
> > 
> > Is this a known problem? If it's not, please advise me on how to provide
> > more usefull informations.
> 
> Neil, 
> 
> This is the second report of corruption with RAID5. 
> 
> Do you know if any of your recent changes can be the reason?
> 

Probably related.
There is growing evidence of problems when accessing a degraded array,
but I don't know if it is writing bad data, or reading incorrectly, or
just getting the parity calculation wrong...
I'll try to have a look, but with linux.conf.au coming up, it might
not be very soon.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 22:09       ` Marcelo Tosatti
  2001-01-16  0:21         ` Rainer Mager
@ 2001-01-16  2:03         ` Keith Owens
  1 sibling, 0 replies; 14+ messages in thread
From: Keith Owens @ 2001-01-16  2:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Rainer Mager, linux-kernel

On Mon, 15 Jan 2001 20:09:14 -0200 (BRST), 
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
>On Tue, 16 Jan 2001, Rainer Mager wrote:
>>>EIP; f889e044 <END_OF_CODE+385bfe34/????>   <=====
>Trace; f889d966 <END_OF_CODE+385bf756/????>
>
>It seems the oops is happening in a module's function.
>
>You have to make ksymoops parse the oops output against a System.map which
>has all modules symbols. Load each module by hand with the insmod -m
>option ("insmod -m module.o") and _append_ the outputs to System.map.

No need, just create directory /var/log/ksymoops.  insmod and rmmod
will automatically save the list of modules and the symbol table on
every module load or unload, neatly timestamped.  When you get an oops,
find the entries just before the oops and point ksymoops at those.

ksymoops -m /lib/modules/2.4.0/System.map \
	 -k /var/log/ksymoops/20010116093850.ksyms \
	 -l /var/log/ksymoops/20010116093850.modules < oops.txt

man insmod, section KSYMOOPS ASSISTANCE.  Much easier than trying to
reproduce the environment by hand.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
  2001-01-15 21:47   ` Marcelo Tosatti
@ 2001-01-16  8:40   ` Urban Widmark
  2001-01-17 23:59     ` Rainer Mager
  1 sibling, 1 reply; 14+ messages in thread
From: Urban Widmark @ 2001-01-16  8:40 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel

On Tue, 16 Jan 2001, Rainer Mager wrote:

> Hi all,
>
> 	I have a 100% reproducable bug in all of the 2.4.0 kernels including the
> latest stable one. The issue is that if I compile the kernel to support 4GB
> RAM (I have 1 GB) and then try to access a samba mount I get an oops. This

I'll have a look tonight or so. It works for you on non-bigmem?

> ALWAYS happens. Usually after this the system is frozen (although the magic
> SYSREQ still works). If the system isn't frozen then any commands that
> access the disk will freeze. Fortunately GPM worked and I was able to paste
> the oops to a file via telnet.

smb_rename suggests mv, but the process is ls ... er? What commands where
you running on smbfs when it crashed?

Could this be a symbol mismatch? Keith Owens suggested a less manual way
to get module symbol output. Do you get the same results using that?

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-16  8:40   ` Urban Widmark
@ 2001-01-17 23:59     ` Rainer Mager
  2001-01-18  0:30       ` Urban Widmark
  0 siblings, 1 reply; 14+ messages in thread
From: Rainer Mager @ 2001-01-17 23:59 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

> smb_rename suggests mv, but the process is ls ... er? What commands where
> you running on smbfs when it crashed?
>
> Could this be a symbol mismatch? Keith Owens suggested a less manual way
> to get module symbol output. Do you get the same results using that?

Here is a newly parsed oops, this time using the /var/log/ksymoops method
mentioned by Keith Owens. Does this look better?

--Rainer

[-- Attachment #2: oops.parsed --]
[-- Type: application/octet-stream, Size: 3771 bytes --]

ksymoops 0.7c on i686 2.4.0.  Options used
     -V (default)
     -k /var/log/ksymoops/20010118084505.ksyms (specified)
     -l /var/log/ksymoops/20010118084505.modules (specified)
     -o /lib/modules/2.4.0/ (default)
     -m /boot/System.map-2.4.0-bigmem (specified)

Warning (compare_maps): ksyms_base symbol highmem_start_page_R__ver_highmem_start_page not found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kmap_high_R__ver_kmap_high not found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kunmap_high_R__ver_kunmap_high not found in System.map.  Ignoring ksyms_base entry
Unable to handle kernel NULL pointer dereference at virtual address 00000000
c01239a4
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01239a4>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00001001   ebx: 00000000   ecx: c0256730   edx: 0003f435
esi: c20cde24   edi: 00000000   ebp: 00000001   esp: ee5e3e30
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 449, stackpage=ee5e3000)
Stack: c20cde24 ee5e3e64 f7e00004 00000001 c01262f5 c20cde24 00000000 00000001
       f7e00004 c1000010 fe2f0014 00000018 fe2f0000 c20cde24 f88982f8 00000000
       00000001 00000070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 00000010 f8896e69
Call Trace: [<c01262f5>] [<fe2f0014>] [<fe2f0000>] [<f88982f8>] [<f889e180>] [<f8896e69>] [<f8896eaa>]
       [<fe2f0000>] [<fe2f0000>] [<f889e048>] [<f889e03c>] [<f8896f40>] [<fe2f0000>] [<f88983b0>] [<fe2f0000>]
       [<fe2f0000>] [<f889798b>] [<fe2f0000>] [<c0140c10>] [<c0140e7c>] [<c0140f9e>] [<c0140e7c>] [<c0108f4b>]
Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44

>>EIP; c01239a4 <add_to_page_cache_unique+c0/f4>   <=====
Trace; c01262f5 <grab_cache_page+7d/a4>
Trace; fe2f0014 <END_OF_CODE+5a531b5/????>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; f88982f8 <[smbfs]smb_add_to_cache+dc/104>
Trace; f889e180 <.data.end+1321/????>
Trace; f8896e69 <[smbfs]smb_proc_readdir_long+34d/400>
Trace; f8896eaa <[smbfs]smb_proc_readdir_long+38e/400>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; f889e048 <.data.end+11e9/????>
Trace; f889e03c <.data.end+11dd/????>
Trace; f8896f40 <[smbfs]smb_proc_readdir+24/34>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; f88983b0 <[smbfs]smb_refill_dircache+24/70>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; f889798b <[smbfs]smb_readdir+db/188>
Trace; fe2f0000 <END_OF_CODE+5a531a1/????>
Trace; c0140c10 <vfs_readdir+90/ec>
Trace; c0140e7c <filldir+0/d8>
Trace; c0140f9e <sys_getdents+4a/98>
Trace; c0140e7c <filldir+0/d8>
Trace; c0108f4b <system_call+33/38>
Code;  c01239a4 <add_to_page_cache_unique+c0/f4>
00000000 <_EIP>:
Code;  c01239a4 <add_to_page_cache_unique+c0/f4>   <=====
   0:   8b 07                     mov    (%edi),%eax   <=====
Code;  c01239a6 <add_to_page_cache_unique+c2/f4>
   2:   ff 47 18                  incl   0x18(%edi)
Code;  c01239a9 <add_to_page_cache_unique+c5/f4>
   5:   89 70 04                  mov    %esi,0x4(%eax)
Code;  c01239ac <add_to_page_cache_unique+c8/f4>
   8:   89 06                     mov    %eax,(%esi)
Code;  c01239ae <add_to_page_cache_unique+ca/f4>
   a:   89 7e 04                  mov    %edi,0x4(%esi)
Code;  c01239b1 <add_to_page_cache_unique+cd/f4>
   d:   89 37                     mov    %esi,(%edi)
Code;  c01239b3 <add_to_page_cache_unique+cf/f4>
   f:   89 7e 08                  mov    %edi,0x8(%esi)
Code;  c01239b6 <add_to_page_cache_unique+d2/f4>
  12:   8b 44 00 00               mov    0x0(%eax,%eax,1),%eax


3 warnings issued.  Results may not be reliable.

[-- Attachment #3: oops.txt --]
[-- Type: text/plain, Size: 1047 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c01239a4
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01239a4>]
EFLAGS: 00010202
eax: 00001001   ebx: 00000000   ecx: c0256730   edx: 0003f435
esi: c20cde24   edi: 00000000   ebp: 00000001   esp: ee5e3e30
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 449, stackpage=ee5e3000)
Stack: c20cde24 ee5e3e64 f7e00004 00000001 c01262f5 c20cde24 00000000 00000001
       f7e00004 c1000010 fe2f0014 00000018 fe2f0000 c20cde24 f88982f8 00000000
       00000001 00000070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 00000010 f8896e69
Call Trace: [<c01262f5>] [<fe2f0014>] [<fe2f0000>] [<f88982f8>] [<f889e180>] [<f8896e69>] [<f8896eaa>]
       [<fe2f0000>] [<fe2f0000>] [<f889e048>] [<f889e03c>] [<f8896f40>] [<fe2f0000>] [<f88983b0>] [<fe2f0000>]
       [<fe2f0000>] [<f889798b>] [<fe2f0000>] [<c0140c10>] [<c0140e7c>] [<c0140f9e>] [<c0140e7c>] [<c0108f4b>]

Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44
Segmentation fault

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Oops with 4GB memory setting in 2.4.0 stable
  2001-01-17 23:59     ` Rainer Mager
@ 2001-01-18  0:30       ` Urban Widmark
  0 siblings, 0 replies; 14+ messages in thread
From: Urban Widmark @ 2001-01-18  0:30 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel

On Thu, 18 Jan 2001, Rainer Mager wrote:

> Here is a newly parsed oops, this time using the /var/log/ksymoops method
> mentioned by Keith Owens. Does this look better?

Yes, and it sort of matches the other oops someone sent. Thanks.

I have a changed version now, based on the ncpfs directory cahce code.
But it doesn't work at all right now. (and that would be the "based on"
bit, the copy and paste bits haven't crashed yet :)

Assuming that all meetings have an end, and sometimes they don't seem to
have one, there may be something for you to try tomorrow.

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-01-18  0:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15 22:47 FS corruption on 2.4.0-ac8 Jure Pecar
2001-01-15 21:11 ` Marcelo Tosatti
2001-01-16  1:00   ` Neil Brown
2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
2001-01-15 21:47   ` Marcelo Tosatti
2001-01-15 23:45     ` Rainer Mager
2001-01-15 22:09       ` Marcelo Tosatti
2001-01-16  0:21         ` Rainer Mager
2001-01-15 22:37           ` Marcelo Tosatti
2001-01-16  2:03         ` Keith Owens
2001-01-16  8:40   ` Urban Widmark
2001-01-17 23:59     ` Rainer Mager
2001-01-18  0:30       ` Urban Widmark
2001-01-16  0:29 ` FS corruption on 2.4.0-ac8 Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox