corrupted disk

All of lore.kernel.org
 help / color / mirror / Atom feed

* corrupted disk
@ 2005-10-11 13:02 Peter Nixon
  2005-10-11 13:08 ` Peter Nixon
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Nixon @ 2005-10-11 13:02 UTC (permalink / raw)
  To: reiserfs-list

Hi List

I have an interesting problem at a customer which I hope someone can shed some 
light on.

The server is an IBM server with an multipath SCSI controller connected to a 
SAN with multiple 2 TB disks configured. The Operating System is SLES 8. 
Among other things the server runs IBM DB2.
A previous contractor recommended to that the filesystems be directly created 
on the disk devices, NOT on disk partitions so the filesystem in question is 
on /dev/sdc

At 06:15 this morning the following errors showed up in /var/log/messages

Oct 11 06:15:03 DB2MUHASEBE kernel: journal-2332: Trying to log block 359, 
which is a log block
Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp #1 
SMP Fri Oct 31 00:51:31 UTC 2003
Oct 11 06:15:03 DB2MUHASEBE kernel: CPU:    1
Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317460792/96]    
Tainted: P
Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted: P
Oct 11 06:15:03 DB2MUHASEBE kernel: EFLAGS: 00010296
Oct 11 06:15:03 DB2MUHASEBE kernel: eax: 0000003f   ebx: f11ea000   ecx: 
00000046   edx: c032f8c8
Oct 11 06:15:03 DB2MUHASEBE kernel: esi: f9596578   edi: 00000167   ebp: 
f958a7a0   esp: ebfe9ea4
Oct 11 06:15:03 DB2MUHASEBE kernel: ds: 0018   es: 0018   ss: 0018
Oct 11 06:15:03 DB2MUHASEBE kernel: Process db2sysc (pid: 18866, 
stackpage=ebfe9000)
Oct 11 06:15:03 DB2MUHASEBE kernel: Stack: c50b4d56 c50b5800 f9596578 00002012 
c50a8758 f11ea000 c50b3fe0 00000167
Oct 11 06:15:03 DB2MUHASEBE kernel:        00000006 c50a65a5 c50b5051 03882f46 
ef8d48d8 00000000 d7c8e7a0 00000004
Oct 11 06:15:03 DB2MUHASEBE kernel:        00000000 00000042 00000000 e68797e0 
e6879260 f6277000 f475b000 f9596578
Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317338282/96] 
(04) [st:__insmod_st_O/
lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317335552/96] (12) 
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317388968/96] 
(08)
Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         [<c50b4d56>] (04) 
[<c50b5800>] (12) [<c50a8758>] (08)
Oct 11 06:15:03 DB2MUHASEBE kernel:   
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317341728/96] 
(12) [st:__insmod_st_O/lib/modules/2.4.21
-138-smp/kernel/drivers/scs+-317397595/96] (04) 
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317337519/96] 
(72) [st:__insmod_st_O/lib/modu
les/2.4.21-138-smp/kernel/drivers/scs+-317394646/96] (28)
Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50b3fe0>] (12) [<c50a65a5>] (04) 
[<c50b5051>] (72) [<c50a712a>] (28)
Oct 11 06:15:03 DB2MUHASEBE kernel:   
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317392482/96] 
(64) [st:__insmod_st_O/lib/modules/2.4.21
-138-smp/kernel/drivers/scs+-317392365/96] (24) 
[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317492411/96] 
(20) [sys_fsync+152/208] (36)
Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50a799e>] (64) [<c50a7a13>] (24) 
[<c508f345>] (20) [<c0157688>] (36)
Oct 11 06:15:03 DB2MUHASEBE kernel:   [system_call+51/56] (60)
Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c01096b7>] (60)
Oct 11 06:15:03 DB2MUHASEBE kernel: Modules: 
[(reiserfs:<c5080060>:<c50b71b4>)]
Oct 11 06:15:03 DB2MUHASEBE kernel: Code: 0f 0b 4e 01 5c 4d 0b c5 85 db 74 0e 
0f b7 43 08 89 04 24 e8


Dmesg shows things like:
sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
FAT: bogus logical sector size 0
VFS: Can't find a valid FAT filesystem on dev 08:20.
sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)

And mount now shows:
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       or too many mounted file systems
       (aren't you trying to mount an extended partition,
       instead of some logical partition inside?)

I am now doing a dd_rescue copy of the disk to another disk in the SAN as a 
backup which looks like it is going to take another 20 hours so in the 
meantime I was hoping someone might have some ideas what caused it, and the 
best way to recover this partition.

Any ideas?

-- 

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:02 corrupted disk Peter Nixon
@ 2005-10-11 13:08 ` Peter Nixon
  2005-10-11 13:31 ` Sander
  2005-10-11 13:53 ` Vladimir V. Saveliev
  2 siblings, 0 replies; 11+ messages in thread
From: Peter Nixon @ 2005-10-11 13:08 UTC (permalink / raw)
  To: reiserfs-list

Additional HW info:

IBM X335 Server with HP2214 HBA( OEM qlogic 2300 fiber channnel card)
SAN: IBM FAStT 600 Storage Device

-Peter

On Tuesday 11 October 2005 16:02, Peter Nixon wrote:
> Hi List
>
> I have an interesting problem at a customer which I hope someone can shed
> some light on.
>
> The server is an IBM server with an multipath SCSI controller connected to
> a SAN with multiple 2 TB disks configured. The Operating System is SLES 8.
> Among other things the server runs IBM DB2.
> A previous contractor recommended to that the filesystems be directly
> created on the disk devices, NOT on disk partitions so the filesystem in
> question is on /dev/sdc
>
> At 06:15 this morning the following errors showed up in /var/log/messages
>
> Oct 11 06:15:03 DB2MUHASEBE kernel: journal-2332: Trying to log block 359,
> which is a log block
> Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp #1
> SMP Fri Oct 31 00:51:31 UTC 2003
> Oct 11 06:15:03 DB2MUHASEBE kernel: CPU:    1
> Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317460792/
>96] Tainted: P
> Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted: P
> Oct 11 06:15:03 DB2MUHASEBE kernel: EFLAGS: 00010296
> Oct 11 06:15:03 DB2MUHASEBE kernel: eax: 0000003f   ebx: f11ea000   ecx:
> 00000046   edx: c032f8c8
> Oct 11 06:15:03 DB2MUHASEBE kernel: esi: f9596578   edi: 00000167   ebp:
> f958a7a0   esp: ebfe9ea4
> Oct 11 06:15:03 DB2MUHASEBE kernel: ds: 0018   es: 0018   ss: 0018
> Oct 11 06:15:03 DB2MUHASEBE kernel: Process db2sysc (pid: 18866,
> stackpage=ebfe9000)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Stack: c50b4d56 c50b5800 f9596578
> 00002012 c50a8758 f11ea000 c50b3fe0 00000167
> Oct 11 06:15:03 DB2MUHASEBE kernel:        00000006 c50a65a5 c50b5051
> 03882f46 ef8d48d8 00000000 d7c8e7a0 00000004
> Oct 11 06:15:03 DB2MUHASEBE kernel:        00000000 00000042 00000000
> e68797e0 e6879260 f6277000 f475b000 f9596578
> Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317338282/
>96] (04) [st:__insmod_st_O/
> lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317335552/96] (12)
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317388968/
>96] (08)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         [<c50b4d56>] (04)
> [<c50b5800>] (12) [<c50a8758>] (08)
> Oct 11 06:15:03 DB2MUHASEBE kernel:
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317341728/
>96] (12) [st:__insmod_st_O/lib/modules/2.4.21
> -138-smp/kernel/drivers/scs+-317397595/96] (04)
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317337519/
>96] (72) [st:__insmod_st_O/lib/modu
> les/2.4.21-138-smp/kernel/drivers/scs+-317394646/96] (28)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50b3fe0>] (12) [<c50a65a5>] (04)
> [<c50b5051>] (72) [<c50a712a>] (28)
> Oct 11 06:15:03 DB2MUHASEBE kernel:
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317392482/
>96] (64) [st:__insmod_st_O/lib/modules/2.4.21
> -138-smp/kernel/drivers/scs+-317392365/96] (24)
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317492411/
>96] (20) [sys_fsync+152/208] (36)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50a799e>] (64) [<c50a7a13>] (24)
> [<c508f345>] (20) [<c0157688>] (36)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [system_call+51/56] (60)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c01096b7>] (60)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Modules:
> [(reiserfs:<c5080060>:<c50b71b4>)]
> Oct 11 06:15:03 DB2MUHASEBE kernel: Code: 0f 0b 4e 01 5c 4d 0b c5 85 db 74
> 0e 0f b7 43 08 89 04 24 e8
>
>
> Dmesg shows things like:
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
> FAT: bogus logical sector size 0
> VFS: Can't find a valid FAT filesystem on dev 08:20.
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
>
> And mount now shows:
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        or too many mounted file systems
>        (aren't you trying to mount an extended partition,
>        instead of some logical partition inside?)
>
> I am now doing a dd_rescue copy of the disk to another disk in the SAN as a
> backup which looks like it is going to take another 20 hours so in the
> meantime I was hoping someone might have some ideas what caused it, and the
> best way to recover this partition.
>
> Any ideas?

-- 

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:02 corrupted disk Peter Nixon
  2005-10-11 13:08 ` Peter Nixon
@ 2005-10-11 13:31 ` Sander
  2005-10-11 13:34   ` Peter Nixon
  2005-10-11 13:53 ` Vladimir V. Saveliev
  2 siblings, 1 reply; 11+ messages in thread
From: Sander @ 2005-10-11 13:31 UTC (permalink / raw)
  To: Peter Nixon; +Cc: reiserfs-list

Peter Nixon wrote (ao):
> At 06:15 this morning the following errors showed up in /var/log/messages

> Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp #1 
> SMP Fri Oct 31 00:51:31 UTC 2003

> Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted: P

Your kernel is very, very old and tainted.

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:31 ` Sander
@ 2005-10-11 13:34   ` Peter Nixon
  2005-10-11 13:55     ` Sander
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Nixon @ 2005-10-11 13:34 UTC (permalink / raw)
  To: reiserfs-list, sander

On Tuesday 11 October 2005 16:31, Sander wrote:
> Peter Nixon wrote (ao):
> > At 06:15 this morning the following errors showed up in /var/log/messages
> >
> > Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> > Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp
> > #1 SMP Fri Oct 31 00:51:31 UTC 2003
> >
> > Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted:
> > P
>
> Your kernel is very, very old and tainted.

Yes. I am aware of that. As I mentioned the server is an IBM server running 
SUSE Linux Enterprise 8 and DB2. At the time of deployment of the server SLES 
9 was not yet certified to run with DB2.

Regards

-- 

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:02 corrupted disk Peter Nixon
  2005-10-11 13:08 ` Peter Nixon
  2005-10-11 13:31 ` Sander
@ 2005-10-11 13:53 ` Vladimir V. Saveliev
       [not found]   ` <200510111717.02119.listuser@peternixon.net>
  2 siblings, 1 reply; 11+ messages in thread
From: Vladimir V. Saveliev @ 2005-10-11 13:53 UTC (permalink / raw)
  To: Peter Nixon; +Cc: reiserfs-list

Hello

Peter Nixon wrote:
> Hi List
> 
> I have an interesting problem at a customer which I hope someone can shed some 
> light on.
> 
> The server is an IBM server with an multipath SCSI controller connected to a 
> SAN with multiple 2 TB disks configured. The Operating System is SLES 8. 
> Among other things the server runs IBM DB2.
> A previous contractor recommended to that the filesystems be directly created 
> on the disk devices, NOT on disk partitions so the filesystem in question is 
> on /dev/sdc
> 
> At 06:15 this morning the following errors showed up in /var/log/messages
> 
> Oct 11 06:15:03 DB2MUHASEBE kernel: journal-2332: Trying to log block 359, 
> which is a log block

this is a good reason to crash

> Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp #1 
> SMP Fri Oct 31 00:51:31 UTC 2003
> Oct 11 06:15:03 DB2MUHASEBE kernel: CPU:    1
> Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317460792/96]    
> Tainted: P
> Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted: P
> Oct 11 06:15:03 DB2MUHASEBE kernel: EFLAGS: 00010296
> Oct 11 06:15:03 DB2MUHASEBE kernel: eax: 0000003f   ebx: f11ea000   ecx: 
> 00000046   edx: c032f8c8
> Oct 11 06:15:03 DB2MUHASEBE kernel: esi: f9596578   edi: 00000167   ebp: 
> f958a7a0   esp: ebfe9ea4
> Oct 11 06:15:03 DB2MUHASEBE kernel: ds: 0018   es: 0018   ss: 0018
> Oct 11 06:15:03 DB2MUHASEBE kernel: Process db2sysc (pid: 18866, 
> stackpage=ebfe9000)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Stack: c50b4d56 c50b5800 f9596578 00002012 
> c50a8758 f11ea000 c50b3fe0 00000167
> Oct 11 06:15:03 DB2MUHASEBE kernel:        00000006 c50a65a5 c50b5051 03882f46 
> ef8d48d8 00000000 d7c8e7a0 00000004
> Oct 11 06:15:03 DB2MUHASEBE kernel:        00000000 00000042 00000000 e68797e0 
> e6879260 f6277000 f475b000 f9596578
> Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317338282/96] 
> (04) [st:__insmod_st_O/
> lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317335552/96] (12) 
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317388968/96] 
> (08)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         [<c50b4d56>] (04) 
> [<c50b5800>] (12) [<c50a8758>] (08)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317341728/96] 
> (12) [st:__insmod_st_O/lib/modules/2.4.21
> -138-smp/kernel/drivers/scs+-317397595/96] (04) 
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317337519/96] 
> (72) [st:__insmod_st_O/lib/modu
> les/2.4.21-138-smp/kernel/drivers/scs+-317394646/96] (28)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50b3fe0>] (12) [<c50a65a5>] (04) 
> [<c50b5051>] (72) [<c50a712a>] (28)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317392482/96] 
> (64) [st:__insmod_st_O/lib/modules/2.4.21
> -138-smp/kernel/drivers/scs+-317392365/96] (24) 
> [st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317492411/96] 
> (20) [sys_fsync+152/208] (36)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50a799e>] (64) [<c50a7a13>] (24) 
> [<c508f345>] (20) [<c0157688>] (36)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [system_call+51/56] (60)
> Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c01096b7>] (60)
> Oct 11 06:15:03 DB2MUHASEBE kernel: Modules: 
> [(reiserfs:<c5080060>:<c50b71b4>)]
> Oct 11 06:15:03 DB2MUHASEBE kernel: Code: 0f 0b 4e 01 5c 4d 0b c5 85 db 74 0e 
> 0f b7 43 08 89 04 24 e8
> 
> 
> Dmesg shows things like:
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
> FAT: bogus logical sector size 0
> VFS: Can't find a valid FAT filesystem on dev 08:20.
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
> sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
> 
> And mount now shows:
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        or too many mounted file systems
>        (aren't you trying to mount an extended partition,
>        instead of some logical partition inside?)
> 
> I am now doing a dd_rescue copy of the disk to another disk in the SAN as a 

this is right thing to do

> backup which looks like it is going to take another 20 hours so in the 
> meantime I was hoping someone might have some ideas what caused it, and the 
> best way to recover this partition.
> 
> Any ideas?
> 

can you send us few blocks of /dev/sdc?
dd if=/dev/sdc bs=4096 count=1000 | gzip -c > sdc.head.gz


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:34   ` Peter Nixon
@ 2005-10-11 13:55     ` Sander
  2005-10-11 14:07       ` Peter Nixon
  2005-10-11 14:10       ` Vladimir V. Saveliev
  0 siblings, 2 replies; 11+ messages in thread
From: Sander @ 2005-10-11 13:55 UTC (permalink / raw)
  To: Peter Nixon; +Cc: reiserfs-list, sander

Peter Nixon wrote (ao):
> On Tuesday 11 October 2005 16:31, Sander wrote:
> > Peter Nixon wrote (ao):
> > > At 06:15 this morning the following errors showed up in /var/log/messages
> > >
> > > Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> > > Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp
> > > #1 SMP Fri Oct 31 00:51:31 UTC 2003
> > >
> > > Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted:
> > > P
> >
> > Your kernel is very, very old and tainted.
> 
> Yes. I am aware of that. As I mentioned the server is an IBM server running 
> SUSE Linux Enterprise 8 and DB2. At the time of deployment of the server SLES 
> 9 was not yet certified to run with DB2.

What I'm trying to say is that you are very unlikely to receive support
on such an old kernel. And most likely the bug is fixed in a younger
kernel.

And, you are also running a tainted kernel. You are less likely to
receive support on a tainted kernel.

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:55     ` Sander
@ 2005-10-11 14:07       ` Peter Nixon
  2005-10-11 14:10       ` Vladimir V. Saveliev
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Nixon @ 2005-10-11 14:07 UTC (permalink / raw)
  To: sander; +Cc: reiserfs-list

On Tuesday 11 October 2005 16:55, Sander wrote:
> Peter Nixon wrote (ao):
> > On Tuesday 11 October 2005 16:31, Sander wrote:
> > > Peter Nixon wrote (ao):
> > > > At 06:15 this morning the following errors showed up in
> > > > /var/log/messages
> > > >
> > > > Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> > > > Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000
> > > > 2.4.21-138-smp #1 SMP Fri Oct 31 00:51:31 UTC 2003
> > > >
> > > > Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]   
> > > > Tainted: P
> > >
> > > Your kernel is very, very old and tainted.
> >
> > Yes. I am aware of that. As I mentioned the server is an IBM server
> > running SUSE Linux Enterprise 8 and DB2. At the time of deployment of the
> > server SLES 9 was not yet certified to run with DB2.
>
> What I'm trying to say is that you are very unlikely to receive support
> on such an old kernel. And most likely the bug is fixed in a younger
> kernel.
>
> And, you are also running a tainted kernel. You are less likely to
> receive support on a tainted kernel.

Yes I understand the issue, however without running an official SLES kernel 
the customer cannot receive support on DB2 or on their IBM hardware. If there 
is a bug in the kernel that causes this then I need to get SUSE to roll a new 
kernel and get it certified by IBM. If it is a bug in the SCSI driver (the 
module that taints the kernel) then I need to report it to IBM and get them 
to supply a fixed module.

-- 

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 13:55     ` Sander
  2005-10-11 14:07       ` Peter Nixon
@ 2005-10-11 14:10       ` Vladimir V. Saveliev
  2005-10-11 14:22         ` Sander
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir V. Saveliev @ 2005-10-11 14:10 UTC (permalink / raw)
  To: sander; +Cc: Peter Nixon, reiserfs-list, Vitaly Fertman

Hello

Sander wrote:
> Peter Nixon wrote (ao):
>>On Tuesday 11 October 2005 16:31, Sander wrote:
>>>Peter Nixon wrote (ao):
>>>>At 06:15 this morning the following errors showed up in /var/log/messages
>>>>
>>>>Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
>>>>Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp
>>>>#1 SMP Fri Oct 31 00:51:31 UTC 2003
>>>>
>>>>Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted:
>>>>P
>>>Your kernel is very, very old and tainted.
>>Yes. I am aware of that. As I mentioned the server is an IBM server running 
>>SUSE Linux Enterprise 8 and DB2. At the time of deployment of the server SLES 
>>9 was not yet certified to run with DB2.
> 
> What I'm trying to say is that you are very unlikely to receive support
> on such an old kernel. And most likely the bug is fixed in a younger
> kernel.
> 
> And, you are also running a tainted kernel. You are less likely to
> receive support on a tainted kernel.
> 
The filesystem should not have been eaten. We can to understand what did happen
with it. Even if kernel managed somehow wipe filesystem superblock out - it
could not (well, it should not) corrupt all the data.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 14:10       ` Vladimir V. Saveliev
@ 2005-10-11 14:22         ` Sander
  0 siblings, 0 replies; 11+ messages in thread
From: Sander @ 2005-10-11 14:22 UTC (permalink / raw)
  To: Vladimir V. Saveliev; +Cc: sander, Peter Nixon, reiserfs-list, Vitaly Fertman

Vladimir V. Saveliev wrote (ao):
> Hello
> 
> Sander wrote:
> > Peter Nixon wrote (ao):
> >>On Tuesday 11 October 2005 16:31, Sander wrote:
> >>>Peter Nixon wrote (ao):
> >>>>At 06:15 this morning the following errors showed up in /var/log/messages
> >>>>
> >>>>Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
> >>>>Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp
> >>>>#1 SMP Fri Oct 31 00:51:31 UTC 2003
> >>>>
> >>>>Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted:
> >>>>P
> >>>Your kernel is very, very old and tainted.
> >>Yes. I am aware of that. As I mentioned the server is an IBM server running 
> >>SUSE Linux Enterprise 8 and DB2. At the time of deployment of the server SLES 
> >>9 was not yet certified to run with DB2.
> > 
> > What I'm trying to say is that you are very unlikely to receive support
> > on such an old kernel. And most likely the bug is fixed in a younger
> > kernel.
> > 
> > And, you are also running a tainted kernel. You are less likely to
> > receive support on a tainted kernel.
> > 
> The filesystem should not have been eaten. We can to understand what did happen
> with it. Even if kernel managed somehow wipe filesystem superblock out - it
> could not (well, it should not) corrupt all the data.

Ok, ok, I'm crawling back under my rock already :-)

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
       [not found]   ` <200510111717.02119.listuser@peternixon.net>
@ 2005-10-11 15:05     ` Vladimir V. Saveliev
  2005-10-11 15:33       ` Peter Nixon
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir V. Saveliev @ 2005-10-11 15:05 UTC (permalink / raw)
  To: Peter Nixon; +Cc: reiserfs-list, Vitaly Fertman

Peter Nixon wrote:
> Here you go.
> 

ok, I have to spend some time to decode this. As you are doing dd_rescue anyway
- we can continue tomorrow, ok?

> -Peter
> 
> On Tuesday 11 October 2005 16:53, Vladimir V. Saveliev wrote:
>>Hello
>>
>>Peter Nixon wrote:
>>>Hi List
>>>
>>>I have an interesting problem at a customer which I hope someone can shed
>>>some light on.
>>>
>>>The server is an IBM server with an multipath SCSI controller connected
>>>to a SAN with multiple 2 TB disks configured. The Operating System is
>>>SLES 8. Among other things the server runs IBM DB2.
>>>A previous contractor recommended to that the filesystems be directly
>>>created on the disk devices, NOT on disk partitions so the filesystem in
>>>question is on /dev/sdc
>>>
>>>At 06:15 this morning the following errors showed up in /var/log/messages
>>>
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: journal-2332: Trying to log block
>>>359, which is a log block
>>this is a good reason to crash
>>
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: kernel BUG at prints.c:334!
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: invalid operand: 0000 2.4.21-138-smp
>>>#1 SMP Fri Oct 31 00:51:31 UTC 2003
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: CPU:    1
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31746079
>>>2/96] Tainted: P
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: EIP:    0010:[<c5096ec8>]    Tainted:
>>>P Oct 11 06:15:03 DB2MUHASEBE kernel: EFLAGS: 00010296
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: eax: 0000003f   ebx: f11ea000   ecx:
>>>00000046   edx: c032f8c8
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: esi: f9596578   edi: 00000167   ebp:
>>>f958a7a0   esp: ebfe9ea4
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: ds: 0018   es: 0018   ss: 0018
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Process db2sysc (pid: 18866,
>>>stackpage=ebfe9000)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Stack: c50b4d56 c50b5800 f9596578
>>>00002012 c50a8758 f11ea000 c50b3fe0 00000167
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:        00000006 c50a65a5 c50b5051
>>>03882f46 ef8d48d8 00000000 d7c8e7a0 00000004
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:        00000000 00000042 00000000
>>>e68797e0 e6879260 f6277000 f475b000 f9596578
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31733828
>>>2/96] (04) [st:__insmod_st_O/
>>>lib/modules/2.4.21-138-smp/kernel/drivers/scs+-317335552/96] (12)
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31738896
>>>8/96] (08)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Call Trace:         [<c50b4d56>] (04)
>>>[<c50b5800>] (12) [<c50a8758>] (08)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31734172
>>>8/96] (12) [st:__insmod_st_O/lib/modules/2.4.21
>>>-138-smp/kernel/drivers/scs+-317397595/96] (04)
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31733751
>>>9/96] (72) [st:__insmod_st_O/lib/modu
>>>les/2.4.21-138-smp/kernel/drivers/scs+-317394646/96] (28)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50b3fe0>] (12) [<c50a65a5>] (04)
>>>[<c50b5051>] (72) [<c50a712a>] (28)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31739248
>>>2/96] (64) [st:__insmod_st_O/lib/modules/2.4.21
>>>-138-smp/kernel/drivers/scs+-317392365/96] (24)
>>>[st:__insmod_st_O/lib/modules/2.4.21-138-smp/kernel/drivers/scs+-31749241
>>>1/96] (20) [sys_fsync+152/208] (36)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c50a799e>] (64) [<c50a7a13>] (24)
>>>[<c508f345>] (20) [<c0157688>] (36)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:   [system_call+51/56] (60)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel:   [<c01096b7>] (60)
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Modules:
>>>[(reiserfs:<c5080060>:<c50b71b4>)]
>>>Oct 11 06:15:03 DB2MUHASEBE kernel: Code: 0f 0b 4e 01 5c 4d 0b c5 85 db
>>>74 0e 0f b7 43 08 89 04 24 e8
>>>
>>>
>>>Dmesg shows things like:
>>>sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
>>>FAT: bogus logical sector size 0
>>>VFS: Can't find a valid FAT filesystem on dev 08:20.
>>>sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
>>>sh-2021: reiserfs_read_super: can not find reiserfs on sd(8,32)
>>>
>>>And mount now shows:
>>>mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>>>       or too many mounted file systems
>>>       (aren't you trying to mount an extended partition,
>>>       instead of some logical partition inside?)
>>>
>>>I am now doing a dd_rescue copy of the disk to another disk in the SAN as
>>>a
>>this is right thing to do
>>
>>>backup which looks like it is going to take another 20 hours so in the
>>>meantime I was hoping someone might have some ideas what caused it, and
>>>the best way to recover this partition.
>>>
>>>Any ideas?
>>can you send us few blocks of /dev/sdc?
>>dd if=/dev/sdc bs=4096 count=1000 | gzip -c > sdc.head.gz
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: corrupted disk
  2005-10-11 15:05     ` Vladimir V. Saveliev
@ 2005-10-11 15:33       ` Peter Nixon
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Nixon @ 2005-10-11 15:33 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Vladimir V. Saveliev, Vitaly Fertman

On Tuesday 11 October 2005 18:05, Vladimir V. Saveliev wrote:
> Peter Nixon wrote:
> > Here you go.
>
> ok, I have to spend some time to decode this. As you are doing dd_rescue
> anyway - we can continue tomorrow, ok?

Sure. I expect the dd_rescue to finish at around 9am UTC tomorrow at which 
point I will have a 2TB fileimage to play with.

Cheers

-- 

Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-10-11 15:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-11 13:02 corrupted disk Peter Nixon
2005-10-11 13:08 ` Peter Nixon
2005-10-11 13:31 ` Sander
2005-10-11 13:34   ` Peter Nixon
2005-10-11 13:55     ` Sander
2005-10-11 14:07       ` Peter Nixon
2005-10-11 14:10       ` Vladimir V. Saveliev
2005-10-11 14:22         ` Sander
2005-10-11 13:53 ` Vladimir V. Saveliev
     [not found]   ` <200510111717.02119.listuser@peternixon.net>
2005-10-11 15:05     ` Vladimir V. Saveliev
2005-10-11 15:33       ` Peter Nixon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.