RAID0 - one drive's superblock is corrupt

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID0 - one drive's superblock is corrupt
@ 2002-07-23 22:33 Neil Sedger
  2002-07-29 10:35 ` Neil Brown
  0 siblings, 1 reply; 2+ messages in thread
From: Neil Sedger @ 2002-07-23 22:33 UTC (permalink / raw)
  To: linux-raid

Hi all.

I've had a 140gb RAID0 consisting of various drive makes/sizes for a 
year or so - running on a Red Hat 7.1, Kernel 2.4.2, raidtools-0.90-20. 
All drives have been on a Promise ATA-100 controller.

The RAID0 was originally created with 'persistant-superblock 1' and has 
auto-started itself fine every time until now.

But now I get kernel messages:

kernel: autodetecting RAID arrays
kernel: (read) hde1's sb offset: 45030080hde: dma_intr: status=0x51 { 
DriveReady SeekComplete Error }
kernel: hde: dma_intr: error=0x40 { UncorrectableError }, 
LBAsect=90060223, sector=90060160
kernel: end_request: I/O error, dev 21:01 (hde), sector 90060160
kernel: md: disabled device hde1, could not read superblock.
kernel: md: could not read hde1's sb, not importing!
kernel: could not import hde1!

Am I correct that hde1 - the first partition in RAID0 - has had its 
superblock corrupted?

'badblocks' reports that that sector, and a few others around it, are 
damaged.

 From reading the HOWTO it seems that the superblock is not actually 
required - its just a convenience thing so the kernel can start the raid 
without needing /etc/raidtab. Is that right?

So, I've tried bypassing the superblocks by editing /etc/raidtab, 
setting persistant-superblock to 0, then doing 'raid0run'.
This says its started the RAID OK, I can mount its filesystem (ext2, 
readonly at this time), some files are OK but if I delve too far into 
the filesystem I get errors and kernel messages:

kernel: attempt to access beyond end of device
kernel: 09:00: rw=0, want=326333420, limit=143733920

running e2fsck on it (in read-only mode) produces loads of inode errors, 
eventually exiting with

Error while iterating over blocks in inode 2932821: Illegal indirect 
block found

...so I thought I could try replacing the superblock. From reading the 
HOWTO and man page I think I should be able to do that with 'mkraid':

[root@giles /]# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/hde1, 45030163kB, raid superblock at 45030080kB
mkraid: aborted, see the syslog and /proc/mdstat for potential clues.

syslog said:
kernel: hde: read_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
kernel: hde: read_intr: error=0x40 { UncorrectableError }, 
LBAsect=90060223, sector=90060160
kernel: end_request: I/O error, dev 21:01 (hde), sector 90060160

Its an IBM Deskstar 40gb. IBM provide a test program which also includes 
a 'sector repair' option, so I tried that (you have to put it on a 
floppy boot with it). Its scan results agreed with 'badblocks' - it then 
offered to try sector repair. At this point I accepted - but after a 
good go it said sector repair failed.

I accept that my RAID is on its last legs and needs rebuilding after a 
low-level format or drive replacement. I also accept that the few 
corrupted sectors mean I'll lose a few files. But I'd like to access the 
  rest of the 140gb RAID0 to salvage data off elsewhere.

Any suggestions?

Thanks
Neil

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RAID0 - one drive's superblock is corrupt
  2002-07-23 22:33 RAID0 - one drive's superblock is corrupt Neil Sedger
@ 2002-07-29 10:35 ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2002-07-29 10:35 UTC (permalink / raw)
  To: Neil Sedger; +Cc: linux-raid

On Tuesday July 23, linux-raid@moley.org.uk wrote:
> 
> Any suggestions?

0/ restore from backups.  You do have backups don't you?

1/ Get a replacement disc the same size, copy all the data from one to
   the other
     dd if=/dev/olddisk of=/dev/newdisk bs=100M
   then change your raidtab to refer to the new drive instead of the
   old run the "mkraid" command to re-create the superblocks.

2/ The reason the "persistant=0" trick didn't work is that the
   effective size of the disk is smaller when you have a superblock.
   so: work out the effecitive size of each partition ... kernel log
   messages might tell you. mkraid will tell you (hde1 has an
   effective size of 45030080kB).
   Then change your partition tables with fdisk so that these
   partitions appear to breally the size that raid0 effecitvely make
   them, and then to the persistant=0 trick.
   To calculate the effective size, take the real size, round down to
   a multiple of 64K and subtract 64K.
   e.g. for hde1, real size is 45030163Kb
     45030163/64 == 703596
     703596 * 64 == 45030144  (this is rounded down)
     45030144 - 64 == 45030080  which is the correct number.

   To might have to use export mode to convince fdisk to ignore
   "cylinder boundries".

3/ Hack the md code to not actually write out superblocks,
   or to read them in.  And then try the mkraid command to create
   a new array.

4/ Sue IBM for selling such dodgy hard drives.

Good luck.

NeilBrown

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-07-29 10:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-23 22:33 RAID0 - one drive's superblock is corrupt Neil Sedger
2002-07-29 10:35 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).