linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* recovering after a /dev/sda failure on raid1
@ 2002-08-01 14:49 ` Louis-David Mitterrand
  2002-08-02 21:26   ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Louis-David Mitterrand @ 2002-08-01 14:49 UTC (permalink / raw)
  To: linux-raid



Hi,

I have a root raid1 partition on /dev/sda1 & /dev/sdb1 (swap on
/dev/sda2 & /dev/sdb2). The server boots directly from the raid
partition. 

Now /dev/sda1 and /dev/sda2 have both failed and been removed from the
array and I am getting ready to replace the disk tonight.

What is the best way to proceed to minimize downtime?

My concern is that if I power down and replace /dev/sda the machine
won't be able to reboot without a rescue CD (lilo.conf has root=/dev/md0
and boot=/dev/md0) or will it? 

When the bios (Dell Poweredge 1500) will try /dev/sda's mbr and fail,
will it then automatically try /dev/sdb?

Or should I swap /dev/sdb on the scsi ribbon to have it take the first
place and thus become /dev/sda or will this just confuse the kernel
raid driver? (the letter on scsi drives is dependent on their place on
the ribbon cable, isn't it?)

Alternatively I was thinking of booting with a rescue CD (after
replacing /dev/sda) with the "root=/dev/md0", creating my partitions,
running lilo and rebooting into production for final reconstruction.
Would that be the safest bet?

Thanks in advance for your insight, cheers,

-- 
vindex@apartia.org 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-01 14:49 ` recovering after a /dev/sda failure on raid1 Louis-David Mitterrand
@ 2002-08-02 21:26   ` Neil Brown
  2002-08-02 23:03     ` Danilo Godec
  2002-08-05  9:26     ` Louis-David Mitterrand
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Brown @ 2002-08-02 21:26 UTC (permalink / raw)
  To: Louis-David Mitterrand; +Cc: linux-raid

On Thursday August 1, vindex@apartia.org wrote:
> 
> 
> Hi,
> 
> I have a root raid1 partition on /dev/sda1 & /dev/sdb1 (swap on
> /dev/sda2 & /dev/sdb2). The server boots directly from the raid
> partition. 
> 
> Now /dev/sda1 and /dev/sda2 have both failed and been removed from the
> array and I am getting ready to replace the disk tonight.

Lucky you :-)

> 
> What is the best way to proceed to minimize downtime?
> 
> My concern is that if I power down and replace /dev/sda the machine
> won't be able to reboot without a rescue CD (lilo.conf has root=/dev/md0
> and boot=/dev/md0) or will it? 
> 
> When the bios (Dell Poweredge 1500) will try /dev/sda's mbr and fail,
> will it then automatically try /dev/sdb?

With most bioses I have seen you can explicitly tell it which device
to boot from.  But cannot say for-sure about Dell Poweredge.

> 
> Or should I swap /dev/sdb on the scsi ribbon to have it take the first
> place and thus become /dev/sda or will this just confuse the kernel
> raid driver? (the letter on scsi drives is dependent on their place on
> the ribbon cable, isn't it?)

It isn't the position on the ribbon cable.  It is the position in the
scsi device number ordering.  If  you make sure the new drive has a
larger number than the old drive, the old drive will appear as sda.

> 
> Alternatively I was thinking of booting with a rescue CD (after
> replacing /dev/sda) with the "root=/dev/md0", creating my partitions,
> running lilo and rebooting into production for final reconstruction.
> Would that be the safest bet?

This sounds like the best bet to me.  You do have to boot twice, but
if you try the other, less well understood (by you atleast) approach,
there is an even chance you will need to reboot a couple of times
anyway.

NeilBrown


> 
> Thanks in advance for your insight, cheers,
> 
> -- 
> vindex@apartia.org 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-02 21:26   ` Neil Brown
@ 2002-08-02 23:03     ` Danilo Godec
  2002-08-05  9:34       ` Louis-David Mitterrand
  2002-08-05  9:26     ` Louis-David Mitterrand
  1 sibling, 1 reply; 11+ messages in thread
From: Danilo Godec @ 2002-08-02 23:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: Louis-David Mitterrand, linux-raid

On Sat, 3 Aug 2002, Neil Brown wrote:

> > My concern is that if I power down and replace /dev/sda the machine
> > won't be able to reboot without a rescue CD (lilo.conf has root=/dev/md0
> > and boot=/dev/md0) or will it?

A recent enough lilo knows raid partition and will install it self on both
disks. So, in theory both disk should be able to boot you system.

> > When the bios (Dell Poweredge 1500) will try /dev/sda's mbr and fail,
> > will it then automatically try /dev/sdb?

It might not even try /dev/sdb if /dev/sda exists...
But with some recent bioses you can choose which drive you want to boot,
sou you could just choose your 2nd SCSI drive and it should work.


   D.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-02 21:26   ` Neil Brown
  2002-08-02 23:03     ` Danilo Godec
@ 2002-08-05  9:26     ` Louis-David Mitterrand
  2002-08-05 12:15       ` Neil Brown
  2002-08-05 18:35       ` Maurice Hilarius
  1 sibling, 2 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2002-08-05  9:26 UTC (permalink / raw)
  To: linux-raid

On Sat, Aug 03, 2002 at 07:26:29AM +1000, Neil Brown wrote:
> On Thursday August 1, vindex@apartia.org wrote:
> > 
> > I have a root raid1 partition on /dev/sda1 & /dev/sdb1 (swap on
> > /dev/sda2 & /dev/sdb2). The server boots directly from the raid
> > partition. 
> > 
> > Now /dev/sda1 and /dev/sda2 have both failed and been removed from the
> > array and I am getting ready to replace the disk tonight.
> 
> Lucky you :-)

It went fast fortunately ;-) And yes, I am very lucky to have raid1
notify me by email through mdadm of a disk failure.

FWIW the disk that is malfunctioning is a 3-month-old Fujitsu 15k 36G
(MAM3367MP) which is an expensive server-grade disk. The reason I
selected Fujitsu was because of reported quality problems on IBM disks
and Fujitsu's good reputation on SCSI (their IDE line is bad however).
I am looking for informed opinions on these disks and recommendations
for future purchases. What are the most reliable SCSI disks out there?

It must be: fast, affordable, reliable, (select any two ;-)

> > 
> > What is the best way to proceed to minimize downtime?
> > 
> > My concern is that if I power down and replace /dev/sda the machine
> > won't be able to reboot without a rescue CD (lilo.conf has root=/dev/md0
> > and boot=/dev/md0) or will it? 
> > 
> > When the bios (Dell Poweredge 1500) will try /dev/sda's mbr and fail,
> > will it then automatically try /dev/sdb?
> 
> With most bioses I have seen you can explicitly tell it which device
> to boot from.  But cannot say for-sure about Dell Poweredge.

Yes, I found it's in the SCSI bios itself. Very configurable.

Unfortunately when I tried booting from /dev/sdb the screen filled with
010101010 instead of "lilo". And this is on debian unstable, having run
lilo on /dev/md0 just prior booting. In fact I found that having root
and boot set to /dev/md0 in lilo.conf does not allow me to boot my raid1
partition. However if I set root=/dev/sda all goes well. Any trick here?
(both disks are identical)

> > 
> > Or should I swap /dev/sdb on the scsi ribbon to have it take the first
> > place and thus become /dev/sda or will this just confuse the kernel
> > raid driver? (the letter on scsi drives is dependent on their place on
> > the ribbon cable, isn't it?)
> 
> It isn't the position on the ribbon cable.  It is the position in the
> scsi device number ordering.  If  you make sure the new drive has a
> larger number than the old drive, the old drive will appear as sda.

I'm really ashamed to have asked that one, memory lapse on my part.

> > 
> > Alternatively I was thinking of booting with a rescue CD (after
> > replacing /dev/sda) with the "root=/dev/md0", creating my partitions,
> > running lilo and rebooting into production for final reconstruction.
> > Would that be the safest bet?
> 
> This sounds like the best bet to me.  You do have to boot twice, but
> if you try the other, less well understood (by you atleast) approach,
> there is an even chance you will need to reboot a couple of times
> anyway.

Having failed to boot /dev/sdb this is what I ended up doing and all
went well.

Thanks for your help and ideas, cheers,

-- 
ldm@apartia.org 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-02 23:03     ` Danilo Godec
@ 2002-08-05  9:34       ` Louis-David Mitterrand
  0 siblings, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2002-08-05  9:34 UTC (permalink / raw)
  To: linux-raid

On Sat, Aug 03, 2002 at 01:03:34AM +0200, Danilo Godec wrote:
> On Sat, 3 Aug 2002, Neil Brown wrote:
> 
> > > My concern is that if I power down and replace /dev/sda the machine
> > > won't be able to reboot without a rescue CD (lilo.conf has root=/dev/md0
> > > and boot=/dev/md0) or will it?
> 
> A recent enough lilo knows raid partition and will install it self on both
> disks. So, in theory both disk should be able to boot you system.

In theory, because I was unable to boot this debian sid system with
lilo-22.2 and these settings:

	boot=/dev/md0
	root=/dev/md0
	install=/boot/boot-menu.b
	delay=20
	map=/boot/map
	read-only

	image=/vmlinuz
		label=Linux

After altering the SCSI bios setting to boot from /dev/sdb the screen
filled with 01010101010.

However this is not specific to booting on degraded mode. I have the
same problem when both disks are fine. The machine will only boot after
I changed "boot=/dev/sda" and run lilo. I can then revert to
"boot=/dev/md0" and re-run lilo and from there booting will work fine.

> > > When the bios (Dell Poweredge 1500) will try /dev/sda's mbr and fail,
> > > will it then automatically try /dev/sdb?
> 
> It might not even try /dev/sdb if /dev/sda exists...
> But with some recent bioses you can choose which drive you want to boot,
> sou you could just choose your 2nd SCSI drive and it should work.

Yes, I found that out, sorry for the basic questions but I was a bit
stressed changing a disk on a production server and having it not
restart. But all went well in the end.

Thanks again to kernel-raid developers, on another server I replaced a
Mylex card with a kernel raid5 partition and the speed gain is
tremendous!

Cheers,

-- 
ldm@apartia.org 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-05  9:26     ` Louis-David Mitterrand
@ 2002-08-05 12:15       ` Neil Brown
  2002-08-05 12:36         ` Louis-David Mitterrand
  2002-08-05 18:38         ` Maurice Hilarius
  2002-08-05 18:35       ` Maurice Hilarius
  1 sibling, 2 replies; 11+ messages in thread
From: Neil Brown @ 2002-08-05 12:15 UTC (permalink / raw)
  To: Louis-David Mitterrand; +Cc: linux-raid

On Monday August 5, vindex@apartia.org wrote:
> 
> FWIW the disk that is malfunctioning is a 3-month-old Fujitsu 15k 36G
> (MAM3367MP) which is an expensive server-grade disk. The reason I
> selected Fujitsu was because of reported quality problems on IBM disks
> and Fujitsu's good reputation on SCSI (their IDE line is bad however).
> I am looking for informed opinions on these disks and recommendations
> for future purchases. What are the most reliable SCSI disks out there?
> 
> It must be: fast, affordable, reliable, (select any two ;-)

Somehow, I wish you hadn't said that.....

I just recently commissioned a fileserver with 14 Fujitsu MAM3367MC drives.
I wander what the difference between P and C is...

Previously we have used Seagates which have seemed quite reliable.
The only real problems that we have had is with the IBM IDE DeathStars
(oops, I meant DeskStars.  Naughty keyboard).

As far as SCSI drives, I have no significant experiences of
unreliability, and I hope to keep it that way.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-05 12:15       ` Neil Brown
@ 2002-08-05 12:36         ` Louis-David Mitterrand
  2002-08-05 18:38         ` Maurice Hilarius
  1 sibling, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2002-08-05 12:36 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, Aug 05, 2002 at 10:15:54PM +1000, Neil Brown wrote:
> On Monday August 5, vindex@apartia.org wrote:
> > 
> > FWIW the disk that is malfunctioning is a 3-month-old Fujitsu 15k 36G
> > (MAM3367MP) which is an expensive server-grade disk. The reason I
> > selected Fujitsu was because of reported quality problems on IBM disks
> > and Fujitsu's good reputation on SCSI (their IDE line is bad however).
> > I am looking for informed opinions on these disks and recommendations
> > for future purchases. What are the most reliable SCSI disks out there?
> > 
> > It must be: fast, affordable, reliable, (select any two ;-)
> 
> Somehow, I wish you hadn't said that.....

Sorry, but I thoroughly checked the disk after taking it off the server
and a lot of sectors are damaged, reiserfsck quits with a "can't read",
and the spindle makes bad noises.

> I just recently commissioned a fileserver with 14 Fujitsu MAM3367MC drives.
> I wander what the difference between P and C is...

It must be for 68 or 80 pin (ours is a 68).

> Previously we have used Seagates which have seemed quite reliable.
> The only real problems that we have had is with the IBM IDE DeathStars
> (oops, I meant DeskStars.  Naughty keyboard).

Had a sorry experience with these as well, a 75 G.

-- 
ldm@apartia.org 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-05  9:26     ` Louis-David Mitterrand
  2002-08-05 12:15       ` Neil Brown
@ 2002-08-05 18:35       ` Maurice Hilarius
  2002-08-06  8:36         ` Louis-David Mitterrand
  1 sibling, 1 reply; 11+ messages in thread
From: Maurice Hilarius @ 2002-08-05 18:35 UTC (permalink / raw)
  To: Louis-David Mitterrand; +Cc: linux-raid

With regards to your message at 03:26 AM 8/5/02, Louis-David Mitterrand. 
Where you stated:
><<snip>>
>FWIW the disk that is malfunctioning is a 3-month-old Fujitsu 15k 36G
>(MAM3367MP) which is an expensive server-grade disk. The reason I
>selected Fujitsu was because of reported quality problems on IBM disks
>and Fujitsu's good reputation on SCSI (their IDE line is bad however).
>I am looking for informed opinions on these disks and recommendations
>for future purchases. What are the most reliable SCSI disks out there?
>
>It must be: fast, affordable, reliable, (select any two ;-)
The Fujitsu disks have a track record of good reliability.
The IDE disk you mention were discontinued a year ago.
In our experiences Fujitsu disks have as low a failure rate as can be found.
Drives we have used with higher failure rates (SCSI) came from IBM and Seagate.
The best failure rate we have seen is from Hitachi.
In the case of Fujitsu, Hitachi, and Seagate the rates are very close.
Quantum/Maxtor SCSI are a bit worse.
IBM are substantially worse.
This is based on our DOA and in service SCSI disk failures over the past year.


With our best regards,

Maurice W. Hilarius       Telephone: 01-780-456-9771
Hard Data Ltd.               FAX:       01-780-456-9772
11060 - 166 Avenue        mailto:maurice@harddata.com
Edmonton, AB, Canada      http://www.harddata.com/
    T5X 1Y3

2.3TB RAID5 NAS server - dual AthlonMP CPU, Linux, $10,995 CAD / $6850 USD


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-05 12:15       ` Neil Brown
  2002-08-05 12:36         ` Louis-David Mitterrand
@ 2002-08-05 18:38         ` Maurice Hilarius
  1 sibling, 0 replies; 11+ messages in thread
From: Maurice Hilarius @ 2002-08-05 18:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

With regards to your message at 06:15 AM 8/5/02, Neil Brown. Where you stated:
>Somehow, I wish you hadn't said that.....
>
>I just recently commissioned a fileserver with 14 Fujitsu MAM3367MC drives.
>I wander what the difference between P and C is...

SCA versus 68 pin SCSI connector interface.
Otherwise the same drives.

Per:
http://www.fujitsu.ca/products/storage/scsi/al7lx-mcmp.html

"Interface Ultra 160 SCSI (MC:SCA-2 80-pin wide MP:68-pin) "



With our best regards,

Maurice W. Hilarius       Telephone: 01-780-456-9771
Hard Data Ltd.               FAX:       01-780-456-9772
11060 - 166 Avenue        mailto:maurice@harddata.com
Edmonton, AB, Canada      http://www.harddata.com/
    T5X 1Y3

2.3TB RAID5 NAS server - dual AthlonMP CPU, Linux, $10,995 CAD / $6850 USD


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recovering after a /dev/sda failure on raid1
  2002-08-05 18:35       ` Maurice Hilarius
@ 2002-08-06  8:36         ` Louis-David Mitterrand
  0 siblings, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2002-08-06  8:36 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: linux-raid

On Mon, Aug 05, 2002 at 12:35:47PM -0600, Maurice Hilarius wrote:
> The Fujitsu disks have a track record of good reliability.
> The IDE disk you mention were discontinued a year ago.
> In our experiences Fujitsu disks have as low a failure rate as can be found.
> Drives we have used with higher failure rates (SCSI) came from IBM and 
> Seagate.
> The best failure rate we have seen is from Hitachi.
> In the case of Fujitsu, Hitachi, and Seagate the rates are very close.
> Quantum/Maxtor SCSI are a bit worse.
> IBM are substantially worse.
> This is based on our DOA and in service SCSI disk failures over the past 
> year.

Thanks for sharing your experience with these brands. It confirms that
Fujitsu remains a good choice. I am in the process of returning the
failed disk and will ask for a full audit of the failure, which I will
summarize to this list when I get it.

Cheers,

-- 
ldm@apartia.org 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: recovering after a /dev/sda failure on raid1
@ 2002-08-06 19:51 Cress, Andrew R
  0 siblings, 0 replies; 11+ messages in thread
From: Cress, Andrew R @ 2002-08-06 19:51 UTC (permalink / raw)
  To: 'Louis-David Mitterrand'; +Cc: linux-raid


I would agree about Fujitsu, Hitachi, & Seagate SCSI disks being very
reliable.

Do note that the server-class drives should have consistent mode page
settings and firmware levels from the vendor.  Fixes to most early-life disk
problems boil down to changes to either the firmware or the mode page
settings.  Even the IBM problems can be helped significantly by upgrading
firmware and careful mode page settings.  

There are DOS utilities to check/update these, and I have some tools for
Linux to help check or update disk firmware and mode pages, if you are
interested.
http://cvs.carrierlinux.org/viewcvs/viewcvs.cgi/components/scsirastools/src/

Andy Cress

On Mon, Aug 05, 2002 at 12:35:47PM -0600, Maurice Hilarius wrote:
> The Fujitsu disks have a track record of good reliability.
> The IDE disk you mention were discontinued a year ago.
> In our experiences Fujitsu disks have as low a failure rate as can be
found.
> Drives we have used with higher failure rates (SCSI) came from IBM and 
> Seagate.
> The best failure rate we have seen is from Hitachi.
> In the case of Fujitsu, Hitachi, and Seagate the rates are very close.
> Quantum/Maxtor SCSI are a bit worse.
> IBM are substantially worse.
> This is based on our DOA and in service SCSI disk failures over the past 
> year.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-08-06 19:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <message from Louis-David Mitterrand on Monday August 5>
2002-08-01 14:49 ` recovering after a /dev/sda failure on raid1 Louis-David Mitterrand
2002-08-02 21:26   ` Neil Brown
2002-08-02 23:03     ` Danilo Godec
2002-08-05  9:34       ` Louis-David Mitterrand
2002-08-05  9:26     ` Louis-David Mitterrand
2002-08-05 12:15       ` Neil Brown
2002-08-05 12:36         ` Louis-David Mitterrand
2002-08-05 18:38         ` Maurice Hilarius
2002-08-05 18:35       ` Maurice Hilarius
2002-08-06  8:36         ` Louis-David Mitterrand
2002-08-06 19:51 Cress, Andrew R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).