Good news / bad news - The joys of RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Good news / bad news - The joys of RAID
@ 2004-11-19 21:06 Robin Bowes
  2004-11-19 21:28 ` Guy
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Robin Bowes @ 2004-11-19 21:06 UTC (permalink / raw)
  To: linux-raid

The bad news is I lost another disk tonight. Remind me *never* to buy 
Maxtor drives again.

The good news is that my RAID5 array was configured as 5 + 1 spare. I 
powered down the server, used the Maxtor PowerMax utility to identify 
the bad disk, pulled it out and re-booted. My array is currently re-syncing.

[root@dude root]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Fri Nov 19 20:52:58 2004
           State : dirty, resyncing
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 128K

  Rebuild Status : 0% complete

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.1765551

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       50        3      active sync   /dev/sdd2
        4       8       66        4      active sync   /dev/sde2

Thinking about what happened, I would have expected that the bad drive 
would just be removed from the array and spare activated and re-syncing 
started automatically.

What actually happened was that I rebooted to activate a new kernel and 
the box didn't come back up. As the machine runs headless, I had to 
power it off and take it to a monitor/keyboard to check it. In the new 
location it came up fine so I shut it down again and put it back in my 
"server room" (read: cellar). I still couldn't see it from the network 
so I dragged an old 14" CRT out of the shed and connected it up. The 
login prompt was there but there was an "ata2 timeout" error message and 
the console was dead. I power-cycled to reboot and as it booted I saw a 
message something like "postponing resync of md0 as it uses the same 
device as md5. waiting for md5 to resync. I then got a further ata 
timeout error. I had to physically disconnect the bad drive and reboot 
in order to re-start the re-sync.

Further md information:

[root@dude log]# mdadm --detail --scan
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=11caa547:1ba8d185:1f1f771f:d66368c9
    devices=/dev/sdc1
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=be8ad31a:f13b6f4b:c39732fc:c84f32a8
    devices=/dev/sdb1,/dev/sde1
ARRAY /dev/md5 level=raid5 num-devices=5 
UUID=a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
    devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2,/dev/sde2
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=4b28338c:bf08d0bc:bb2899fc:e7f35eae
    devices=/dev/sda1,/dev/sdd1

It was /dev/sdf that failed which contained two partitions, one of them 
part of md2 (now running un-mirrored but still showing two devices) and 
the other part of md5 (now re-syncing but only showing five devices).

Is this normal behaviour?

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-19 21:06 Good news / bad news - The joys of RAID Robin Bowes
@ 2004-11-19 21:28 ` Guy
  2004-11-20 18:42   ` Mark Hahn
  2004-11-19 21:42 ` Good news / bad news - The joys of RAID Guy
  2004-11-19 21:58 ` Gordon Henderson
  2 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-19 21:28 UTC (permalink / raw)
  To: 'Robin Bowes', linux-raid

Reminder....
Never buy Maxtor drives again!

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Robin Bowes
Sent: Friday, November 19, 2004 4:07 PM
To: linux-raid@vger.kernel.org
Subject: Good news / bad news - The joys of RAID

The bad news is I lost another disk tonight. Remind me *never* to buy 
Maxtor drives again.

The good news is that my RAID5 array was configured as 5 + 1 spare. I 
powered down the server, used the Maxtor PowerMax utility to identify 
the bad disk, pulled it out and re-booted. My array is currently re-syncing.

[root@dude root]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Fri Nov 19 20:52:58 2004
           State : dirty, resyncing
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 128K

  Rebuild Status : 0% complete

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.1765551

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       50        3      active sync   /dev/sdd2
        4       8       66        4      active sync   /dev/sde2

Thinking about what happened, I would have expected that the bad drive 
would just be removed from the array and spare activated and re-syncing 
started automatically.

What actually happened was that I rebooted to activate a new kernel and 
the box didn't come back up. As the machine runs headless, I had to 
power it off and take it to a monitor/keyboard to check it. In the new 
location it came up fine so I shut it down again and put it back in my 
"server room" (read: cellar). I still couldn't see it from the network 
so I dragged an old 14" CRT out of the shed and connected it up. The 
login prompt was there but there was an "ata2 timeout" error message and 
the console was dead. I power-cycled to reboot and as it booted I saw a 
message something like "postponing resync of md0 as it uses the same 
device as md5. waiting for md5 to resync. I then got a further ata 
timeout error. I had to physically disconnect the bad drive and reboot 
in order to re-start the re-sync.

Further md information:

[root@dude log]# mdadm --detail --scan
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=11caa547:1ba8d185:1f1f771f:d66368c9
    devices=/dev/sdc1
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=be8ad31a:f13b6f4b:c39732fc:c84f32a8
    devices=/dev/sdb1,/dev/sde1
ARRAY /dev/md5 level=raid5 num-devices=5 
UUID=a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
    devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2,/dev/sde2
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=4b28338c:bf08d0bc:bb2899fc:e7f35eae
    devices=/dev/sda1,/dev/sdd1

It was /dev/sdf that failed which contained two partitions, one of them 
part of md2 (now running un-mirrored but still showing two devices) and 
the other part of md5 (now re-syncing but only showing five devices).

Is this normal behaviour?

R.
-- 
http://robinbowes.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-19 21:28 ` Guy
@ 2004-11-20 18:42   ` Mark Hahn
  2004-11-20 19:37     ` Guy
                       ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Mark Hahn @ 2004-11-20 18:42 UTC (permalink / raw)
  To: linux-raid

> Never buy Maxtor drives again!

you imply that Maxtor drives are somehow inherently flawed.
can you explain why you think millions of people/companies
are naive idiots for continuing to buy Maxtor disks?

this sort of thing is just not plausible: Maxtor competes 
with the other top-tier disk vendors with similar products 
and prices and reliability.  yes, if you buy a 1-year disk,
you can expect it to have been less carefully tested, possibly
be of lower-end design and reliability, and to have been handle
more poorly by the supply chain.  thankfully, you don't have 
to buy 1-year disks any more.

read the specs.  make sure your supply chain knows how to 
handle disks.  make sure your disks are mounted correctly,
both mechanically and with enough airflow.  use raid and 
some form of archiving/backups.  don't get hung up on which 
of the 4-5 top-tier vendors makes your disk.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 18:42   ` Mark Hahn
@ 2004-11-20 19:37     ` Guy
  2004-11-20 20:03       ` Mark Klarzynski
  2004-11-20 23:30       ` Mark Hahn
  2004-11-20 19:40     ` David Greaves
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Guy @ 2004-11-20 19:37 UTC (permalink / raw)
  To: 'Mark Hahn', linux-raid

I have had far more failures of Maxtor drives than any other.  I have also
had problems with WD drives.  I know someone that had 4-6 IBM disks, most of
which have failed.

I am talking about disks with 3 year warranties!  Based on the spec.  But
OEM disks have none.  You must return them to the PC manufacture.
Most of my failures were within 3 years, but beyond the warranty period of
the system.  So the OEM issue has occurred too often.

I have had good luck with Seagate.

I use RAID, it is a must with the failure rate!
I do backup also, but RAID tends to save me.

Most people have a PC with 1 disk.  I don't understand RAID, and they don't
understand that everything will be lost if the disk breaks!  They think
"Dell will just fix it".  But wrong, Dell will just replace it!  Big
difference.

Today's disks claim a MTBF of about 1,000,000 hours!  That's about 114
years.  So, if I had 10 disks I should expect 1 failure every 11.4 years.
That would be so cool!  But not in the real world.

Can you explain how the disks have a MTBF of 1,000,000 hours?  But fail more
often than that?  Maybe I just don't understand some aspect of MTBF.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Saturday, November 20, 2004 1:43 PM
To: linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

> Never buy Maxtor drives again!

you imply that Maxtor drives are somehow inherently flawed.
can you explain why you think millions of people/companies
are naive idiots for continuing to buy Maxtor disks?

this sort of thing is just not plausible: Maxtor competes 
with the other top-tier disk vendors with similar products 
and prices and reliability.  yes, if you buy a 1-year disk,
you can expect it to have been less carefully tested, possibly
be of lower-end design and reliability, and to have been handle
more poorly by the supply chain.  thankfully, you don't have 
to buy 1-year disks any more.

read the specs.  make sure your supply chain knows how to 
handle disks.  make sure your disks are mounted correctly,
both mechanically and with enough airflow.  use raid and 
some form of archiving/backups.  don't get hung up on which 
of the 4-5 top-tier vendors makes your disk.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 19:37     ` Guy
@ 2004-11-20 20:03       ` Mark Klarzynski
  2004-11-20 22:17         ` Mark Hahn
  2004-11-20 23:30       ` Mark Hahn
  1 sibling, 1 reply; 50+ messages in thread
From: Mark Klarzynski @ 2004-11-20 20:03 UTC (permalink / raw)
  To: linux-raid

MTBF is statistic based upon the expected 'use' of the drive and the
replacement of the drive after its end of life (3-5 years)...

It's extremely complex and boring but the figure is only relative if the
drive is being used within an environment that matches those of the
calculations.

SATA / IDE drives have an MTBF similar to that of SCSI / Fibre. But this
is based upon their expected use... i.e. SCSI used to be [power on hours
= 24hr] [use = 8 hours].. whilst SATA used to be [power on = 8 hours]
and [use = 20 mins].

Regardless of what some people clam (usually those that only sell sata
based raids), the drives are not constructed the same in any way.

SATA's fail more within a raid environment (probably around 10:1)
because of the heavy use and also because they are not as intelligent...
therefore when they do not respond we have no way of interrogating them
or resetting them, whilst with scsi we do both. This means that a raid
controller / driver has no option to but simply fail the drive.

Maxtor lead the way in capacity and also reliability... I personal had
to recall countless earlier IBMs and replace them with maxtor.  But the
new generation of IBM's (Hitachi) have got it together.

So - I guess you are all right :) 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 20:03       ` Mark Klarzynski
@ 2004-11-20 22:17         ` Mark Hahn
  2004-11-20 23:09           ` Guy
  2004-12-02 16:47           ` TJ
  0 siblings, 2 replies; 50+ messages in thread
From: Mark Hahn @ 2004-11-20 22:17 UTC (permalink / raw)
  To: linux-raid

> SATA / IDE drives have an MTBF similar to that of SCSI / Fibre. But this
> is based upon their expected use... i.e. SCSI used to be [power on hours
> = 24hr] [use = 8 hours].. whilst SATA used to be [power on = 8 hours]
> and [use = 20 mins].

the vendors I talk to always quote SCSI/FC at 100% power 100% duty,
and PATA/SATA at 100% power 20% duty.

> Regardless of what some people clam (usually those that only sell sata
> based raids), the drives are not constructed the same in any way.

obviously, there *have* been pairs of SCSI/ATA disks which had 
identical mech/analog sections.  but the mech/analog fall into 
just two kinds:

	- optimized for IOPS: 10-15K rpm for minimal rotational 
	latency, narrow recording area for low seek distance,
	quite low bit and track density to avoid long waits for 
	the head to stabilize after a seek.

	- optimized for density/bandwidth: high bit/track density,
	wide recording area, modest seeks/rotation speed.

the first is SCSI/FC and the second ATA, mainly for historic reasons.

> SATA's fail more within a raid environment (probably around 10:1)
> because of the heavy use and also because they are not as intelligent...

what connection are you drawing between raid and "heavy use"?
how does being in a raid increase the IO load per disk?

> therefore when they do not respond we have no way of interrogating them
> or resetting them, whilst with scsi we do both. 

you've never seen a SCSI reset that looks just like an ATA reset?
sorry, but SCSI has no magic.

> This means that a raid
> controller / driver has no option to but simply fail the drive.

no.

> Maxtor lead the way in capacity and also reliability... I personal had
> to recall countless earlier IBMs and replace them with maxtor.  But the

afaikt, the deathstar incident was actually bad firmware 
(didn't correctly flush data when hard powered off, resulting in 
blocks on disk with bogus ECC, which had to be considered bad from
then on, even if the media was perfect.)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 22:17         ` Mark Hahn
@ 2004-11-20 23:09           ` Guy
  2004-12-02 16:47           ` TJ
  1 sibling, 0 replies; 50+ messages in thread
From: Guy @ 2004-11-20 23:09 UTC (permalink / raw)
  To: 'Mark Hahn', linux-raid

You got any links related to this?
"the deathstar incident was actually bad firmware"

Can a user download and update the firmware?

If so, I know someone that may have some bad disks that are not so bad.

If he can repair his disks, I will report the status back on this list.

Previously I thought IBM made very good disks, until my friend had more than
a 75% failure rate.  And within the warranty period.

I personally have an IBM SCSI disk that is running 100% of the time, and the
cooling is real bad.  The drive is much too hot to touch.  Been like that
for 5+ years.  Never had any issues.  The system also has a Seagate that is
too hot to touch, but only been running 3+ years.  Both are 18 Gig.  The
disks are in a system my wife uses!  Don't tell her. :)  I got to fix that
someday.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Saturday, November 20, 2004 5:18 PM
To: linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

> SATA / IDE drives have an MTBF similar to that of SCSI / Fibre. But this
> is based upon their expected use... i.e. SCSI used to be [power on hours
> = 24hr] [use = 8 hours].. whilst SATA used to be [power on = 8 hours]
> and [use = 20 mins].

the vendors I talk to always quote SCSI/FC at 100% power 100% duty,
and PATA/SATA at 100% power 20% duty.

> Regardless of what some people clam (usually those that only sell sata
> based raids), the drives are not constructed the same in any way.

obviously, there *have* been pairs of SCSI/ATA disks which had 
identical mech/analog sections.  but the mech/analog fall into 
just two kinds:

	- optimized for IOPS: 10-15K rpm for minimal rotational 
	latency, narrow recording area for low seek distance,
	quite low bit and track density to avoid long waits for 
	the head to stabilize after a seek.

	- optimized for density/bandwidth: high bit/track density,
	wide recording area, modest seeks/rotation speed.

the first is SCSI/FC and the second ATA, mainly for historic reasons.

> SATA's fail more within a raid environment (probably around 10:1)
> because of the heavy use and also because they are not as intelligent...

what connection are you drawing between raid and "heavy use"?
how does being in a raid increase the IO load per disk?

> therefore when they do not respond we have no way of interrogating them
> or resetting them, whilst with scsi we do both. 

you've never seen a SCSI reset that looks just like an ATA reset?
sorry, but SCSI has no magic.

> This means that a raid
> controller / driver has no option to but simply fail the drive.

no.

> Maxtor lead the way in capacity and also reliability... I personal had
> to recall countless earlier IBMs and replace them with maxtor.  But the

afaikt, the deathstar incident was actually bad firmware 
(didn't correctly flush data when hard powered off, resulting in 
blocks on disk with bogus ECC, which had to be considered bad from
then on, even if the media was perfect.)

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-20 22:17         ` Mark Hahn
  2004-11-20 23:09           ` Guy
@ 2004-12-02 16:47           ` TJ
  2004-12-02 17:29             ` Stephen C Woods
                               ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: TJ @ 2004-12-02 16:47 UTC (permalink / raw)
  To: linux-raid

> afaikt, the deathstar incident was actually bad firmware
> (didn't correctly flush data when hard powered off, resulting in
> blocks on disk with bogus ECC, which had to be considered bad from
> then on, even if the media was perfect.)

I do not think the deathstar incident was due to a firmware problem as you 
describe at all. I had a lot of these drives fail, and I read as much as I 
could find on the subject. The problem was most likely caused by the fact 
that these drives used IBM's new glass substrate technology. This substrate 
had heat expansion issues which caused the heads to misalign on tracks and 
eventually cross write over tracks, corrupting data. The classic "click of 
death" was the sound of the drive searching for a track repetitively. In some 
cases a format would allow the drive to be used again, in many cases it would 
not. It is my belief that formatting was inneffective at fixing the drive 
because the cross writing probably hit some of the low level data, which the 
drive cannot repair on a format.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-02 16:47           ` TJ
@ 2004-12-02 17:29             ` Stephen C Woods
  2004-12-03  3:37             ` Mark Hahn
  2004-12-09  0:17             ` H. Peter Anvin
  2 siblings, 0 replies; 50+ messages in thread
From: Stephen C Woods @ 2004-12-02 17:29 UTC (permalink / raw)
  To: TJ, linux-raid


  Perhaps servo/timing data?  Also I recall some Kennedy Winchester
drives back in the early 80s that if you had a power outage would get
header CRC errors at pairs of blocks that were arranged in a spiral as the head
headed for the landing zone.    I recall writing a standalone program
that would read the entire drive and then 'correct' the CRC errors as it
found them.   Since much of the drive was unused I finally figurered out
that the data was fine it was the header CRC that got clobbered, apparently
there was a bug in the powerdown hardware so it would enable the write head
when it was in the the interblock zone as it was flying to land....

    Ahh for the days of poking into device registers  (in Memory) to get I/O to
happen (from the console).
<scw>


On Thu, Dec 02, 2004 at 11:47:12AM -0500, TJ wrote:
> > afaikt, the deathstar incident was actually bad firmware
> > (didn't correctly flush data when hard powered off, resulting in
> > blocks on disk with bogus ECC, which had to be considered bad from
> > then on, even if the media was perfect.)
> 
> I do not think the deathstar incident was due to a firmware problem as you 
> describe at all. I had a lot of these drives fail, and I read as much as I 
> could find on the subject. The problem was most likely caused by the fact 
> that these drives used IBM's new glass substrate technology. This substrate 
> had heat expansion issues which caused the heads to misalign on tracks and 
> eventually cross write over tracks, corrupting data. The classic "click of 
> death" was the sound of the drive searching for a track repetitively. In some 
> cases a format would allow the drive to be used again, in many cases it would 
> not. It is my belief that formatting was inneffective at fixing the drive 
> because the cross writing probably hit some of the low level data, which the 
> drive cannot repair on a format.

-- 
-----
Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
Unless otherwise noted these statements are my own, Not those of the 
University of California.                      Internet mail:scw@seas.ucla.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-02 16:47           ` TJ
  2004-12-02 17:29             ` Stephen C Woods
@ 2004-12-03  3:37             ` Mark Hahn
  2004-12-03  4:16               ` Guy
  2004-12-09  0:17             ` H. Peter Anvin
  2 siblings, 1 reply; 50+ messages in thread
From: Mark Hahn @ 2004-12-03  3:37 UTC (permalink / raw)
  To: TJ; +Cc: linux-raid

> not. It is my belief that formatting was inneffective at fixing the drive 
> because the cross writing probably hit some of the low level data, which the 
> drive cannot repair on a format.

the ecc *is* the low-level data.  without performing a controlled experiment
that recreates the power-off scenario, there's no way to distinguish a block
whose media is actually bad from one whose ecc fails because the ecc is bad.

the firmware theory is supported by the fact that many deathstars 
performed perfectly well for many years.  I have at least one that lasted
for 4+ years, and was powered off only a few times, and all of those cleanly.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-12-03  3:37             ` Mark Hahn
@ 2004-12-03  4:16               ` Guy
  2004-12-03  4:46                 ` Alvin Oga
  2004-12-03  5:24                 ` Richard Scobie
  0 siblings, 2 replies; 50+ messages in thread
From: Guy @ 2004-12-03  4:16 UTC (permalink / raw)
  To: 'Mark Hahn', 'TJ'; +Cc: linux-raid

The ECC is not the low level data.  The servo tracks are.  I bet there are
start of track/sector header marks also.  I believe a low level format will
not re-write the servo tracks.  Some drives reserve 1 side of 1 platter for
servo data.  Others mix the servo data with user data.  I don't know the
full details, just tidbit I have read over the years.

If your drives were cooled better than most, that may explain why you did
not have the "substrate had heat expansion issues".  Just a guess.

If the problem was a firmware issue, why didn't IBM release a firmware
update?

You said:
"the firmware theory is supported by the fact that many deathstars 
performed perfectly well for many years"

Are you saying some drives had good firmware, while others had bad firmware?
Otherwise, I don't understand your logic, since a drive not failing does not
prove a firmware bug.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Thursday, December 02, 2004 10:37 PM
To: TJ
Cc: linux-raid@vger.kernel.org
Subject: Re: Good news / bad news - The joys of RAID

> not. It is my belief that formatting was inneffective at fixing the drive 
> because the cross writing probably hit some of the low level data, which
the 
> drive cannot repair on a format.

the ecc *is* the low-level data.  without performing a controlled experiment
that recreates the power-off scenario, there's no way to distinguish a block
whose media is actually bad from one whose ecc fails because the ecc is bad.

the firmware theory is supported by the fact that many deathstars 
performed perfectly well for many years.  I have at least one that lasted
for 4+ years, and was powered off only a few times, and all of those
cleanly.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-12-03  4:16               ` Guy
@ 2004-12-03  4:46                 ` Alvin Oga
  2004-12-03  5:24                 ` Richard Scobie
  1 sibling, 0 replies; 50+ messages in thread
From: Alvin Oga @ 2004-12-03  4:46 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid


On Thu, 2 Dec 2004, Guy wrote:

> The ECC is not the low level data.  The servo tracks are.  I bet there are
> start of track/sector header marks also.  I believe a low level format will
> not re-write the servo tracks.  Some drives reserve 1 side of 1 platter for
> servo data.  Others mix the servo data with user data.  I don't know the
> full details, just tidbit I have read over the years.

ecc is in the disk controller with the phaselock loop and other analog
circuit to convert the analog signal from the head back into 1's and 0's
for the ecc code in firmware to correct any obvious head read errors

track/sector info is written to the disk with low level format
	( usually at the manufacturer
	- you can also do lowlevel format with superformat
 
	- it contains sector and track info and other header info
	along with gaps and timing/spacing between each field

	- disks are now soft sectored .. ( no servo info )
	( 512bytes or 1K or 2K or 4K(?) bytes per sector )

	- there is just one "index" mark to indicate one full
	platter rotation

- you can change any/all of the data ... as long as the apps can
  read the data its lower-level drivers did to the disk
	- firmware level is the lowest changes ( on the disk controller )

	- some brave soles put "raid" in firmware .. 
	( risky in my book )

we use mke2fs, mkreiserfs etc to write file system data to make the
platter useful

we use software and other utilities to do more ecc checking on the 
data we expect to get back

- if the system memory is bad .. 
	we overwrite good disk data with bad data from bad memory

- if the disk read/write is bad ...
	we can sometimes compensate for it by keeping the disk
	cooler ( <= 30C for disk temp is good )

	if ecc on the disk controller cannot fix it ..  the disk
	is basically worthless

c ya
alvin


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-03  4:16               ` Guy
  2004-12-03  4:46                 ` Alvin Oga
@ 2004-12-03  5:24                 ` Richard Scobie
  2004-12-03  5:40                   ` Konstantin Olchanski
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Scobie @ 2004-12-03  5:24 UTC (permalink / raw)
  To: linux-raid

Guy wrote:

> If the problem was a firmware issue, why didn't IBM release a firmware
> update?

I believe they did. I recall downloading something similar to this:

http://support.dell.com/support/downloads/format.aspx?releaseid=r37239&c=us&l=en&s=biz&cs=555

at the time, to fix one of my drives.

Regards,

Richard


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-03  5:24                 ` Richard Scobie
@ 2004-12-03  5:40                   ` Konstantin Olchanski
  0 siblings, 0 replies; 50+ messages in thread
From: Konstantin Olchanski @ 2004-12-03  5:40 UTC (permalink / raw)
  To: Richard Scobie; +Cc: linux-raid

On Fri, Dec 03, 2004 at 06:24:13PM +1300, Richard Scobie wrote:
> >If the problem was a firmware issue, why didn't IBM release a firmware
> >update?
> 
> I believe they did. I recall downloading something similar to this:
> http://support.dell.com/support/downloads/format.aspx?releaseid=r37239&c=us&l=en&s=biz&cs=555

The updated IBM firmware helped. Before, every power outage would
produce disks with unreadable sectors. Now, all our IBM disks have
the "new" firmware and they hardly ever develop unreadable sectors.

This makes me suspect that there are *two* unrelated problems:
1) the "scribble at power down" problem, fixed by the firmware update;
2) the "overheated disks lose data due to platter thermal expansion" problem,
   probably unfixable, other than by keeping the disks cool.

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-02 16:47           ` TJ
  2004-12-02 17:29             ` Stephen C Woods
  2004-12-03  3:37             ` Mark Hahn
@ 2004-12-09  0:17             ` H. Peter Anvin
  2 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2004-12-09  0:17 UTC (permalink / raw)
  To: linux-raid

Followup to:  <200412021147.12410.systemloc@earthlink.net>
By author:    TJ <systemloc@earthlink.net>
In newsgroup: linux.dev.raid
> 
> I do not think the deathstar incident was due to a firmware problem as you 
> describe at all. I had a lot of these drives fail, and I read as much as I 
> could find on the subject. The problem was most likely caused by the fact 
> that these drives used IBM's new glass substrate technology. This substrate 
> had heat expansion issues which caused the heads to misalign on tracks and 
> eventually cross write over tracks, corrupting data. The classic "click of 
> death" was the sound of the drive searching for a track repetitively. In some 
> cases a format would allow the drive to be used again, in many cases it would 
> not. It is my belief that formatting was inneffective at fixing the drive 
> because the cross writing probably hit some of the low level data, which the 
> drive cannot repair on a format.
> 

It's also worth noting that there was extremely high correlation
between which factory built the drives and the failure rates.
Apparently some factories had virtually zero instances of this
problem.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 19:37     ` Guy
  2004-11-20 20:03       ` Mark Klarzynski
@ 2004-11-20 23:30       ` Mark Hahn
  1 sibling, 0 replies; 50+ messages in thread
From: Mark Hahn @ 2004-11-20 23:30 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

> Can you explain how the disks have a MTBF of 1,000,000 hours?  But fail more
> often than that?  Maybe I just don't understand some aspect of MTBF.

simple: the MTBF applies to very large sets of disks.  if you had 
millions of disks, you'd expect to average mtbf/ndisks between failures.
with statistically trivial sample sizes (10 disks), you can't 
really say much.  of course, a proper model of the failure rate 
would have a lot more than 1 parameter...

for instance, my organization will be buying about .5 PB
of storage soon.  here are some options:

disk		n	mtbf	hours	$/disk	$K total

250GB SATA	1920	1e6	500	399	766
600GB SATA	800	1e6	1250	600?	480

73GB SCSI/FC	6575	1.3e6	198	389	2558
146GB SCSI/FC	3288	1.3e6	395	600	1973
300GB SCSI/FC	1600	1.3e6	813	1200	1920

these mtbf's are basically made up, since disk vendors aren't really
very helpful in publishing their true reliability distributions.
these disk counts are starting to be big enough to give some meaning
to the hours=mtbf/n calculation - I'd WAG that "hours" is within
a factor of two.  (I looked at only three lines of SCSI disks to 
get 1.3e6 - two quoted 1.2 and the newer was 1.4.)  vendors seem 
to be switching to quoting "annualized failure rates", which are 
probably easier to understand - 1.2e6 MTBF or 0.73% AFR, for instance.
the latter makes it more clear that we're talking about gambling ;)

but the message is clear: for a fixed, large capacity, your main 
concern should be bigger disks.  since our money is also fixed,
you can see that SCSI/FC prices are a big problem (these are 
real list prices from a tier-1 vendor who marks up their SATA
by an embarassing amount...)  further, there's absolutely no chance
we could ever keep .5 PB of disks busy at 100% duty cycle, so that's
not a reason to buy SCSI/FC either...

regards, mark hahn.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-20 18:42   ` Mark Hahn
  2004-11-20 19:37     ` Guy
@ 2004-11-20 19:40     ` David Greaves
  2004-11-21  4:33       ` Guy
  2004-11-21  1:01     ` berk walker
  2004-11-23 19:10     ` H. Peter Anvin
  3 siblings, 1 reply; 50+ messages in thread
From: David Greaves @ 2004-11-20 19:40 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Mark Hahn wrote:

>>Never buy Maxtor drives again!
>>    
>>
>
>you imply that Maxtor drives are somehow inherently flawed.
>can you explain why you think millions of people/companies
>are naive idiots for continuing to buy Maxtor disks?
>
>this sort of thing is just not plausible: Maxtor competes 
>with the other top-tier disk vendors with similar products 
>and prices and reliability.  yes, if you buy a 1-year disk,
>you can expect it to have been less carefully tested, possibly
>be of lower-end design and reliability, and to have been handle
>more poorly by the supply chain.  thankfully, you don't have 
>to buy 1-year disks any more.
>
>read the specs.  make sure your supply chain knows how to 
>handle disks.  make sure your disks are mounted correctly,
>both mechanically and with enough airflow.  use raid and 
>some form of archiving/backups.  don't get hung up on which 
>of the 4-5 top-tier vendors makes your disk.
>
>  
>

Yeah, you're right.
Of course - the fact that 2 of *my* 6 Maxtor 250Gb SATA drives (3 year 
warranty) date stamped at various times in 2004 have failed is 
coincidence and should, of course, be expected with a MTBF of millions 
of hours.

Oh, please note I'm not Robin - that must be a coincidence too :)

Personally I'm waiting for the revelation that they are recycled IBM 
Deskstar 70's ;)

I take your point about supply chain though - anything that's shipped by 
courier is suspect.

David



^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-20 19:40     ` David Greaves
@ 2004-11-21  4:33       ` Guy
  0 siblings, 0 replies; 50+ messages in thread
From: Guy @ 2004-11-21  4:33 UTC (permalink / raw)
  To: 'David Greaves', 'Mark Hahn'; +Cc: linux-raid

You said:
"anything that's shipped by courier is suspect."

Humm, the way the drives are packed you would have a hard time exceeding
300Gs.  Even UPS can't do that I bet.  But I must admit, I have no idea what
force a drive would "feel" in a 4 foot drop.  Remember, the drive is packed
very well!  Also, they only refer to 2 ms.  So, no idea if that is equal to
150 Gs for 4ms.  Or 75 Gs for 8ms.

From a 300G Maxtor drive.

Reliability 
- Shock Tolerance: 60Gs @ 2 ms half-sine pulse (Operating), 300Gs @ 2 ms
half-sine pulse (Non-operating) 
- Data Error Rate: < 1 /10E15 bits read (Non-recoverable) 
- MTBF: 1000000 Hours

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Saturday, November 20, 2004 2:41 PM
To: Mark Hahn
Cc: linux-raid@vger.kernel.org
Subject: Re: Good news / bad news - The joys of RAID

Mark Hahn wrote:

>>Never buy Maxtor drives again!
>>    
>>
>
>you imply that Maxtor drives are somehow inherently flawed.
>can you explain why you think millions of people/companies
>are naive idiots for continuing to buy Maxtor disks?
>
>this sort of thing is just not plausible: Maxtor competes 
>with the other top-tier disk vendors with similar products 
>and prices and reliability.  yes, if you buy a 1-year disk,
>you can expect it to have been less carefully tested, possibly
>be of lower-end design and reliability, and to have been handle
>more poorly by the supply chain.  thankfully, you don't have 
>to buy 1-year disks any more.
>
>read the specs.  make sure your supply chain knows how to 
>handle disks.  make sure your disks are mounted correctly,
>both mechanically and with enough airflow.  use raid and 
>some form of archiving/backups.  don't get hung up on which 
>of the 4-5 top-tier vendors makes your disk.
>
>  
>

Yeah, you're right.
Of course - the fact that 2 of *my* 6 Maxtor 250Gb SATA drives (3 year 
warranty) date stamped at various times in 2004 have failed is 
coincidence and should, of course, be expected with a MTBF of millions 
of hours.

Oh, please note I'm not Robin - that must be a coincidence too :)

Personally I'm waiting for the revelation that they are recycled IBM 
Deskstar 70's ;)

I take your point about supply chain though - anything that's shipped by 
courier is suspect.

David


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-20 18:42   ` Mark Hahn
  2004-11-20 19:37     ` Guy
  2004-11-20 19:40     ` David Greaves
@ 2004-11-21  1:01     ` berk walker
  2004-11-23 19:10     ` H. Peter Anvin
  3 siblings, 0 replies; 50+ messages in thread
From: berk walker @ 2004-11-21  1:01 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

ALL of the Maxtor junk that I have sitting next to me were in factory 
packaging, and not likely to have been affected by either physical or 
electrical shock.

HE might have implied, I am saying it!  Why ask someone as you did in 
sentence #2?  Ask them - or yourself.

Of course, he probably missed the warranty statement to not run Linux.

Mark Hahn wrote:

>>Never buy Maxtor drives again!
>>    
>>
>
>you imply that Maxtor drives are somehow inherently flawed.
>can you explain why you think millions of people/companies
>are naive idiots for continuing to buy Maxtor disks?
>
>this sort of thing is just not plausible: Maxtor competes 
>with the other top-tier disk vendors with similar products 
>and prices and reliability.  yes, if you buy a 1-year disk,
>you can expect it to have been less carefully tested, possibly
>be of lower-end design and reliability, and to have been handle
>more poorly by the supply chain.  thankfully, you don't have 
>to buy 1-year disks any more.
>
>read the specs.  make sure your supply chain knows how to 
>handle disks.  make sure your disks are mounted correctly,
>both mechanically and with enough airflow.  use raid and 
>some form of archiving/backups.  don't get hung up on which 
>of the 4-5 top-tier vendors makes your disk.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-20 18:42   ` Mark Hahn
                       ` (2 preceding siblings ...)
  2004-11-21  1:01     ` berk walker
@ 2004-11-23 19:10     ` H. Peter Anvin
  2004-11-23 20:03       ` Guy
  3 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2004-11-23 19:10 UTC (permalink / raw)
  To: linux-raid

Followup to:  <Pine.LNX.4.44.0411201238320.19120-100000@coffee.psychology.mcmaster.ca>
By author:    Mark Hahn <hahn@physics.mcmaster.ca>
In newsgroup: linux.dev.raid
>
> > Never buy Maxtor drives again!
> 
> you imply that Maxtor drives are somehow inherently flawed.
> can you explain why you think millions of people/companies
> are naive idiots for continuing to buy Maxtor disks?
> 
> this sort of thing is just not plausible: Maxtor competes 
> with the other top-tier disk vendors with similar products 
> and prices and reliability.
> 

In my experience, that is bullshit.  Maxtor competes on price using
inferior products.  I bought two Maxtor drives, both of them failed
within 13 months.  That was my first attempt at trying Maxtor again
after taking them off my sh*tlist from last time.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-23 19:10     ` H. Peter Anvin
@ 2004-11-23 20:03       ` Guy
  2004-11-23 21:18         ` Mark Hahn
  0 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-23 20:03 UTC (permalink / raw)
  To: 'H. Peter Anvin', linux-raid

When will you learn?  :)

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of H. Peter Anvin
Sent: Tuesday, November 23, 2004 2:11 PM
To: linux-raid@vger.kernel.org
Subject: Re: Good news / bad news - The joys of RAID

Followup to:
<Pine.LNX.4.44.0411201238320.19120-100000@coffee.psychology.mcmaster.ca>
By author:    Mark Hahn <hahn@physics.mcmaster.ca>
In newsgroup: linux.dev.raid
>
> > Never buy Maxtor drives again!
> 
> you imply that Maxtor drives are somehow inherently flawed.
> can you explain why you think millions of people/companies
> are naive idiots for continuing to buy Maxtor disks?
> 
> this sort of thing is just not plausible: Maxtor competes 
> with the other top-tier disk vendors with similar products 
> and prices and reliability.
> 

In my experience, that is bullshit.  Maxtor competes on price using
inferior products.  I bought two Maxtor drives, both of them failed
within 13 months.  That was my first attempt at trying Maxtor again
after taking them off my sh*tlist from last time.

	-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-23 20:03       ` Guy
@ 2004-11-23 21:18         ` Mark Hahn
  2004-11-23 23:02           ` Robin Bowes
  2004-11-24  1:45           ` berk walker
  0 siblings, 2 replies; 50+ messages in thread
From: Mark Hahn @ 2004-11-23 21:18 UTC (permalink / raw)
  To: Guy; +Cc: 'H. Peter Anvin', linux-raid

> When will you learn?  :)

exactly - you can conclude absolutely nothing from two samples.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-23 21:18         ` Mark Hahn
@ 2004-11-23 23:02           ` Robin Bowes
  2004-11-24  0:33             ` Guy
  2004-11-24  1:45           ` berk walker
  1 sibling, 1 reply; 50+ messages in thread
From: Robin Bowes @ 2004-11-23 23:02 UTC (permalink / raw)
  To: linux-raid

Mark Hahn wrote:
>>When will you learn?  :)
> 
> 
> exactly - you can conclude absolutely nothing from two samples.
> 

I read that mail as "I stopped buying Maxtor (for whatever reason) then 
tried them again and had an 100% failure rate (albeit with a small 
sample size) so have stopped buying them again" rather than "I bought 
two Maxtor drives that failed so Maxtor drives are shit".

My own personal experience (I'm the OP in this thread) is that the 250GB 
SATA Maxtor Maxline II drives I have purchased have an unacceptable 
failure rate (something like 40% in 5 months)

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-23 23:02           ` Robin Bowes
@ 2004-11-24  0:33             ` Guy
  0 siblings, 0 replies; 50+ messages in thread
From: Guy @ 2004-11-24  0:33 UTC (permalink / raw)
  To: 'Robin Bowes', linux-raid

I understood!  I was poking fun that you tried them again, and again lost!
I hope you understood me.  "When will you learn? :)"

Also, I thought of this about 4 years ago.  Describes many managers!
"Sure you saved money, but at what cost?" - Guy Watkins

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Robin Bowes
Sent: Tuesday, November 23, 2004 6:03 PM
To: linux-raid@vger.kernel.org
Subject: Re: Good news / bad news - The joys of RAID

Mark Hahn wrote:
>>When will you learn?  :)
> 
> 
> exactly - you can conclude absolutely nothing from two samples.
> 

I read that mail as "I stopped buying Maxtor (for whatever reason) then 
tried them again and had an 100% failure rate (albeit with a small 
sample size) so have stopped buying them again" rather than "I bought 
two Maxtor drives that failed so Maxtor drives are shit".

My own personal experience (I'm the OP in this thread) is that the 250GB 
SATA Maxtor Maxline II drives I have purchased have an unacceptable 
failure rate (something like 40% in 5 months)

R.
-- 
http://robinbowes.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-23 21:18         ` Mark Hahn
  2004-11-23 23:02           ` Robin Bowes
@ 2004-11-24  1:45           ` berk walker
  2004-11-24  2:00             ` H. Peter Anvin
  1 sibling, 1 reply; 50+ messages in thread
From: berk walker @ 2004-11-24  1:45 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Guy, 'H. Peter Anvin', linux-raid

I think I have 4 1/2 out of 6.  Better?

Mark Hahn wrote:

>>When will you learn?  :)
>>    
>>
>
>exactly - you can conclude absolutely nothing from two samples.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-24  1:45           ` berk walker
@ 2004-11-24  2:00             ` H. Peter Anvin
  2004-11-24  8:01               ` Good news / bad news - The joys of hardware Guy
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2004-11-24  2:00 UTC (permalink / raw)
  To: berk walker; +Cc: Mark Hahn, Guy, linux-raid

berk walker wrote:
> I think I have 4 1/2 out of 6.  Better?
> 
> Mark Hahn wrote:
> 
>>> When will you learn?  :)
>>>   
>>
>>
>> exactly - you can conclude absolutely nothing from two samples.
>>

Actually, you can.  Having two fail in short order should be an extremely rare 
event.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Good news / bad news - The joys of hardware
  2004-11-24  2:00             ` H. Peter Anvin
@ 2004-11-24  8:01               ` Guy
  2004-11-24  8:57                 ` Robin Bowes
  0 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-24  8:01 UTC (permalink / raw)
  Cc: linux-raid

About 2 years ago I had a disk fail, not 100%, but intermittent problems.
So I replaced it.  The replacement started acting up about 6-12 months ago.
Read errors about every 1-2 months, finally it went off-line.  But,
intermittently.  I did think it was odd that the drive in the same position
was failing, and with similar problems, but figured it was just a
quincidence.  Today I replaced it, after replacing it, I had some problems.
It is in a case with 6 other disks, so I could tell by the LEDs that the
replacement drive was acting wrong, intermittently.  I determined that the
Molex power plug going to the drive was causing the problems.  What a pain!
So, the 2 drives that I replaced may have been good.  The first drive I took
apart.  I have the magnets to prove it!  But it may have been a good drive!

To make a long story short, check the cables for failures, including the
power cables.

The drives are Seagate, and I have at least 26 in service, so 2 failures out
of 26 in 3 years is not so bad.  However, if the Molex connector was at
fault, then 0 failures out of 26 in 3 years, is just fine.

The drive is model ST118282LC, MTBF 1,000,000.  I think with 26 drives I
should have 1 failure in about 4.4 years.  The drives have a 5 year
warranty, but they are OEM, so I get nothing.  I am not the first owner, but
they were unused.  And I bet they are about 5 years old now.

Too much info?  Sorry.  Maybe I need a blog?  :)

Can anyone spell "quincidence"?

Guy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of hardware
  2004-11-24  8:01               ` Good news / bad news - The joys of hardware Guy
@ 2004-11-24  8:57                 ` Robin Bowes
  0 siblings, 0 replies; 50+ messages in thread
From: Robin Bowes @ 2004-11-24  8:57 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Guy wrote:
> Can anyone spell "quincidence"?

http://dictionary.reference.com/search?q=coincidence

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-19 21:06 Good news / bad news - The joys of RAID Robin Bowes
  2004-11-19 21:28 ` Guy
@ 2004-11-19 21:42 ` Guy
  2004-11-28 13:15   ` Robin Bowes
  2004-11-19 21:58 ` Gordon Henderson
  2 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-19 21:42 UTC (permalink / raw)
  To: 'Robin Bowes', linux-raid

The re-sync to the spare should have been automatic, without a re-boot.
Your errors related to ata timeout is not a Linux issue.  My guess is the
bios could see the drive, but the drive was not responding correctly.  I
think this is life with ata.  I have had similar problems with SCSI.  1
drive failed in a way that it caused problems with other drives on the same
SCSI bus.

It could be that your array was re-building, but did not finish.  In that
case it would start over from the beginning.  Which may look like it did not
attempt to re-build until the re-boot.  Did you check the status before you
shut it down?  I use mdadm's monitor mode to send me email when events
occur.  By the time I read my emails, a drive has failed and the re-sync to
the spare is done.  No need to check logs.

Yes, it is normal that md will not re-sync 2 arrays that share a common
device.  One will be delayed until the other finishes.

Second reminder....
Never buy Maxtor drives again!

This quote seems to fit real well!
"Sure you saved money, but at what cost?" - Guy Watkins

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Robin Bowes
Sent: Friday, November 19, 2004 4:07 PM
To: linux-raid@vger.kernel.org
Subject: Good news / bad news - The joys of RAID

The bad news is I lost another disk tonight. Remind me *never* to buy 
Maxtor drives again.

The good news is that my RAID5 array was configured as 5 + 1 spare. I 
powered down the server, used the Maxtor PowerMax utility to identify 
the bad disk, pulled it out and re-booted. My array is currently re-syncing.

[root@dude root]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Fri Nov 19 20:52:58 2004
           State : dirty, resyncing
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 128K

  Rebuild Status : 0% complete

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.1765551

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       50        3      active sync   /dev/sdd2
        4       8       66        4      active sync   /dev/sde2

Thinking about what happened, I would have expected that the bad drive 
would just be removed from the array and spare activated and re-syncing 
started automatically.

What actually happened was that I rebooted to activate a new kernel and 
the box didn't come back up. As the machine runs headless, I had to 
power it off and take it to a monitor/keyboard to check it. In the new 
location it came up fine so I shut it down again and put it back in my 
"server room" (read: cellar). I still couldn't see it from the network 
so I dragged an old 14" CRT out of the shed and connected it up. The 
login prompt was there but there was an "ata2 timeout" error message and 
the console was dead. I power-cycled to reboot and as it booted I saw a 
message something like "postponing resync of md0 as it uses the same 
device as md5. waiting for md5 to resync. I then got a further ata 
timeout error. I had to physically disconnect the bad drive and reboot 
in order to re-start the re-sync.

Further md information:

[root@dude log]# mdadm --detail --scan
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=11caa547:1ba8d185:1f1f771f:d66368c9
    devices=/dev/sdc1
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=be8ad31a:f13b6f4b:c39732fc:c84f32a8
    devices=/dev/sdb1,/dev/sde1
ARRAY /dev/md5 level=raid5 num-devices=5 
UUID=a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
    devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2,/dev/sde2
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=4b28338c:bf08d0bc:bb2899fc:e7f35eae
    devices=/dev/sda1,/dev/sdd1

It was /dev/sdf that failed which contained two partitions, one of them 
part of md2 (now running un-mirrored but still showing two devices) and 
the other part of md5 (now re-syncing but only showing five devices).

Is this normal behaviour?

R.
-- 
http://robinbowes.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-19 21:42 ` Good news / bad news - The joys of RAID Guy
@ 2004-11-28 13:15   ` Robin Bowes
  2004-11-30  2:05     ` Neil Brown
  0 siblings, 1 reply; 50+ messages in thread
From: Robin Bowes @ 2004-11-28 13:15 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Guy wrote:
> I use mdadm's monitor mode to send me email when events occur.

Guy,

I've been meaning to write this for a while...

I tried monitoring once but had a problem when shutting down as the 
arrays were reported as "busy" because mdadm --monitor was running on 
them. I guess it needs to be killed earlier in the shutdown process.

So, can you share with me how you start/stop mdadm to run in monitor mode?

Thanks,

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-28 13:15   ` Robin Bowes
@ 2004-11-30  2:05     ` Neil Brown
  2004-12-01  3:34       ` Doug Ledford
  0 siblings, 1 reply; 50+ messages in thread
From: Neil Brown @ 2004-11-30  2:05 UTC (permalink / raw)
  To: Robin Bowes; +Cc: Guy, linux-raid

On Sunday November 28, robin-lists@robinbowes.com wrote:
> Guy wrote:
> > I use mdadm's monitor mode to send me email when events occur.
> 
> Guy,
> 
> I've been meaning to write this for a while...
> 
> I tried monitoring once but had a problem when shutting down as the 
> arrays were reported as "busy" because mdadm --monitor was running on 
> them. I guess it needs to be killed earlier in the shutdown process.

That bug was fixed in mdadm 1.6.0

NeilBrown


 From the ChangeLog:
Changes Prior to 1.6.0 release
...
    -   Fix bug in --monitor where an array could be held open and so
	could not be stopped without killing mdadm.
...


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-30  2:05     ` Neil Brown
@ 2004-12-01  3:34       ` Doug Ledford
  2004-12-01 11:50         ` Robin Bowes
  0 siblings, 1 reply; 50+ messages in thread
From: Doug Ledford @ 2004-12-01  3:34 UTC (permalink / raw)
  To: Neil Brown; +Cc: Robin Bowes, Guy, linux-raid

On Tue, 2004-11-30 at 13:05 +1100, Neil Brown wrote:
> On Sunday November 28, robin-lists@robinbowes.com wrote:
> > Guy wrote:
> > > I use mdadm's monitor mode to send me email when events occur.
> > 
> > Guy,
> > 
> > I've been meaning to write this for a while...
> > 
> > I tried monitoring once but had a problem when shutting down as the 
> > arrays were reported as "busy" because mdadm --monitor was running on 
> > them. I guess it needs to be killed earlier in the shutdown process.
> 
> That bug was fixed in mdadm 1.6.0
> 
> NeilBrown
> 
> 
>  From the ChangeLog:
> Changes Prior to 1.6.0 release
> ...
>     -   Fix bug in --monitor where an array could be held open and so
> 	could not be stopped without killing mdadm.
> ...

If I recall correctly, this fixes the primary symptom, but not the whole
problem.  When in --monitor mode, mdadm will reopen each device every 15
seconds to scan its status.  As such, a shutdown could still fail if
mdadm is still running and the timing is right.  In that instance,
retrying the shutdown on failure would likely be enough to solve the
problem, but that sounds icky to me.  Would be much better if mdadm
could open a control device of some sort and query about running arrays
instead of opening the arrays themselves.

-- 
  Doug Ledford <dledford@redhat.com>
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-12-01  3:34       ` Doug Ledford
@ 2004-12-01 11:50         ` Robin Bowes
  0 siblings, 0 replies; 50+ messages in thread
From: Robin Bowes @ 2004-12-01 11:50 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Neil Brown, Guy, linux-raid

Doug Ledford wrote:
> 
> If I recall correctly, this fixes the primary symptom, but not the whole
> problem.  When in --monitor mode, mdadm will reopen each device every 15
> seconds to scan its status.  As such, a shutdown could still fail if
> mdadm is still running and the timing is right.  In that instance,
> retrying the shutdown on failure would likely be enough to solve the
> problem, but that sounds icky to me.  Would be much better if mdadm
> could open a control device of some sort and query about running arrays
> instead of opening the arrays themselves.

Wouldn't simply killing the "mdadm --monitor" process early on in the 
shutdown process achieve the same result?

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-19 21:06 Good news / bad news - The joys of RAID Robin Bowes
  2004-11-19 21:28 ` Guy
  2004-11-19 21:42 ` Good news / bad news - The joys of RAID Guy
@ 2004-11-19 21:58 ` Gordon Henderson
  2 siblings, 0 replies; 50+ messages in thread
From: Gordon Henderson @ 2004-11-19 21:58 UTC (permalink / raw)
  To: Robin Bowes; +Cc: linux-raid

On Fri, 19 Nov 2004, Robin Bowes wrote:

> What actually happened was that I rebooted to activate a new kernel and
> the box didn't come back up. As the machine runs headless, I had to
> power it off and take it to a monitor/keyboard to check it.

Not directly related to your RAID issue, but I've been running headless
servers with console on serial ports as of late. LILO has an option to put
output on a serial line, and there's a kernel compile flag and an append
instruction to make it all work.

That combined with a power cycler makes me feel more at ease about the
remote servers I run.

Just don't connect 2 PCs back to back and run a getty on each serial
line...

Gordon

^ permalink raw reply	[flat|nested] 50+ messages in thread

[parent not found: <037401c4cf3b$ee75bc90$030a0a0a@musicroom>]

* RE: Good news / bad news - The joys of RAID
       [not found] <037401c4cf3b$ee75bc90$030a0a0a@musicroom>
@ 2004-11-21  4:33 ` Guy
  2004-11-22 14:13   ` Yu Chen
  0 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-21  4:33 UTC (permalink / raw)
  To: 'Mark Klarzynski', linux-raid

Humm, the Maxtor spec I am looking at does not limit the duty cycle.  It
makes no reference at all.  I think it is reasonable to assume 24 hours per
day, unless they claim less.

The drive should fail on average of once per 114 years, but end of life is
3-5 years?

I did find this on the Maxtor web site:
No MTBF, but ARR of <1%.  I think they are saying if I had 100 drives less
than 1 failure per year.  That is a MTFB of more than 100 years.

Design life (min) 5 years.  So, the disk should last al least 5 years.  I
have no problem with this.  If this is running time, not time powered off.

No limits on duty cycle listed, so got to assume 24/7.

So, if I had 100 disks that lasted at least 5 years with less than 1 failure
per year...  I would be happy.  After all, in 5 years I could replace the
100 drives with 6 new drives with the same total capacity.  This is based on
drive size doubling every 1.5 years.  Of course my requirements double every
year! :)

http://maxtor.com/_files/maxtor/en_us/documentation/data_sheets/diamondmax_1
0_data_sheet.pdf

Now if someone made an affordable tape drive and tapes that could backup
200G per tape, that would be cool!

Guy

-----Original Message-----
From: Mark Klarzynski [mailto:mark.k@computer-design.co.uk] 
Sent: Saturday, November 20, 2004 3:03 PM
To: 'Guy'
Subject: RE: Good news / bad news - The joys of RAID

MTBF is statistic based upon the expected 'use' of the drive and the
replacement of the drive after its end of life (3-5 years)...

It's extremely complex and boring but the figure is only relative if the
drive is being used within an environment that matches those of the
calculations.

SATA / IDE drives have an MTBF similar to that of SCSI / Fibre. But this
is based upon their expected use... i.e. SCSI used to be [power on hours
= 24hr] [use = 8 hours].. whilst SATA used to be [power on = 8 hours]
and [use = 20 mins].

Regardless of what some people clam (usually those that only sell sata
based raids), the drives are not constructed the same in any way.

SATA's fail more within a raid environment (probably around 10:1)
because of the heavy use and also because they are not as intelligent...
therefore when they do not respond we have no way of interrogating them
or resetting them, whilst with scsi we do both. This means that a raid
controller / driver has no option to but simply fail the drive.

Maxtor lead the way in capacity and also reliability... I personal had
to recall countless earlier IBMs and replace them with maxtor.  But the
new generation of IBM's (Hitachi) have got it together.

So - I guess you are all right :) 

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Guy
Sent: 20 November 2004 19:38
To: 'Mark Hahn'; linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

I have had far more failures of Maxtor drives than any other.  I have
also
had problems with WD drives.  I know someone that had 4-6 IBM disks,
most of
which have failed.

I am talking about disks with 3 year warranties!  Based on the spec.
But
OEM disks have none.  You must return them to the PC manufacture.
Most of my failures were within 3 years, but beyond the warranty period
of
the system.  So the OEM issue has occurred too often.

I have had good luck with Seagate.

I use RAID, it is a must with the failure rate!
I do backup also, but RAID tends to save me.

Most people have a PC with 1 disk.  I don't understand RAID, and they
don't
understand that everything will be lost if the disk breaks!  They think
"Dell will just fix it".  But wrong, Dell will just replace it!  Big
difference.

Today's disks claim a MTBF of about 1,000,000 hours!  That's about 114
years.  So, if I had 10 disks I should expect 1 failure every 11.4
years.
That would be so cool!  But not in the real world.

Can you explain how the disks have a MTBF of 1,000,000 hours?  But fail
more
often than that?  Maybe I just don't understand some aspect of MTBF.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn
Sent: Saturday, November 20, 2004 1:43 PM
To: linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

> Never buy Maxtor drives again!

you imply that Maxtor drives are somehow inherently flawed.
can you explain why you think millions of people/companies
are naive idiots for continuing to buy Maxtor disks?

this sort of thing is just not plausible: Maxtor competes
with the other top-tier disk vendors with similar products
and prices and reliability.  yes, if you buy a 1-year disk,
you can expect it to have been less carefully tested, possibly
be of lower-end design and reliability, and to have been handle
more poorly by the supply chain.  thankfully, you don't have
to buy 1-year disks any more.

read the specs.  make sure your supply chain knows how to
handle disks.  make sure your disks are mounted correctly,
both mechanically and with enough airflow.  use raid and
some form of archiving/backups.  don't get hung up on which
of the 4-5 top-tier vendors makes your disk.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-21  4:33 ` Guy
@ 2004-11-22 14:13   ` Yu Chen
  2004-11-22 14:34     ` Gordon Henderson
  2004-11-23  0:17     ` berk walker
  0 siblings, 2 replies; 50+ messages in thread
From: Yu Chen @ 2004-11-22 14:13 UTC (permalink / raw)
  To: Guy; +Cc: 'Mark Klarzynski', linux-raid

> Now if someone made an affordable tape drive and tapes that could backup
> 200G per tape, that would be cool!
>

You don't know? they have that already, AIT-4, LTO as I know.


===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone: 	(410)455-6347 (primary)
 	(410)455-2718 (secondary)
fax: 	(410)455-1174
email: 	chen@hhmi.umbc.edu
===========================================


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-22 14:13   ` Yu Chen
@ 2004-11-22 14:34     ` Gordon Henderson
  2004-11-22 17:51       ` Guy
  2004-11-23  0:17     ` berk walker
  1 sibling, 1 reply; 50+ messages in thread
From: Gordon Henderson @ 2004-11-22 14:34 UTC (permalink / raw)
  To: linux-raid

On Mon, 22 Nov 2004, Yu Chen wrote:

> > Now if someone made an affordable tape drive and tapes that could backup
> > 200G per tape, that would be cool!
>
> You don't know? they have that already, AIT-4, LTO as I know.

I think the key-word here was "affordable"

I use DLT drives, which I think go up to 220GB native right now, ('m only
currently using 160GB native drives), but right now the cost of media at
about £60 each is about the same as a 160GB IDE drive.. Easier to manage
though, and the cost of the tape drive is still round about £3500.

But how valuable is your data? (As I keep telling my clients!!!)

I've tried to build servers that have a max. capacity of 200GB per
partition, but I have clients chomping at the bit for bigger partitions,
thn it becomes a PITA to backup to tape.

I don't think the requirement for tape backup is going to go away in the
near future, anyway. I just wish tape technology would keep up with disk
technology.

RAID is great, but it's not for archive and backup.

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-22 14:34     ` Gordon Henderson
@ 2004-11-22 17:51       ` Guy
  2004-11-22 23:26         ` Gordon Henderson
  0 siblings, 1 reply; 50+ messages in thread
From: Guy @ 2004-11-22 17:51 UTC (permalink / raw)
  To: 'Gordon Henderson', linux-raid

Yes, I was going for affordable!  A tape drive with native capacity of 160
Gig costs over $2600 US (SDLT).  And tapes cost $89 each.  You need to do a
lot of backups before tapes cost less than an IDE disk.  An IDE disk is so
much faster too.

The best price I could find for a 160Gig ultra 100 was $107 Hitachi
A Hitachi 160 Gig SATA disk is $113.

SDLT tapes cost $89 each (10 for $890)

I am sure you could get a quantity discount on tapes, but disk drives too.

Now we just need to be able to hot plug ultra 100 disk drives.
SATA hardware supports hot plug, but I read Linux does not support that yet.

I do want to be able to remove my backup and put it in the shelf.  A
business should have 2 copies where one goes off site.  I did have a power
supply fail in a way that it fried everything in the box.  I think line
voltage was send directly to the 12V or 5V line.  DVD drive, disk drive,
motherboard, RAM, video card, ... all gone.   So if my backups were on-line
with the same power supply as the main disk(s), all would have been lost.

Some people seem to think tape is better than disk.  Somehow since there is
no filesystem, so you can't delete a file by mistake.  So, fine, just use
the disk drive the same way.  Use cpio and output to /dev/hda or similar.
The only thing tapes have that is better than disk drives is the eof and eot
marks.  I can put 10-20 daily backups on the same tape and let the hardware
track the position of each backup.  With disk, you would need to count the
blocks used, and track the start and length of each.  Or you could use a
file system, but like I said, some people seem to think that has too much
risk.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Gordon Henderson
Sent: Monday, November 22, 2004 9:35 AM
To: linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

On Mon, 22 Nov 2004, Yu Chen wrote:

> > Now if someone made an affordable tape drive and tapes that could backup
> > 200G per tape, that would be cool!
>
> You don't know? they have that already, AIT-4, LTO as I know.

I think the key-word here was "affordable"

I use DLT drives, which I think go up to 220GB native right now, ('m only
currently using 160GB native drives), but right now the cost of media at
about £60 each is about the same as a 160GB IDE drive.. Easier to manage
though, and the cost of the tape drive is still round about £3500.

But how valuable is your data? (As I keep telling my clients!!!)

I've tried to build servers that have a max. capacity of 200GB per
partition, but I have clients chomping at the bit for bigger partitions,
thn it becomes a PITA to backup to tape.

I don't think the requirement for tape backup is going to go away in the
near future, anyway. I just wish tape technology would keep up with disk
technology.

RAID is great, but it's not for archive and backup.

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-22 17:51       ` Guy
@ 2004-11-22 23:26         ` Gordon Henderson
  2004-11-22 23:48           ` Guy
  0 siblings, 1 reply; 50+ messages in thread
From: Gordon Henderson @ 2004-11-22 23:26 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

On Mon, 22 Nov 2004, Guy wrote:

> Yes, I was going for affordable!  A tape drive with native capacity of 160
> Gig costs over $2600 US (SDLT).  And tapes cost $89 each.  You need to do a
> lot of backups before tapes cost less than an IDE disk.  An IDE disk is so
> much faster too.

True (on the speed side) Although right now it's only just over 2 hours to
dump ~200GB on one of the servers I look after.

I can see a time where the only real solution is a combined disk/tape
system - right now, I'm taking a snapshot overnight off some servers, then
backing up from that - that at least gives the punters a "yesterday"
snapshot which is great for those "accidental" deletions where getting
stuff off tape might take 4-5 hours. Using rsync, or LVM, you can even
make multiple days of snapshots. (Although I'm not sure about LVM even
now, after having some problems with it causing crashes, and very slow
performance after snapshots had been taken, maybe it's time to look at it
again though)

> The best price I could find for a 160Gig ultra 100 was $107 Hitachi
> A Hitachi 160 Gig SATA disk is $113.
>
> SDLT tapes cost $89 each (10 for $890)
>
> I am sure you could get a quantity discount on tapes, but disk drives too.
>
> Now we just need to be able to hot plug ultra 100 disk drives.
> SATA hardware supports hot plug, but I read Linux does not support that yet.

I've had good results with SCSI hot pluggability and with a FireWire drive
where the underlying hardware uses the SCSI stack, also with USB mass
storage devices which look like SCSI drives (eg. my digital camera!)
So-far I've just used a little script to do the echo "scsi-hot-add 0 0 1
0" > /proc/scsi, etc. then mount /dev/sda1 and so on. I'm hoping that SATA
using the SCSI stack will be able to do this too, but I'm hearing
mutterings about problems with the device numbers, but so-far I've not had
any problems myself... So in that respect, going SCSI, or things that look
like SCSI drives might be the way to go...

> I do want to be able to remove my backup and put it in the shelf.  A
> business should have 2 copies where one goes off site.  I did have a power
> supply fail in a way that it fried everything in the box.  I think line
> voltage was send directly to the 12V or 5V line.  DVD drive, disk drive,
> motherboard, RAM, video card, ... all gone.   So if my backups were on-line
> with the same power supply as the main disk(s), all would have been lost.

Ouch. I've not had anythng this bad, (yet?) Different businesses have
different ideas about backup and archive (and there are legal implications
too for some companies)

One of my clients is a small web design house - their in-house server gets
backed up to a firewire drive ("lacie" I think the brand is) once a week,
as well as a daily snapshot on-line, and is remote backed up over the net
to one of my servers, they have 2 other servers for their client web sites
which I manage and I back these up to each other overnight - not perfect,
but usable, and as these are 200 miles away from me, I need these to be as
reliable as possible within the money restaints put upon me by my client
(mutter)

> Some people seem to think tape is better than disk.  Somehow since there is
> no filesystem, so you can't delete a file by mistake.  So, fine, just use
> the disk drive the same way.  Use cpio and output to /dev/hda or similar.

I actually use 'dump' to a file on their removable firewire drive which is
formatted ext2 - they have a 120GB drive and only 20GB of live data, so
plenty of room for multiple backups - all on the same drive... I'm going
to set them up with 'amanda' soon to try to automate it. I've used amanda
for many years no - PITA to setup, but once going, it's very good (with
tapes, anyway - I'm not actually sure I'll be able to get it to backup to
individual files on the single drive)

> The only thing tapes have that is better than disk drives is the eof and eot
> marks.  I can put 10-20 daily backups on the same tape and let the hardware
> track the position of each backup.  With disk, you would need to count the
> blocks used, and track the start and length of each.  Or you could use a
> file system, but like I said, some people seem to think that has too much
> risk.

I haven't found anything that beats tapes for ease of handling (physical
stacking and storage in nice boxes) and archiving. I have DLT tapes that
are 5 years old now that still read - the real problem with archiving is a
good management system, as well as realising the fact that nothing lasts
forever, so at some point you have to take those old tapes, read them back
onto disk and re-write them using the current technology, and hope the
current technology will still be about in 5 years time when you do it
again... (The good side is that densities have improved immensely, so
long-term storage costs ought to decrease...)

Gordon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-22 23:26         ` Gordon Henderson
@ 2004-11-22 23:48           ` Guy
  2004-11-23  0:09             ` Måns Rullgård
  2004-11-23 15:33             ` Gordon Henderson
  0 siblings, 2 replies; 50+ messages in thread
From: Guy @ 2004-11-22 23:48 UTC (permalink / raw)
  To: 'Gordon Henderson'; +Cc: linux-raid

Amanda...
I looked into this about 2 years ago.  From what I found, each daily backup
used a different tape.  This is crazy!  I can put 10-20 days on 1 tape.
Maybe more, not really sure.  Of course it is based on how much data changes
each day.  So, my full backups are only needed about once every 2-3 weeks.  

Since Amanda uses up too many tapes, I use a home grown set of scripts that
maintain the tape position and use cpio for the backup.

Do you know if the above is true about Amanda?

About tape age.  I know of a system that has DLT tapes that are over 7 years
old.  They have 21 tape drives total in 7 tape juke boxes.  No idea about
the number of tapes, but well over 4000.  These very in age from less than 1
year to over 7 years old.  They also have 2 copies of all data.  So, if a
tape fails, just find the copy.  Also make a new copy to maintain 2 copies.

Guy

-----Original Message-----
From: Gordon Henderson [mailto:gordon@drogon.net] 
Sent: Monday, November 22, 2004 6:27 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: RE: Good news / bad news - The joys of RAID

On Mon, 22 Nov 2004, Guy wrote:

> Yes, I was going for affordable!  A tape drive with native capacity of 160
> Gig costs over $2600 US (SDLT).  And tapes cost $89 each.  You need to do
a
> lot of backups before tapes cost less than an IDE disk.  An IDE disk is so
> much faster too.

True (on the speed side) Although right now it's only just over 2 hours to
dump ~200GB on one of the servers I look after.

I can see a time where the only real solution is a combined disk/tape
system - right now, I'm taking a snapshot overnight off some servers, then
backing up from that - that at least gives the punters a "yesterday"
snapshot which is great for those "accidental" deletions where getting
stuff off tape might take 4-5 hours. Using rsync, or LVM, you can even
make multiple days of snapshots. (Although I'm not sure about LVM even
now, after having some problems with it causing crashes, and very slow
performance after snapshots had been taken, maybe it's time to look at it
again though)

> The best price I could find for a 160Gig ultra 100 was $107 Hitachi
> A Hitachi 160 Gig SATA disk is $113.
>
> SDLT tapes cost $89 each (10 for $890)
>
> I am sure you could get a quantity discount on tapes, but disk drives too.
>
> Now we just need to be able to hot plug ultra 100 disk drives.
> SATA hardware supports hot plug, but I read Linux does not support that
yet.

I've had good results with SCSI hot pluggability and with a FireWire drive
where the underlying hardware uses the SCSI stack, also with USB mass
storage devices which look like SCSI drives (eg. my digital camera!)
So-far I've just used a little script to do the echo "scsi-hot-add 0 0 1
0" > /proc/scsi, etc. then mount /dev/sda1 and so on. I'm hoping that SATA
using the SCSI stack will be able to do this too, but I'm hearing
mutterings about problems with the device numbers, but so-far I've not had
any problems myself... So in that respect, going SCSI, or things that look
like SCSI drives might be the way to go...

> I do want to be able to remove my backup and put it in the shelf.  A
> business should have 2 copies where one goes off site.  I did have a power
> supply fail in a way that it fried everything in the box.  I think line
> voltage was send directly to the 12V or 5V line.  DVD drive, disk drive,
> motherboard, RAM, video card, ... all gone.   So if my backups were
on-line
> with the same power supply as the main disk(s), all would have been lost.

Ouch. I've not had anythng this bad, (yet?) Different businesses have
different ideas about backup and archive (and there are legal implications
too for some companies)

One of my clients is a small web design house - their in-house server gets
backed up to a firewire drive ("lacie" I think the brand is) once a week,
as well as a daily snapshot on-line, and is remote backed up over the net
to one of my servers, they have 2 other servers for their client web sites
which I manage and I back these up to each other overnight - not perfect,
but usable, and as these are 200 miles away from me, I need these to be as
reliable as possible within the money restaints put upon me by my client
(mutter)

> Some people seem to think tape is better than disk.  Somehow since there
is
> no filesystem, so you can't delete a file by mistake.  So, fine, just use
> the disk drive the same way.  Use cpio and output to /dev/hda or similar.

I actually use 'dump' to a file on their removable firewire drive which is
formatted ext2 - they have a 120GB drive and only 20GB of live data, so
plenty of room for multiple backups - all on the same drive... I'm going
to set them up with 'amanda' soon to try to automate it. I've used amanda
for many years no - PITA to setup, but once going, it's very good (with
tapes, anyway - I'm not actually sure I'll be able to get it to backup to
individual files on the single drive)

> The only thing tapes have that is better than disk drives is the eof and
eot
> marks.  I can put 10-20 daily backups on the same tape and let the
hardware
> track the position of each backup.  With disk, you would need to count the
> blocks used, and track the start and length of each.  Or you could use a
> file system, but like I said, some people seem to think that has too much
> risk.

I haven't found anything that beats tapes for ease of handling (physical
stacking and storage in nice boxes) and archiving. I have DLT tapes that
are 5 years old now that still read - the real problem with archiving is a
good management system, as well as realising the fact that nothing lasts
forever, so at some point you have to take those old tapes, read them back
onto disk and re-write them using the current technology, and hope the
current technology will still be about in 5 years time when you do it
again... (The good side is that densities have improved immensely, so
long-term storage costs ought to decrease...)

Gordon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-22 23:48           ` Guy
@ 2004-11-23  0:09             ` Måns Rullgård
  2004-11-23 15:33             ` Gordon Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Måns Rullgård @ 2004-11-23  0:09 UTC (permalink / raw)
  To: linux-raid

"Guy" <bugzilla@watkins-home.com> writes:

> Amanda...
> I looked into this about 2 years ago.  From what I found, each daily backup
> used a different tape.  This is crazy!  I can put 10-20 days on 1 tape.
> Maybe more, not really sure.  Of course it is based on how much data changes
> each day.  So, my full backups are only needed about once every 2-3 weeks.  

Using several tapes, switching every day (or however often you make
backups), is a good idea.  If the tapes can hold more than one backup,
just keep adding to the oldest tape when all have been used once.
That way, if the system explodes during a backup, it won't take the
most recent backup with it.

-- 
Måns Rullgård
mru@inprovide.com

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-22 23:48           ` Guy
  2004-11-23  0:09             ` Måns Rullgård
@ 2004-11-23 15:33             ` Gordon Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Gordon Henderson @ 2004-11-23 15:33 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

On Mon, 22 Nov 2004, Guy wrote:

> Amanda...
> I looked into this about 2 years ago.  From what I found, each daily backup
> used a different tape.  This is crazy!  I can put 10-20 days on 1 tape.
> Maybe more, not really sure.  Of course it is based on how much data changes
> each day.  So, my full backups are only needed about once every 2-3 weeks.
>
> Since Amanda uses up too many tapes, I use a home grown set of scripts that
> maintain the tape position and use cpio for the backup.
>
> Do you know if the above is true about Amanda?

Yes. Amanda uses one tape per backup. It writes a label at the start of
every tape to make sure it's writing the backup to the right tape.

With your system, if you lose one tape, you lose a lot of backups
(however, I have a client who uses one removable disk and I store multiple
backups on that disk)...

> About tape age.  I know of a system that has DLT tapes that are over 7 years
> old.  They have 21 tape drives total in 7 tape juke boxes.  No idea about
> the number of tapes, but well over 4000.  These very in age from less than 1
> year to over 7 years old.  They also have 2 copies of all data.  So, if a
> tape fails, just find the copy.  Also make a new copy to maintain 2 copies.

As long as they remember to take a set of tapes out of the jukebox from
time to time and replace with fresh :)

Gordon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-22 14:13   ` Yu Chen
  2004-11-22 14:34     ` Gordon Henderson
@ 2004-11-23  0:17     ` berk walker
  2004-11-23  9:24       ` Robin Bowes
  1 sibling, 1 reply; 50+ messages in thread
From: berk walker @ 2004-11-23  0:17 UTC (permalink / raw)
  To: Yu Chen; +Cc: Guy, 'Mark Klarzynski', linux-raid

you must be NUTS!  hehe.. I don't know what these cost on the street, 
but earlier, Computerworld forcast the price @ $3500, and $79 for the 
media.  If one does the traditional multi-level backup routine, the 
drive and fodder would buy a heck of a lot of alternative storage.
My idea of affordable is..$179 + 9.99.
Yu Chen wrote:

>> Now if someone made an affordable tape drive and tapes that could backup
>> 200G per tape, that would be cool!
>>
>
> You don't know? they have that already, AIT-4, LTO as I know.
>
>
> ===========================================
> Yu Chen
> Howard Hughes Medical Institute
> Chemistry Building, Rm 182
> University of Maryland at Baltimore County
> 1000 Hilltop Circle
> Baltimore, MD 21250
>
> phone:     (410)455-6347 (primary)
>     (410)455-2718 (secondary)
> fax:     (410)455-1174
> email:     chen@hhmi.umbc.edu
> ===========================================
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-23  0:17     ` berk walker
@ 2004-11-23  9:24       ` Robin Bowes
  2004-11-23 12:31         ` Bob Hillegas
  0 siblings, 1 reply; 50+ messages in thread
From: Robin Bowes @ 2004-11-23  9:24 UTC (permalink / raw)
  To: berk walker; +Cc: Yu Chen, Guy, 'Mark Klarzynski', linux-raid

berk walker wrote:
> My idea of affordable is..$179 + 9.99.

You pay delivery ??? :)

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-23  9:24       ` Robin Bowes
@ 2004-11-23 12:31         ` Bob Hillegas
  2004-11-23 13:00           ` berk walker
  0 siblings, 1 reply; 50+ messages in thread
From: Bob Hillegas @ 2004-11-23 12:31 UTC (permalink / raw)
  To: linux-raid

On Tue, 2004-11-23 at 03:24, Robin Bowes wrote:
> berk walker wrote:
> > My idea of affordable is..$179 + 9.99.

Has anyone considered Omega's REV drive? It's kind of smallish when
talking about backing up terabytes. It's 35 gigs per removable
cartridge.  But it is random access in the $375 + $20 range.

Thanks, BobH
-- 
Bob Hillegas <bobhillegas@houston.rr.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Good news / bad news - The joys of RAID
  2004-11-23 12:31         ` Bob Hillegas
@ 2004-11-23 13:00           ` berk walker
  0 siblings, 0 replies; 50+ messages in thread
From: berk walker @ 2004-11-23 13:00 UTC (permalink / raw)
  To: Bob Hillegas; +Cc: linux-raid

gotta dash, but i just checked, $343 + $58.  Back after work for comment.
thx -
b-


Bob Hillegas wrote:

>On Tue, 2004-11-23 at 03:24, Robin Bowes wrote:
>  
>
>>berk walker wrote:
>>    
>>
>>>My idea of affordable is..$179 + 9.99.
>>>      
>>>
>
>Has anyone considered Omega's REV drive? It's kind of smallish when
>talking about backing up terabytes. It's 35 gigs per removable
>cartridge.  But it is random access in the $375 + $20 range.
>
>Thanks, BobH
>  
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

[parent not found: <Pine.LNX.4.44.0411201655400.19120-100000@coffee.psychology.mcmaster.ca>]

* RE: Good news / bad news - The joys of RAID
       [not found] <Pine.LNX.4.44.0411201655400.19120-100000@coffee.psychology.mcmaster.ca>
@ 2004-11-21 21:28 ` Mark Klarzynski
  2004-11-21 21:58   ` Mark Hahn
  2004-11-22  6:29   ` Mikael Abrahamsson
  0 siblings, 2 replies; 50+ messages in thread
From: Mark Klarzynski @ 2004-11-21 21:28 UTC (permalink / raw)
  To: 'Mark Hahn'; +Cc: linux-raid

I have no idea as to what the tier1 vendors say as I have only worked
within the storage business.. the figures I quoted are based on the last
time I consulted on this are would been provided by IBM / Seagate as
these are the only two scsi vendors we use.   If you really want to dig,
then ask Seagate, they are respected in both camps and will openly
justify the technology and price difference.  They produce extremely
in-depth docs on the testing methods and assumptions.

In terms of reset I am not sure what you mean... we and all raid
manufacturers will reset a scsi bus on scsi timeouts.. this is normal
practice and simple to achieve.  It is not achievable on sata..  I have
not used pata much, but I do not recall a reset line that we could
trigger from firmware level.

RAID in isolation does not increase the i/o load as we all know... but
the reality is that raid applications do.  Non of us can refuse the cost
effective nature of sata drives, this means we can often use raid in
places where we could not afford or justify scsi.  Add multiple users
and the stress on the drives increase dramatically.  

If you want a real life situation... one of our scsi designs is used
around the world and has probably 10m+ users (many systems).. in some
cases these have been running for 4 / 5 years and therefore we have to
look at drive replacement. For a trial we used sata to obviously see if
we could save costs or offer an intermediate solution.  We could not
keep a single system going for more than 14 days. The load varied
between 10-250 users at any one time.. we tried Maxtor and IBM.  There
was also a 40% occurrence of fatal state errors.. this was simple the
rate that the drives were failing meant it was likely to fail whilst in
rebuild state and obviously die.

Take the sata box and stick it in many applications and it will last you
to your dying day.

You may be right that there has been ata and scsi drive manufactured
with the same components excluding the interface.... but the last time I
saw this was a bearing shortage in 95... I don't know of any
manufactures today that even hint at this. But I could well be wrong..

The discussion could probably go on forever, but the point is that we
are not stupid... sata solutions are probably 30% of the cost of the
scsi..... there is a difference and we know it. the important thing is
accepting the difference and using the right technology for the right
application. 

-----Original Message-----
From: Mark Hahn [mailto:hahn@physics.mcmaster.ca] 
Sent: 20 November 2004 21:58
To: Mark Klarzynski
Subject: RE: Good news / bad news - The joys of RAID

> SATA / IDE drives have an MTBF similar to that of SCSI / Fibre. But
this
> is based upon their expected use... i.e. SCSI used to be [power on
hours
> = 24hr] [use = 8 hours].. whilst SATA used to be [power on = 8 hours]
> and [use = 20 mins].

can you cite a source for these numbers?  the vendors I talk to
(tier1 system vendors, not disk vendors) usually state 24x7 100%
duty cycles for scsi/fc, and 100% poweron, 20% duty cycles for
PATA/SATA.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-21 21:28 ` Mark Klarzynski
@ 2004-11-21 21:58   ` Mark Hahn
  2004-11-22  6:29   ` Mikael Abrahamsson
  1 sibling, 0 replies; 50+ messages in thread
From: Mark Hahn @ 2004-11-21 21:58 UTC (permalink / raw)
  To: Mark Klarzynski; +Cc: linux-raid

> practice and simple to achieve.  It is not achievable on sata..  I have

Linux certainly appears to be able to reset both pata and sata;
perhaps the drivers are just lying.

> are not stupid... sata solutions are probably 30% of the cost of the
> scsi..... there is a difference and we know it. the important thing is
> accepting the difference and using the right technology for the right
> application. 

sure.  it's basically only extremely high-end DBs (which require 
150 IOPS per disk, 24/7) that need SCSI/FC.  anyone designing
a storage system needs to actually profile their IO to see whether 
their workload actually falls into this very tiny niche.  do your 
seeks scale down as ndisks increases?  do you need bandwidth (which
is almost trivial to obtain with more disks)?  do you need reliability
(which is easy to achieve with raid)?  does your IO drop to near zero
once you run it through a battery-backed cache of a few GB?

the take-home message is that you need to actually find out whether
your workload requires that you pay the huge premium for SCSI/FC 
infrastructure ("enterprise-class storage").  almost none do, seriously.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Good news / bad news - The joys of RAID
  2004-11-21 21:28 ` Mark Klarzynski
  2004-11-21 21:58   ` Mark Hahn
@ 2004-11-22  6:29   ` Mikael Abrahamsson
  1 sibling, 0 replies; 50+ messages in thread
From: Mikael Abrahamsson @ 2004-11-22  6:29 UTC (permalink / raw)
  To: linux-raid

On Sun, 21 Nov 2004, Mark Klarzynski wrote:

> You may be right that there has been ata and scsi drive manufactured
> with the same components excluding the interface.... but the last time I
> saw this was a bearing shortage in 95... I don't know of any
> manufactures today that even hint at this. But I could well be wrong..

This was in the day of Mac:s only having scsi interface but needing an 
affordable drive. Since Apple stopped using scsi in their lowend boxes, as 
far as I know there has been no more "desktop scsi drive".

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se


^ permalink raw reply	[flat|nested] 50+ messages in thread

[parent not found: <04Nov26.172857est.30052@gpu.utcc.utoronto.ca>]

* Re: Good news / bad news - The joys of RAID
       [not found] <04Nov26.172857est.30052@gpu.utcc.utoronto.ca>
@ 2004-11-26 22:41 ` Robin Bowes
  0 siblings, 0 replies; 50+ messages in thread
From: Robin Bowes @ 2004-11-26 22:41 UTC (permalink / raw)
  To: Chris Siebenmann, linux-raid

Chris Siebenmann wrote:
> You write:
> | Thinking about what happened, I would have expected that the bad
> | drive would just be removed from the array and spare activated and
> | re-syncing started automatically.
> 
>  This is what is supposed to happen; when the hardware winds are blowing
> in the right direction and the software recognizes everything, it even
> really does happen.

Chris,

I suspect that what happened is that the array was in the process of 
re-syncing when I powered off the box because it had frozen because of 
an ATA timeout error.

When I re-booted, the RAID1 root partition was dirty and wouldn't 
re-sync while the RAID 5 array was re-syncing.

Whatever, I got it back up and running by disconnecting the failed drive.

Cheers,

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2004-12-09  0:17 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-19 21:06 Good news / bad news - The joys of RAID Robin Bowes
2004-11-19 21:28 ` Guy
2004-11-20 18:42   ` Mark Hahn
2004-11-20 19:37     ` Guy
2004-11-20 20:03       ` Mark Klarzynski
2004-11-20 22:17         ` Mark Hahn
2004-11-20 23:09           ` Guy
2004-12-02 16:47           ` TJ
2004-12-02 17:29             ` Stephen C Woods
2004-12-03  3:37             ` Mark Hahn
2004-12-03  4:16               ` Guy
2004-12-03  4:46                 ` Alvin Oga
2004-12-03  5:24                 ` Richard Scobie
2004-12-03  5:40                   ` Konstantin Olchanski
2004-12-09  0:17             ` H. Peter Anvin
2004-11-20 23:30       ` Mark Hahn
2004-11-20 19:40     ` David Greaves
2004-11-21  4:33       ` Guy
2004-11-21  1:01     ` berk walker
2004-11-23 19:10     ` H. Peter Anvin
2004-11-23 20:03       ` Guy
2004-11-23 21:18         ` Mark Hahn
2004-11-23 23:02           ` Robin Bowes
2004-11-24  0:33             ` Guy
2004-11-24  1:45           ` berk walker
2004-11-24  2:00             ` H. Peter Anvin
2004-11-24  8:01               ` Good news / bad news - The joys of hardware Guy
2004-11-24  8:57                 ` Robin Bowes
2004-11-19 21:42 ` Good news / bad news - The joys of RAID Guy
2004-11-28 13:15   ` Robin Bowes
2004-11-30  2:05     ` Neil Brown
2004-12-01  3:34       ` Doug Ledford
2004-12-01 11:50         ` Robin Bowes
2004-11-19 21:58 ` Gordon Henderson
     [not found] <037401c4cf3b$ee75bc90$030a0a0a@musicroom>
2004-11-21  4:33 ` Guy
2004-11-22 14:13   ` Yu Chen
2004-11-22 14:34     ` Gordon Henderson
2004-11-22 17:51       ` Guy
2004-11-22 23:26         ` Gordon Henderson
2004-11-22 23:48           ` Guy
2004-11-23  0:09             ` Måns Rullgård
2004-11-23 15:33             ` Gordon Henderson
2004-11-23  0:17     ` berk walker
2004-11-23  9:24       ` Robin Bowes
2004-11-23 12:31         ` Bob Hillegas
2004-11-23 13:00           ` berk walker
     [not found] <Pine.LNX.4.44.0411201655400.19120-100000@coffee.psychology.mcmaster.ca>
2004-11-21 21:28 ` Mark Klarzynski
2004-11-21 21:58   ` Mark Hahn
2004-11-22  6:29   ` Mikael Abrahamsson
     [not found] <04Nov26.172857est.30052@gpu.utcc.utoronto.ca>
2004-11-26 22:41 ` Robin Bowes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).