* ATAn timeout errors
@ 2005-10-04 14:37 Robin Bowes
2005-12-02 14:18 ` Robin Bowes
0 siblings, 1 reply; 2+ messages in thread
From: Robin Bowes @ 2005-10-04 14:37 UTC (permalink / raw)
To: linux-raid
I posted this on the linux-ide list but got no response so I thought I'd
try here.
My setup is as follows:
Epox EP-D3VA
Dual PIII 1GHZ processors
1.5GB RAM
Two of Promise SATA150 TX4 controllers
Six of Maxtor 250GB SATA drives (7Y250M0) - three per controller
Running Fedora Core 4 with "stock" FC4 kernel (kernel-smp-2.6.12-1.1456_FC4)
I have four md arrays as follows:
/dev/md0 RAID1 /dev/sd[ad]
/dev/md1 RAID1 /dev/sd[be]1
/dev/md2 RAID1 /dev/sd[cf]1
/dev/md5 RAID5 /dev/sd[abcef]2 (/dev/sdd2 is a hot spare)
md[0-2] are 1.5 MB areas
/dev/md0 is /
/dev/md1 is swap
/dev/md2 is currently not used
md5 is 929GB and I have used lvm to create:
/home home_lv audio_vg -wi-ao 914.38G
/usr usr_lv audio_vg -wi-ao 10.00G
/var var_lv audio_vg -wi-ao 5.00G
Ok, onto the problem...
After a couple of power outages I recently got myself a UPS but
(typically) didn't get round to installing it before another outage (doh!).
The server came back up OK with /dev/md5 dirty and needing to resync.
However, during the re-sync, one or more of the disks clunked and I saw
an "ATAn Timeout" message on the console and the system froze. (n
varied, e.g. ATA2, ATA1, ATA4, etc.) This seemed to be triggered by
doing something that caused disk activity during the resync.
I've seen this before and done a hard-reset to start again - eventually
the resync has completed and everything's back to normal.
However, this time, I had to drop to single-user mode and reduce the
RAID sync speed (echo 5000 > /proc/sys/dev/raid/speed_limit_max) to get
the resync to complete.
The server then hung again - same error - so I used Maxtor's PowerMax
utility to perform a full test of all drives and all passed
successfully. I then rebooted and left it re-syncing in multi-user mode
(with fingers crossed) and this time it completed successfully.
Can anyone tell me if this is a bug somewhere or might it be a hardware
limitation, i.e. saturating the PCI bus when resyncing? Is there
anything I can do to prevent it from happening?
I'm not too bothered about RAID performance - I mainly use it to store
.flac audio files which don't need great throuhgput to stream off the disk.
Any suggestions (or fixes!) appreciated.
Thanks,
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: ATAn timeout errors
2005-10-04 14:37 ATAn timeout errors Robin Bowes
@ 2005-12-02 14:18 ` Robin Bowes
0 siblings, 0 replies; 2+ messages in thread
From: Robin Bowes @ 2005-12-02 14:18 UTC (permalink / raw)
To: linux-raid
My server is now unusable - I've rebuilt the array using a boot CD and
it completed OK.
I rebooted from the internal drives and it came up OK, then crashed with
an "ata6: commain timeout" error.
I guess there must be a hardware issue somewhere, but I have no idea where.
Anyone got any ideas or suggestions?
R.
Robin Bowes said the following on 04/10/2005 15:37:
> I posted this on the linux-ide list but got no response so I thought I'd
> try here.
>
> My setup is as follows:
>
> Epox EP-D3VA
> Dual PIII 1GHZ processors
> 1.5GB RAM
> Two of Promise SATA150 TX4 controllers
> Six of Maxtor 250GB SATA drives (7Y250M0) - three per controller
>
> Running Fedora Core 4 with "stock" FC4 kernel
> (kernel-smp-2.6.12-1.1456_FC4)
>
> I have four md arrays as follows:
>
> /dev/md0 RAID1 /dev/sd[ad]
> /dev/md1 RAID1 /dev/sd[be]1
> /dev/md2 RAID1 /dev/sd[cf]1
> /dev/md5 RAID5 /dev/sd[abcef]2 (/dev/sdd2 is a hot spare)
>
> md[0-2] are 1.5 MB areas
> /dev/md0 is /
> /dev/md1 is swap
> /dev/md2 is currently not used
>
> md5 is 929GB and I have used lvm to create:
>
> /home home_lv audio_vg -wi-ao 914.38G
> /usr usr_lv audio_vg -wi-ao 10.00G
> /var var_lv audio_vg -wi-ao 5.00G
>
> Ok, onto the problem...
>
> After a couple of power outages I recently got myself a UPS but
> (typically) didn't get round to installing it before another outage (doh!).
>
> The server came back up OK with /dev/md5 dirty and needing to resync.
>
> However, during the re-sync, one or more of the disks clunked and I saw
> an "ATAn Timeout" message on the console and the system froze. (n
> varied, e.g. ATA2, ATA1, ATA4, etc.) This seemed to be triggered by
> doing something that caused disk activity during the resync.
>
> I've seen this before and done a hard-reset to start again - eventually
> the resync has completed and everything's back to normal.
>
> However, this time, I had to drop to single-user mode and reduce the
> RAID sync speed (echo 5000 > /proc/sys/dev/raid/speed_limit_max) to get
> the resync to complete.
>
> The server then hung again - same error - so I used Maxtor's PowerMax
> utility to perform a full test of all drives and all passed
> successfully. I then rebooted and left it re-syncing in multi-user mode
> (with fingers crossed) and this time it completed successfully.
>
> Can anyone tell me if this is a bug somewhere or might it be a hardware
> limitation, i.e. saturating the PCI bus when resyncing? Is there
> anything I can do to prevent it from happening?
>
> I'm not too bothered about RAID performance - I mainly use it to store
> .flac audio files which don't need great throuhgput to stream off the disk.
>
> Any suggestions (or fixes!) appreciated.
>
> Thanks,
>
> R.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-12-02 14:18 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-04 14:37 ATAn timeout errors Robin Bowes
2005-12-02 14:18 ` Robin Bowes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).