From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robin Bowes Subject: Re: SATA150TX4 atat1:command timeout Date: Fri, 30 Sep 2005 11:40:38 +0100 Message-ID: <433D1626.2000909@robinbowes.com> References: <42111B02.4010805@netmosphere.net> <4211279C.5070205@pobox.com> <421360ED.2040505@netmosphere.net> <42161A94.1020404@netmosphere.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from 83-216-145-213.pambow882.adsl.metronet.co.uk ([83.216.145.213]:17628 "HELO dude.robinbowes.com") by vger.kernel.org with SMTP id S1030248AbVI3KkQ (ORCPT ); Fri, 30 Sep 2005 06:40:16 -0400 In-Reply-To: <42161A94.1020404@netmosphere.net> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Francois Payette Cc: Jeff Garzik , linux-ide@vger.kernel.org, Eric Mudama Hi, I hope it's OK resurrecting an old thread, but I'm seeing similar problems. My setup is as follows: Epox EP-D3VA Dual PIII 1GHZ processors 1.5GB RAM Two of Promise SATA150 TX4 controllers Six of Maxtor 250GB SATA drives (7Y250M0) - three per controller Running Fedora Core 4 with "stock" FC4 kernel (kernel-smp-2.6.12-1.1456_FC4) I have four md arrays as follows: /dev/md0 RAID1 /dev/sd[ad] /dev/md1 RAID1 /dev/sd[be]1 /dev/md2 RAID1 /dev/sd[cf]1 /dev/md5 RAID5 /dev/sd[abcef]2 (/dev/sdd2 is a hot spare) md[0-2] are 1.5 MB areas /dev/md0 is / /dev/md1 is swap /dev/md2 is currently not used md5 is 929GB and I have used lvm to create: /home home_lv audio_vg -wi-ao 914.38G /usr usr_lv audio_vg -wi-ao 10.00G /var var_lv audio_vg -wi-ao 5.00G Ok, onto the problem... After a couple of power outages I recently got myself a UPS but (typically) didn't get round to installing it before another outage (doh!). The server came back up OK with /dev/md5 dirty and needing to resync. However, during the re-sync, one or more of the disks clunked and I saw an "ATAn Timeout" message on the console and the system froze. (n varied, e.g. ATA2, ATA1, ATA4, etc.) This seemed to be triggered by doing something that caused disk activity during the resync. I've seen this before and done a hard-reset to start again - eventually the resync has completed and everything's back to normal. However, this time, I had to drop to single-user mode and reduce the RAID sync speed (echo 5000 > /proc/sys/dev/raid/speed_limit_max) to get the resync to complete. Can anyone tell me if this is a bug somewhere or might it be a hardware limitation, i.e. saturating the PCI bus when resyncing? Is there anything I can do to prevent it from happening? I'm not too bothered about RAID performance - I mainly use it to store .flac audio files which don't need great throuhgput to stream off the disk. Any suggestions (or fixes!) appreciated. Thanks, R. -- http://robinbowes.com If a man speaks in a forest, and his wife's not there, is he still wrong?