From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932464AbVL1GYh (ORCPT ); Wed, 28 Dec 2005 01:24:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932469AbVL1GYh (ORCPT ); Wed, 28 Dec 2005 01:24:37 -0500 Received: from neopsis.com ([213.239.204.14]:52447 "EHLO matterhorn.dbservice.com") by vger.kernel.org with ESMTP id S932464AbVL1GYg (ORCPT ); Wed, 28 Dec 2005 01:24:36 -0500 Message-ID: <43B23DE0.5080201@dbservice.com> Date: Wed, 28 Dec 2005 07:25:20 +0000 From: Tomas Carnecky User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051204) X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: Serial ATA Lockups Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Neopsis-MailScanner-Information: Please contact the ISP for more information X-Neopsis-MailScanner: Found to be clean X-MailScanner-From: tom@dbservice.com Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org My setup: Shuttle XPC Barebone, AMD CPU, two serial ATA disks in a software raid setup. When the system is under heavy load (start World of Warcraft, dd if=/dev/zero of=/part/file etc) I get these messages in dmesg: ata1: translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x84 { DriveStatusError BadCRC } over and over, pages with these messages. The system will eventually lockup hard, HDD led is on, no disk activity, I have to reboot the system. Some kernels ago (2.6.14.2) I got a kernel backtrace on the console, I don't remember exactly anymore but there was something with scsi_resume(). I don't get this backtrace with this kernel: 2.6.15-rc6-gd5ea4e26, now it just locks up hard. Sometimes I can't even boot (like just before), it locked up before init could be started. And I've seen this on my console (transcript): command 0x35 timeout, stat 0xd0 host_stat 0x21 translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00 status=0xd0 { busy } SCSI error: return code = 0x8000002 sda: Current: sense key = 0xB end_request: I/O error, dev sda, sector [sector #] ATA abnormal status 0xD0 on port 0x9f7 Its very hard recover from a hard lockup because at the next reboot, the kernel wants to RESYNC the raid arrays and this causes heavy load which again causes a hard lockup. And endless loop. Sometimes, I can boot and then I change to 'init 2' to stop as many services as I can and unmount as many partitions as I can but even then it sometimes locks up again. The Barebone SATA chip supports SATA-II, but the harddrives are SATA-I. The two disks are Seagate 120GB. I've had problems with the harddrives before, I've had them on a ICP Vortex SATA hardware-raid controller and sometimes, one disk would fail and I'd have to rebuild the array. It was always the same disk, and I don't think it one that I have in my new computer now. Can this be fixed in the kernel? Or do I have to buy new harddisks? tom