From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tim Small Subject: ahci timeouts, retries etc. Date: Wed, 14 Oct 2009 17:51:32 +0100 Message-ID: <4AD60194.8030106@seoss.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from relay1.allsecurenet.com ([63.246.152.102]:59201 "EHLO relay1.allsecurenet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934963AbZJNQvr (ORCPT ); Wed, 14 Oct 2009 12:51:47 -0400 Received: from [78.105.152.189] (helo=zebedee.buttersideup.com) by relay1.allsecurenet.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1My74A-0001ZW-PU for linux-ide@vger.kernel.org; Wed, 14 Oct 2009 16:51:10 +0000 Received: from [91.208.163.30] (ermintrude [91.208.163.30]) by zebedee.buttersideup.com (Postfix) with ESMTP id 51996639C1 for ; Wed, 14 Oct 2009 17:51:08 +0100 (BST) Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org Hi, I have a Tyan S5375 (BIOS v1.03) ICH9 which periodically (approx twice a week) logs timeouts like this: [6475755.652262] ata2.00: exception Emask 0x0 SAct 0x3832 SErr 0x0 action 0x6 frozen [6475755.652262] ata2.00: cmd 60/18:08:2a:90:ee/00:00:12:00:00/40 tag 1 ncq 12288 in [6475755.652262] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [6475755.652262] ata2.00: status: { DRDY } [6475755.652262] ata2.00: cmd 61/60:20:6a:8c:ee/00:00:12:00:00/40 tag 4 ncq 49152 out [6475755.652262] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ... [6475755.652262] ata2.00: cmd 60/10:68:6a:65:ee/00:00:12:00:00/40 tag 13 ncq 8192 in [6475755.652262] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [6475755.652262] ata2.00: status: { DRDY } [6475755.652262] ata2: hard resetting link [6475756.009863] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [6475756.040731] ata2.00: configured for UDMA/133 [6475756.040731] sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB) [6475756.040731] sd 1:0:0:0: [sdb] Write Protect is off [6475756.040731] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [6475756.040731] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA A look at the libata wiki suggests interrupt delivery problems as a possible explanation, but is this likely to be the case here? I'm guessing that multiple interrupts must have been dropped by the time this error has occurred, as multiple requests are queued for the drive? I'm assuming that the kernel will retry these requests after the sata link has been reset? The errors appear to be randomly distributed over the four drives on this machine - all are Seagate ST31000340NS with either firmware version SN05 or SN16... Cheers, Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309