From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: error handling - DMA to PIO step down sequence Date: Wed, 20 Sep 2006 15:03:27 -0400 Message-ID: <4511907F.1010104@emc.com> Reply-To: ric@emc.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mexforward.lss.emc.com ([128.222.32.20]:54895 "EHLO mexforward.lss.emc.com") by vger.kernel.org with ESMTP id S932267AbWITTFo (ORCPT ); Wed, 20 Sep 2006 15:05:44 -0400 Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org, Tejun Heo , Jeff Garzik , Mark Lord , Alan Cox Now that Tejun has put in the enhanced error handling (which is a big jump forward), I have been trying to test and validate the code and the assumptions. Having spent far too much time on planes recently, broken only by spending the other part of my time helping do root cause failure analysis of drives, I have been questioning the validity of the way we currently derate our p-ata and s-ata connected drives from DMA to slower DMA to PIO and then spiral on down. All of this is a long winded way of asking if this step down is ever valid for either S-ATA (or even modern P-ATA) drives. From what I see and what I hear from the way my colleagues handle drive errors in non-linux code, this seems to be very aggressive and most likely not justified with modern drives and hba's. Derating should probably never happen on normal drive errors - even those that might take 10's of seconds. Often, drives will try really, really hard to recover and might eventually respond after internally giving up after up to 30 seconds. Also, NACK's from unsupported commands or any type of media errors should not kick off this sequence. Would this be a reasonable thing for a config option? Better to add yet another blacklist for devices that might have a justified need for this derating?