From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH Linux 2.6.12 00/09] NCQ: generic NCQ completion/error-handling Date: Wed, 6 Jul 2005 15:00:05 +0200 Message-ID: <20050706130004.GB1373@suse.de> References: <20050626152105.D86561FB@htj.dyndns.org> <20050627143344.GI11633@suse.de> <20050630073633.GF2243@suse.de> <42C3CEA5.9040509@gmail.com> <20050630152620.GZ2243@suse.de> <20050701002035.GA24878@htj.dyndns.org> <20050701085912.GB2243@suse.de> <20050704055332.GA7249@suse.de> <20050706125500.GA1373@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from ns.virtualhost.dk ([195.184.98.160]:56250 "EHLO virtualhost.dk") by vger.kernel.org with ESMTP id S261419AbVGFM6d (ORCPT ); Wed, 6 Jul 2005 08:58:33 -0400 Content-Disposition: inline In-Reply-To: <20050706125500.GA1373@suse.de> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Tejun Cc: jgarzik@pobox.com, linux-ide@vger.kernel.org On Wed, Jul 06 2005, Jens Axboe wrote: > On Mon, Jul 04 2005, Jens Axboe wrote: > > On Fri, Jul 01 2005, Jens Axboe wrote: > > > > I converted most of debug messages I've used during development into > > > > warning messages when posting the patchset and forgot about it, so > > > > I've never posted the debug patch. Sorry about that. Here's a small > > > > patch which adds some more messages though. The following patch also > > > > adds printk'ing FIS on each command issue in ahci.c:ahci_qc_issue(), > > > > if you think it would fill your log excessively, feel free to turn it > > > > off. It wouldn't probably matter anyway. > > > > > > I will have to kill the issue part of the patch, that would generate > > > insane amounts of printk traffic :-) > > > > > > I'll boot the kernel and report what happens. > > > > It triggered last night, but the old kernel was booted. This was the > > log: > > > > ahci ata1: stat=d0, issuing COMRESET > > ata1: recovering from error > > ata1: status=0x01 { Error } > > ata1: error=0x80 { Sector } > > SCSI error : <0 0 0 0> return code = 0x8000002 > > sda: Current: sense key=0x3 > > ASC=0x11 ASCQ=0x4 > > end_request: I/O error, dev sda, sector 66255899 > > Buffer I/O error on device sda2, logical block 8018923 > > lost page write due to I/O error on sda2 > > ata1: status=0x01 { Error } > > ata1: error=0x80 { Sector } > > SCSI error : <0 0 0 0> return code = 0x8000002 > > sda: Current: sense key=0x3 > > ASC=0x11 ASCQ=0x4 > > end_request: I/O error, dev sda, sector 66239043 > > Buffer I/O error on device sda2, logical block 8016816 > > lost page write due to I/O error on sda2 > > ata1: recovering from error > > ata1: status=0x01 { Error } > > ata1: error=0x80 { Sector } > > SCSI error : <0 0 0 0> return code = 0x8000002 > > sda: Current: sense key=0x3 > > ASC=0x11 ASCQ=0x4 > > end_request: I/O error, dev sda, sector 66239051 > > Buffer I/O error on device sda2, logical block 8016817 > > lost page write due to I/O error on sda2 > > ata1: status=0x01 { Error } > > ata1: error=0x80 { Sector } > > SCSI error : <0 0 0 0> return code = 0x8000002 > > sda: Current: sense key=0x3 > > ASC=0x11 ASCQ=0x4 > > end_request: I/O error, dev sda, sector 35137043 > > This is with the extra debug. Given that it is the timeout triggering, > only the sstatus is new. > > ahci ata1: stat=d0, issuing COMRESET > ata1: started resetting... > ata1: end resetting, sstatus=00000113 > ata1: recovering from error > ata1: status=0x01 { Error } > ata1: error=0x80 { Sector } > SCSI error : <0 0 0 0> return code = 0x8000002 > sda: Current: sense key=0x3 > ASC=0x11 ASCQ=0x4 > end_request: I/O error, dev sda, sector 66190875 > Buffer I/O error on device sda2, logical block 8010795 > lost page write due to I/O error on sda2 > ata1: status=0x01 { Error } > ata1: error=0x80 { Sector } > SCSI error : <0 0 0 0> return code = 0x8000002 > sda: Current: sense key=0x3 > ASC=0x11 ASCQ=0x4 > end_request: I/O error, dev sda, sector 66159699 > Buffer I/O error on device sda2, logical block 8006898 > lost page write due to I/O error on sda2 btw, the reason it hangs here (I suspect) is that your read_log_page() logic is wrong - not every error condition will have NCQ_FAILED set before entering ncq_recover. The timeout will not, for instance. Testing... As usual, this will take days. -- Jens Axboe