From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [PATCH #upstream 2/2] libata: implement spurious irq handling for SFF and apply it to piix Date: Thu, 21 Jan 2010 11:52:52 -0500 Message-ID: <4B588664.4040602@garzik.org> References: <4B550EF8.1000009@kernel.org> <4B550F9F.80503@kernel.org> <4B575A84.3030005@garzik.org> <4B579582.4050806@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-gx0-f217.google.com ([209.85.217.217]:55757 "EHLO mail-gx0-f217.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753127Ab0AUQwy (ORCPT ); Thu, 21 Jan 2010 11:52:54 -0500 Received: by gxk9 with SMTP id 9so176719gxk.8 for ; Thu, 21 Jan 2010 08:52:53 -0800 (PST) In-Reply-To: <4B579582.4050806@kernel.org> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Tejun Heo Cc: "linux-ide@vger.kernel.org" , Alan Cox , Hans Werner , Sergei Shtylyov On 01/20/2010 06:45 PM, Tejun Heo wrote: > Hello, > > On 01/21/2010 04:33 AM, Jeff Garzik wrote: >> Overall, as long as the drive is in Bus-Idle mode, it should be safe to >> go ahead and read Status, for pretty much every controller and drive. > > Hmmm... I was a bit worried about the case Alan mentioned several > times where access to AltStatus while data transfer is going on can > lead to silent data corruption. If a drive is in Bus-Idle, as I mentioned, then there is no active data transfer. >> I would make exception only for the new SATA FIS-based controllers, >> where we know that hitting Status is likely both pointless and wasteful, >> as well as being superfluous because the newer FIS-based controllers all >> have irq status registers. > > FIS-based ones need their own interrupt handlers anyway so, > fortunately, things like irq_check callback isn't necessary to begin > with. :-) Yep. >> Additionally, I think we should have a "fast-timeout" and >> "slow-timeout", whereby we check Status after a short period (5 >> seconds?) to make sure we did not lose an interrupt. If Status is !BSY, >> then we can proceed with handling qc success/failure immediately. > > Does this happen often? What I find more common is just plain > timeouts, so I think it would improve our exception latency if we > apply different timeouts for each trial. ie. For the first RW try, > set the timeout to 7 secs. For the second, 15 and then to 30. This > wouldn't harm the correctness while allowing libata to react much > faster to transient failures. Lost interrupts do not happen often, but they do happen. Google finds plenty of examples. > Another thing is I can think of which can improve our robustness is > dynamic irqpoll support such that when screaming IRQ happens, IRQ > subsystem not only shuts down the IRQ line but also begins selectively > irqpolling it. Does this ever happen when data transfer is active? AFAIK this happens during probe or reset or set-xfer or bus-idle or some other auxiliary moment in time. Jeff