From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: Flexible SFF interrupt handling Date: Wed, 28 Nov 2007 11:09:38 -0500 Message-ID: <474D92C2.1080804@garzik.org> References: <474D70E0.4060709@garzik.org> <20071128142947.17221a33@the-village.bc.nu> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from srv5.dvmed.net ([207.36.208.214]:44044 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752069AbXK1QJk (ORCPT ); Wed, 28 Nov 2007 11:09:40 -0500 In-Reply-To: <20071128142947.17221a33@the-village.bc.nu> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Alan Cox Cc: IDE/ATA development list Alan Cox wrote: >> In general, I think we should adopt a flexible or "loose" model for >> acking interrupts on SFF controllers. > > Agreed - especially as the IRQ is often essentially the drive output not > under any kind of sane control of ours. Good point (I had not thought of looking at it that way). >> (a) whenever we are in bus-idle (qc == NULL), and get an interrupt, go >> ahead and read Status. > > Please call into the driver. Quite a few PATA drivers have multiple IRQ > sources, and SATA many. Done :) This should simply be a new behavior coded into the existing interrupt handlers. Thus you can choose per-driver whether to do this or not. >> (b) if we are expecting an interrupt, and receive one, check Status (or >> AltStatus if DMAing). > > Providing we are not mid data transfer (which is why we need to get into > enable/disable_irq for some controllers). Right now its a problem that > can't occur but on some controllers reading status mid PIO xfer causes > joyous things like silent corruption. True.. >> (c) if condition "(b)" indicates busy, initiate status polling every >> 250ms until timeout occurs or BSY clears. > > Yep. > >> (d) if N seconds (4?) elapses without an interrupt, initiate polling. >> keep a history of such "fail-over" events, and note each fail-over'd >> command's eventual success via polling, success via interrupt, or >> timeout. Use that history to decide to switch to 100% polling mode >> (i.e. reach conclusion that interrupt delivery is broken, via observation) > > N = 8 sounds good to me (7 being the normal maximum command timeout) > >> That should cover no-interrupts, lost interrupts, early interrupts, >> screaming interrupts, insane devices, and of course normal operation. > > Should we also consider resetting the device as one of the strategies (at > least once off) > > Might also want to think at that point about the case of > > command > .... > timeout > > where old IDE checks with the controller to spot lost IRQ cases where a > command finished and stuff vanished. Old IDE doesn't do much with it but > we could use that as a good hint that we want to switch to polling mode > and tell the user their computer sucks. That's basically where I wanted to go with "(d)". Being able to both handle interrupts _and_ fall back to polling makes it easy to notice when interrupts are getting lost. If more than a couple rescues of this nature occur, do as you describe. Jeff