From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Lord Subject: Re: [PATCHSET #upstream] libata: improve FLUSH error handling Date: Thu, 27 Mar 2008 10:24:43 -0400 Message-ID: <47EBAE2B.8070102@rtr.ca> References: <12066128663306-git-send-email-htejun@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from rtr.ca ([76.10.145.34]:4203 "EHLO mail.rtr.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755752AbYC0OYp (ORCPT ); Thu, 27 Mar 2008 10:24:45 -0400 In-Reply-To: <12066128663306-git-send-email-htejun@gmail.com> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Tejun Heo Cc: jeff@garzik.org, linux-ide@vger.kernel.org, alan@lxorguk.ukuu.org.uk Tejun Heo wrote: > > As the code is being smart against retrying needlessly, it won't be > too dangerous to increase the 20 tries (taken from Alan's patch) but I > think it's as good as any other random number. If anyone knows any > meaningful number, please chime in. The same goes for 60 secs timeout > too. .. I really think that we should enforce a strict upper limit on the time that can be spent inside the flush-cache near-infinite loop being introduced. Some things rely on I/O completing or failing in a time deterministic manner. Really, the entire flush + retries etc.. should never, ever, be permitted to take more than XX seconds total. Not 60 seconds per retry, but XX seconds total for the original command + however many retries we can fit in there. As for the value of XX, well.. make it a sysfs attribute, with a default of something "sensible". The time bounds is really dependent upon how quickly the drive can empty its onboard cache, or how large a cache it has. Figure the biggest drives will have no more than, say 64MB of cache for many years (biggest SATA drive now uses 16MB). Assuming near-worst case I/O size of 4KB, that's 16384 I/O operations, if none were adjacent on disk. What's the average access time these days? Say.. 20ms worst case for any drive with a cache that huge? That's unrealistically slow for data that's already in the drive cache, but .. 16384 * .020 seconds = 328 seconds. Absolute theoretical worst case for a drive with a buffer 4X the largest current size: 328 seconds. Not taking into account having bad-sector retries for each of those I/O blocks, but *nobody* is going to wait that long anyway. They'll have long since pulled the power cord or reached for the BIG RED BUTTON. On a 16MB cache drive, that number would be 328 / 4 = 82 seconds. That's what I'd put for the limit. But we could be slighly nonsensical and agree upon 120 seconds. :) Cheers