From mboxrd@z Thu Jan 1 00:00:00 1970 From: s ponnusa Subject: Re: Linux kernel - Libata bad block error handling to user mode program Date: Fri, 5 Mar 2010 17:27:50 -0500 Message-ID: References: <20100303224245.ae8d1f7a.akpm@linux-foundation.org> <87f94c371003040617t4a4fcd0dt1c9fc0f50e6002c4@mail.gmail.com> <4B8FC6AC.4060801@teksavvy.com> <4B8FF2C3.1060808@teksavvy.com> <4B90655B.4000005@gmail.com> <20100305120355.6b161572@lxorguk.ukuu.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:45887 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755288Ab0CEW14 convert rfc822-to-8bit (ORCPT ); Fri, 5 Mar 2010 17:27:56 -0500 In-Reply-To: <20100305120355.6b161572@lxorguk.ukuu.org.uk> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Alan Cox Cc: Robert Hancock , Mark Lord , Greg Freemyer , Andrew Morton , linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org, Jens Axboe , linux-mm@kvack.org Having fsync / fdatasync results in the below mentioned behavior. > - sectors being corrupted/dying that were not written by near to it > - writes that the drive thinks were successful and reports that way b= ut > =A0turn out not to be readable Always the write passes after much delay, but it turns out that the sector is not readable. But for my scenario, I need to know when a sector timeouts / errors during the write process and move on to the next sector. Apparently I have changed my program to do the following steps: Open the drive in O_RDWR mode. write a sector. reposition the file pointer. read the sector. verify the read buffer contents with write buffer contents. This scenario always passes and does not identify the bad sectors if the program does it sequentially (even on a hdd with bad sectors). But if I write using an independent program (prg A) and verify using another independent program (prg B), all the writes done using prg A passes (expected behavior) and the read operations from prg B fails on bad sectors (again expected behavior) and I am able to detect the bad sectors. Is there any issue, if I perform both the operations simultaneously? (Initially I tried using the O_DIRECT mode and as it was extremely slow reverted to opening the device in O_RDWR mode and used fadvise with don'tneed flag). Thanks. On Fri, Mar 5, 2010 at 7:03 AM, Alan Cox wro= te: >> cannot be read back by any other means. And the program which wrote >> the data is unaware of the error that has happened at the lower leve= l. >> But the error log clearly has the issue caught but is trying to hand= le >> differently. > > This is standard behaviour on pretty much every OS. If each write was > back verified by the OS you wouldn't get any work done due fact it to= ok > so long to do any I/O and all I/O was synchronoous. > > Where it matters you can mount some file systems synchronous, you can= do > synchronous I/O (O_SYNC) or you can use and check fsync/fdatasync res= ults > which should give you pretty good coverage providing barriers are ena= bled. > > It still won't catch a lot of cases because you sometimes see > > - sectors being corrupted/dying that were not written by near to it > - writes that the drive thinks were successful and reports that way b= ut > =A0turn out not to be readable > > Alan >