From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JydpA-0004aN-3m for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:04 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jydp9-0004Zx-NR for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400 Received: from [199.232.76.173] (port=56629 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jydp9-0004Zu-Fm for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400 Received: from yw-out-1718.google.com ([74.125.46.157]:58580) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Jydp8-000245-SM for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400 Received: by yw-out-1718.google.com with SMTP id 6so1624938ywa.82 for ; Tue, 20 May 2008 19:12:57 -0700 (PDT) Message-ID: <48338522.7030306@codemonkey.ws> Date: Tue, 20 May 2008 21:12:50 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT) References: <1211283126.4314.70.camel@frecb07144> <48332AB9.3010707@codemonkey.ws> <20080520223602.GE27853@shareable.org> <48337444.2070203@codemonkey.ws> <20080521011915.GC595@shareable.org> In-Reply-To: <20080521011915.GC595@shareable.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Blue Swirl , Laurent Vivier , Kevin Wolf Jamie Lokier wrote: > Anthony Liguori wrote: > >>> One property of disks is that if you overwrite a sector and the're >>> power loss, when read later that sector might be corrupt. Even if the >>> new data is the same as the old data with only some bytes changed, >>> some of the _unchanged_ bytes may be corrupt by this. >>> >> I don't think this is true. What evidence do you have to support such >> claims? >> > > What do you imagine happens when you pull the power in the middle of > writing a sector to a floppy disk (to pick a more easily imagined > example)? > > There is not enough residual power to write the rest of the sector. > That sector's checksum will therefore be corrupt, and (hopefully) have > a CRC read error. It can be written over again, wiping the CRC error. > Why would the sector's checksum be corrupt? The checksum wouldn't change after the data write. > No sector which wasn't being written will be corrupt: the write head > isn't activated over those. The drive waits until it senses the start > of sector N, then activates the write head to write data bits. > > The CRC error by itself my cause the whole sector to be reported as > corrupt with no data. However, if you do manage to get back the bits > from the media, some bits of the sector being written whose values > were not intended to change may be different than expected. This is > because the way data is recorded does not encode each bit separately, > but multiplexes them together for modulation, and also because bit > timing is not exact. > > A modern hard disk uses much more complex data encoding, which further > adds to the effect of a truncated write corrupting even data bits not > intended to be changed, in the vicinity of those being changed. > > But it should aim to provide the same basic guarantee that writing a > sector cannot corrupt neighbouring sectors on power failure, only the > one(s) being written. This is because robustness of journalling > filesystems and databases do rather depend on this property, and > simple old-fashioned disks do provide it. > > I am just speculating; I don't know whether modern hard disks provide > this property, or under what circumstances they fail. But it seems > they could provide it, because they still have physically independent > sectors. > > (Interestingly, the journal block size used by Oracle on different > OSes is different, suggesting the "basic unit of corruption" > varies between OSes and is not always a single sector). > > Although it's just speculation, do you think modern hard disks behave > differently from this? > Modern *enterprise* hard disks have battery backed caches so read/write operations always complete or fail. Low-end disks don't tend to have battery backed caches but AFAIK, rewriting the same data will not result in any sort of disk corruption. Regards, Anthony Liguori > -- Jamie > > >