From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1JydpA-0004aN-3m
	for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:04 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Jydp9-0004Zx-NR
	for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400
Received: from [199.232.76.173] (port=56629 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Jydp9-0004Zu-Fm
	for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400
Received: from yw-out-1718.google.com ([74.125.46.157]:58580)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1Jydp8-000245-SM
	for qemu-devel@nongnu.org; Tue, 20 May 2008 22:13:03 -0400
Received: by yw-out-1718.google.com with SMTP id 6so1624938ywa.82
	for <qemu-devel@nongnu.org>; Tue, 20 May 2008 19:12:57 -0700 (PDT)
Message-ID: <48338522.7030306@codemonkey.ws>
Date: Tue, 20 May 2008 21:12:50 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off
	(O_DIRECT)
References: <1211283126.4314.70.camel@frecb07144>	<48332AB9.3010707@codemonkey.ws>	<20080520223602.GE27853@shareable.org>	<48337444.2070203@codemonkey.ws>
	<20080521011915.GC595@shareable.org>
In-Reply-To: <20080521011915.GC595@shareable.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Blue Swirl <blauwirbel@gmail.com>, Laurent Vivier <Laurent.Vivier@bull.net>, Kevin Wolf <kwolf@suse.de>

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>>> One property of disks is that if you overwrite a sector and the're
>>> power loss, when read later that sector might be corrupt.  Even if the
>>> new data is the same as the old data with only some bytes changed,
>>> some of the _unchanged_ bytes may be corrupt by this.
>>>       
>> I don't think this is true.  What evidence do you have to support such 
>> claims?
>>     
>
> What do you imagine happens when you pull the power in the middle of
> writing a sector to a floppy disk (to pick a more easily imagined
> example)?
>
> There is not enough residual power to write the rest of the sector.
> That sector's checksum will therefore be corrupt, and (hopefully) have
> a CRC read error.  It can be written over again, wiping the CRC error.
>   

Why would the sector's checksum be corrupt?  The checksum wouldn't 
change after the data write.

> No sector which wasn't being written will be corrupt: the write head
> isn't activated over those.  The drive waits until it senses the start
> of sector N, then activates the write head to write data bits.
>
> The CRC error by itself my cause the whole sector to be reported as
> corrupt with no data.  However, if you do manage to get back the bits
> from the media, some bits of the sector being written whose values
> were not intended to change may be different than expected.  This is
> because the way data is recorded does not encode each bit separately,
> but multiplexes them together for modulation, and also because bit
> timing is not exact.
>
> A modern hard disk uses much more complex data encoding, which further
> adds to the effect of a truncated write corrupting even data bits not
> intended to be changed, in the vicinity of those being changed.
>
> But it should aim to provide the same basic guarantee that writing a
> sector cannot corrupt neighbouring sectors on power failure, only the
> one(s) being written.  This is because robustness of journalling
> filesystems and databases do rather depend on this property, and
> simple old-fashioned disks do provide it.
>
> I am just speculating; I don't know whether modern hard disks provide
> this property, or under what circumstances they fail.  But it seems
> they could provide it, because they still have physically independent
> sectors.
>
> (Interestingly, the journal block size used by Oracle on different
> OSes is different, suggesting the "basic unit of corruption"
> varies between OSes and is not always a single sector).
>
> Although it's just speculation, do you think modern hard disks behave
> differently from this?
>   

Modern *enterprise* hard disks have battery backed caches so read/write 
operations always complete or fail.  Low-end disks don't tend to have 
battery backed caches but AFAIK, rewriting the same data will not result 
in any sort of disk corruption.

Regards,

Anthony Liguori


> -- Jamie
>
>
>