From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Jycz8-0003fu-MX
	for qemu-devel@nongnu.org; Tue, 20 May 2008 21:19:18 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Jycz7-0003dP-LE
	for qemu-devel@nongnu.org; Tue, 20 May 2008 21:19:17 -0400
Received: from [199.232.76.173] (port=33979 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Jycz7-0003d0-FF
	for qemu-devel@nongnu.org; Tue, 20 May 2008 21:19:17 -0400
Received: from mail2.shareable.org ([80.68.89.115]:49787)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1Jycz7-0000Ar-18
	for qemu-devel@nongnu.org; Tue, 20 May 2008 21:19:17 -0400
Date: Wed, 21 May 2008 02:19:15 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off
	(O_DIRECT)
Message-ID: <20080521011915.GC595@shareable.org>
References: <1211283126.4314.70.camel@frecb07144>
	<48332AB9.3010707@codemonkey.ws>
	<20080520223602.GE27853@shareable.org>
	<48337444.2070203@codemonkey.ws>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <48337444.2070203@codemonkey.ws>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Blue Swirl <blauwirbel@gmail.com>, Laurent Vivier <Laurent.Vivier@bull.net>, Kevin Wolf <kwolf@suse.de>

Anthony Liguori wrote:
> >One property of disks is that if you overwrite a sector and the're
> >power loss, when read later that sector might be corrupt.  Even if the
> >new data is the same as the old data with only some bytes changed,
> >some of the _unchanged_ bytes may be corrupt by this.
> 
> I don't think this is true.  What evidence do you have to support such 
> claims?

What do you imagine happens when you pull the power in the middle of
writing a sector to a floppy disk (to pick a more easily imagined
example)?

There is not enough residual power to write the rest of the sector.
That sector's checksum will therefore be corrupt, and (hopefully) have
a CRC read error.  It can be written over again, wiping the CRC error.

No sector which wasn't being written will be corrupt: the write head
isn't activated over those.  The drive waits until it senses the start
of sector N, then activates the write head to write data bits.

The CRC error by itself my cause the whole sector to be reported as
corrupt with no data.  However, if you do manage to get back the bits
from the media, some bits of the sector being written whose values
were not intended to change may be different than expected.  This is
because the way data is recorded does not encode each bit separately,
but multiplexes them together for modulation, and also because bit
timing is not exact.

A modern hard disk uses much more complex data encoding, which further
adds to the effect of a truncated write corrupting even data bits not
intended to be changed, in the vicinity of those being changed.

But it should aim to provide the same basic guarantee that writing a
sector cannot corrupt neighbouring sectors on power failure, only the
one(s) being written.  This is because robustness of journalling
filesystems and databases do rather depend on this property, and
simple old-fashioned disks do provide it.

I am just speculating; I don't know whether modern hard disks provide
this property, or under what circumstances they fail.  But it seems
they could provide it, because they still have physically independent
sectors.

(Interestingly, the journal block size used by Oracle on different
OSes is different, suggesting the "basic unit of corruption"
varies between OSes and is not always a single sector).

Although it's just speculation, do you think modern hard disks behave
differently from this?

-- Jamie