From: Anthony Liguori <anthony@codemonkey.ws>
To: qemu-devel@nongnu.org
Cc: Blue Swirl <blauwirbel@gmail.com>,
Laurent Vivier <Laurent.Vivier@bull.net>,
Kevin Wolf <kwolf@suse.de>
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Tue, 20 May 2008 21:12:50 -0500 [thread overview]
Message-ID: <48338522.7030306@codemonkey.ws> (raw)
In-Reply-To: <20080521011915.GC595@shareable.org>
Jamie Lokier wrote:
> Anthony Liguori wrote:
>
>>> One property of disks is that if you overwrite a sector and the're
>>> power loss, when read later that sector might be corrupt. Even if the
>>> new data is the same as the old data with only some bytes changed,
>>> some of the _unchanged_ bytes may be corrupt by this.
>>>
>> I don't think this is true. What evidence do you have to support such
>> claims?
>>
>
> What do you imagine happens when you pull the power in the middle of
> writing a sector to a floppy disk (to pick a more easily imagined
> example)?
>
> There is not enough residual power to write the rest of the sector.
> That sector's checksum will therefore be corrupt, and (hopefully) have
> a CRC read error. It can be written over again, wiping the CRC error.
>
Why would the sector's checksum be corrupt? The checksum wouldn't
change after the data write.
> No sector which wasn't being written will be corrupt: the write head
> isn't activated over those. The drive waits until it senses the start
> of sector N, then activates the write head to write data bits.
>
> The CRC error by itself my cause the whole sector to be reported as
> corrupt with no data. However, if you do manage to get back the bits
> from the media, some bits of the sector being written whose values
> were not intended to change may be different than expected. This is
> because the way data is recorded does not encode each bit separately,
> but multiplexes them together for modulation, and also because bit
> timing is not exact.
>
> A modern hard disk uses much more complex data encoding, which further
> adds to the effect of a truncated write corrupting even data bits not
> intended to be changed, in the vicinity of those being changed.
>
> But it should aim to provide the same basic guarantee that writing a
> sector cannot corrupt neighbouring sectors on power failure, only the
> one(s) being written. This is because robustness of journalling
> filesystems and databases do rather depend on this property, and
> simple old-fashioned disks do provide it.
>
> I am just speculating; I don't know whether modern hard disks provide
> this property, or under what circumstances they fail. But it seems
> they could provide it, because they still have physically independent
> sectors.
>
> (Interestingly, the journal block size used by Oracle on different
> OSes is different, suggesting the "basic unit of corruption"
> varies between OSes and is not always a single sector).
>
> Although it's just speculation, do you think modern hard disks behave
> differently from this?
>
Modern *enterprise* hard disks have battery backed caches so read/write
operations always complete or fail. Low-end disks don't tend to have
battery backed caches but AFAIK, rewriting the same data will not result
in any sort of disk corruption.
Regards,
Anthony Liguori
> -- Jamie
>
>
>
next prev parent reply other threads:[~2008-05-21 2:13 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-20 11:32 [Qemu-devel] [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Laurent Vivier
2008-05-20 19:47 ` [Qemu-devel] " Anthony Liguori
2008-05-20 22:36 ` Jamie Lokier
2008-05-20 22:52 ` Paul Brook
2008-05-20 22:59 ` Laurent Vivier
2008-05-21 0:54 ` Paul Brook
2008-05-21 7:59 ` Laurent Vivier
2008-05-21 0:58 ` Anthony Liguori
2008-05-21 1:04 ` Jamie Lokier
2008-05-21 1:05 ` Anthony Liguori
2008-05-21 8:06 ` Kevin Wolf
2008-05-21 1:05 ` Paul Brook
2008-05-21 1:14 ` Anthony Liguori
2008-05-21 8:24 ` Kevin Wolf
2008-05-21 12:26 ` Jamie Lokier
2008-05-21 12:37 ` Avi Kivity
2008-05-21 13:41 ` Jamie Lokier
2008-05-21 13:55 ` Anthony Liguori
2008-05-21 14:17 ` Avi Kivity
2008-05-21 14:26 ` Anthony Liguori
2008-05-21 14:57 ` Avi Kivity
2008-05-21 15:34 ` Jamie Lokier
2008-05-21 16:02 ` Anthony Liguori
2008-05-21 16:24 ` Jamie Lokier
2008-05-21 16:48 ` Avi Kivity
2008-05-21 17:01 ` Andrea Arcangeli
2008-05-21 17:18 ` Avi Kivity
2008-05-21 17:47 ` Andrea Arcangeli
2008-05-21 17:53 ` Anthony Liguori
2008-05-21 18:08 ` Andrea Arcangeli
2008-05-21 18:25 ` Anthony Liguori
2008-05-21 20:13 ` Andrea Arcangeli
2008-05-21 20:35 ` Anthony Liguori
2008-05-21 20:42 ` Andrea Arcangeli
2008-05-21 18:29 ` Avi Kivity
2008-05-21 16:45 ` Avi Kivity
2008-05-21 16:44 ` Avi Kivity
2008-05-20 23:04 ` Laurent Vivier
2008-05-20 23:13 ` Jamie Lokier
2008-05-21 1:00 ` Anthony Liguori
2008-05-21 1:19 ` Jamie Lokier
2008-05-21 2:12 ` Anthony Liguori [this message]
2008-05-21 8:27 ` Andreas Färber
2008-05-21 14:06 ` Anthony Liguori
2008-05-21 15:31 ` Jamie Lokier
2008-05-21 11:43 ` Jamie Lokier
2008-05-23 9:12 ` Laurent Vivier
2008-05-28 7:01 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48338522.7030306@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=Laurent.Vivier@bull.net \
--cc=blauwirbel@gmail.com \
--cc=kwolf@suse.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.