From: Ronan CHAUVIN <ronan.chauvin@parrot.com>
To: Peter Cordes <peter@cordes.ca>
Cc: Karel Zak <kzak@redhat.com>, <util-linux@vger.kernel.org>,
matthieu CASTET <matthieu.castet@parrot.com>,
Alexandre Dilly <alexandre.dilly@parrot.com>
Subject: Re: [libfdisk]: gpt_write_disklabel function robustness to sudden power off
Date: Thu, 26 Mar 2015 14:07:41 +0100 [thread overview]
Message-ID: <5514049D.3060301@parrot.com> (raw)
In-Reply-To: <20150324142515.GW3933@cordes.ca>
On 03/24/2015 03:25 PM, Peter Cordes wrote:
> On Tue, Mar 24, 2015 at 03:05:36PM +0100, Ronan CHAUVIN wrote:
>> On 03/23/2015 07:31 PM, Peter Cordes wrote:
>>> On Fri, Mar 20, 2015 at 12:18:12PM +0100, Karel Zak wrote:
>>>> Conclusion: be pessimistic and verify all you read from disk and be
>>>> optimistic when you write to the disk, and when when someone is talking
>>>> about write guaranty and run far away. That's all the story.
>>> The whole GPT is what, 16kiB or so? On most storage, you could
>>> force data to persistent storage with a granularity of 4kiB, with
>>> fdatasync(2) (assuming that works on block devices, not just files).
>> The whole GPT is 16kiB (MBR+GPT header+partition array). There is two
>> GPT systems, one at the beginning and another one at the end. The
>> bootloader verifies the integrity of the header and the partition array
>> with a CRC32.
>>> write() everything, then fsync() so it all hits the disk in
>>>
>>> So I'd agree with Karel that the current method is probably
>>> ideal. write() everything, then fsync() so it all hits the disk in
>>> one multi-sector write op. Not necessarily atomic, but probably.
>> As the block will not be consecutive (primary and backup), the operation
>> cannot be done in one write operation....
> So at least one of the four 4kiB sectors doesn't get written at all?
> Because if all the sectors are getting written, regardless of order,
> Linux will merge the IOs into one write request to send over the SATA
> (or whatever) wire. Write request merging is useful even on SSDs, so
> Linux does it.
>
> Even if there is a sector that doesn't get written, it's probably
> still academic. Sending a request in a single write OP doesn't make
> it atomic. On a magnetic disk, the data will still probably all
> hit the platter on the same rotation, just by powering down the write
> head as it flies over the sector you aren't writing, so the window for
> a power failure to cause a problem is quite small. I'm sure SSDs are
> far more complicated.
The guaranty of the write OP clearly depends of the hardware... The
primary/backup mechanism and CRC checks are implemented to detect these
hardware failures.
>> I agree that we should wait confirmation of a storage expert but the
>> fsync() and sleep() combination should guaranty the operation order on
>> most hardware.
> Probably 1/10th of a second is long enough, but still short enough to
> not be annoying. If you're editting the partition table of a disk
> that isn't idle (in which case even 1 sec might not be long enough for
> the write to hit disk after fdatasync()), and you don't have the
> system on a UPS, I think we maybe don't need to waste 0.9 seconds of
> everyone's time just for this hypothetical user.
>
>
I agree that we don't need to waste 1 second of everyone's time.
Nevertheless, only a fsync() between the write operation of the backup
and primary GTP systems will give more chances that data are directly
written to the disk (the disk cache will be flushed).
--
Ronan CHAUVIN
Embedded Software Engineer
ASIC team
--------------------------------
Parrot
174, quai de Jemmapes
75010 Paris France
--------------------------------
www.parrot.com
next prev parent reply other threads:[~2015-03-26 13:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-20 10:17 [libfdisk]: gpt_write_disklabel function robustness to sudden power off Ronan CHAUVIN
2015-03-20 11:18 ` Karel Zak
2015-03-23 18:31 ` Peter Cordes
2015-03-24 14:05 ` Ronan CHAUVIN
2015-03-24 14:25 ` Peter Cordes
2015-03-26 13:07 ` Ronan CHAUVIN [this message]
2015-03-24 3:24 ` Dale R. Worley
2015-03-24 13:54 ` Ronan CHAUVIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5514049D.3060301@parrot.com \
--to=ronan.chauvin@parrot.com \
--cc=alexandre.dilly@parrot.com \
--cc=kzak@redhat.com \
--cc=matthieu.castet@parrot.com \
--cc=peter@cordes.ca \
--cc=util-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.