Util-Linux package development
 help / color / mirror / Atom feed
From: Ronan CHAUVIN <ronan.chauvin@parrot.com>
To: Peter Cordes <peter@cordes.ca>, Karel Zak <kzak@redhat.com>
Cc: <util-linux@vger.kernel.org>,
	matthieu CASTET <matthieu.castet@parrot.com>,
	Alexandre Dilly <alexandre.dilly@parrot.com>
Subject: Re: [libfdisk]: gpt_write_disklabel function robustness to sudden power off
Date: Tue, 24 Mar 2015 15:05:36 +0100	[thread overview]
Message-ID: <55116F30.3080204@parrot.com> (raw)
In-Reply-To: <20150323183142.GU3933@cordes.ca>

Thank you for your answer.

On 03/23/2015 07:31 PM, Peter Cordes wrote:
> On Fri, Mar 20, 2015 at 12:18:12PM +0100, Karel Zak wrote:
>> Conclusion: be pessimistic and verify all you read from disk and be
>> optimistic when you write to the disk, and when when someone is talking
>> about write guaranty and run far away. That's all the story.
> The whole GPT is what, 16kiB or so?  On most storage, you could
> force data to persistent storage with a granularity of 4kiB, with
> fdatasync(2) (assuming that works on block devices, not just files).
The whole GPT is 16kiB (MBR+GPT header+partition array). There is two 
GPT systems, one at the beginning and another one at the end. The 
bootloader verifies the integrity of the header and the partition array 
with a CRC32.
>    write() everything, then fsync() so it all hits the disk in
>
> But some SSDs lie, and will claim that data is flushed to persistent
> storage when it isn't.  (According to one of Marc Merlin's BTRFS
> talks).
>
>   So I'd agree with Karel that the current method is probably
> ideal.  write() everything, then fsync() so it all hits the disk in
> one multi-sector write op.  Not necessarily atomic, but probably.
As the block will not be consecutive (primary and backup), the operation 
cannot be done in one write operation....
> If we think the backup partition table / GPT header is useful,
> write(backup); fsync();
> sleep(1sec);
> write(primary); fsync();
> is potentially worthwhile.  On an SSD, there's the mapping metadata
> separate from the actual data, and the write block size might be 8kiB
> on some current disks.  (This is why I'm thinking that the 1sec pause
> between writing the backup and primary would give a chance for
> whatever write-back caching layers to actually flush for real.)
>
>   I don't know how likely that is to help on any real storage setup;
> I'm really just making that up.  I also don't know whether the backup
> and primary are in separate 4kiB or 8kiB data blocks.  Even if not, it
> could still be useful to always be writing blocks where one of the two
> copies written matches what's already there, so there's a valid table
> whether the old or new version is there when you try to read it back.
>
> So I think there's potentially a tiny benefit to a fsync();sleep(),
> but I'd wait for confirmation from a storage expert before
> implementing it.  The current method probably just sends one write op
> to the hardware for the whole GPT, which is nice.
I agree that we should wait confirmation of a storage expert but the 
fsync() and sleep() combination should guaranty the operation order on 
most hardware.
>

Best regards,

-- 
Ronan CHAUVIN
Embedded Software Engineer
ASIC team
--------------------------------
Parrot
174, quai de Jemmapes
75010 Paris  France
--------------------------------
www.parrot.com

  reply	other threads:[~2015-03-24 14:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-20 10:17 [libfdisk]: gpt_write_disklabel function robustness to sudden power off Ronan CHAUVIN
2015-03-20 11:18 ` Karel Zak
2015-03-23 18:31   ` Peter Cordes
2015-03-24 14:05     ` Ronan CHAUVIN [this message]
2015-03-24 14:25       ` Peter Cordes
2015-03-26 13:07         ` Ronan CHAUVIN
2015-03-24  3:24 ` Dale R. Worley
2015-03-24 13:54   ` Ronan CHAUVIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55116F30.3080204@parrot.com \
    --to=ronan.chauvin@parrot.com \
    --cc=alexandre.dilly@parrot.com \
    --cc=kzak@redhat.com \
    --cc=matthieu.castet@parrot.com \
    --cc=peter@cordes.ca \
    --cc=util-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox