All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ronan CHAUVIN <ronan.chauvin@parrot.com>
To: Peter Cordes <peter@cordes.ca>, Karel Zak <kzak@redhat.com>
Cc: <util-linux@vger.kernel.org>,
	matthieu CASTET <matthieu.castet@parrot.com>,
	Alexandre Dilly <alexandre.dilly@parrot.com>
Subject: Re: [libfdisk]: gpt_write_disklabel function robustness to sudden power off
Date: Tue, 24 Mar 2015 15:05:36 +0100	[thread overview]
Message-ID: <55116F30.3080204@parrot.com> (raw)
In-Reply-To: <20150323183142.GU3933@cordes.ca>

Thank you for your answer.

On 03/23/2015 07:31 PM, Peter Cordes wrote:
> On Fri, Mar 20, 2015 at 12:18:12PM +0100, Karel Zak wrote:
>> Conclusion: be pessimistic and verify all you read from disk and be
>> optimistic when you write to the disk, and when when someone is talking
>> about write guaranty and run far away. That's all the story.
> The whole GPT is what, 16kiB or so?  On most storage, you could
> force data to persistent storage with a granularity of 4kiB, with
> fdatasync(2) (assuming that works on block devices, not just files).
The whole GPT is 16kiB (MBR+GPT header+partition array). There is two 
GPT systems, one at the beginning and another one at the end. The 
bootloader verifies the integrity of the header and the partition array 
with a CRC32.
>    write() everything, then fsync() so it all hits the disk in
>
> But some SSDs lie, and will claim that data is flushed to persistent
> storage when it isn't.  (According to one of Marc Merlin's BTRFS
> talks).
>
>   So I'd agree with Karel that the current method is probably
> ideal.  write() everything, then fsync() so it all hits the disk in
> one multi-sector write op.  Not necessarily atomic, but probably.
As the block will not be consecutive (primary and backup), the operation 
cannot be done in one write operation....
> If we think the backup partition table / GPT header is useful,
> write(backup); fsync();
> sleep(1sec);
> write(primary); fsync();
> is potentially worthwhile.  On an SSD, there's the mapping metadata
> separate from the actual data, and the write block size might be 8kiB
> on some current disks.  (This is why I'm thinking that the 1sec pause
> between writing the backup and primary would give a chance for
> whatever write-back caching layers to actually flush for real.)
>
>   I don't know how likely that is to help on any real storage setup;
> I'm really just making that up.  I also don't know whether the backup
> and primary are in separate 4kiB or 8kiB data blocks.  Even if not, it
> could still be useful to always be writing blocks where one of the two
> copies written matches what's already there, so there's a valid table
> whether the old or new version is there when you try to read it back.
>
> So I think there's potentially a tiny benefit to a fsync();sleep(),
> but I'd wait for confirmation from a storage expert before
> implementing it.  The current method probably just sends one write op
> to the hardware for the whole GPT, which is nice.
I agree that we should wait confirmation of a storage expert but the 
fsync() and sleep() combination should guaranty the operation order on 
most hardware.
>

Best regards,

-- 
Ronan CHAUVIN
Embedded Software Engineer
ASIC team
--------------------------------
Parrot
174, quai de Jemmapes
75010 Paris  France
--------------------------------
www.parrot.com

  reply	other threads:[~2015-03-24 14:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-20 10:17 [libfdisk]: gpt_write_disklabel function robustness to sudden power off Ronan CHAUVIN
2015-03-20 11:18 ` Karel Zak
2015-03-23 18:31   ` Peter Cordes
2015-03-24 14:05     ` Ronan CHAUVIN [this message]
2015-03-24 14:25       ` Peter Cordes
2015-03-26 13:07         ` Ronan CHAUVIN
2015-03-24  3:24 ` Dale R. Worley
2015-03-24 13:54   ` Ronan CHAUVIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55116F30.3080204@parrot.com \
    --to=ronan.chauvin@parrot.com \
    --cc=alexandre.dilly@parrot.com \
    --cc=kzak@redhat.com \
    --cc=matthieu.castet@parrot.com \
    --cc=peter@cordes.ca \
    --cc=util-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.