Re: The FAQ on fsync/O_SYNC - Martin Steigerwald

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Steigerwald <martin@lichtvoll.de>
To: Hugo Mills <hugo@carfax.org.uk>
Cc: Craig Ringer <craig@2ndquadrant.com>, linux-btrfs@vger.kernel.org
Subject: Re: The FAQ on fsync/O_SYNC
Date: Sun, 19 Apr 2015 19:50:32 +0200	[thread overview]
Message-ID: <2245776.oZdMfpEQBm@merkaba> (raw)
In-Reply-To: <20150419151851.GA18187@carfax.org.uk>

[-- Attachment #1: Type: text/plain, Size: 3504 bytes --]

Am Sonntag, 19. April 2015, 15:18:51 schrieb Hugo Mills:
> On Sun, Apr 19, 2015 at 05:10:30PM +0200, Martin Steigerwald wrote:
> > Am Sonntag, 19. April 2015, 22:31:02 schrieb Craig Ringer:
> > > On 19 April 2015 at 22:28, Martin Steigerwald <martin@lichtvoll.de>
> > 
> > wrote:
> > > > Am Sonntag, 19. April 2015, 21:20:11 schrieb Craig Ringer:
> > > >> Hi all
> > > > 
> > > > Hi Craig,
> > > > 
> > > >> I'm looking into the advisability of running PostgreSQL on BTRFS,
> > > >> and
> > > >> after looking at the FAQ there's something I'm hoping you could
> > > >> clarify.
> > > >> 
> > > >> The wiki FAQ says:
> > > >> 
> > > >> "Btrfs does not force all dirty data to disk on every fsync or
> > > >> O_SYNC
> > > >> operation, fsync is designed to be fast."
> > > >> 
> > > >> Is that wording intended narrowly, to contrast with ext3's nasty
> > > >> habit
> > > >> of flushing *all* dirty blocks for the entire file system
> > > >> whenever
> > > >> anyone calls fsync() ? Or is it intended broadly, to say that
> > > >> btrfs's
> > > >> fsync won't necessarily flush all data blocks (just metadata) ?
> > > >> 
> > > >> Is that statement still true in recent BTRFS versions (3.18,
> > > >> etc)?
> > > > 
> > > > I don´t know, thus leave that for others to answer. I always
> > > > assumed a
> > > > strong fsync() guarentee as in "its on disk" with BTRFS. So I am
> > > > interested in that as well.
> > > > 
> > > > But for databases, did you consider the copy on write
> > > > fragmentation
> > > > BTRFS will give? Even with autodefrag, afaik it is not recommended
> > > > to
> > > > use it for large databases on rotating media at least.
> > > 
> > > I did, and any testing would need to look at the efficacy of the
> > > chattr +C option on the database directory tree.
> > > 
> > > PostgreSQL is its self copy-on-write (because of multi-version
> > > concurrency control), so it doesn't make much sense to have the FS
> > > doing another layer of COW.
> > > 
> > > I'm curious as to whether +C has any effect on BTRFS's durability,
> > > too.
> > 
> > You will loose the ability to snapshot that directory tree then.
> 
>    No you won't.
> 
>    The +C attribute still allows snapshotting and reflink copies.
> However, after the snapshot, writes to either copy will result in that
> copy being CoWed. (Specifically, writes to an extent of a +C file with
> more than one reference to the extent will result in a CoW operation,
> until there is only one reference, and then the writes will not be
> CoWed again).
> 
>    The practical upshot of this is that every snapshot of, and
> subsequent writes to, a +C file will introduce fragmentation in the
> same way that writes to a non-+C file would.
> 
>    You also have a disadvantage with +C that you lose the checksumming
> features of the FS, and hence the self-healing properties if you're
> running with btrfs-native RAID.

Thanks for clarifying this Hugo, so chattr +C will make the directory 
cowed again.

And there is not checksumming on the FS at all anymore. Why is the later? 
Why can´t BTRFS checkum nocowed objects or at least the cowed ones in the 
same FS? Cause of atomicity guarentees?

If this has been answered before, and I missed it, feel free to point me 
to it, I didn´t find anything obvious with my quick search.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

next prev parent reply	other threads:[~2015-04-19 17:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-19 13:20 The FAQ on fsync/O_SYNC Craig Ringer
2015-04-19 14:28 ` Martin Steigerwald
2015-04-19 14:31   ` Craig Ringer
2015-04-19 15:10     ` Martin Steigerwald
2015-04-19 15:18       ` Hugo Mills
2015-04-19 17:50         ` Martin Steigerwald [this message]
2015-04-19 18:18           ` Hugo Mills
2015-04-19 18:41             ` Martin Steigerwald
2015-04-19 18:51               ` Hugo Mills
2015-04-19 15:28     ` Russell Coker
2015-04-20  4:27     ` Zygo Blaxell
2015-04-20  6:07       ` Duncan
2015-04-21  1:31         ` Zygo Blaxell
2015-04-20  8:13       ` Gian-Carlo Pascutto
2015-04-20 15:19         ` Zygo Blaxell
2015-04-21 19:07       ` Chris Murphy
2015-04-20  3:29 ` Craig Ringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2245776.oZdMfpEQBm@merkaba \
    --to=martin@lichtvoll.de \
    --cc=craig@2ndquadrant.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.