linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Anton Mitterer <calestyo@scientia.net>
To: Hugo Mills <hugo@carfax.org.uk>
Cc: Henk Slager <eye1tm@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs
Date: Sun, 05 Jun 2016 23:31:57 +0200	[thread overview]
Message-ID: <1465162317.6702.53.camel@scientia.net> (raw)
In-Reply-To: <20160605210721.GH24492@carfax.org.uk>

[-- Attachment #1: Type: text/plain, Size: 3516 bytes --]

On Sun, 2016-06-05 at 21:07 +0000, Hugo Mills wrote:
>    The problem is that you can't guarantee consistency with
> nodatacow+checksums. If you have nodatacow, then data is overwritten,
> in place. If you do that, then you can't have a fully consistent
> checksum -- there are always race conditions between the checksum and
> the data being written (or the data and the checksum, depending on
> which way round you do it).

I'm not an expert in the btrfs internals... but I had a pretty long
discussion back then when I brought this up first, and everything that
came out of that - to my understanding - indicated, that it should be
simply possible.

a) nodatacow just means "no data cow", but not "no meta data cow".
   And isn't the checksumming data meda data? So AFAIU, this is itself
   anyway COWed.
b) What you refer to above is, AFAIU, that data may be written (not
   COWed) and there is of course no guarantee that the written data
   matches the checksum (which may e.g. still be the old sum).
   => So what?
      This anyway only happens in case of crash/etc. and in that case
      we anyway have no idea, whether the written not COWed block is
      consistent or not, whether we do checksumming or not.
      We rather get the benefit that we now know: it may be garbage
      The only "bad" thing that could happen was:
      the block is fully written and actually consistent, but the
      checksum hasn't been written yet - IMHO much less likely than
      the other case(s). And I rather get one false positive in an
      more unlikely case, than corrupted blocks in all other possible
      situations (silent block errors, etc. pp.)
      And in principle, nothing would prevent a future btrfs to get a
      journal for the nodatacow-ed writes.

Look for the past thread "dear developers, can we have notdatacow +
checksumming, plz?",... I think I wrote about much more cases there,
any why - even it may not be perfect as datacow+checksumming - it would
always still be better to have checksumming with nodatacow.

> > Wasn't it said, that autodefrag performs bad for anything larger
> > than
> > ~1G?
> 
>    I don't recall ever seeing someone saying that. Of course, I may
> have forgotten seeing it...
I think it was mentioned below this thread:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/50444/focus=50586
and also implied here:
http://article.gmane.org/gmane.comp.file-systems.btrfs/51399/match=autodefrag+large+files


> > Well the fragmentation has also many other consequences and not
> > just
> > seeks (assuming everyone would use SSDs, which is and probably
> > won't be
> > the case for quite a while).
> > Most obviously you get much more IOPS and btrfs itself will, AFAIU,
> > also suffer from some issues due to the fragmentation.
>    This is a fundamental problem with all CoW filesystems. There are
> some mititgations that can be put in place (true CoW rather than
> btrfs's redirect-on-write, like some databases do, where the original
> data is copied elsewhere before overwriting; cache aggressively and
> with knowledge of the CoW nature of the FS, like ZFS does), but they
> all have their drawbacks and pathological cases.
Sure... but defrag (if it would generally work) or notdatacow (if it
wouldn't make you loose the ability to determine whether you're
consistent or not) would be already quite helpful here.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

  reply	other threads:[~2016-06-05 21:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 22:25 raid5/6 production use status? Christoph Anton Mitterer
2016-06-02  9:24 ` Gerald Hopf
2016-06-02  9:35   ` Hugo Mills
2016-06-02 10:03     ` Gerald Hopf
2016-06-03 17:38   ` btrfs (was: raid5/6) production use status (and future)? Christoph Anton Mitterer
2016-06-03 19:50     ` btrfs Austin S Hemmelgarn
2016-06-04  1:51       ` btrfs Christoph Anton Mitterer
2016-06-04  7:24         ` btrfs Andrei Borzenkov
2016-06-04 17:00           ` btrfs Chris Murphy
2016-06-04 17:37             ` btrfs Christoph Anton Mitterer
2016-06-04 19:13               ` btrfs Chris Murphy
2016-06-04 22:43                 ` btrfs Christoph Anton Mitterer
2016-06-05 15:51                   ` btrfs Chris Murphy
2016-06-05 20:39                     ` btrfs Christoph Anton Mitterer
2016-06-04 21:18             ` btrfs Andrei Borzenkov
2016-06-05 20:39         ` btrfs Henk Slager
2016-06-05 20:56           ` btrfs Christoph Anton Mitterer
2016-06-05 21:07             ` btrfs Hugo Mills
2016-06-05 21:31               ` Christoph Anton Mitterer [this message]
2016-06-05 23:39                 ` btrfs Chris Murphy
2016-06-08  6:13                 ` btrfs Duncan
2016-06-06  0:56         ` btrfs Chris Murphy
2016-06-06 13:04         ` btrfs Austin S. Hemmelgarn
     [not found]     ` <f4a9ef2f-99a8-bcc4-5a8f-b022914980f0@swiftspirit.co.za>
2016-06-04  2:13       ` btrfs Christoph Anton Mitterer
2016-06-04  2:36         ` btrfs Chris Murphy
  -- strict thread matches above, loose matches on Subject: below --
2024-01-15 15:32 btrfs Turritopsis Dohrnii Teo En Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465162317.6702.53.camel@scientia.net \
    --to=calestyo@scientia.net \
    --cc=eye1tm@gmail.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).