From: Goffredo Baroncelli <kreijack@inwind.it>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
Chris Murphy <lists@colorremedies.com>
Cc: Mackenzie Meyer <snackmasterx@gmail.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
Date: Thu, 11 Feb 2016 15:14:12 +0100 [thread overview]
Message-ID: <56BC9734.403@inwind.it> (raw)
In-Reply-To: <56BB9698.5020203@gmail.com>
On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
[...]
> Again, a torn write to the metadata referencing the block (stripe in
> this case I believe) will result in loosing anything written by the
> update to the stripe.
I think that the order matters: first the data block are written (in a new location, so the old data are untouched), then the metadata, from the leafs up to the upper node (again in a new location), then the superblock which references to the upper node of the tree(s).
If you interrupt the writes in any time, the filesystem can survive because the old superblock-metadata-tree and data-block are still valid until the last pieces (the new superblock) is written.
And if this last step fails, the checksum shows that the super-block is invalid and the old one is taken in consideration.
> There is no way that _any_ system can avoid
> this issue without having the ability to truly atomically write out
> the entire metadata tree after the block (stripe) update.
It is not needed to atomically write the (meta)data in a COW filesystem, because the new data don't owerwrite the old one. The only thing that is needed is that before the last piece is written all the previous (mata)data are already written.
For not COW filesystem a journal is required to avoid this kind of problem.
> Doing so
> would require a degree of tight hardware level integration that's
> functionally impossible for any general purpose system (in essence,
> the filesystem would have to be implemented in the hardware, not
> software).
To solve the raid-write-hole problem, a checksum system (of data and metadata) is sufficient. However to protect with checksum the data, it seems that a COW filesystem is required.
The only critical thing, is that the hardware has to not lie about the fact that the data reached the platter. Most of the problem reported in the ML are related to external disk used in USB enclousure, which most of the time lie about this aspect.
GB
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2016-02-11 14:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
2016-02-06 8:43 ` Duncan
2016-02-09 14:07 ` Psalle
2016-02-09 20:39 ` Chris Murphy
2016-02-10 13:57 ` Austin S. Hemmelgarn
2016-02-10 19:06 ` Chris Murphy
2016-02-10 19:59 ` Austin S. Hemmelgarn
2016-02-11 14:14 ` Goffredo Baroncelli [this message]
2016-02-11 14:58 ` Austin S. Hemmelgarn
2016-02-11 17:29 ` Chris Murphy
2016-02-10 10:16 ` Psalle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56BC9734.403@inwind.it \
--to=kreijack@inwind.it \
--cc=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=snackmasterx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).