Re: Superblock update: Is there really any benefits of updating synchronously?

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: waxhead@dirtcellar.net, Nikolay Borisov <nborisov@suse.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Superblock update: Is there really any benefits of updating synchronously?
Date: Wed, 24 Jan 2018 01:04:42 +0100	[thread overview]
Message-ID: <7bf3c40d-30ef-1911-a1b8-49bf49fbdcd5@mendix.com> (raw)
In-Reply-To: <0358d7af-bbce-7e2f-0b00-51b2823f83cb@dirtcellar.net>

On 01/23/2018 08:51 PM, waxhead wrote:
> Nikolay Borisov wrote:
>> On 23.01.2018 16:20, Hans van Kranenburg wrote:

[...]

>>>
>>> We also had a discussion about the "backup roots" that are stored
>>> besides the superblock, and that they are "better than nothing" to help
>>> maybe recover something from a borken fs, but never ever guarantee you
>>> will get a working filesystem back.
>>>
>>> The same holds for superblocks from a previous generation. As soon as
>>> the transaction for generation X succesfully hits the disk, all space
>>> that was occupied in generation X-1 but no longer in X is available to
>>> be overwritten immediately.
>>>
> Ok so this means that superblocks with a older generation is utterly
> useless and will lead to corruption (effectively making my argument
> above useless as that would in fact assist corruption then).

Mostly, yes.

> Does this means that if disk space was allocated in X-1 and is freed in
> X it will unallocated if you roll back to X-1 e.g. writing to
> unallocated storage.

Can you reword that? I can't follow that sentence.

> I was under the impression that a superblock was like a "snapshot" of
> the entire filesystem and that rollbacks via pre-gen superblocks was
> possible. Am I mistaking?

Yes. The first fundamental thing in Btrfs is COW which makes sure that
everything referenced from transaction X, from the superblock all the
way down to metadata trees and actual data space is never overwritten by
changes done in transaction X+1.

For metadata trees that are NOT filesystem trees a.k.a. subvolumes, the
way this is done is actually quite simple. If a block is cowed, the old
location is added to a 'pinned extents' list (in memory), which is used
as a blacklist for choosing space to put new writes in. After a
transaction is completed on disk, that list with pinned extents is
emptied and all that space is available for immediate reuse. This way we
make sure that if the transaction that is ongoing is aborted, the
previous one (latest one that is completely on disk) is always still
there. If the computer crashes and the in memory list is lost, no big
deal, we just continue from the latest completed transaction again after
a reboot. (ignoring extra log things for simplicity)

So, the only situation in which you can fully use an X-1 superblock is
when none of that previously pinned space has actually been overwritten
yet afterwards.

And if any of the space was overwritten already, you can go play around
with using an older superblock and your filesystem mounts and everything
might look fine, until you hit that distant corner and BOOM!

---- >8 ---- Extra!! Moar!! ---- >8 ----

But, doing so does not give you snapshot functionality yet! It's more
like a poor mans snapshot that only can prevent from messing up the
current version.

Snapshot functionality is implemented only for filesystem trees
(subvolumes) by adding reference counting (which does end up on disk) to
the metadata blocks, and then COW trees as a whole.

If you make a snapshot of a filesystem tree, the snapshot gets a whole
new tree ID! It's not a previous version of the same subvolume you're
looking at, it's a clone!

This is a big difference. The extent tree is always tree 2. The chunk
tree is always tree 3. But your subvolume snapshot gets a new tree number.

Technically, it would maybe be possible to implement reference counting
and snapshots to all of the metadata trees, but it would probably mean
that the whole filesystem would get stuck in rewriting itself all day
instead of doing any useful work. The current extent tree already has
such amount of rumination problems that the added work of keeping track
of reference counts would make it completely unusable.

In the wiki, it's here:
https://btrfs.wiki.kernel.org/index.php/Btrfs_design#Copy_on_Write_Logging

Actually, I just paraphrased the first two of those six alineas... The
subvolume trees actually having a previous version of themselves again
(whaaaa!) is another thing... ;]

-- 
Hans van Kranenburg

next prev parent reply	other threads:[~2018-01-24  0:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-23  7:03 Superblock update: Is there really any benefits of updating synchronously? waxhead
2018-01-23  9:03 ` Nikolay Borisov
2018-01-23 14:20   ` Hans van Kranenburg
2018-01-23 14:48     ` Nikolay Borisov
2018-01-23 19:51       ` waxhead
2018-01-24  0:04         ` Hans van Kranenburg [this message]
2018-01-24 18:54           ` waxhead
2018-01-24 21:00             ` Hans van Kranenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bf3c40d-30ef-1911-a1b8-49bf49fbdcd5@mendix.com \
    --to=hans.van.kranenburg@mendix.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=waxhead@dirtcellar.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox