public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Phillip Susi <phill@thesusis.net>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Wed, 16 Mar 2022 14:46:33 -0400	[thread overview]
Message-ID: <877d8twwrn.fsf@vps.thesusis.net> (raw)
In-Reply-To: <YjD/7zhERFjcY5ZP@hungrycats.org>


Zygo Blaxell <ce3g8jdj@umail.furryterror.org> writes:

> If the extent is compressed, you have to write a new extent, because
> there's no other way to atomically update a compressed extent.

Right, that makes sense for compression.

> If it's reflinked or snapshotted, you can't overwrite the data in place
> as long as a second reference to the data exists.  This is what makes
> nodatacow and prealloc slow--on every write, they have to check whether
> the blocks being written are shared or not, and that check is expensive
> because it's a linear search of every reference for overlapping block
> ranges, and it can't exit the search early until it has proven there
> are no shared references.  Contrast with datacow, which allocates a new
> unshared extent that it knows it can write to, and only has to check
> overwritten extents when they are completely overwritten (and only has
> to check for the existence of one reference, not enumerate them all).

Right, I know you can't overwrite the data in place.  What I'm not
understanding is why you can't just just write the new data elsewhere
and then free the no longer used portion of the old extent.

> When a file refers to an extent, it refers to the entire extent from the
> file's subvol tree, even if only a single byte of the extent is contained
> in the file.  There's no mechanism in btrfs extent tree v1 for atomically
> replacing an extent with separately referenceable objects, and updating
> all the pointers to parts of the old object to point to the new one.
> Any such update could cascade into updates across all reflinks and
> snapshots of the extent, so the write multiplier can be arbitrarily large.

So the inode in the subvol tree points to an extent in the extent tree,
and then the extent points to the space on disk?  And only one extent in
the extent tree can ever point to a given location on disk?

In other words, if file B is a reflink copy of file A, and you update
one page in file B, it can't just create 3 new extents in the extent
tree: one that refers to the firt part of the original extent, one that
refers to the last part of the original extent, and one for the new
location of the new data?  Instead file B refers to the original extent,
and to one new extent, in such a way that the second superceeds part of
the first only for file B?


  parent reply	other threads:[~2022-03-16 18:57 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 15:59 Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Jan Ziak
2022-03-07  0:48 ` Qu Wenruo
2022-03-07  2:23   ` Jan Ziak
2022-03-07  2:39     ` Qu Wenruo
2022-03-07  7:31       ` Qu Wenruo
2022-03-10  1:10         ` Jan Ziak
2022-03-10  1:26           ` Qu Wenruo
2022-03-10  4:33             ` Jan Ziak
2022-03-10  6:42               ` Qu Wenruo
2022-03-10 21:31                 ` Jan Ziak
2022-03-10 23:27                   ` Qu Wenruo
2022-03-11  2:42                     ` Jan Ziak
2022-03-11  2:59                       ` Qu Wenruo
2022-03-11  5:04                         ` Jan Ziak
2022-03-11 16:31                           ` Jan Ziak
2022-03-11 20:02                             ` Jan Ziak
2022-03-11 23:04                             ` Qu Wenruo
2022-03-11 23:28                               ` Jan Ziak
2022-03-11 23:39                                 ` Qu Wenruo
2022-03-12  0:01                                   ` Jan Ziak
2022-03-12  0:15                                     ` Qu Wenruo
2022-03-12  3:16                                     ` Zygo Blaxell
2022-03-12  2:43                                 ` Zygo Blaxell
2022-03-12  3:24                                   ` Qu Wenruo
2022-03-12  3:48                                     ` Zygo Blaxell
2022-03-14 20:09                         ` Phillip Susi
2022-03-14 22:59                           ` Zygo Blaxell
2022-03-15 18:28                             ` Phillip Susi
2022-03-15 19:28                               ` Jan Ziak
2022-03-15 21:06                               ` Zygo Blaxell
2022-03-15 22:20                                 ` Jan Ziak
2022-03-16 17:02                                   ` Zygo Blaxell
2022-03-16 17:48                                     ` Jan Ziak
2022-03-17  2:11                                       ` Zygo Blaxell
2022-03-16 18:46                                 ` Phillip Susi [this message]
2022-03-16 19:59                                   ` Zygo Blaxell
2022-03-20 17:50                             ` Forza
2022-03-20 21:15                               ` Zygo Blaxell
2022-03-08 21:57       ` Jan Ziak
2022-03-08 23:40         ` Qu Wenruo
2022-03-09 22:22           ` Jan Ziak
2022-03-09 22:44             ` Qu Wenruo
2022-03-09 22:55               ` Jan Ziak
2022-03-09 23:00                 ` Jan Ziak
2022-03-09  4:48         ` Zygo Blaxell
2022-03-07 14:30 ` Phillip Susi
2022-03-08 21:43   ` Jan Ziak
2022-03-09 18:46     ` Phillip Susi
2022-03-09 21:35       ` Jan Ziak
2022-03-14 20:02         ` Phillip Susi
2022-03-14 21:53           ` Jan Ziak
2022-03-14 22:24             ` Remi Gauvin
2022-03-14 22:51               ` Zygo Blaxell
2022-03-14 23:07                 ` Remi Gauvin
2022-03-14 23:39                   ` Zygo Blaxell
2022-03-15 14:14                     ` Remi Gauvin
2022-03-15 18:51                       ` Zygo Blaxell
2022-03-15 19:22                         ` Remi Gauvin
2022-03-15 21:08                           ` Zygo Blaxell
2022-03-15 18:15             ` Phillip Susi
2022-03-16 16:52           ` Andrei Borzenkov
2022-03-16 18:28             ` Jan Ziak
2022-03-16 18:31             ` Phillip Susi
2022-03-16 18:43               ` Andrei Borzenkov
2022-03-16 18:46               ` Jan Ziak
2022-03-16 19:04               ` Zygo Blaxell
2022-03-17 20:34                 ` Phillip Susi
2022-03-17 22:06                   ` Zygo Blaxell
2022-03-16 12:47 ` Kai Krakow
2022-03-16 18:18   ` Jan Ziak
  -- strict thread matches above, loose matches on Subject: below --
2022-06-17  0:20 Jan Ziak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877d8twwrn.fsf@vps.thesusis.net \
    --to=phill@thesusis.net \
    --cc=0xe2.0x9a.0x9b@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox