linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert White <rwhite@pobox.com>
To: Tomasz Chmielewski <tch@virtall.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
Date: Sun, 14 Dec 2014 00:45:13 -0800	[thread overview]
Message-ID: <548D4E19.4040407@pobox.com> (raw)
In-Reply-To: <548CD23E.8080702@pobox.com>

On 12/13/2014 03:56 PM, Robert White wrote:
> ...

Dangit... On re-reading I think I was still less than optimally clear. I 
kept using the word "resent" when I should have been using a word like 
"re-written" or "re-stored" (as opposed to "restored"). On re-reading 
I'm not sure what the least confusing word would be.

So here is a contrived example with seriously simplified assumptions):

Lets say every day rsync coincidentally sends 1Gib and the receiving 
filesystem is otherwise almost quiescent. So as a side effect the 
receiving filesystem monotonically creates one 1GiB data extent. A 
snapshot is taken every day after the rsync. (This is all to just to 
make the mental picture easier.)

Lets say there is a file Aardvark that just happens to be the first file 
considered every time and also happens to grow by exactly 1MiB in pure 
append each day and started out at 1MiB. After ten days Aardvark is 
stored across the ten extents. After 100 days it is store across 100 
extents. Each successive 1MiB is exactly 1023MiB away from its 
predecessor and successor.

Now consider file Badger, the second file. It is 100MiB in size. It is 
also modified each day such that five percent of its total bytes are 
rewritten as exactly five records of exactly 1MiB aligned on 1MiB 
boundaries, all on convenient rsync boundaries. On the first day a 
100MiB chunk lands square in the first data Extent right next to 
Aardvark. On the second and every successive day 5MiB lands next to 
Aardvark in the next extent. But the 5MiB is not a contiguous, they are 
1MiB holes punched in a completely fair distribution across all the 
active fragments of Badger wherever they lie.

A linear read of Aardvark gets monotonically worse with each rsync. A 
linear read of Badger decays towards being 100 head seeks for every 
linear read.

Now how does rsync work? It does a linear read of each file. All of 
Aardvark, then all of Badger (etc) to create the progressive checksum 
stream that it uses to determine if a block needs to transmitted or not.

Now if we start "aging off" (and deleting) snapshots, we start realizing 
the holes in the oldest copies of Badger. There is a very high 
probability that the next chunk of Aardvark is going to end up somewhere 
in Badger-of-day-one. Worse still, some parts of Badger are going to end 
up in Badger-of-day-one but nicely out of order.

At this point the model starts to get too complex for my understanding 
(I don't know how BTRFS selects which data extent to put any one chunk 
of data in relative to the rest of the file contents of whether it tries 
to fill the fullest chunk, the least-full chunk, or if it does some 
other best-fit for this case, so I have to stop that half of the example 
there.)

Additionally: After (N*log(N))^2 days (where I think N is 5) [because of 
fair randomness] {so just shy of two months?} there is a high 
probability that no _current_ part of Badger is still mapped to data 
extent 1. But it is still impossible for snapshot removal to result in a 
reclaim of data extent 1... Aardvark's first block is there forever.

Now compare this to doing the copy.

A linear write of a file is supposed to be (if I understand what I'm 
reading here) laid out as closely-as-possible as a linear extent on the 
disk. Not guaranteed, but its a goal.  This would be "more true" if the 
application doing the writing called fallocate(). [I don't know if rsync 
does fallocate(), I'm just saying.]

So now on day one, Aardvark is one 1MiB chunk in Data extent 1, followed 
by all of Badger.

On day two Aardvark is one 2MiB chunk in Data extent 2, followed by all 
of Badger.

(magical adjustments take place in the source data stream so that we are 
still, by incredible coincidence, using up exactly one extent every day. 
[it's like one of those physics problems where we get to ignore 
friction. 8-)])

On every rsync pass, both the growing Aardvark and the active working 
set of Badger are available as linear reads while making the rolling 
checksums.

If Aardvark and/or Badger need to be used for any purpose from one or 
more of the snapshots, they will also benefit from locality and linear 
read optimization.

When we get around to deleting the first snapshot all of the active 
parts of Aardvark and Badger are long gone. (and since this is magical 
fairly land, data extent one is reclaimed!).

---

How realistic is this? Well clearly magical fairies were involved in the 
making of this play. But the role of Badger will be played by a database 
tablespace and his friend Aardvark will be played by the associated 
update journal. Meaning that both of those two file behaviors are 
real-world examples (not withstanding the cartoonish monotonic update 
profile).

And _clearly_ once you start deleting older snapshots the orderly 
picture would fall apart piecewise.

Then again, according to grep, my /usr/bin/rsync contains the string 
"fallocate". Not a guarantee it's being used, but a strong indicator. 
Any use of fallocate tends to imply that later defrag would not change 
efficiency, so there's another task you wouldn't need to undertake.

So it's a classic trade-off of efficiencies of space vs order.

Once you achieve your dynamic balance, with whole copies of things 
tending to find their homes your _overall_ performance should become 
more stable over time as it bounces back and forth about a mean 
performance for the first few cycles (a cycle being completed when the 
snapshot is deleted).

Right now you are reporting that it was becoming less stable over time.

So there is your deal right there.

  reply	other threads:[~2014-12-14  8:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-02  7:27 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Tomasz Chmielewski
2014-10-03 18:17 ` Josef Bacik
2014-10-03 22:06   ` Tomasz Chmielewski
2014-10-03 22:09     ` Josef Bacik
2014-10-04 21:47       ` Tomasz Chmielewski
2014-10-04 22:07         ` Josef Bacik
2014-11-25 22:33     ` Tomasz Chmielewski
2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
2014-12-12 21:36         ` Robert White
2014-12-12 21:46           ` Tomasz Chmielewski
2014-12-12 22:34             ` Robert White
2014-12-12 22:46               ` Tomasz Chmielewski
2014-12-12 22:58                 ` Robert White
2014-12-13  8:16                   ` Tomasz Chmielewski
2014-12-13  9:39                     ` Robert White
2014-12-13 13:53                       ` Tomasz Chmielewski
2014-12-13 20:54                         ` Robert White
2014-12-13 21:52                           ` Tomasz Chmielewski
2014-12-13 23:56                             ` Robert White
2014-12-14  8:45                               ` Robert White [this message]
2014-12-15 20:07         ` Josef Bacik
2014-12-15 23:27           ` Tomasz Chmielewski
2014-12-19 21:47         ` Josef Bacik
2014-12-19 23:18           ` Tomasz Chmielewski
2014-10-13 15:15 ` 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Rich Freeman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548D4E19.4040407@pobox.com \
    --to=rwhite@pobox.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tch@virtall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).