From: Robert White <rwhite@pobox.com>
To: Tomasz Chmielewski <tch@virtall.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
Date: Sun, 14 Dec 2014 00:45:13 -0800 [thread overview]
Message-ID: <548D4E19.4040407@pobox.com> (raw)
In-Reply-To: <548CD23E.8080702@pobox.com>
On 12/13/2014 03:56 PM, Robert White wrote:
> ...
Dangit... On re-reading I think I was still less than optimally clear. I
kept using the word "resent" when I should have been using a word like
"re-written" or "re-stored" (as opposed to "restored"). On re-reading
I'm not sure what the least confusing word would be.
So here is a contrived example with seriously simplified assumptions):
Lets say every day rsync coincidentally sends 1Gib and the receiving
filesystem is otherwise almost quiescent. So as a side effect the
receiving filesystem monotonically creates one 1GiB data extent. A
snapshot is taken every day after the rsync. (This is all to just to
make the mental picture easier.)
Lets say there is a file Aardvark that just happens to be the first file
considered every time and also happens to grow by exactly 1MiB in pure
append each day and started out at 1MiB. After ten days Aardvark is
stored across the ten extents. After 100 days it is store across 100
extents. Each successive 1MiB is exactly 1023MiB away from its
predecessor and successor.
Now consider file Badger, the second file. It is 100MiB in size. It is
also modified each day such that five percent of its total bytes are
rewritten as exactly five records of exactly 1MiB aligned on 1MiB
boundaries, all on convenient rsync boundaries. On the first day a
100MiB chunk lands square in the first data Extent right next to
Aardvark. On the second and every successive day 5MiB lands next to
Aardvark in the next extent. But the 5MiB is not a contiguous, they are
1MiB holes punched in a completely fair distribution across all the
active fragments of Badger wherever they lie.
A linear read of Aardvark gets monotonically worse with each rsync. A
linear read of Badger decays towards being 100 head seeks for every
linear read.
Now how does rsync work? It does a linear read of each file. All of
Aardvark, then all of Badger (etc) to create the progressive checksum
stream that it uses to determine if a block needs to transmitted or not.
Now if we start "aging off" (and deleting) snapshots, we start realizing
the holes in the oldest copies of Badger. There is a very high
probability that the next chunk of Aardvark is going to end up somewhere
in Badger-of-day-one. Worse still, some parts of Badger are going to end
up in Badger-of-day-one but nicely out of order.
At this point the model starts to get too complex for my understanding
(I don't know how BTRFS selects which data extent to put any one chunk
of data in relative to the rest of the file contents of whether it tries
to fill the fullest chunk, the least-full chunk, or if it does some
other best-fit for this case, so I have to stop that half of the example
there.)
Additionally: After (N*log(N))^2 days (where I think N is 5) [because of
fair randomness] {so just shy of two months?} there is a high
probability that no _current_ part of Badger is still mapped to data
extent 1. But it is still impossible for snapshot removal to result in a
reclaim of data extent 1... Aardvark's first block is there forever.
Now compare this to doing the copy.
A linear write of a file is supposed to be (if I understand what I'm
reading here) laid out as closely-as-possible as a linear extent on the
disk. Not guaranteed, but its a goal. This would be "more true" if the
application doing the writing called fallocate(). [I don't know if rsync
does fallocate(), I'm just saying.]
So now on day one, Aardvark is one 1MiB chunk in Data extent 1, followed
by all of Badger.
On day two Aardvark is one 2MiB chunk in Data extent 2, followed by all
of Badger.
(magical adjustments take place in the source data stream so that we are
still, by incredible coincidence, using up exactly one extent every day.
[it's like one of those physics problems where we get to ignore
friction. 8-)])
On every rsync pass, both the growing Aardvark and the active working
set of Badger are available as linear reads while making the rolling
checksums.
If Aardvark and/or Badger need to be used for any purpose from one or
more of the snapshots, they will also benefit from locality and linear
read optimization.
When we get around to deleting the first snapshot all of the active
parts of Aardvark and Badger are long gone. (and since this is magical
fairly land, data extent one is reclaimed!).
---
How realistic is this? Well clearly magical fairies were involved in the
making of this play. But the role of Badger will be played by a database
tablespace and his friend Aardvark will be played by the associated
update journal. Meaning that both of those two file behaviors are
real-world examples (not withstanding the cartoonish monotonic update
profile).
And _clearly_ once you start deleting older snapshots the orderly
picture would fall apart piecewise.
Then again, according to grep, my /usr/bin/rsync contains the string
"fallocate". Not a guarantee it's being used, but a strong indicator.
Any use of fallocate tends to imply that later defrag would not change
efficiency, so there's another task you wouldn't need to undertake.
So it's a classic trade-off of efficiencies of space vs order.
Once you achieve your dynamic balance, with whole copies of things
tending to find their homes your _overall_ performance should become
more stable over time as it bounces back and forth about a mean
performance for the first few cycles (a cycle being completed when the
snapshot is deleted).
Right now you are reporting that it was becoming less stable over time.
So there is your deal right there.
next prev parent reply other threads:[~2014-12-14 8:45 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-02 7:27 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Tomasz Chmielewski
2014-10-03 18:17 ` Josef Bacik
2014-10-03 22:06 ` Tomasz Chmielewski
2014-10-03 22:09 ` Josef Bacik
2014-10-04 21:47 ` Tomasz Chmielewski
2014-10-04 22:07 ` Josef Bacik
2014-11-25 22:33 ` Tomasz Chmielewski
2014-12-12 14:37 ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
2014-12-12 21:36 ` Robert White
2014-12-12 21:46 ` Tomasz Chmielewski
2014-12-12 22:34 ` Robert White
2014-12-12 22:46 ` Tomasz Chmielewski
2014-12-12 22:58 ` Robert White
2014-12-13 8:16 ` Tomasz Chmielewski
2014-12-13 9:39 ` Robert White
2014-12-13 13:53 ` Tomasz Chmielewski
2014-12-13 20:54 ` Robert White
2014-12-13 21:52 ` Tomasz Chmielewski
2014-12-13 23:56 ` Robert White
2014-12-14 8:45 ` Robert White [this message]
2014-12-15 20:07 ` Josef Bacik
2014-12-15 23:27 ` Tomasz Chmielewski
2014-12-19 21:47 ` Josef Bacik
2014-12-19 23:18 ` Tomasz Chmielewski
2014-10-13 15:15 ` 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Rich Freeman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548D4E19.4040407@pobox.com \
--to=rwhite@pobox.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=tch@virtall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).