Re: btrfs-cleaner / snapshot performance analysis

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: "Ellis H. Wilson III" <ellisw@panasas.com>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs-cleaner / snapshot performance analysis
Date: Tue, 13 Feb 2018 17:14:01 -0800	[thread overview]
Message-ID: <20180214011401.GA5201@magnolia> (raw)
In-Reply-To: <659ed727-f5dc-e81b-c01c-b6a063f3ed34@gmx.com>

On Sun, Feb 11, 2018 at 02:40:16PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年02月10日 00:45, Ellis H. Wilson III wrote:
> > Hi all,
> > 
> > I am trying to better understand how the cleaner kthread (btrfs-cleaner)
> > impacts foreground performance, specifically during snapshot deletion.
> > My experience so far has been that it can be dramatically disruptive to
> > foreground I/O.
> > 
> > Looking through the wiki at kernel.org I have not yet stumbled onto any
> > analysis that would shed light on this specific problem.  I have found
> > numerous complaints about btrfs-cleaner online, especially relating to
> > quotas being enabled.  This has proven thus far less than helpful, as
> > the response tends to be "use less snapshots," or "disable quotas," both
> > of which strike me as intellectually unsatisfying answers, especially
> > the former in a filesystem where snapshots are supposed to be
> > "first-class citizens."
> 
> Yes, snapshots of btrfs is really "first-class citizen".
> Tons of designs are all biased to snapshot.
> 
> But one should be clear about one thing:
> Snapshot creation and backref walk (used in qgroup, relocation and
> extent deletion), are two conflicting workload in fact.
> 
> Btrfs puts snapshot creation on a very high priority, so that it greatly
> degrades the performance of backref walk (used in snapshot deletion,
> relocation and extent exclusive/shared calculation of qgroup).
> 
> Let me explain this problem in detail.
> 
> Just as explained by Peter Grandi, for any snapshot system (or any
> system supports reflink) there must be a reserved mapping tree, to tell
> which extent is used by who.
> 
> It's very critical, to determine if an extent is shared so we determine
> if we need to do CoW.
> 
> There are several different ways to implement it, and this hugely
> affects snapshot creation performance.
> 
> 1) Direct mapping record
>    Just records exactly which extent is used by who, directly.
>    So when we needs to check the owner, just search the tree ONCE, then
>    we get it.
> 
>    This is simple and it seems that LVM thin-provision and LVM
>    traditional targets are all using them.
>    (Maybe XFS also follows this way?)

Yes, it does.

>    Pros:
>    *FAST* backref walk, which means quick extent deletion and CoW
>    condition check.
> 
> 
>    Cons:
>    *SLOW* snapshot creation.
>    Each snapshot creation needs to insert new owner relationship into
>    the tree. This modification grow with the size of snapshot source.

...of course xfs also doesn't support snapshots. :)

--D

> 2) Indirect mapping record
>    Records upper level referencer only.
> 
>    To get all direct owner of an extent, it will needs multiple lookup
>    in the reserved mapping tree.
> 
>    And obviously, btrfs is using this method.
> 
>    Pros:
>    *FAST* owner inheritance, which means snapshot creation.
>    (Well, the only advantage I can think of)
> 
>    Cons:
>    *VERY SLOW* backref walk, used by extent deletion, relocation, qgroup
>    and Cow condition check.
>    (That may also be why btrfs default to CoW data, so that it can skip
>     the costy backref walk)
> 
> And a more detailed example of the difference between them will be:
> 
> [Basic tree layout]
>                              Tree X
>                              node A
>                            /        \
>                         node B         node C
>                         /     \       /      \
>                      leaf D  leaf E  leaf F  leaf G
> 
> Use above tree X as snapshot source.
> 
> [Snapshot creation: Direct mapping]
> Then for direct mapping record, if we are going to create snapshot Y
> then we would get:
> 
>             Tree X      Tree Y
>             node A     <node H>
>              |      \ /     |
>              |       X      |
>              |      / \     |
>             node B      node C
>          /      \          /     \
>       leaf D  leaf E   leaf F   leaf G
> 
> We need to create new node H, and update the owner for node B/C/D/E/F/G.
> 
> That's to say, we need to create 1 new node, and update 6 references of
> existing nodes/leaves.
> And this will grow rapidly if the tree is large, but still should be a
> linear increase.
> 
> 
> [Snapshot creation: Indirect mapping]
> And if using indirect mapping tree, firstly, reserved mapping tree
> doesn't record exactly the owner for each leaf/node, but only records
> its parent(s).
> 
> So even when tree X exists along, without snapshot Y, if we need to know
> the owner of leaf D, we only knows its only parent is node B.
> And do the same query on node B until we read node A and knows it's
> owned by tree X.
> 
>                              Tree X         ^
>                              node A         ^ Look upward until
>                            /                | we reach tree root
>                         node B              | to search the owner
>                         /                   | of a leaf/node
>                      leaf D                 |
> 
> So even in its best case, to look up the owner of leaf D, we need to do
> 3 times lookup. One for leaf D, one for node B, one for node A (which is
> the end).
> Such lookup will get more and more complex if there are extra branch in
> the lookup chain.
> 
> But such complicated design makes one thing easier, that is snapshot
> creation:
>             Tree X      Tree Y
>             node A     <node H>
>              |      \ /     |
>              |       X      |
>              |      / \     |
>             node B      node C
>          /      \          /     \
>       leaf D  leaf E   leaf F   leaf G
> 
> Still same tree Y, snapshot from tree X.
> 
> Despite the new node H, we only needs to update the reference lookup for
> node B and C.
> 
> So far so good, as for indirect mapping, we reduced the modification to
> reserved mapping tree, from 6 to 2.
> And it reduce will be even more obvious if the tree is larger.
> 
> But the problem is reserved for snapshot deletion:
> 
> [Snapshot deletion: Direct mapping]
> 
> To delete snapshot Y:
> 
>             Tree X      Tree Y
>             node A     <node H>
>              |      \ /     |
>              |       X      |
>              |      / \     |
>             node B      node C
>          /      \          /     \
>       leaf D  leaf E   leaf F   leaf G
> 
> [Snapshot deletion: Direct mapping]
> Quite straightforward, just check the owner of each node to see if we
> can delete the node/leaf.
> 
> For direct mapping, we just do the owner lookup in the reserved mapping
> tree, 7 times. And we found node H can be deleted.
> 
> That's all, same amount of work for snapshot creation and deletion.
> Not bad.
> 
> [Snapshot deletion: Indirect mapping]
> Here we still needs to the lookup, 7 times.
> 
> But the difference is, each lookup can cause extra lookup.
> 
> For node H, just one single lookup as it's the root.
> But for leaf G, it needs 4 times lookup.
>             Tree X      Tree Y
>             node A     <node H>
>                     \       |
>                      \      |
>                       \     |
>                         node C
>                              |
>                         leaf G
> 
> One for leaf G itself, one for node C, one for node A (parent of node C)
> and one for node H (parent of node C again).
> 
> When summing up the lookup, for indirect mapping it needs:
> 1 for node H
> 3 for node B and C each
> 4 for leaf D~G each
> 
> total 23 lookup opeartions.
> 
> And it will just be even more if there are more snapshots, and it's not
> a linear increase.
> 
> 
> Although we could do some optimization, for example for above extent
> deletion, we don't really care all the owners of the node/leaf, but only
> cares if the extent is shared.
> 
> In that case, if we find node C is also shared by tree X, we don't need
> to check node H.
> If using this optimization, the lookup times reduced to 17 times.
> 
> 
> But here comes to qgroup and balance, where they can't use such
> optimization, as they needs to update all owners to handle the owner
> change. (tree relocation tree for relocation, and qgroup number change
> for quota).
> 
> That's why quota brings an obvious impact on performance.
> 
> 
> So in short conclusions:
> 1) Snapshot is not an easy workload to be considered as one single
>    operation
>    Creation and deletion are different workload, at least for btrfs.
> 
> 2) Snapshot deletion and qgroup is the biggest cost, by the btrfs design
>    Either reduce number of snapshots to reduce branches, or disable
>    quota to optimize the lookup operation.
> 
> Thanks,
> Qu
> 
> 
> > 
> > The 2007 and 2013 Rodeh papers don't do the thorough practical snapshot
> > performance analysis I would expect to see given the assertions in the
> > latter that "BTRFS...supports efficient snapshots..."  The former is
> > sufficiently pre-BTRFS that while it does performance analysis of btree
> > clones, it's unclear (to me at least) if the results can be
> > forward-propagated in some way to real-world performance expectations
> > for BTRFS snapshot creation/deletion/modification.
> > 
> > Has this analysis been performed somewhere else and I'm just missing it?
> >  Also, I'll be glad to comment on my specific setup, kernel version,
> > etc, and discuss pragmatic work-arounds, but I'd like to better
> > understand the high-level performance implications first.
> > 
> > Thanks in advance to anyone who can comment on this.  I am very inclined
> > to read anything thrown at me, so if there is documentation I failed to
> > read, please just send the link.
> > 
> > Best,
> > 
> > ellis
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

     prev parent reply	other threads:[~2018-02-14  1:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-09 16:45 btrfs-cleaner / snapshot performance analysis Ellis H. Wilson III
2018-02-09 17:10 ` Peter Grandi
2018-02-09 20:36 ` Hans van Kranenburg
2018-02-10 18:29   ` Ellis H. Wilson III
2018-02-10 22:05     ` Tomasz Pala
2018-02-11 15:59       ` Ellis H. Wilson III
2018-02-11 18:24         ` Hans van Kranenburg
2018-02-12 15:37           ` Ellis H. Wilson III
2018-02-12 16:02             ` Austin S. Hemmelgarn
2018-02-12 16:39               ` Ellis H. Wilson III
2018-02-12 18:07                 ` Austin S. Hemmelgarn
2018-02-13 13:34             ` E V
2018-02-11  1:02     ` Hans van Kranenburg
2018-02-11  9:31       ` Andrei Borzenkov
2018-02-11 17:25         ` Adam Borowski
2018-02-11 16:15       ` Ellis H. Wilson III
2018-02-11 18:03         ` Hans van Kranenburg
2018-02-12 14:45           ` Ellis H. Wilson III
2018-02-12 17:09             ` Hans van Kranenburg
2018-02-12 17:38               ` Ellis H. Wilson III
2018-02-11  6:40 ` Qu Wenruo
2018-02-14  1:14   ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180214011401.GA5201@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=ellisw@panasas.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).