From: Stefan Behrens <sbehrens@giantdisaster.de>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v8 0/8] Btrfs: introduce a tree for UUID to subvol ID mapping
Date: Thu, 15 Aug 2013 17:11:16 +0200 [thread overview]
Message-ID: <cover.1376575048.git.sbehrens@giantdisaster.de> (raw)
Mapping UUIDs to subvolume IDs is an operation with a high effort
today. Today, the algorithm even has quadratic effort (based on the
number of existing subvolumes), which means, that it takes minutes
to send/receive a single subvolume if 10,000 subvolumes exist. But
even linear effort would be too much since it is a waste. And these
data structures to allow mapping UUIDs to subvolume IDs are created
every time a btrfs send/receive instance is started.
So the issue to address is that Btrfs send / receive does not work
as it is today when a high number of subvolumes exist.
The table below shows the time it takes on my testbox to send _one_
empty subvolume depending on the number of subvolume that exist in
the filesystem.
# of subvols | without | with
in filesystem | UUID tree | UUID tree
--------------+------------+----------
2 | 0m00.004s | 0m00.003s
1000 | 0m07.010s | 0m00.004s
2000 | 0m28.210s | 0m00.004s
3000 | 1m04.872s | 0m00.004s
4000 | 1m56.059s | 0m00.004s
5000 | 3m00.489s | 0m00.004s
6000 | 4m27.376s | 0m00.004s
7000 | 6m08.938s | 0m00.004s
8000 | 7m54.020s | 0m00.004s
9000 | 10m05.108s | 0m00.004s
10000 | 12m47.406s | 0m00.004s
11000 | 15m05.800s | 0m00.004s
12000 | 18m00.170s | 0m00.004s
13000 | 21m39.438s | 0m00.004s
14000 | 24m54.681s | 0m00.004s
15000 | 28m09.096s | 0m00.004s
16000 | 33m08.856s | 0m00.004s
17000 | 37m10.562s | 0m00.004s
18000 | 41m44.727s | 0m00.004s
19000 | 46m14.335s | 0m00.004s
20000 | 51m55.100s | 0m00.004s
21000 | 56m54.346s | 0m00.004s
22000 | 62m53.466s | 0m00.004s
23000 | 66m57.328s | 0m00.004s
24000 | 73m59.687s | 0m00.004s
25000 | 81m24.476s | 0m00.004s
26000 | 87m11.478s | 0m00.004s
27000 | 92m59.225s | 0m00.004s
Or as a chart:
http://btrfs.giantdisaster.de/Btrfs-send-recv-perf.pdf
It is much more efficient to maintain a searchable persistent data
structure in the filesystem, one that is updated whenever a
subvolume/snapshot is created and deleted, and when the received
subvolume UUID is set by the btrfs-receive tool.
Therefore kernel code is added that is able to maintain data
structures in the filesystem that allow to quickly search for a
given UUID and to retrieve the subvol ID.
Now follows the lengthy justification, why a new tree was added
instead of using the existing root tree:
The first approach was to not create another tree that holds UUID
items. Instead, the items should just go into the top root tree.
Unfortunately this confused the algorithm to assign the objectid
of subvolumes and snapshots. The reason is that
btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
the first created subvol or snapshot after mounting a filesystem,
and this function simply searches for the largest used objectid in
the root tree keys to pick the next objectid to assign. Of course,
the UUID keys have always been the ones with the highest offset
value, and the next assigned subvol ID was wastefully huge.
To use any other existing tree did not look proper. To apply a
workaround such as setting the objectid to zero in the UUID item
key and to implement collision handling would either add
limitations (in case of a btrfs_extend_item() approach to handle
the collisions) or a lot of complexity and source code (in case a
key would be looked up that is free of collisions). Adding new code
that introduces limitations is not good, and adding code that is
complex and lengthy for no good reason is also not good. That's the
justification why a completely new tree was introduced.
v1 -> v2:
- All review comments from David Sterba, Josef Bacik and Jan Schmidt
are addressed.
The hugest change was to add a mechanism that handles the case that
the filesystem is mounted with an older kernel. Now that case is
detected when the filesystem is mounted with a newer kernel again,
and the UUID tree is updated in the background.
v2 -> v3:
- All review comments from Liu Bo are addressed:
- shrinked the size of the uuid_item.
- fixed the issue that the uuid-tree was not using the transaction
block reserve.
v3 -> v4:
- Fixed a bug. A corrupted UUID tree entry could have caused an endless
loop in the check+rescan thread.
v4 -> v5:
- On demand from multiple persons, the way was changed that a umount
waits for the completion of the uuid tree rescan thread. Now a
struct completion is used instead of a struct semaphore.
v5 -> v6:
- Iterate through the UUID tree using btrfs_next_item() when possible.
- Use the type field in the key to distinguish the UUID tree item types.
- Removed the lookup functions that are only used in the btrfs-progs
code.
v6 -> v7:
- WARN_ON_ONCE specifically returns the condition.
- Eliminate the sparse warnings that CF=-D__CHECK_ENDIAN__ produces.
- Have callers pass in the key type to the search functions and remove
the specific search functions.
v7 -> v8:
- Rebase
- Undo the changes that I had done for v5. Don't use a completion. The
completion meant to either manually reimplement a semaphore with a
completion, or to cause bugs (the latter was the case since v5).
I changed it back to use a semaphore, this time making lockdep et al.
happy and setting the semaphore to its initial state at the end.
Stefan Behrens (8):
Btrfs: introduce a tree for items that map UUIDs to something
Btrfs: support printing UUID tree elements
Btrfs: create UUID tree if required
Btrfs: maintain subvolume items in the UUID tree
Btrfs: fill UUID tree initially
Btrfs: introduce uuid-tree-gen field
Btrfs: check UUID tree during mount if required
Btrfs: add mount option to force UUID tree checking
fs/btrfs/Makefile | 3 +-
fs/btrfs/ctree.h | 39 +++++-
fs/btrfs/disk-io.c | 58 ++++++++
fs/btrfs/extent-tree.c | 3 +
fs/btrfs/ioctl.c | 76 +++++++++--
fs/btrfs/print-tree.c | 24 ++++
fs/btrfs/super.c | 8 +-
fs/btrfs/transaction.c | 22 ++-
fs/btrfs/uuid-tree.c | 358 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.c | 258 +++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 2 +
11 files changed, 837 insertions(+), 14 deletions(-)
create mode 100644 fs/btrfs/uuid-tree.c
--
1.8.3.4
next reply other threads:[~2013-08-15 15:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-15 15:11 Stefan Behrens [this message]
2013-08-15 15:11 ` [PATCH v8 1/8] Btrfs: introduce a tree for items that map UUIDs to something Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 2/8] Btrfs: support printing UUID tree elements Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 3/8] Btrfs: create UUID tree if required Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 4/8] Btrfs: maintain subvolume items in the UUID tree Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 5/8] Btrfs: fill UUID tree initially Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 6/8] Btrfs: introduce uuid-tree-gen field Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 7/8] Btrfs: check UUID tree during mount if required Stefan Behrens
2013-08-15 15:11 ` [PATCH v8 8/8] Btrfs: add mount option to force UUID tree checking Stefan Behrens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1376575048.git.sbehrens@giantdisaster.de \
--to=sbehrens@giantdisaster.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).