From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: defragmenting best practice?
Date: Tue, 12 Sep 2017 13:27:00 -0400 [thread overview]
Message-ID: <1e39d1a1-db3a-1925-2bee-629987b22d3a@gmail.com> (raw)
In-Reply-To: <20170912162843.GA32233@rus.uni-stuttgart.de>
On 2017-09-12 12:28, Ulli Horlacher wrote:
> On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
>> When I do a
>> btrfs filesystem defragment -r /directory
>> does it defragment really all files in this directory tree, even if it
>> contains subvolumes?
>> The man page does not mention subvolumes on this topic.
>
> No answer so far :-(
I hadn't seen your original mail, otherwise I probably would have
responded. Sorry about that.
On the note of the original question:
I'm pretty sure that it does recursively operate on nested subvolumes.
The documentation doesn't say otherwise, and not doing so would be
non-intuitive to people who don't know anything about subvolumes.
>
> But I found another problem in the man-page:
>
> Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
> with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
> will break up the ref-links of COW data (for example files copied with
> cp --reflink, snapshots or de-duplicated data). This may cause
> considerable increase of space usage depending on the broken up
> ref-links.
>
> I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
> snapshots.
> Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
>
> What is the defragmenting best practice?
That really depends on what you're doing.
First, you need to understand that defrag won't break _all_ reflinks,
just the particular instances you point it at. So, if you have
subvolume A, and snapshots S1 and S2 of that subvolume A, then running
defrag on _just_ subvolume A will break the reflinks between it and the
snapshots, but S1 and S2 will still share any data they were originally
with each other. If you then take a third snapshot of A, it will share
data with A, but not with S1 or S2 (because A is no longer sharing data
with S1 or S2).
Given this behavior, you have in turn three potential cases when talking
about persistent snapshots:
1. You care about minimizing space used, but aren't as worried about
performance. In this case, the only option is to not run defrag at all.
2. You care about performance, but not space usage. In this case,
defragment everything.
3. You care about both space usage and performance. In this case, I
would personally suggest defragmenting only the source subvolume (so
only subvolume A in the above explanation), and doing so on a schedule
that coincides with snapshot rotation. The idea is to defrag just
before you take a snapshot, and at a frequency that gives a good balance
between space usage and performance. As a general rule, if you take
this route, start by doing the defrag on either a monthly basis if
you're doing daily or weekly snapshots, or with every fourth snapshot if
not, and then adjust the interval based on how that impacts your space
usage.
Additionally, you can compact free space without defragmenting data or
breaking reflinks by running a full balance on the filesystem.
The tricky part though is that differing workloads are impacted
differently by fragmentation. Using just four generic examples:
* Mostly sequential write focused workloads (like security recording
systems) tend to be impacted by free space fragmentation more than data
fragmentation. Balancing filesystems used for such workloads is likely
to give a noticeable improvement, but defragmenting probably won't give
much.
* Mostly sequential read focused workloads (like a streaming media
server) tend to be the most impacted by data fragmentation, but aren't
generally impacted by free space fragmentation. As a result, defrag
will help here a lot, but balance won't as much.
* Mostly random write focused workloads (like most database systems or
virtual machines) are often impacted by both free space and data
fragmentation, and are a pathological case for CoW filesystems. Balance
and defrag will help here, but they won't help for long.
* Mostly random read focused workloads (like most non-multimedia desktop
usage) are not impacted much by either aspect, but if you're on a
traditional hard drive they can be impacted significantly by how the
data is spread across the disk. Balance can help here, but only because
it improves data locality, not because it compacts free space.
next prev parent reply other threads:[~2017-09-12 17:27 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-31 7:05 btrfs filesystem defragment -r -- does it affect subvolumes? Ulli Horlacher
2017-09-12 16:28 ` defragmenting best practice? Ulli Horlacher
2017-09-12 17:27 ` Austin S. Hemmelgarn [this message]
2017-09-14 7:54 ` Duncan
2017-09-14 12:28 ` Austin S. Hemmelgarn
2017-09-14 11:38 ` Kai Krakow
2017-09-14 13:31 ` Tomasz Kłoczko
2017-09-14 15:24 ` Kai Krakow
2017-09-14 15:47 ` Kai Krakow
2017-09-14 17:48 ` Tomasz Kłoczko
2017-09-14 18:53 ` Austin S. Hemmelgarn
2017-09-15 2:26 ` Tomasz Kłoczko
2017-09-15 12:23 ` Austin S. Hemmelgarn
2017-09-14 20:17 ` Kai Krakow
2017-09-15 10:54 ` Michał Sokołowski
2017-09-15 11:13 ` Peter Grandi
2017-09-15 13:07 ` Tomasz Kłoczko
2017-09-15 14:11 ` Michał Sokołowski
2017-09-15 16:35 ` Peter Grandi
2017-09-15 17:08 ` Kai Krakow
2017-09-15 19:10 ` Tomasz Kłoczko
2017-09-20 6:38 ` Dave
2017-09-20 11:46 ` Austin S. Hemmelgarn
2017-09-21 20:10 ` Kai Krakow
2017-09-21 23:30 ` Dave
2017-09-21 23:58 ` Kai Krakow
2017-09-22 11:22 ` Austin S. Hemmelgarn
2017-09-22 20:29 ` Marc Joliet
2017-09-21 11:09 ` Duncan
2017-10-31 21:47 ` Dave
2017-10-31 23:06 ` Peter Grandi
2017-11-01 0:37 ` Dave
2017-11-01 12:21 ` Austin S. Hemmelgarn
2017-11-02 1:39 ` Dave
2017-11-02 11:07 ` Austin S. Hemmelgarn
2017-11-03 2:59 ` Dave
2017-11-03 7:12 ` Kai Krakow
2017-11-03 5:58 ` Marat Khalili
2017-11-03 7:19 ` Kai Krakow
2017-11-01 17:48 ` Peter Grandi
2017-11-02 0:09 ` Dave
2017-11-02 11:17 ` Austin S. Hemmelgarn
2017-11-02 18:09 ` Dave
2017-11-02 18:37 ` Austin S. Hemmelgarn
2017-11-02 0:43 ` Peter Grandi
2017-11-02 21:16 ` Kai Krakow
2017-11-03 2:47 ` Dave
2017-11-03 7:26 ` Kai Krakow
2017-11-03 11:30 ` Austin S. Hemmelgarn
[not found] ` <CAH=dxU47-52-asM5vJ_-qOpEpjZczHw7vQzgi1-TeKm58++zBQ@mail.gmail.com>
2017-12-11 5:18 ` Dave
2017-12-11 6:10 ` Timofey Titovets
2017-11-01 7:43 ` Sean Greenslade
2017-11-01 13:31 ` Duncan
2017-11-01 23:36 ` Dave
2017-09-21 19:28 ` Sean Greenslade
2017-09-20 7:34 ` Dmitry Kudriavtsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1e39d1a1-db3a-1925-2bee-629987b22d3a@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).