linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: defragmenting best practice?
Date: Tue, 12 Sep 2017 13:27:00 -0400	[thread overview]
Message-ID: <1e39d1a1-db3a-1925-2bee-629987b22d3a@gmail.com> (raw)
In-Reply-To: <20170912162843.GA32233@rus.uni-stuttgart.de>

On 2017-09-12 12:28, Ulli Horlacher wrote:
> On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
>> When I do a
>> btrfs filesystem defragment -r /directory
>> does it defragment really all files in this directory tree, even if it
>> contains subvolumes?
>> The man page does not mention subvolumes on this topic.
> 
> No answer so far :-(
I hadn't seen your original mail, otherwise I probably would have 
responded.  Sorry about that.

On the note of the original question:
I'm pretty sure that it does recursively operate on nested subvolumes. 
The documentation doesn't say otherwise, and not doing so would be 
non-intuitive to people who don't know anything about subvolumes.
> 
> But I found another problem in the man-page:
> 
>    Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
>    with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
>    will break up the ref-links of COW data (for example files copied with
>    cp --reflink, snapshots or de-duplicated data). This may cause
>    considerable increase of space usage depending on the broken up
>    ref-links.
> 
> I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
> snapshots.
> Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
> 
> What is the defragmenting best practice?
That really depends on what you're doing.

First, you need to understand that defrag won't break _all_ reflinks, 
just the particular instances you point it at.  So, if you have 
subvolume A, and snapshots S1 and S2 of that subvolume A, then running 
defrag on _just_ subvolume A will break the reflinks between it and the 
snapshots, but S1 and S2 will still share any data they were originally 
with each other.  If you then take a third snapshot of A, it will share 
data with A, but not with S1 or S2 (because A is no longer sharing data 
with S1 or S2).

Given this behavior, you have in turn three potential cases when talking 
about persistent snapshots:

1. You care about minimizing space used, but aren't as worried about 
performance.  In this case, the only option is to not run defrag at all.
2. You care about performance, but not space usage.  In this case, 
defragment everything.
3. You care about both space usage and performance.  In this case, I 
would personally suggest defragmenting only the source subvolume (so 
only subvolume A in the above explanation), and doing so on a schedule 
that coincides with snapshot rotation.  The idea is to defrag just 
before you take a snapshot, and at a frequency that gives a good balance 
between space usage and performance.  As a general rule, if you take 
this route, start by doing the defrag on either a monthly basis if 
you're doing daily or weekly snapshots, or with every fourth snapshot if 
not, and then adjust the interval based on how that impacts your space 
usage.

Additionally, you can compact free space without defragmenting data or 
breaking reflinks by running a full balance on the filesystem.

The tricky part though is that differing workloads are impacted 
differently by fragmentation.  Using just four generic examples:

* Mostly sequential write focused workloads (like security recording 
systems) tend to be impacted by free space fragmentation more than data 
fragmentation.  Balancing filesystems used for such workloads is likely 
to give a noticeable improvement, but defragmenting probably won't give 
much.
* Mostly sequential read focused workloads (like a streaming media 
server) tend to be the most impacted by data fragmentation, but aren't 
generally impacted by free space fragmentation.  As a result, defrag 
will help here a lot, but balance won't as much.
* Mostly random write focused workloads (like most database systems or 
virtual machines) are often impacted by both free space and data 
fragmentation, and are a pathological case for CoW filesystems.  Balance 
and defrag will help here, but they won't help for long.
* Mostly random read focused workloads (like most non-multimedia desktop 
usage) are not impacted much by either aspect, but if you're on a 
traditional hard drive they can be impacted significantly by how the 
data is spread across the disk.  Balance can help here, but only because 
it improves data locality, not because it compacts free space.

  reply	other threads:[~2017-09-12 17:27 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-31  7:05 btrfs filesystem defragment -r -- does it affect subvolumes? Ulli Horlacher
2017-09-12 16:28 ` defragmenting best practice? Ulli Horlacher
2017-09-12 17:27   ` Austin S. Hemmelgarn [this message]
2017-09-14  7:54     ` Duncan
2017-09-14 12:28       ` Austin S. Hemmelgarn
2017-09-14 11:38   ` Kai Krakow
2017-09-14 13:31     ` Tomasz Kłoczko
2017-09-14 15:24       ` Kai Krakow
2017-09-14 15:47         ` Kai Krakow
2017-09-14 17:48         ` Tomasz Kłoczko
2017-09-14 18:53           ` Austin S. Hemmelgarn
2017-09-15  2:26             ` Tomasz Kłoczko
2017-09-15 12:23               ` Austin S. Hemmelgarn
2017-09-14 20:17           ` Kai Krakow
2017-09-15 10:54           ` Michał Sokołowski
2017-09-15 11:13             ` Peter Grandi
2017-09-15 13:07             ` Tomasz Kłoczko
2017-09-15 14:11               ` Michał Sokołowski
2017-09-15 16:35                 ` Peter Grandi
2017-09-15 17:08                 ` Kai Krakow
2017-09-15 19:10                   ` Tomasz Kłoczko
2017-09-20  6:38                     ` Dave
2017-09-20 11:46                       ` Austin S. Hemmelgarn
2017-09-21 20:10                         ` Kai Krakow
2017-09-21 23:30                           ` Dave
2017-09-21 23:58                           ` Kai Krakow
2017-09-22 11:22                           ` Austin S. Hemmelgarn
2017-09-22 20:29                             ` Marc Joliet
2017-09-21 11:09                       ` Duncan
2017-10-31 21:47                         ` Dave
2017-10-31 23:06                           ` Peter Grandi
2017-11-01  0:37                             ` Dave
2017-11-01 12:21                               ` Austin S. Hemmelgarn
2017-11-02  1:39                                 ` Dave
2017-11-02 11:07                                   ` Austin S. Hemmelgarn
2017-11-03  2:59                                     ` Dave
2017-11-03  7:12                                       ` Kai Krakow
2017-11-03  5:58                                   ` Marat Khalili
2017-11-03  7:19                                     ` Kai Krakow
2017-11-01 17:48                               ` Peter Grandi
2017-11-02  0:09                                 ` Dave
2017-11-02 11:17                                   ` Austin S. Hemmelgarn
2017-11-02 18:09                                     ` Dave
2017-11-02 18:37                                       ` Austin S. Hemmelgarn
2017-11-02  0:43                                 ` Peter Grandi
2017-11-02 21:16                               ` Kai Krakow
2017-11-03  2:47                                 ` Dave
2017-11-03  7:26                                   ` Kai Krakow
2017-11-03 11:30                                     ` Austin S. Hemmelgarn
     [not found]                             ` <CAH=dxU47-52-asM5vJ_-qOpEpjZczHw7vQzgi1-TeKm58++zBQ@mail.gmail.com>
2017-12-11  5:18                               ` Dave
2017-12-11  6:10                                 ` Timofey Titovets
2017-11-01  7:43                           ` Sean Greenslade
2017-11-01 13:31                           ` Duncan
2017-11-01 23:36                             ` Dave
2017-09-21 19:28                       ` Sean Greenslade
2017-09-20  7:34                     ` Dmitry Kudriavtsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e39d1a1-db3a-1925-2bee-629987b22d3a@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).