From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs fi defrag -r -t 32M? What is actually happening?
Date: Wed, 27 Jul 2016 01:10:35 +0000 (UTC) [thread overview]
Message-ID: <pan$acc$75dc71b8$532c7a5c$47aab598@cox.net> (raw)
In-Reply-To: CAD=QJKhXC4OR1BanS2p1Kb0cHunduY9ETPwwhphkeCfHxOXA0Q@mail.gmail.com
Nicholas D Steeves posted on Tue, 26 Jul 2016 19:03:53 -0400 as excerpted:
> Hi,
>
> I've been using btrfs fi defrag with out the "-r -t 32M" option for
> regular maintenance. I just learned, in
> Documentation/btrfs-convert.asciidoc, that there is a recommendation
> to run with "-t 32M" after a conversion from ext2/3/4. I then
> cross-referenced this with btrfs-filesystem(8), and found that:
>
> Extents bigger than value given by -t will be skipped, otherwise
> this value is used as a target extent size, but is only advisory
> and may not be reached if the free space is too fragmented. Use 0
> to take the kernel default, which is 256kB but may change in the
> future.
>
> I understand the default behaviour of target extent size of 256kB to
> mean only defragment small files and metadata. Or does this mean that
> the default behaviour is to defragment extent tree metadata >256kB,
> and then defragment the (larger than 256kB) data from many extents
> into a single extent? I was surprised to read this!
>
> What's really happening with this default behaviour? Should everyone
> be using -t with a much larger value to actually defragment their
> databases?
Something about defrag's -t option should really be in the FAQ, as it is
known to be somewhat confusing and to come up from time to time, tho this
is the first time I've seen it in the context of convert.
In general, you are correct in that the larger the value given to -t, the
more defragging you should ultimately get. There's a practical upper
limit, however, the data chunk size, which is nominally 1 GiB (tho on
tiny btrfs it's smaller and on TB-scale it can be larger, to 8 or 10 GiB
IIRC). 32-bit btrfs-progs defrag also had a bug at one point that would
(IIRC) kill the parameter if it was set to 2+ GiB -- that has been fixed
by hard-coding the 32-bit max to 1 GiB, I believe. The bug didn't affect
64-bit. In any case, 1 GiB is fine, and often the largest btrfs can do
anyway, due as I said to that being the normal data chunk size.
And btrfs defrag only deals with data. There's no metadata defrag, tho
balance -m (or whole filesystem) will normally consolidate the metadata
into the fewest (nominally 256 MiB) metadata chunks possible as it
rewrites them.
In that regard a defrag -t 32M recommendation is reasonable for a
converted filesystem, tho you can certainly go larger... to 1 GiB as I
said.
On a converted filesystem, however, there's possibly the opposite issue
as well -- on btrfs, as stated, extents are normally limited to chunk
size, nominally 1 GiB (the reason being for chunk management, with the
indirection chunks provide allowing balance to do all the stuff it can
do, like converting between different raid levels), while ext* apparently
has no such limitation. If the initial post-saved-subvol-delete defrag
and balance don't work correctly -- they've been buggy at times in the
past and haven't, then it can be the problem blocking a successful full
balance is huge files with single extents larger than a GiB, that didn't
get broken up into btrfs-native chunk-sized extents. At times people
have had to temporarily move these off the filesystem, thus clearing the
> 1 GiB extents they took, and back on, thus recreating them with btrfs-
native chunk-size extents, maximum.
Of course all this is in the context of btrfs-convert. But it's not
really a recommended conversion path anyway, tho it's recognized as
ultimately a pragmatic necessity. The reasoning goes like this. Before
you do anything as major as filesystem conversion in the first place,
full backups of anything you wish to keep are strongly recommended,
because it's always possible something will go wrong during the convert.
No sane sysadmin would attempt the conversion without a backup, unless
the data really was defined as worth less than the cost of the space
required to store that backup, because sane sysadmins recognize that
attempting that sort of risky operation without a backup is by definition
of the risk associated with the operation, defining that data as simply
not worth the trouble -- they'd literally prefer to lose the data rather
than pay the time/hassle/resource cost of having a backup.
And once the requirement of a full backup (or alternatively, that the
data is really not worth the hassle) is recognized, it's far easier and
faster to simply mkfs.btrfs a brand new btrfs and copy everything over
from the old ext*, in the process leaving /it/ as the backup, than it is
to go thru the hassle of doing the backup (and thus the same copy step
you'd do copying the old data to the new filesystem) anyway, then the
convert-in-place, then testing that it worked, then deleting the saved
subvolume with the ext* metadata, then doing the defrag and balance,
before you can be sure your btrfs is properly cleaned up and ready for
normal use, and thus the btrfs convert fully successful.
Meanwhile, even when functioning perfectly, convert, because it /is/
dealing with the data and metadata in-place, isn't going to give you the
flexibility or choices, nor the performance, of a freshly created btrfs,
created with the options you want, and freshly populated with cleanly
copied and thus fully native and defragged data and metadata from the
old, now backup, copy.
So to a sysadmin considering the risks involved, convert gains you
nothing and loses you lots, compared to starting with a brand new
filesystem and copying everything over, thus letting the old filesystem
remain in place as the initial backup until you're confident the data on
the new filesystem is complete and the filesystem functioning properly as
a full replacement for the old, now backup, copy.
Never-the-less, having a convert utility is "nice", and pragmatically
recognized as a practical necessity, if btrfs is to eventually supplant
ext* as the assumed Linux default filesystem, because despite all the
wisdom saying otherwise and risk and disadvantages of convert-in-place,
some people will never have those backups and will just take that risk,
and without a convert utility, will simply remain with ext*.
(It's worth noting that arguably, those sorts of people shouldn't be
switching to the still maturing filesystem in the first place, as backups
or only testing with "losable" data is still strongly recommended for
those wishing to try btrfs. People who aren't willing to deal with
backups really should be sticking to a rather more mature and stable
filesystem than btrfs in its current state, and once they /are/ willing
to deal with backups, the choice of a brand new btrfs and copy everything
over from the old filesystem, which then becomes the backup, becomes so
much better than convert, that there's simply no sane reason to use
convert in the first place. Thus, arguably, the only people using
convert at this point should be those with the specific purpose of
testing it, in ordered to be sure it's ready for that day when btrfs
really is finally stable enough that it can in clear conscience be
recommended to people without backups, as at least as stable and problem
free as whatever they were using previously. Tho that day's likely some
years in the future.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-07-27 1:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-26 23:03 btrfs fi defrag -r -t 32M? What is actually happening? Nicholas D Steeves
2016-07-27 1:10 ` Duncan [this message]
2016-07-27 17:19 ` btrfs fi defrag does not defrag files >256kB? Nicholas D Steeves
2016-07-28 5:14 ` Duncan
2016-07-28 10:55 ` David Sterba
2016-07-28 17:25 ` Duncan
2016-07-28 17:53 ` Nicholas D Steeves
2016-07-29 3:56 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$acc$75dc71b8$532c7a5c$47aab598@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).