From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Peter Zaitsev <pz@percona.com>, Hugo Mills <hugo@carfax.org.uk>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS for OLTP Databases
Date: Tue, 7 Feb 2017 10:43:11 -0500 [thread overview]
Message-ID: <716741d9-deef-ebc1-b418-da9f8d23d791@gmail.com> (raw)
In-Reply-To: <CAGqmi772WF8KqMWUowtG2_8uS7KTu3u4vJkNbhTHams_x13jWg@mail.gmail.com>
On 2017-02-07 10:20, Timofey Titovets wrote:
>>> I think that you have a problem with extent bookkeeping (if i
>>> understand how btrfs manage extents).
>>> So for deal with it, try enable compression, as compression will force
>>> all extents to be fragmented with size ~128kb.
>>
>> No, it will compress everything in chunks of 128kB, but it will not fragment
>> things any more than they already would have been (it may actually _reduce_
>> fragmentation because there is less data being stored on disk). This
>> representation is a bug in the FIEMAP ioctl, it doesn't understand the way
>> BTRFS represents things properly. IIRC, there was a patch to fix this, but
>> I don't remember what happened with it.
>>
>> That said, in-line compression can help significantly, especially if you
>> have slow storage devices.
>
>
> I mean that:
> You have a 128MB extent, you rewrite random 4k sectors, btrfs will not
> split 128MB extent, and not free up data, (i don't know internal algo,
> so i can't predict when this will hapen), and after some time, btrfs
> will rebuild extents, and split 128 MB exten to several more smaller.
> But when you use compression, allocator rebuilding extents much early
> (i think, it's because btrfs also operates with that like 128kb
> extent, even if it's a continuos 128MB chunk of data).
>
The allocator has absolutely nothing to do with this, it's a function of
the COW operation. Unless you're using nodatacow, that 128MB extent
will get split the moment the data hits the storage device (either on
the next commit cycle (at most 30 seconds with the default commit
cycle), or when fdatasync is called, whichever is sooner). In the case
of compression, it's still one extent (although on disk it will be less
than 128MB) and will be split at _exactly_ the same time under _exactly_
the same circumstances as an uncompressed extent. IOW, it has
absolutely nothing to do with the extent handling either.
The difference arises in that compressed data effectively has a on-media
block size of 128k, not 16k (the current default block size) or 4k (the
old default). This means that the smallest fragment possible for a file
with in-line compression enabled is 128k, while for a file without it
it's equal to the filesystem block size. A larger minimum fragment size
means that the maximum number of fragments a given file can have is
smaller (8 times smaller in fact than without compression when using the
current default block size), which means that there will be less
fragmentation.
Some rather complex and tedious math indicates that this is not the
_only_ thing improving performance when using in-line compression, but
it's probably the biggest thing doing so for the workload being discussed.
next prev parent reply other threads:[~2017-02-07 15:43 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13 ` Peter Zaitsev
2017-02-07 15:00 ` Timofey Titovets
2017-02-07 15:09 ` Austin S. Hemmelgarn
2017-02-07 15:20 ` Timofey Titovets
2017-02-07 15:43 ` Austin S. Hemmelgarn [this message]
2017-02-07 21:14 ` Kai Krakow
2017-02-07 16:22 ` Lionel Bouton
2017-02-07 19:57 ` Roman Mamedov
2017-02-07 20:36 ` Kai Krakow
2017-02-07 20:44 ` Lionel Bouton
2017-02-07 20:47 ` Austin S. Hemmelgarn
2017-02-07 21:25 ` Lionel Bouton
2017-02-07 21:35 ` Kai Krakow
2017-02-07 22:27 ` Hans van Kranenburg
2017-02-08 19:08 ` Goffredo Baroncelli
[not found] ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44 ` Austin S. Hemmelgarn
2017-02-13 17:16 ` linux-btrfs
2017-02-07 19:31 ` Peter Zaitsev
2017-02-07 19:50 ` Austin S. Hemmelgarn
2017-02-07 20:19 ` Kai Krakow
2017-02-07 20:27 ` Austin S. Hemmelgarn
2017-02-07 20:54 ` Kai Krakow
2017-02-08 12:12 ` Austin S. Hemmelgarn
2017-02-08 2:11 ` Peter Zaitsev
2017-02-08 12:14 ` Martin Raiber
2017-02-08 13:00 ` Adrian Brzezinski
2017-02-08 13:08 ` Austin S. Hemmelgarn
2017-02-08 13:26 ` Martin Raiber
2017-02-08 13:32 ` Austin S. Hemmelgarn
2017-02-08 14:28 ` Adrian Brzezinski
2017-02-08 13:38 ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39 ` Kai Krakow
2017-02-07 19:59 ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59 ` Peter Zaitsev
2017-02-07 19:54 ` Austin S. Hemmelgarn
2017-02-07 20:40 ` Peter Zaitsev
2017-02-07 22:08 ` Hans van Kranenburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=716741d9-deef-ebc1-b418-da9f8d23d791@gmail.com \
--to=ahferroin7@gmail.com \
--cc=hugo@carfax.org.uk \
--cc=linux-btrfs@vger.kernel.org \
--cc=nefelim4ag@gmail.com \
--cc=pz@percona.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).