linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Peter Zaitsev <pz@percona.com>, Hugo Mills <hugo@carfax.org.uk>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS for OLTP Databases
Date: Tue, 7 Feb 2017 10:43:11 -0500	[thread overview]
Message-ID: <716741d9-deef-ebc1-b418-da9f8d23d791@gmail.com> (raw)
In-Reply-To: <CAGqmi772WF8KqMWUowtG2_8uS7KTu3u4vJkNbhTHams_x13jWg@mail.gmail.com>

On 2017-02-07 10:20, Timofey Titovets wrote:
>>> I think that you have a problem with extent bookkeeping (if i
>>> understand how btrfs manage extents).
>>> So for deal with it, try enable compression, as compression will force
>>> all extents to be fragmented with size ~128kb.
>>
>> No, it will compress everything in chunks of 128kB, but it will not fragment
>> things any more than they already would have been (it may actually _reduce_
>> fragmentation because there is less data being stored on disk).  This
>> representation is a bug in the FIEMAP ioctl, it doesn't understand the way
>> BTRFS represents things properly.  IIRC, there was a patch to fix this, but
>> I don't remember what happened with it.
>>
>> That said, in-line compression can help significantly, especially if you
>> have slow storage devices.
>
>
> I mean that:
> You have a 128MB extent, you rewrite random 4k sectors, btrfs will not
> split 128MB extent, and not free up data, (i don't know internal algo,
> so i can't predict when this will hapen), and after some time, btrfs
> will rebuild extents, and split 128 MB exten to several more smaller.
> But when you use compression, allocator rebuilding extents much early
> (i think, it's because btrfs also operates with that like 128kb
> extent, even if it's a continuos 128MB chunk of data).
>
The allocator has absolutely nothing to do with this, it's a function of 
the COW operation.  Unless you're using nodatacow, that 128MB extent 
will get split the moment the data hits the storage device (either on 
the next commit cycle (at most 30 seconds with the default commit 
cycle), or when fdatasync is called, whichever is sooner).  In the case 
of compression, it's still one extent (although on disk it will be less 
than 128MB) and will be split at _exactly_ the same time under _exactly_ 
the same circumstances as an uncompressed extent.  IOW, it has 
absolutely nothing to do with the extent handling either.

The difference arises in that compressed data effectively has a on-media 
block size of 128k, not 16k (the current default block size) or 4k (the 
old default).  This means that the smallest fragment possible for a file 
with in-line compression enabled is 128k, while for a file without it 
it's equal to the filesystem block size.  A larger minimum fragment size 
means that the maximum number of fragments a given file can have is 
smaller (8 times smaller in fact than without compression when using the 
current default block size), which means that there will be less 
fragmentation.

Some rather complex and tedious math indicates that this is not the 
_only_ thing improving performance when using in-line compression, but 
it's probably the biggest thing doing so for the workload being discussed.

  reply	other threads:[~2017-02-07 15:43 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13   ` Peter Zaitsev
2017-02-07 15:00     ` Timofey Titovets
2017-02-07 15:09       ` Austin S. Hemmelgarn
2017-02-07 15:20         ` Timofey Titovets
2017-02-07 15:43           ` Austin S. Hemmelgarn [this message]
2017-02-07 21:14             ` Kai Krakow
2017-02-07 16:22     ` Lionel Bouton
2017-02-07 19:57     ` Roman Mamedov
2017-02-07 20:36     ` Kai Krakow
2017-02-07 20:44       ` Lionel Bouton
2017-02-07 20:47       ` Austin S. Hemmelgarn
2017-02-07 21:25         ` Lionel Bouton
2017-02-07 21:35           ` Kai Krakow
2017-02-07 22:27             ` Hans van Kranenburg
2017-02-08 19:08             ` Goffredo Baroncelli
     [not found]         ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44           ` Austin S. Hemmelgarn
2017-02-13 17:16             ` linux-btrfs
2017-02-07 19:31   ` Peter Zaitsev
2017-02-07 19:50     ` Austin S. Hemmelgarn
2017-02-07 20:19       ` Kai Krakow
2017-02-07 20:27         ` Austin S. Hemmelgarn
2017-02-07 20:54           ` Kai Krakow
2017-02-08 12:12             ` Austin S. Hemmelgarn
2017-02-08  2:11   ` Peter Zaitsev
2017-02-08 12:14     ` Martin Raiber
2017-02-08 13:00       ` Adrian Brzezinski
2017-02-08 13:08       ` Austin S. Hemmelgarn
2017-02-08 13:26         ` Martin Raiber
2017-02-08 13:32           ` Austin S. Hemmelgarn
2017-02-08 14:28             ` Adrian Brzezinski
2017-02-08 13:38           ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39   ` Kai Krakow
2017-02-07 19:59     ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59   ` Peter Zaitsev
2017-02-07 19:54     ` Austin S. Hemmelgarn
2017-02-07 20:40       ` Peter Zaitsev
2017-02-07 22:08     ` Hans van Kranenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=716741d9-deef-ebc1-b418-da9f8d23d791@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nefelim4ag@gmail.com \
    --cc=pz@percona.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).