From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Peter Zaitsev <pz@percona.com>
Cc: Jeff Mahoney <jeffm@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS for OLTP Databases
Date: Tue, 7 Feb 2017 14:54:16 -0500 [thread overview]
Message-ID: <fb3187d8-16f5-9a84-d0ce-d048f1ac5cc7@gmail.com> (raw)
In-Reply-To: <CA+RUij2QOP=bJos8p6NswZqSdbhwZ7frTdaLtZaiGBc4vKajCg@mail.gmail.com>
On 2017-02-07 13:59, Peter Zaitsev wrote:
> Jeff,
>
> Thank you very much for explanations. Indeed it was not clear in the
> documentation - I read it simply as "if you have snapshots enabled
> nodatacow makes no difference"
>
> I will rebuild the database in this mode from scratch and see how
> performance changes.
>
> So far the most frustating for me was periodic stalls for many seconds
> (running sysbench workload). What was the most puzzling I get this
> even if I run workload at the 50% or less of the full load - Ie
> database can handle 1000 transactions/sec and I only inject 500/sec
> and I still have those stalls.
>
> This is where it looks to me like some work is being delayed and when
> it requires stall for a few seconds to catch up. I wonder if there
> are some configuration options available to play with.
>
> So far I found BTRFS rather "zero configuration" which is great if it
> works but it is also great to have more levers to pull if you're
> having some troubles.
It's worth keeping in mind that there is more to the storage stack than
just the filesystem, and BTRFS tends to be more sensitive to the
behavior of other components in the stack than most other filesystems
are. The stalls you're describing sound more like a symptom of the
brain-dead writeback buffering defaults used by the VFS layer than they
do an issue with BTRFS (although BTRFS tends to be a bit more heavily
impacted by this than most other filesystems). Try fiddling with the
/proc/sys/vm/dirty_* sysctls (there is some pretty good documentation in
Documentation/sysctl/vm.txt in the kernel source) and see if that helps.
The default values it uses are at most 20% of RAM, which is an insane
amount of data to buffer before starting writeback when you're talking
about systems with 16GB of RAM.
>
>
> On Tue, Feb 7, 2017 at 1:27 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> On 2/7/17 8:53 AM, Peter Zaitsev wrote:
>>> Hi,
>>>
>>> I have tried BTRFS from Ubuntu 16.04 LTS for write intensive OLTP MySQL
>>> Workload.
>>>
>>> It did not go very well ranging from multi-seconds stalls where no
>>> transactions are completed to the finally kernel OOPS with "no space left
>>> on device" error message and filesystem going read only.
>>>
>>> I'm complete newbie in BTRFS so I assume I'm doing something wrong.
>>>
>>> Do you have any advice on how BTRFS should be tuned for OLTP workload
>>> (large files having a lot of random writes) ? Or is this the case where
>>> one should simply stay away from BTRFS and use something else ?
>>>
>>> One item recommended in some places is "nodatacow" this however defeats
>>> the main purpose I'm looking at BTRFS - I am interested in "free"
>>> snapshots which look very attractive to use for database recovery scenarios
>>> allow instant rollback to the previous state.
>>>
>>
>> Hi Peter -
>>
>> There seems to be some misunderstanding around how nodatacow works.
>> Nodatacow doesn't prohibit snapshot use. Snapshots are still allowed
>> and, of course, will cause CoW to happen when a write occurs, but only
>> on the first write. Subsequent writes will not CoW again. This does
>> mean you don't get CRC protection for data, though. Since most
>> databases do this internally, that is probably no great loss. You will
>> get fragmentation, but that's true of any random-write workload on btrfs.
>>
>> Timothy's comment about how extents are accounted is more-or-less
>> correct. The file extents in the file system trees reference data
>> extents in the extent tree. When portions of the data extent are
>> unreferenced, they're not necessarily released. A balance operation
>> will usually split the data extents so that the unused space is released.
>>
>> As for the Oopses with ENOSPC, that's something we'd want to look into
>> if it can be reproduced with a more recent kernel. We shouldn't be
>> getting ENOSPC anywhere sensitive anymore.
>>
>> -Jeff
>>
>> --
>> Jeff Mahoney
>> SUSE Labs
>>
>
>
>
next prev parent reply other threads:[~2017-02-07 19:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13 ` Peter Zaitsev
2017-02-07 15:00 ` Timofey Titovets
2017-02-07 15:09 ` Austin S. Hemmelgarn
2017-02-07 15:20 ` Timofey Titovets
2017-02-07 15:43 ` Austin S. Hemmelgarn
2017-02-07 21:14 ` Kai Krakow
2017-02-07 16:22 ` Lionel Bouton
2017-02-07 19:57 ` Roman Mamedov
2017-02-07 20:36 ` Kai Krakow
2017-02-07 20:44 ` Lionel Bouton
2017-02-07 20:47 ` Austin S. Hemmelgarn
2017-02-07 21:25 ` Lionel Bouton
2017-02-07 21:35 ` Kai Krakow
2017-02-07 22:27 ` Hans van Kranenburg
2017-02-08 19:08 ` Goffredo Baroncelli
[not found] ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44 ` Austin S. Hemmelgarn
2017-02-13 17:16 ` linux-btrfs
2017-02-07 19:31 ` Peter Zaitsev
2017-02-07 19:50 ` Austin S. Hemmelgarn
2017-02-07 20:19 ` Kai Krakow
2017-02-07 20:27 ` Austin S. Hemmelgarn
2017-02-07 20:54 ` Kai Krakow
2017-02-08 12:12 ` Austin S. Hemmelgarn
2017-02-08 2:11 ` Peter Zaitsev
2017-02-08 12:14 ` Martin Raiber
2017-02-08 13:00 ` Adrian Brzezinski
2017-02-08 13:08 ` Austin S. Hemmelgarn
2017-02-08 13:26 ` Martin Raiber
2017-02-08 13:32 ` Austin S. Hemmelgarn
2017-02-08 14:28 ` Adrian Brzezinski
2017-02-08 13:38 ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39 ` Kai Krakow
2017-02-07 19:59 ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59 ` Peter Zaitsev
2017-02-07 19:54 ` Austin S. Hemmelgarn [this message]
2017-02-07 20:40 ` Peter Zaitsev
2017-02-07 22:08 ` Hans van Kranenburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb3187d8-16f5-9a84-d0ce-d048f1ac5cc7@gmail.com \
--to=ahferroin7@gmail.com \
--cc=jeffm@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=pz@percona.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).