From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: Kai Krakow <hurikhan77@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS for OLTP Databases
Date: Tue, 7 Feb 2017 23:27:51 +0100 [thread overview]
Message-ID: <6d69b44a-f4b6-2f05-260f-7a740cdd29c3@mendix.com> (raw)
In-Reply-To: <20170207223538.3c37c840@jupiter.sol.kaishome.de>
On 02/07/2017 10:35 PM, Kai Krakow wrote:
> Am Tue, 7 Feb 2017 22:25:29 +0100
> schrieb Lionel Bouton <lionel-subscription@bouton.name>:
>
>> Le 07/02/2017 à 21:47, Austin S. Hemmelgarn a écrit :
>>> On 2017-02-07 15:36, Kai Krakow wrote:
>>>> Am Tue, 7 Feb 2017 09:13:25 -0500
>>>> schrieb Peter Zaitsev <pz@percona.com>:
>>>>
>> [...]
>>>>
>>>> Out of curiosity, I see one problem here:
>>>>
>>>> If you're doing snapshots of the live database, each snapshot
>>>> leaves the database files like killing the database in-flight.
>>>> Like shutting the system down in the middle of writing data.
>>>>
>>>> This is because I think there's no API for user space to subscribe
>>>> to events like a snapshot - unlike e.g. the VSS API (volume
>>>> snapshot service) in Windows. You should put the database into
>>>> frozen state to prepare it for a hotcopy before creating the
>>>> snapshot, then ensure all data is flushed before continuing.
>>> Correct.
>>>>
>>>> I think I've read that btrfs snapshots do not guarantee single
>>>> point in time snapshots - the snapshot may be smeared across a
>>>> longer period of time while the kernel is still writing data. So
>>>> parts of your writes may still end up in the snapshot after
>>>> issuing the snapshot command, instead of in the working copy as
>>>> expected.
>>> Also correct AFAICT, and this needs to be better documented (for
>>> most people, the term snapshot implies atomicity of the
>>> operation).
>>
>> Atomicity can be a relative term. If the snapshot atomicity is
>> relative to barriers but not relative to individual writes between
>> barriers then AFAICT it's fine because the filesystem doesn't make
>> any promise it won't keep even in the context of its snapshots.
>> Consider a power loss : the filesystems atomicity guarantees can't go
>> beyond what the hardware guarantees which means not all current in fly
>> write will reach the disk and partial writes can happen. Modern
>> filesystems will remain consistent though and if an application using
>> them makes uses of f*sync it can provide its own guarantees too. The
>> same should apply to snapshots : all the writes in fly can complete or
>> not on disk before the snapshot what matters is that both the snapshot
>> and these writes will be completed after the next barrier (and any
>> robust application will ignore all the in fly writes it finds in the
>> snapshot if they were part of a batch that should be atomically
>> commited).
>>
>> This is why AFAIK PostgreSQL or MySQL with their default ACID
>> compliant configuration will recover from a BTRFS snapshot in the
>> same way they recover from a power loss.
>
> This is what I meant in my other reply. But this is also why it should
> be documented. Wrongly implying that snapshots are single point in time
> snapshots is a wrong assumption with possibly horrible side effects one
> wouldn't expect.
It depends on what the definition of time is. (whoa!!) A snapshot is
taken of a single point in the lifetime of a filesystem tree (a
generation, the point where a transaction commits)...?
> Taking a snapshot is like a power loss - even tho there is no power
> loss. So the database has to be properly configured. It is simply short
> sighted if you don't think about this fact. The documentation should
> really point that fact out.
I'd almost say that it would be short sighted to assume a btrfs snapshot
would *not* behave like a power loss. At least, to me (thinking as a
sysadmin) it feels really weird to think of it in any other way than that.
Oh wait, that's what you mean, or not? What is the thing that the
documentation should point out? I'm not trying to be trolling, the piled
up double negations make this discussion a bit hard to read.
Moo
--
Hans van Kranenburg
next prev parent reply other threads:[~2017-02-07 22:28 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13 ` Peter Zaitsev
2017-02-07 15:00 ` Timofey Titovets
2017-02-07 15:09 ` Austin S. Hemmelgarn
2017-02-07 15:20 ` Timofey Titovets
2017-02-07 15:43 ` Austin S. Hemmelgarn
2017-02-07 21:14 ` Kai Krakow
2017-02-07 16:22 ` Lionel Bouton
2017-02-07 19:57 ` Roman Mamedov
2017-02-07 20:36 ` Kai Krakow
2017-02-07 20:44 ` Lionel Bouton
2017-02-07 20:47 ` Austin S. Hemmelgarn
2017-02-07 21:25 ` Lionel Bouton
2017-02-07 21:35 ` Kai Krakow
2017-02-07 22:27 ` Hans van Kranenburg [this message]
2017-02-08 19:08 ` Goffredo Baroncelli
[not found] ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44 ` Austin S. Hemmelgarn
2017-02-13 17:16 ` linux-btrfs
2017-02-07 19:31 ` Peter Zaitsev
2017-02-07 19:50 ` Austin S. Hemmelgarn
2017-02-07 20:19 ` Kai Krakow
2017-02-07 20:27 ` Austin S. Hemmelgarn
2017-02-07 20:54 ` Kai Krakow
2017-02-08 12:12 ` Austin S. Hemmelgarn
2017-02-08 2:11 ` Peter Zaitsev
2017-02-08 12:14 ` Martin Raiber
2017-02-08 13:00 ` Adrian Brzezinski
2017-02-08 13:08 ` Austin S. Hemmelgarn
2017-02-08 13:26 ` Martin Raiber
2017-02-08 13:32 ` Austin S. Hemmelgarn
2017-02-08 14:28 ` Adrian Brzezinski
2017-02-08 13:38 ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39 ` Kai Krakow
2017-02-07 19:59 ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59 ` Peter Zaitsev
2017-02-07 19:54 ` Austin S. Hemmelgarn
2017-02-07 20:40 ` Peter Zaitsev
2017-02-07 22:08 ` Hans van Kranenburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d69b44a-f4b6-2f05-260f-7a740cdd29c3@mendix.com \
--to=hans.van.kranenburg@mendix.com \
--cc=hurikhan77@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).