linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lionel Bouton <lionel-subscription@bouton.name>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: BTRFS for OLTP Databases
Date: Tue, 7 Feb 2017 22:25:29 +0100	[thread overview]
Message-ID: <7204f2cf-bcb4-c943-b6d2-f9eb4b5b29cf@bouton.name> (raw)
In-Reply-To: <7c1a67ce-a62c-36e1-d228-9a1e15e4d16c@gmail.com>

Le 07/02/2017 à 21:47, Austin S. Hemmelgarn a écrit :
> On 2017-02-07 15:36, Kai Krakow wrote:
>> Am Tue, 7 Feb 2017 09:13:25 -0500
>> schrieb Peter Zaitsev <pz@percona.com>:
>>
>>> Hi Hugo,
>>>
>>> For the use case I'm looking for I'm interested in having snapshot(s)
>>> open at all time.  Imagine  for example snapshot being created every
>>> hour and several of these snapshots  kept at all time providing quick
>>> recovery points to the state of 1,2,3 hours ago.  In  such case (as I
>>> think you also describe)  nodatacow  does not provide any advantage.
>>
>> Out of curiosity, I see one problem here:
>>
>> If you're doing snapshots of the live database, each snapshot leaves
>> the database files like killing the database in-flight. Like shutting
>> the system down in the middle of writing data.
>>
>> This is because I think there's no API for user space to subscribe to
>> events like a snapshot - unlike e.g. the VSS API (volume snapshot
>> service) in Windows. You should put the database into frozen state to
>> prepare it for a hotcopy before creating the snapshot, then ensure all
>> data is flushed before continuing.
> Correct.
>>
>> I think I've read that btrfs snapshots do not guarantee single point in
>> time snapshots - the snapshot may be smeared across a longer period of
>> time while the kernel is still writing data. So parts of your writes
>> may still end up in the snapshot after issuing the snapshot command,
>> instead of in the working copy as expected.
> Also correct AFAICT, and this needs to be better documented (for most
> people, the term snapshot implies atomicity of the operation).

Atomicity can be a relative term. If the snapshot atomicity is relative
to barriers but not relative to individual writes between barriers then
AFAICT it's fine because the filesystem doesn't make any promise it
won't keep even in the context of its snapshots.
Consider a power loss : the filesystems atomicity guarantees can't go
beyond what the hardware guarantees which means not all current in fly
write will reach the disk and partial writes can happen. Modern
filesystems will remain consistent though and if an application using
them makes uses of f*sync it can provide its own guarantees too. The
same should apply to snapshots : all the writes in fly can complete or
not on disk before the snapshot what matters is that both the snapshot
and these writes will be completed after the next barrier (and any
robust application will ignore all the in fly writes it finds in the
snapshot if they were part of a batch that should be atomically commited).

This is why AFAIK PostgreSQL or MySQL with their default ACID compliant
configuration will recover from a BTRFS snapshot in the same way they
recover from a power loss.

Lionel

  reply	other threads:[~2017-02-07 21:27 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13   ` Peter Zaitsev
2017-02-07 15:00     ` Timofey Titovets
2017-02-07 15:09       ` Austin S. Hemmelgarn
2017-02-07 15:20         ` Timofey Titovets
2017-02-07 15:43           ` Austin S. Hemmelgarn
2017-02-07 21:14             ` Kai Krakow
2017-02-07 16:22     ` Lionel Bouton
2017-02-07 19:57     ` Roman Mamedov
2017-02-07 20:36     ` Kai Krakow
2017-02-07 20:44       ` Lionel Bouton
2017-02-07 20:47       ` Austin S. Hemmelgarn
2017-02-07 21:25         ` Lionel Bouton [this message]
2017-02-07 21:35           ` Kai Krakow
2017-02-07 22:27             ` Hans van Kranenburg
2017-02-08 19:08             ` Goffredo Baroncelli
     [not found]         ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44           ` Austin S. Hemmelgarn
2017-02-13 17:16             ` linux-btrfs
2017-02-07 19:31   ` Peter Zaitsev
2017-02-07 19:50     ` Austin S. Hemmelgarn
2017-02-07 20:19       ` Kai Krakow
2017-02-07 20:27         ` Austin S. Hemmelgarn
2017-02-07 20:54           ` Kai Krakow
2017-02-08 12:12             ` Austin S. Hemmelgarn
2017-02-08  2:11   ` Peter Zaitsev
2017-02-08 12:14     ` Martin Raiber
2017-02-08 13:00       ` Adrian Brzezinski
2017-02-08 13:08       ` Austin S. Hemmelgarn
2017-02-08 13:26         ` Martin Raiber
2017-02-08 13:32           ` Austin S. Hemmelgarn
2017-02-08 14:28             ` Adrian Brzezinski
2017-02-08 13:38           ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39   ` Kai Krakow
2017-02-07 19:59     ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59   ` Peter Zaitsev
2017-02-07 19:54     ` Austin S. Hemmelgarn
2017-02-07 20:40       ` Peter Zaitsev
2017-02-07 22:08     ` Hans van Kranenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7204f2cf-bcb4-c943-b6d2-f9eb4b5b29cf@bouton.name \
    --to=lionel-subscription@bouton.name \
    --cc=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).