From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Martin Raiber <martin@urbackup.org>,
Peter Zaitsev <pz@percona.com>,
linux-btrfs@vger.kernel.org
Subject: Re: BTRFS for OLTP Databases
Date: Wed, 8 Feb 2017 08:32:11 -0500 [thread overview]
Message-ID: <f24de2b5-a4c1-a729-2f46-90a1911ef168@gmail.com> (raw)
In-Reply-To: <0102015a1de76a82-da5513d7-1cd8-4eff-9e0a-e34aac752e1f-000000@eu-west-1.amazonses.com>
On 2017-02-08 08:26, Martin Raiber wrote:
> On 08.02.2017 14:08 Austin S. Hemmelgarn wrote:
>> On 2017-02-08 07:14, Martin Raiber wrote:
>>> Hi,
>>>
>>> On 08.02.2017 03:11 Peter Zaitsev wrote:
>>>> Out of curiosity, I see one problem here:
>>>> If you're doing snapshots of the live database, each snapshot leaves
>>>> the database files like killing the database in-flight. Like shutting
>>>> the system down in the middle of writing data.
>>>>
>>>> This is because I think there's no API for user space to subscribe to
>>>> events like a snapshot - unlike e.g. the VSS API (volume snapshot
>>>> service) in Windows. You should put the database into frozen state to
>>>> prepare it for a hotcopy before creating the snapshot, then ensure all
>>>> data is flushed before continuing.
>>>>
>>>> I think I've read that btrfs snapshots do not guarantee single point in
>>>> time snapshots - the snapshot may be smeared across a longer period of
>>>> time while the kernel is still writing data. So parts of your writes
>>>> may still end up in the snapshot after issuing the snapshot command,
>>>> instead of in the working copy as expected.
>>>>
>>>> How is this going to be addressed? Is there some snapshot aware API to
>>>> let user space subscribe to such events and do proper preparation? Is
>>>> this planned? LVM could be a user of such an API, too. I think this
>>>> could have nice enterprise-grade value for Linux.
>>>>
>>>> XFS has xfs_freeze and xfs_thaw for this, to prepare LVM snapshots. But
>>>> still, also this needs to be integrated with MySQL to properly work. I
>>>> once (years ago) researched on this but gave up on my plans when I
>>>> planned database backups for our web server infrastructure. We moved to
>>>> creating SQL dumps instead, although there're binlogs which can be used
>>>> to recover to a clean and stable transactional state after taking
>>>> snapshots. But I simply didn't want to fiddle around with properly
>>>> cleaning up binlogs which accumulate horribly much space usage over
>>>> time. The cleanup process requires to create a cold copy or dump of the
>>>> complete database from time to time, only then it's safe to remove all
>>>> binlogs up to that point in time.
>>>
>>> little bit off topic, but I for one would be on board with such an
>>> effort. It "just" needs coordination between the backup
>>> software/snapshot tools, the backed up software and the various snapshot
>>> providers. If you look at the Windows VSS API, this would be a
>>> relatively large undertaking if all the corner cases are taken into
>>> account, like e.g. a database having the database log on a separate
>>> volume from the data, dependencies between different components etc.
>>>
>>> You'll know more about this, but databases usually fsync quite often in
>>> their default configuration, so btrfs snapshots shouldn't be much behind
>>> the properly snapshotted state, so I see the advantages more with
>>> usability and taking care of corner cases automatically.
>> Just my perspective, but BTRFS (and XFS, and OCFS2) already provide
>> reflinking to userspace, and therefore it's fully possible to
>> implement this in userspace. Having a version of the fsfreeze (the
>> generic form of xfs_freeze) stuff that worked on individual sub-trees
>> would be nice from a practical perspective, but implementing it would
>> not be easy by any means, and would be essentially necessary for a
>> VSS-like API. In the meantime though, it is fully possible for the
>> application software to implement this itself without needing anything
>> more from the kernel.
>
> VSS snapshots whole volumes, not individual files (so comparable to an
> LVM snapshot). The sub-folder freeze would be something useful in some
> situations, but duplicating the files+extends might also take too long
> in a lot of situations. You are correct that the kernel features are
> there and what is missing is a user-space daemon, plus a protocol that
> facilitates/coordinates the backups/snapshots.
>
> Sending a FIFREEZE ioctl, taking a snapshot and then thawing it does not
> really help in some situations as e.g. MySQL InnoDB uses O_DIRECT and
> manages its on buffer pool which won't get the FIFREEZE and flush, but
> as said, the default configuration is to flush/fsync on every commit.
OK, there's part of the misunderstanding. You can't FIFREEZE a BTRFS
filesystem and then take a snapshot in it, because the snapshot requires
writing to the filesystem (which the FIFREEZE would prevent, so a script
that tried to do this would deadlock). A new version of the FIFREEZE
ioctl would be needed that operates on subvolumes.
next prev parent reply other threads:[~2017-02-08 13:42 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13 ` Peter Zaitsev
2017-02-07 15:00 ` Timofey Titovets
2017-02-07 15:09 ` Austin S. Hemmelgarn
2017-02-07 15:20 ` Timofey Titovets
2017-02-07 15:43 ` Austin S. Hemmelgarn
2017-02-07 21:14 ` Kai Krakow
2017-02-07 16:22 ` Lionel Bouton
2017-02-07 19:57 ` Roman Mamedov
2017-02-07 20:36 ` Kai Krakow
2017-02-07 20:44 ` Lionel Bouton
2017-02-07 20:47 ` Austin S. Hemmelgarn
2017-02-07 21:25 ` Lionel Bouton
2017-02-07 21:35 ` Kai Krakow
2017-02-07 22:27 ` Hans van Kranenburg
2017-02-08 19:08 ` Goffredo Baroncelli
[not found] ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44 ` Austin S. Hemmelgarn
2017-02-13 17:16 ` linux-btrfs
2017-02-07 19:31 ` Peter Zaitsev
2017-02-07 19:50 ` Austin S. Hemmelgarn
2017-02-07 20:19 ` Kai Krakow
2017-02-07 20:27 ` Austin S. Hemmelgarn
2017-02-07 20:54 ` Kai Krakow
2017-02-08 12:12 ` Austin S. Hemmelgarn
2017-02-08 2:11 ` Peter Zaitsev
2017-02-08 12:14 ` Martin Raiber
2017-02-08 13:00 ` Adrian Brzezinski
2017-02-08 13:08 ` Austin S. Hemmelgarn
2017-02-08 13:26 ` Martin Raiber
2017-02-08 13:32 ` Austin S. Hemmelgarn [this message]
2017-02-08 14:28 ` Adrian Brzezinski
2017-02-08 13:38 ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39 ` Kai Krakow
2017-02-07 19:59 ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59 ` Peter Zaitsev
2017-02-07 19:54 ` Austin S. Hemmelgarn
2017-02-07 20:40 ` Peter Zaitsev
2017-02-07 22:08 ` Hans van Kranenburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f24de2b5-a4c1-a729-2f46-90a1911ef168@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin@urbackup.org \
--cc=pz@percona.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).