linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Martin Raiber <martin@urbackup.org>,
	Peter Zaitsev <pz@percona.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: BTRFS for OLTP Databases
Date: Wed, 8 Feb 2017 08:32:11 -0500	[thread overview]
Message-ID: <f24de2b5-a4c1-a729-2f46-90a1911ef168@gmail.com> (raw)
In-Reply-To: <0102015a1de76a82-da5513d7-1cd8-4eff-9e0a-e34aac752e1f-000000@eu-west-1.amazonses.com>

On 2017-02-08 08:26, Martin Raiber wrote:
> On 08.02.2017 14:08 Austin S. Hemmelgarn wrote:
>> On 2017-02-08 07:14, Martin Raiber wrote:
>>> Hi,
>>>
>>> On 08.02.2017 03:11 Peter Zaitsev wrote:
>>>> Out of curiosity, I see one problem here:
>>>> If you're doing snapshots of the live database, each snapshot leaves
>>>> the database files like killing the database in-flight. Like shutting
>>>> the system down in the middle of writing data.
>>>>
>>>> This is because I think there's no API for user space to subscribe to
>>>> events like a snapshot - unlike e.g. the VSS API (volume snapshot
>>>> service) in Windows. You should put the database into frozen state to
>>>> prepare it for a hotcopy before creating the snapshot, then ensure all
>>>> data is flushed before continuing.
>>>>
>>>> I think I've read that btrfs snapshots do not guarantee single point in
>>>> time snapshots - the snapshot may be smeared across a longer period of
>>>> time while the kernel is still writing data. So parts of your writes
>>>> may still end up in the snapshot after issuing the snapshot command,
>>>> instead of in the working copy as expected.
>>>>
>>>> How is this going to be addressed? Is there some snapshot aware API to
>>>> let user space subscribe to such events and do proper preparation? Is
>>>> this planned? LVM could be a user of such an API, too. I think this
>>>> could have nice enterprise-grade value for Linux.
>>>>
>>>> XFS has xfs_freeze and xfs_thaw for this, to prepare LVM snapshots. But
>>>> still, also this needs to be integrated with MySQL to properly work. I
>>>> once (years ago) researched on this but gave up on my plans when I
>>>> planned database backups for our web server infrastructure. We moved to
>>>> creating SQL dumps instead, although there're binlogs which can be used
>>>> to recover to a clean and stable transactional state after taking
>>>> snapshots. But I simply didn't want to fiddle around with properly
>>>> cleaning up binlogs which accumulate horribly much space usage over
>>>> time. The cleanup process requires to create a cold copy or dump of the
>>>> complete database from time to time, only then it's safe to remove all
>>>> binlogs up to that point in time.
>>>
>>> little bit off topic, but I for one would be on board with such an
>>> effort. It "just" needs coordination between the backup
>>> software/snapshot tools, the backed up software and the various snapshot
>>> providers. If you look at the Windows VSS API, this would be a
>>> relatively large undertaking if all the corner cases are taken into
>>> account, like e.g. a database having the database log on a separate
>>> volume from the data, dependencies between different components etc.
>>>
>>> You'll know more about this, but databases usually fsync quite often in
>>> their default configuration, so btrfs snapshots shouldn't be much behind
>>> the properly snapshotted state, so I see the advantages more with
>>> usability and taking care of corner cases automatically.
>> Just my perspective, but BTRFS (and XFS, and OCFS2) already provide
>> reflinking to userspace, and therefore it's fully possible to
>> implement this in userspace.  Having a version of the fsfreeze (the
>> generic form of xfs_freeze) stuff that worked on individual sub-trees
>> would be nice from a practical perspective, but implementing it would
>> not be easy by any means, and would be essentially necessary for a
>> VSS-like API.  In the meantime though, it is fully possible for the
>> application software to implement this itself without needing anything
>> more from the kernel.
>
> VSS snapshots whole volumes, not individual files (so comparable to an
> LVM snapshot). The sub-folder freeze would be something useful in some
> situations, but duplicating the files+extends might also take too long
> in a lot of situations. You are correct that the kernel features are
> there and what is missing is a user-space daemon, plus a protocol that
> facilitates/coordinates the backups/snapshots.
>
> Sending a FIFREEZE ioctl, taking a snapshot and then thawing it does not
> really help in some situations as e.g. MySQL InnoDB uses O_DIRECT and
> manages its on buffer pool which won't get the FIFREEZE and flush, but
> as said, the default configuration is to flush/fsync on every commit.
OK, there's part of the misunderstanding.  You can't FIFREEZE a BTRFS 
filesystem and then take a snapshot in it, because the snapshot requires 
writing to the filesystem (which the FIFREEZE would prevent, so a script 
that tried to do this would deadlock).  A new version of the FIFREEZE 
ioctl would be needed that operates on subvolumes.

  reply	other threads:[~2017-02-08 13:42 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 13:53 BTRFS for OLTP Databases Peter Zaitsev
2017-02-07 14:00 ` Hugo Mills
2017-02-07 14:13   ` Peter Zaitsev
2017-02-07 15:00     ` Timofey Titovets
2017-02-07 15:09       ` Austin S. Hemmelgarn
2017-02-07 15:20         ` Timofey Titovets
2017-02-07 15:43           ` Austin S. Hemmelgarn
2017-02-07 21:14             ` Kai Krakow
2017-02-07 16:22     ` Lionel Bouton
2017-02-07 19:57     ` Roman Mamedov
2017-02-07 20:36     ` Kai Krakow
2017-02-07 20:44       ` Lionel Bouton
2017-02-07 20:47       ` Austin S. Hemmelgarn
2017-02-07 21:25         ` Lionel Bouton
2017-02-07 21:35           ` Kai Krakow
2017-02-07 22:27             ` Hans van Kranenburg
2017-02-08 19:08             ` Goffredo Baroncelli
     [not found]         ` <b0de25a7-989e-d16a-2ce6-2b6c1edde08b@gmail.com>
2017-02-13 12:44           ` Austin S. Hemmelgarn
2017-02-13 17:16             ` linux-btrfs
2017-02-07 19:31   ` Peter Zaitsev
2017-02-07 19:50     ` Austin S. Hemmelgarn
2017-02-07 20:19       ` Kai Krakow
2017-02-07 20:27         ` Austin S. Hemmelgarn
2017-02-07 20:54           ` Kai Krakow
2017-02-08 12:12             ` Austin S. Hemmelgarn
2017-02-08  2:11   ` Peter Zaitsev
2017-02-08 12:14     ` Martin Raiber
2017-02-08 13:00       ` Adrian Brzezinski
2017-02-08 13:08       ` Austin S. Hemmelgarn
2017-02-08 13:26         ` Martin Raiber
2017-02-08 13:32           ` Austin S. Hemmelgarn [this message]
2017-02-08 14:28             ` Adrian Brzezinski
2017-02-08 13:38           ` Peter Zaitsev
2017-02-07 14:47 ` Peter Grandi
2017-02-07 15:06 ` Austin S. Hemmelgarn
2017-02-07 19:39   ` Kai Krakow
2017-02-07 19:59     ` Austin S. Hemmelgarn
2017-02-07 18:27 ` Jeff Mahoney
2017-02-07 18:59   ` Peter Zaitsev
2017-02-07 19:54     ` Austin S. Hemmelgarn
2017-02-07 20:40       ` Peter Zaitsev
2017-02-07 22:08     ` Hans van Kranenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f24de2b5-a4c1-a729-2f46-90a1911ef168@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@urbackup.org \
    --cc=pz@percona.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).