* Using noCow with snapshots ?
@ 2014-04-09 11:15 Swâmi Petaramesh
2014-04-09 11:41 ` Hugo Mills
2014-04-09 11:56 ` Duncan
0 siblings, 2 replies; 8+ messages in thread
From: Swâmi Petaramesh @ 2014-04-09 11:15 UTC (permalink / raw)
To: linux-btrfs
Hi,
In the quest for BTRFS and performance, and having received the advice to
"chattr +C" my akonadi DB directory to make it noCow, I would like to be sure
about what will happen when I take a snapshot of the concerned BTRFS
subvolume.
1/ Being noCow, will the database be modified in the snapshot as well,
efectively defeating the snapshot ?
2/ Being snapshotted, will the database be COWed even though it's supposed to
be noCow ?
3/ Are both options mutually incompatible in some more osbcure ways ?
I'd like to know where I'm going with this ;-)
TIA and kind regards.
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
@ 2014-04-09 11:41 ` Hugo Mills
2014-04-09 11:56 ` Duncan
1 sibling, 0 replies; 8+ messages in thread
From: Hugo Mills @ 2014-04-09 11:41 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]
On Wed, Apr 09, 2014 at 01:15:24PM +0200, Swâmi Petaramesh wrote:
> Hi,
>
> In the quest for BTRFS and performance, and having received the advice to
> "chattr +C" my akonadi DB directory to make it noCow, I would like to be sure
> about what will happen when I take a snapshot of the concerned BTRFS
> subvolume.
>
> 1/ Being noCow, will the database be modified in the snapshot as well,
> efectively defeating the snapshot ?
No (see below)
> 2/ Being snapshotted, will the database be COWed even though it's
> supposed to be noCow ?
Yes -- once.
When you make a snapshot of a nodatacow file, the data is shared
between the snapshot and the original as normal. The extents are
reference counted, so the original data now has two references to it.
When one of these copies is written to, the writes are placed
somewhere else on the disk, still marked as nodatacow, and the
reference count is reduced to 1 for each copy again. (Note that this
is done on a per-block basis, although the 30-second transaction
commit will tend to coalesce adjacent blocks to reduce fragmentation;
autodefrag helps here, too).
Basically, a snapshot of a nodatacow file will increase the
reference count for its blocks. A write to a block with a reference
count of more than one will *always* write a new block elsewhere. A
write to a block with a reference count of exactly one will not do so
if the file is marked nodatacow. I hope that's clear.
> 3/ Are both options mutually incompatible in some more osbcure ways ?
Only as noted above.
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- There are three mistaikes in this sentance. ---
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
2014-04-09 11:41 ` Hugo Mills
@ 2014-04-09 11:56 ` Duncan
2014-04-10 8:22 ` Swâmi Petaramesh
1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2014-04-09 11:56 UTC (permalink / raw)
To: linux-btrfs
Swâmi Petaramesh posted on Wed, 09 Apr 2014 13:15:24 +0200 as excerpted:
> In the quest for BTRFS and performance, and having received the advice
> to "chattr +C" my akonadi DB directory to make it noCow, I would like to
> be sure about what will happen when I take a snapshot of the concerned
> BTRFS subvolume.
>
> 1/ Being noCow, will the database be modified in the snapshot as well,
> efectively defeating the snapshot ?
>
> 2/ Being snapshotted, will the database be COWed even though it's
> supposed to be noCow ?
>
> 3/ Are both options mutually incompatible in some more osbcure ways ?
>
> I'd like to know where I'm going with this ;-)
Good questions. =:^)
#2. That's from one of the devs when the question came up perhaps a
couple months ago.
On a NOCOW file the first write to a fileblock (4096 bytes) after a
snapshot must still be COW, because the snapshot locks the old version in
place, and now the fileblock has changed, so it MUST be written elsewhere
despite the NOCOW in ordered to keep the snapshot as it was. However,
the file does retain the NOCOW attribute and additional writes to the
same fileblock will be in-place... until the next snapshot of course.
This is why on filesystems with scripted snapshots as close as a minute a
part (I even saw one guy say he was doing them every 30 seconds!!),
setting NOCOW has very little value -- they aren't NOCOW on the first
write after a snapshot, and with snapshots happening every minute...,
Hourly snapshots are still likely to be a problem on a regularly changing
file, tho with daily snapshots you'd probably save some fragmentation
over the fairly short term anyway, but it'd still be a problem longer
term.
Which is why I suggest putting such files on a separate subvolume and not
snapshotting that subvolume, since snapshots stop at the subvolume
boundary. That gives NOCOW a chance to actually *BE* NOCOW.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-09 11:56 ` Duncan
@ 2014-04-10 8:22 ` Swâmi Petaramesh
2014-04-10 13:19 ` George Eleftheriou
2014-04-10 14:58 ` Duncan
0 siblings, 2 replies; 8+ messages in thread
From: Swâmi Petaramesh @ 2014-04-10 8:22 UTC (permalink / raw)
To: linux-btrfs
Thanks Duncan for the perfect explanations.
>From this, I understand that I might get both better performance by setting my
akonadi dir to "nocow", and still be able to take a snapshot from time to
time, which is exactly what I need.
Besides this, I'm still wondering about the changes in data security that
turning a database to "NoCow" would bring, i.e. would the data still be well
protected in case of a system crash or power failure ?
I have precious data in there and wouldn't like to jeopardize its security for
a performance gain...
Kind regards.
Le mercredi 9 avril 2014 11:56:20 Duncan a écrit :
> Good questions. =:^)
>
> #2. That's from one of the devs when the question came up perhaps a
> couple months ago.
>
> On a NOCOW file the first write to a fileblock (4096 bytes) after a
> snapshot must still be COW, because the snapshot locks the old version in
> place, and now the fileblock has changed, so it MUST be written elsewhere
> despite the NOCOW in ordered to keep the snapshot as it was. However,
> the file does retain the NOCOW attribute and additional writes to the
> same fileblock will be in-place... until the next snapshot of course.
>
> This is why on filesystems with scripted snapshots as close as a minute a
> part (I even saw one guy say he was doing them every 30 seconds!!),
> setting NOCOW has very little value -- they aren't NOCOW on the first
> write after a snapshot, and with snapshots happening every minute...,
> Hourly snapshots are still likely to be a problem on a regularly changing
> file, tho with daily snapshots you'd probably save some fragmentation
> over the fairly short term anyway, but it'd still be a problem longer
> term.
>
> Which is why I suggest putting such files on a separate subvolume and not
> snapshotting that subvolume, since snapshots stop at the subvolume
> boundary. That gives NOCOW a chance to actually *BE* NOCOW.
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-10 8:22 ` Swâmi Petaramesh
@ 2014-04-10 13:19 ` George Eleftheriou
2014-04-10 14:58 ` Duncan
1 sibling, 0 replies; 8+ messages in thread
From: George Eleftheriou @ 2014-04-10 13:19 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: linux-btrfs
> Besides this, I'm still wondering about the changes in data security that
> turning a database to "NoCow" would bring, i.e. would the data still be well
> protected in case of a system crash or power failure ?
>
> I have precious data in there and wouldn't like to jeopardize its security for
> a performance gain...
Quoting from the mount options of the wiki:
nodatacow
Do not copy-on-write data for newly created files, existing files are
unaffected. This also turns off checksumming! IOW, nodatacow implies
nodatasum. datacow is used to ensure the user either has access to the
old version of a file, or to the newer version of the file. datacow
makes sure we never have partially updated files written to disk.
nodatacow gives slight performance boost by directly overwriting data
(like ext[234]), at the expense of potentially getting partially
updated files on system failures. Performance gain is usually < 5%
unless the workload is random writes to large database files, where
the difference can become very large. NOTE: switches off compression !
I don't know if the above applies for "chattr +C" as well... Perhaps
someone more experienced can help.
In any case, if you have precious data in there then, COW or NOCOW,
you should regularly do backups to a separate, discrete filesystem.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-10 8:22 ` Swâmi Petaramesh
2014-04-10 13:19 ` George Eleftheriou
@ 2014-04-10 14:58 ` Duncan
2014-05-07 5:36 ` Russell Coker
1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2014-04-10 14:58 UTC (permalink / raw)
To: linux-btrfs
Swâmi Petaramesh posted on Thu, 10 Apr 2014 10:22:15 +0200 as excerpted:
> Thanks Duncan for the perfect explanations.
>
> From this, I understand that I might get both better performance by
> setting my akonadi dir to "nocow", and still be able to take a snapshot
> from time to time, which is exactly what I need.
>
> Besides this, I'm still wondering about the changes in data security
> that turning a database to "NoCow" would bring, i.e. would the data
> still be well protected in case of a system crash or power failure ?
>
> I have precious data in there and wouldn't like to jeopardize its
> security for a performance gain...
As eleftg suggests, the data integrity (and compression) features are
turned off when something is set NOCOW, whether that "something" is an
individual file or the whole subvolume (mounted with nodatacow).
However, that's not as bad a situation as one might initially think.
Many of the applications that do routine internal-file-writes already
have at least basic data integrity management of their own, as they've
been more or less forced to in ordered to have any stability at all since
filesystems traditionally do not have data integrity management of their
own. This being the case, they're already somewhat prepared to detect
and do limited recovery from file corruption, perhaps losing the last
couple transactions but preserving the data as a whole.
What can happen in the event of a crash is that btrfs and the application
can both handle data integrity, but because they're implemented and work
independently, with no knowledge of what the other one is doing, their
restore efforts can "fight", such that the recovery snapshot that the app
took internally and tries to recover with can be screwed up by btrfs
trying to do its own recovery, due to timing differences between the two
recovery mechanisms and btrfs restoring an image that's half from before
the internal application snapshot and half after, such that were either
mechanism to be used on its own, things would work, but the combination
of the two independent mechanisms working on the same data is actually
worse than either one alone.
This risk is compounded by delicate race/timing issues that develop in an
ongoing incoming write stream scenario, where there's data commits in
various stages of the pipeline at any particular point and strict
ordering must be maintained so that new data doesn't get written to
storage before the checksum for the old data is calculated and written to
storage as well. Normally, COW is designed to ensure all this is handled
atomically, that either the old version or the new version is all there,
not some combination of both such that the checksum doesn't match the mix
of data that is actually in the file as it exists on-device, but NOCOW
bypasses that mechanism and does rewrite-in-place, which dramatically
complicates the ordering and timing issues related to getting everything
written in the correct order so the checksum reflects the data that's
actually there.
And of course if you have compression on, this complicates matters
further with yet another step, plus since the new data probably
compresses to a different size than the old, attempting to rewrite-in-
place is essentially impossible.
So turning both compression and datasum integrity checking off and
bypassing both mechanisms entirely when nocow is set, really is the best
choice, letting the application itself manage data integrity, or not, as
it chooses based on what it considers the risk/value ratio of the data in
question.
Tho based on my own experiences, I don't trust akonadi's data integrity
management any farther than I could throw a multi-ton truck, which is to
say not at all, which is why I switched to a different solution that
seems to work far more reliably for me.
But that said, one thing the kmail and akonadi devs *DID* do right is
that the original messages are still saved in the usual maildir format on-
device. All the akonadi database does is cache that data in a form
that's faster to use. So in theory at least, if the database DOES get
corrupted beyond recovery, it's only caching the data that's still on-
device as plain text files, and thus it's simple enough to simply delete
and rebuild the caching database based on the still existing plain text
files that contain the original data.
And actually, that was indeed my experience. I don't think I ever
actually lost data. I just had to rebuild the cache more frequently than
I thought I should, and that was a hassle I made an executive decision
that I simply wasn't going to go thru any longer, since there were other
alternatives that worked better for me, without that hassle.
So YMMV. You appear to want to keep kmail and akonadi, and that's fine.
But there are certain compromises that must be made in ordered to do so.
As the saying goes, pick your poison. You can either choose a filesystem
other than btrfs to store the akonadi database on and deal with what it
offers (generally standard rewrite-in-place and no filesystem-level-data-
integrity-management), or choose btrfs, with its other options. If you
choose btrfs, you can choose either the normal COW mode with its
performance issues that go along with this type of usage, or one of the
various NOCOW options. If you take the NOCOW option, you can further
choose to do snapshotting and accept the performance and fragmentation
issues that brings altho they'll be somewhat less than with COW as long
as you limit the snapshotting, or use a subvol for these files and don't
snapshot it, instead doing conventional backups for it.
Here, if I were dealing with that type of file (either because I'd chosen
to keep using akonadi or because something else I was using had the same
access pattern), since I tend to use multi-partitioning more than most
already, I'd probably simply use a dedicated partition for those files
and make it something other than btrfs -- I'm familiar with reiserfs but
for that use-case I'd probably try xfs. If for some reason I didn't want
to do the whole separate partition with its different filesystem thing,
I'd use a dedicated btrfs subvolume for it, set the directory and files
NOCOW, and do traditional backups rather than snapshotting for that
subvolume.
But that's just me. It's your system and priorities you have to deal
with, so your choice. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-04-10 14:58 ` Duncan
@ 2014-05-07 5:36 ` Russell Coker
2014-05-07 11:09 ` Duncan
0 siblings, 1 reply; 8+ messages in thread
From: Russell Coker @ 2014-05-07 5:36 UTC (permalink / raw)
To: Duncan, linux-btrfs
How could BTRFS and a database "fight" about data recovery?
BTRFS offers similar guarantees about data durability etc to other journalled filesystems and only differs by having checksums so that while a snapshot might have half the data that was written by an app you at least know that the half will be consistent.
If you had database files on a separate subvol to the database log then you would be at risk of having problems making a any sort of consistent snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a bad idea). But there would be no difference with LVM snapshots in that regard.
--
Sent from my Samsung Galaxy Note 2 with K-9 Mail.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ?
2014-05-07 5:36 ` Russell Coker
@ 2014-05-07 11:09 ` Duncan
0 siblings, 0 replies; 8+ messages in thread
From: Duncan @ 2014-05-07 11:09 UTC (permalink / raw)
To: linux-btrfs
Russell Coker posted on Wed, 07 May 2014 15:36:15 +1000 as excerpted:
> How could BTRFS and a database "fight" about data recovery?
>
> BTRFS offers similar guarantees about data durability etc to other
> journalled filesystems and only differs by having checksums so that
> while a snapshot might have half the data that was written by an app you
> at least know that the half will be consistent.
>
> If you had database files on a separate subvol to the database log then
> you would be at risk of having problems making a any sort of consistent
> snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a
> bad idea). But there would be no difference with LVM snapshots in that
> regard.
Race conditions having to do with unsynced checkpoints, primarily. And
it's actually the btrfs checksumming that seems to create the problem.
The symptom being reported (tho I can say I've not seen further reports
recently, maybe it's fixed now) was that the checksummed values btrfs
restored as "correct" were considered corrupted by the database or vm.
If the checksums checked out after btrfs did its replay (as they did or
btrfs would error on access), but the databases and VMs were still
reporting corruption, then the explanation that was left was that the
btrfs replay and checksum validation was screwing up the application's
own checksumming validation, which could be explained if the two were
sufficiently out of sync that btrfs fixing its own view was actually
breaking the view as seen by the data validating app.
Tho as I said I've not seen that sort of report in several kernel cycles
now. But I'm not sure whether that's because the issues have been fixed
or for some other reason (maybe everybody experiencing the problem gave
up and switched to some other filesystem now, and the message is out
there well enough that new people see it before they experience and
report the same thing, or similar but everybody's switched to NOCOW now
and knows not to do snapshotting on the NOCOW files, or...).
Regardless, NOCOW and not doing snapshotting (because it triggers COW
anyway) on gig-plus internal-write files remains a very good idea.
(Also, quotas and quota sequence numbers play into the combinational
explosion problem along with snapshot-aware-defrag, too. See the writeup
on that that Dave wrote while he was on paternity leave.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-07 11:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
2014-04-09 11:41 ` Hugo Mills
2014-04-09 11:56 ` Duncan
2014-04-10 8:22 ` Swâmi Petaramesh
2014-04-10 13:19 ` George Eleftheriou
2014-04-10 14:58 ` Duncan
2014-05-07 5:36 ` Russell Coker
2014-05-07 11:09 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).