Using noCow with snapshots ?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Using noCow with snapshots ?
@ 2014-04-09 11:15 Swâmi Petaramesh
  2014-04-09 11:41 ` Hugo Mills
  2014-04-09 11:56 ` Duncan
  0 siblings, 2 replies; 8+ messages in thread
From: Swâmi Petaramesh @ 2014-04-09 11:15 UTC (permalink / raw)
  To: linux-btrfs

Hi,

In the quest for BTRFS and performance, and having received the advice to 
"chattr +C" my akonadi DB directory to make it noCow, I would like to be sure 
about what will happen when I take a snapshot of the concerned BTRFS 
subvolume.

1/ Being noCow, will the database be modified in the snapshot as well, 
efectively defeating the snapshot ?

2/ Being snapshotted, will the database be COWed even though it's supposed to 
be noCow ?

3/ Are both options mutually incompatible in some more osbcure ways ?

I'd like to know where I'm going with this ;-)

TIA and kind regards.

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
@ 2014-04-09 11:41 ` Hugo Mills
  2014-04-09 11:56 ` Duncan
  1 sibling, 0 replies; 8+ messages in thread
From: Hugo Mills @ 2014-04-09 11:41 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

On Wed, Apr 09, 2014 at 01:15:24PM +0200, Swâmi Petaramesh wrote:
> Hi,
> 
> In the quest for BTRFS and performance, and having received the advice to 
> "chattr +C" my akonadi DB directory to make it noCow, I would like to be sure 
> about what will happen when I take a snapshot of the concerned BTRFS 
> subvolume.
> 
> 1/ Being noCow, will the database be modified in the snapshot as well, 
> efectively defeating the snapshot ?

   No (see below)

> 2/ Being snapshotted, will the database be COWed even though it's
> supposed to be noCow ?

   Yes -- once.

   When you make a snapshot of a nodatacow file, the data is shared
between the snapshot and the original as normal. The extents are
reference counted, so the original data now has two references to it.

   When one of these copies is written to, the writes are placed
somewhere else on the disk, still marked as nodatacow, and the
reference count is reduced to 1 for each copy again. (Note that this
is done on a per-block basis, although the 30-second transaction
commit will tend to coalesce adjacent blocks to reduce fragmentation;
autodefrag helps here, too).

   Basically, a snapshot of a nodatacow file will increase the
reference count for its blocks. A write to a block with a reference
count of more than one will *always* write a new block elsewhere. A
write to a block with a reference count of exactly one will not do so
if the file is marked nodatacow. I hope that's clear.

> 3/ Are both options mutually incompatible in some more osbcure ways ?

   Only as noted above.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
           --- There are three mistaikes in this sentance. ---           

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
  2014-04-09 11:41 ` Hugo Mills
@ 2014-04-09 11:56 ` Duncan
  2014-04-10  8:22   ` Swâmi Petaramesh
  1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2014-04-09 11:56 UTC (permalink / raw)
  To: linux-btrfs

Swâmi Petaramesh posted on Wed, 09 Apr 2014 13:15:24 +0200 as excerpted:

> In the quest for BTRFS and performance, and having received the advice
> to "chattr +C" my akonadi DB directory to make it noCow, I would like to
> be sure about what will happen when I take a snapshot of the concerned
> BTRFS subvolume.
> 
> 1/ Being noCow, will the database be modified in the snapshot as well,
> efectively defeating the snapshot ?
> 
> 2/ Being snapshotted, will the database be COWed even though it's
> supposed to be noCow ?
> 
> 3/ Are both options mutually incompatible in some more osbcure ways ?
> 
> I'd like to know where I'm going with this ;-)

Good questions. =:^)

#2. That's from one of the devs when the question came up perhaps a 
couple months ago.

On a NOCOW file the first write to a fileblock (4096 bytes) after a 
snapshot must still be COW, because the snapshot locks the old version in 
place, and now the fileblock has changed, so it MUST be written elsewhere 
despite the NOCOW in ordered to keep the snapshot as it was.  However, 
the file does retain the NOCOW attribute and additional writes to the 
same fileblock will be in-place... until the next snapshot of course.

This is why on filesystems with scripted snapshots as close as a minute a 
part (I even saw one guy say he was doing them every 30 seconds!!), 
setting NOCOW has very little value -- they aren't NOCOW on the first 
write after a snapshot, and with snapshots happening every minute...,  
Hourly snapshots are still likely to be a problem on a regularly changing 
file, tho with daily snapshots you'd probably save some fragmentation 
over the fairly short term anyway, but it'd still be a problem longer 
term.

Which is why I suggest putting such files on a separate subvolume and not 
snapshotting that subvolume, since snapshots stop at the subvolume 
boundary.  That gives NOCOW a chance to actually *BE* NOCOW.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-09 11:56 ` Duncan
@ 2014-04-10  8:22   ` Swâmi Petaramesh
  2014-04-10 13:19     ` George Eleftheriou
  2014-04-10 14:58     ` Duncan
  0 siblings, 2 replies; 8+ messages in thread
From: Swâmi Petaramesh @ 2014-04-10  8:22 UTC (permalink / raw)
  To: linux-btrfs

Thanks Duncan for the perfect explanations.

>From this, I understand that I might get both better performance by setting my 
akonadi dir to "nocow", and still be able to take a snapshot from time to 
time, which is exactly what I need.

Besides this, I'm still wondering about the changes in data security that 
turning a database to "NoCow" would bring, i.e. would the data still be well 
protected in case of a system crash or power failure ?

I have precious data in there and wouldn't like to jeopardize its security for 
a performance gain...

Kind regards.


Le mercredi 9 avril 2014 11:56:20 Duncan a écrit :
> Good questions. =:^)
> 
> #2. That's from one of the devs when the question came up perhaps a 
> couple months ago.
> 
> On a NOCOW file the first write to a fileblock (4096 bytes) after a 
> snapshot must still be COW, because the snapshot locks the old version in 
> place, and now the fileblock has changed, so it MUST be written elsewhere 
> despite the NOCOW in ordered to keep the snapshot as it was.  However, 
> the file does retain the NOCOW attribute and additional writes to the 
> same fileblock will be in-place... until the next snapshot of course.
> 
> This is why on filesystems with scripted snapshots as close as a minute a 
> part (I even saw one guy say he was doing them every 30 seconds!!), 
> setting NOCOW has very little value -- they aren't NOCOW on the first 
> write after a snapshot, and with snapshots happening every minute...,  
> Hourly snapshots are still likely to be a problem on a regularly changing 
> file, tho with daily snapshots you'd probably save some fragmentation 
> over the fairly short term anyway, but it'd still be a problem longer 
> term.
> 
> Which is why I suggest putting such files on a separate subvolume and not 
> snapshotting that subvolume, since snapshots stop at the subvolume 
> boundary.  That gives NOCOW a chance to actually *BE* NOCOW.

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-10  8:22   ` Swâmi Petaramesh
@ 2014-04-10 13:19     ` George Eleftheriou
  2014-04-10 14:58     ` Duncan
  1 sibling, 0 replies; 8+ messages in thread
From: George Eleftheriou @ 2014-04-10 13:19 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs

> Besides this, I'm still wondering about the changes in data security that
> turning a database to "NoCow" would bring, i.e. would the data still be well
> protected in case of a system crash or power failure ?
>
> I have precious data in there and wouldn't like to jeopardize its security for
> a performance gain...

Quoting from the mount options of the wiki:

nodatacow
Do not copy-on-write data for newly created files, existing files are
unaffected. This also turns off checksumming! IOW, nodatacow implies
nodatasum. datacow is used to ensure the user either has access to the
old version of a file, or to the newer version of the file. datacow
makes sure we never have partially updated files written to disk.
nodatacow gives slight performance boost by directly overwriting data
(like ext[234]), at the expense of potentially getting partially
updated files on system failures. Performance gain is usually < 5%
unless the workload is random writes to large database files, where
the difference can become very large. NOTE: switches off compression !

I don't know if the above applies for "chattr +C" as well... Perhaps
someone more experienced can help.

In any case, if you have precious data in there then, COW or NOCOW,
you should regularly do backups to a separate, discrete filesystem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-10  8:22   ` Swâmi Petaramesh
  2014-04-10 13:19     ` George Eleftheriou
@ 2014-04-10 14:58     ` Duncan
  2014-05-07  5:36       ` Russell Coker
  1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2014-04-10 14:58 UTC (permalink / raw)
  To: linux-btrfs

Swâmi Petaramesh posted on Thu, 10 Apr 2014 10:22:15 +0200 as excerpted:

> Thanks Duncan for the perfect explanations.
> 
> From this, I understand that I might get both better performance by
> setting my akonadi dir to "nocow", and still be able to take a snapshot
> from time to time, which is exactly what I need.
> 
> Besides this, I'm still wondering about the changes in data security
> that turning a database to "NoCow" would bring, i.e. would the data
> still be well protected in case of a system crash or power failure ?
> 
> I have precious data in there and wouldn't like to jeopardize its
> security for a performance gain...

As eleftg suggests, the data integrity (and compression) features are 
turned off when something is set NOCOW, whether that "something" is an 
individual file or the whole subvolume (mounted with nodatacow).

However, that's not as bad a situation as one might initially think.  
Many of the applications that do routine internal-file-writes already 
have at least basic data integrity management of their own, as they've 
been more or less forced to in ordered to have any stability at all since 
filesystems traditionally do not have data integrity management of their 
own.  This being the case, they're already somewhat prepared to detect 
and do limited recovery from file corruption, perhaps losing the last 
couple transactions but preserving the data as a whole.

What can happen in the event of a crash is that btrfs and the application 
can both handle data integrity, but because they're implemented and work 
independently, with no knowledge of what the other one is doing, their 
restore efforts can "fight", such that the recovery snapshot that the app 
took internally and tries to recover with can be screwed up by btrfs 
trying to do its own recovery, due to timing differences between the two 
recovery mechanisms and btrfs restoring an image that's half from before 
the internal application snapshot and half after, such that were either 
mechanism to be used on its own, things would work, but the combination 
of the two independent mechanisms working on the same data is actually 
worse than either one alone.

This risk is compounded by delicate race/timing issues that develop in an 
ongoing incoming write stream scenario, where there's data commits in 
various stages of the pipeline at any particular point and strict 
ordering must be maintained so that new data doesn't get written to 
storage before the checksum for the old data is calculated and written to 
storage as well.  Normally, COW is designed to ensure all this is handled 
atomically, that either the old version or the new version is all there, 
not some combination of both such that the checksum doesn't match the mix 
of data that is actually in the file as it exists on-device, but NOCOW 
bypasses that mechanism and does rewrite-in-place, which dramatically 
complicates the ordering and timing issues related to getting everything 
written in the correct order so the checksum reflects the data that's 
actually there.

And of course if you have compression on, this complicates matters 
further with yet another step, plus since the new data probably 
compresses to a different size than the old, attempting to rewrite-in-
place is essentially impossible.

So turning both compression and datasum integrity checking off and 
bypassing both mechanisms entirely when nocow is set, really is the best 
choice, letting the application itself manage data integrity, or not, as 
it chooses based on what it considers the risk/value ratio of the data in 
question.

Tho based on my own experiences, I don't trust akonadi's data integrity 
management any farther than I could throw a multi-ton truck, which is to 
say not at all, which is why I switched to a different solution that 
seems to work far more reliably for me.

But that said, one thing the kmail and akonadi devs *DID* do right is 
that the original messages are still saved in the usual maildir format on-
device.  All the akonadi database does is cache that data in a form 
that's faster to use.  So in theory at least, if the database DOES get 
corrupted beyond recovery, it's only caching the data that's still on-
device as plain text files, and thus it's simple enough to simply delete 
and rebuild the caching database based on the still existing plain text 
files that contain the original data.

And actually, that was indeed my experience.  I don't think I ever 
actually lost data.  I just had to rebuild the cache more frequently than 
I thought I should, and that was a hassle I made an executive decision 
that I simply wasn't going to go thru any longer, since there were other 
alternatives that worked better for me, without that hassle.

So YMMV.  You appear to want to keep kmail and akonadi, and that's fine.  
But there are certain compromises that must be made in ordered to do so.  
As the saying goes, pick your poison.  You can either choose a filesystem 
other than btrfs to store the akonadi database on and deal with what it 
offers (generally standard rewrite-in-place and no filesystem-level-data-
integrity-management), or choose btrfs, with its other options.  If you 
choose btrfs, you can choose either the normal COW mode with its 
performance issues that go along with this type of usage, or one of the 
various NOCOW options.  If you take the NOCOW option, you can further 
choose to do snapshotting and accept the performance and fragmentation 
issues that brings altho they'll be somewhat less than with COW as long 
as you limit the snapshotting, or use a subvol for these files and don't 
snapshot it, instead doing conventional backups for it.

Here, if I were dealing with that type of file (either because I'd chosen 
to keep using akonadi or because something else I was using had the same 
access pattern), since I tend to use multi-partitioning more than most 
already, I'd probably simply use a dedicated partition for those files 
and make it something other than btrfs -- I'm familiar with reiserfs but 
for that use-case I'd probably try xfs.  If for some reason I didn't want 
to do the whole separate partition with its different filesystem thing, 
I'd use a dedicated btrfs subvolume for it, set the directory and files 
NOCOW, and do traditional backups rather than snapshotting for that 
subvolume.

But that's just me.  It's your system and priorities you have to deal 
with, so your choice.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-04-10 14:58     ` Duncan
@ 2014-05-07  5:36       ` Russell Coker
  2014-05-07 11:09         ` Duncan
  0 siblings, 1 reply; 8+ messages in thread
From: Russell Coker @ 2014-05-07  5:36 UTC (permalink / raw)
  To: Duncan, linux-btrfs

How could BTRFS and a database "fight" about data recovery?

BTRFS offers similar guarantees about data durability etc to other journalled filesystems and only differs by having checksums so that while a snapshot might have half the data that was written by an app you at least know that the half will be consistent.

If you had database files on a separate subvol to the database log then you would be at risk of having problems making a any sort of consistent snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a bad idea). But there would be no difference with LVM snapshots in that regard.
-- 
Sent from my Samsung Galaxy Note 2 with K-9 Mail.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using noCow with snapshots ?
  2014-05-07  5:36       ` Russell Coker
@ 2014-05-07 11:09         ` Duncan
  0 siblings, 0 replies; 8+ messages in thread
From: Duncan @ 2014-05-07 11:09 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Wed, 07 May 2014 15:36:15 +1000 as excerpted:

> How could BTRFS and a database "fight" about data recovery?
> 
> BTRFS offers similar guarantees about data durability etc to other
> journalled filesystems and only differs by having checksums so that
> while a snapshot might have half the data that was written by an app you
> at least know that the half will be consistent.
> 
> If you had database files on a separate subvol to the database log then
> you would be at risk of having problems making a any sort of consistent
> snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a
> bad idea). But there would be no difference with LVM snapshots in that
> regard.

Race conditions having to do with unsynced checkpoints, primarily.  And 
it's actually the btrfs checksumming that seems to create the problem.

The symptom being reported (tho I can say I've not seen further reports 
recently, maybe it's fixed now) was that the checksummed values btrfs 
restored as "correct" were considered corrupted by the database or vm.  
If the checksums checked out after btrfs did its replay (as they did or 
btrfs would error on access), but the databases and VMs were still 
reporting corruption, then the explanation that was left was that the 
btrfs replay and checksum validation was screwing up the application's 
own checksumming validation, which could be explained if the two were 
sufficiently out of sync that btrfs fixing its own view was actually 
breaking the view as seen by the data validating app.

Tho as I said I've not seen that sort of report in several kernel cycles 
now.  But I'm not sure whether that's because the issues have been fixed 
or for some other reason (maybe everybody experiencing the problem gave 
up and switched to some other filesystem now, and the message is out 
there well enough that new people see it before they experience and 
report the same thing, or similar but everybody's switched to NOCOW now 
and knows not to do snapshotting on the NOCOW files, or...).

Regardless, NOCOW and not doing snapshotting (because it triggers COW 
anyway) on gig-plus internal-write files remains a very good idea.  
(Also, quotas and quota sequence numbers play into the combinational 
explosion problem along with snapshot-aware-defrag, too.  See the writeup 
on that that Dave wrote while he was on paternity leave.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-05-07 11:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
2014-04-09 11:41 ` Hugo Mills
2014-04-09 11:56 ` Duncan
2014-04-10  8:22   ` Swâmi Petaramesh
2014-04-10 13:19     ` George Eleftheriou
2014-04-10 14:58     ` Duncan
2014-05-07  5:36       ` Russell Coker
2014-05-07 11:09         ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).