From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Using noCow with snapshots ?
Date: Thu, 10 Apr 2014 14:58:04 +0000 (UTC) [thread overview]
Message-ID: <pan$9fcd4$1023a4e5$d241126e$285ca14@cox.net> (raw)
In-Reply-To: 3839313.LSaoXm11Qk@zafu
Swâmi Petaramesh posted on Thu, 10 Apr 2014 10:22:15 +0200 as excerpted:
> Thanks Duncan for the perfect explanations.
>
> From this, I understand that I might get both better performance by
> setting my akonadi dir to "nocow", and still be able to take a snapshot
> from time to time, which is exactly what I need.
>
> Besides this, I'm still wondering about the changes in data security
> that turning a database to "NoCow" would bring, i.e. would the data
> still be well protected in case of a system crash or power failure ?
>
> I have precious data in there and wouldn't like to jeopardize its
> security for a performance gain...
As eleftg suggests, the data integrity (and compression) features are
turned off when something is set NOCOW, whether that "something" is an
individual file or the whole subvolume (mounted with nodatacow).
However, that's not as bad a situation as one might initially think.
Many of the applications that do routine internal-file-writes already
have at least basic data integrity management of their own, as they've
been more or less forced to in ordered to have any stability at all since
filesystems traditionally do not have data integrity management of their
own. This being the case, they're already somewhat prepared to detect
and do limited recovery from file corruption, perhaps losing the last
couple transactions but preserving the data as a whole.
What can happen in the event of a crash is that btrfs and the application
can both handle data integrity, but because they're implemented and work
independently, with no knowledge of what the other one is doing, their
restore efforts can "fight", such that the recovery snapshot that the app
took internally and tries to recover with can be screwed up by btrfs
trying to do its own recovery, due to timing differences between the two
recovery mechanisms and btrfs restoring an image that's half from before
the internal application snapshot and half after, such that were either
mechanism to be used on its own, things would work, but the combination
of the two independent mechanisms working on the same data is actually
worse than either one alone.
This risk is compounded by delicate race/timing issues that develop in an
ongoing incoming write stream scenario, where there's data commits in
various stages of the pipeline at any particular point and strict
ordering must be maintained so that new data doesn't get written to
storage before the checksum for the old data is calculated and written to
storage as well. Normally, COW is designed to ensure all this is handled
atomically, that either the old version or the new version is all there,
not some combination of both such that the checksum doesn't match the mix
of data that is actually in the file as it exists on-device, but NOCOW
bypasses that mechanism and does rewrite-in-place, which dramatically
complicates the ordering and timing issues related to getting everything
written in the correct order so the checksum reflects the data that's
actually there.
And of course if you have compression on, this complicates matters
further with yet another step, plus since the new data probably
compresses to a different size than the old, attempting to rewrite-in-
place is essentially impossible.
So turning both compression and datasum integrity checking off and
bypassing both mechanisms entirely when nocow is set, really is the best
choice, letting the application itself manage data integrity, or not, as
it chooses based on what it considers the risk/value ratio of the data in
question.
Tho based on my own experiences, I don't trust akonadi's data integrity
management any farther than I could throw a multi-ton truck, which is to
say not at all, which is why I switched to a different solution that
seems to work far more reliably for me.
But that said, one thing the kmail and akonadi devs *DID* do right is
that the original messages are still saved in the usual maildir format on-
device. All the akonadi database does is cache that data in a form
that's faster to use. So in theory at least, if the database DOES get
corrupted beyond recovery, it's only caching the data that's still on-
device as plain text files, and thus it's simple enough to simply delete
and rebuild the caching database based on the still existing plain text
files that contain the original data.
And actually, that was indeed my experience. I don't think I ever
actually lost data. I just had to rebuild the cache more frequently than
I thought I should, and that was a hassle I made an executive decision
that I simply wasn't going to go thru any longer, since there were other
alternatives that worked better for me, without that hassle.
So YMMV. You appear to want to keep kmail and akonadi, and that's fine.
But there are certain compromises that must be made in ordered to do so.
As the saying goes, pick your poison. You can either choose a filesystem
other than btrfs to store the akonadi database on and deal with what it
offers (generally standard rewrite-in-place and no filesystem-level-data-
integrity-management), or choose btrfs, with its other options. If you
choose btrfs, you can choose either the normal COW mode with its
performance issues that go along with this type of usage, or one of the
various NOCOW options. If you take the NOCOW option, you can further
choose to do snapshotting and accept the performance and fragmentation
issues that brings altho they'll be somewhat less than with COW as long
as you limit the snapshotting, or use a subvol for these files and don't
snapshot it, instead doing conventional backups for it.
Here, if I were dealing with that type of file (either because I'd chosen
to keep using akonadi or because something else I was using had the same
access pattern), since I tend to use multi-partitioning more than most
already, I'd probably simply use a dedicated partition for those files
and make it something other than btrfs -- I'm familiar with reiserfs but
for that use-case I'd probably try xfs. If for some reason I didn't want
to do the whole separate partition with its different filesystem thing,
I'd use a dedicated btrfs subvolume for it, set the directory and files
NOCOW, and do traditional backups rather than snapshotting for that
subvolume.
But that's just me. It's your system and priorities you have to deal
with, so your choice. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2014-04-10 14:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh
2014-04-09 11:41 ` Hugo Mills
2014-04-09 11:56 ` Duncan
2014-04-10 8:22 ` Swâmi Petaramesh
2014-04-10 13:19 ` George Eleftheriou
2014-04-10 14:58 ` Duncan [this message]
2014-05-07 5:36 ` Russell Coker
2014-05-07 11:09 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$9fcd4$1023a4e5$d241126e$285ca14@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).