* Using noCow with snapshots ? @ 2014-04-09 11:15 Swâmi Petaramesh 2014-04-09 11:41 ` Hugo Mills 2014-04-09 11:56 ` Duncan 0 siblings, 2 replies; 8+ messages in thread From: Swâmi Petaramesh @ 2014-04-09 11:15 UTC (permalink / raw) To: linux-btrfs Hi, In the quest for BTRFS and performance, and having received the advice to "chattr +C" my akonadi DB directory to make it noCow, I would like to be sure about what will happen when I take a snapshot of the concerned BTRFS subvolume. 1/ Being noCow, will the database be modified in the snapshot as well, efectively defeating the snapshot ? 2/ Being snapshotted, will the database be COWed even though it's supposed to be noCow ? 3/ Are both options mutually incompatible in some more osbcure ways ? I'd like to know where I'm going with this ;-) TIA and kind regards. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh @ 2014-04-09 11:41 ` Hugo Mills 2014-04-09 11:56 ` Duncan 1 sibling, 0 replies; 8+ messages in thread From: Hugo Mills @ 2014-04-09 11:41 UTC (permalink / raw) To: Swâmi Petaramesh; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1831 bytes --] On Wed, Apr 09, 2014 at 01:15:24PM +0200, Swâmi Petaramesh wrote: > Hi, > > In the quest for BTRFS and performance, and having received the advice to > "chattr +C" my akonadi DB directory to make it noCow, I would like to be sure > about what will happen when I take a snapshot of the concerned BTRFS > subvolume. > > 1/ Being noCow, will the database be modified in the snapshot as well, > efectively defeating the snapshot ? No (see below) > 2/ Being snapshotted, will the database be COWed even though it's > supposed to be noCow ? Yes -- once. When you make a snapshot of a nodatacow file, the data is shared between the snapshot and the original as normal. The extents are reference counted, so the original data now has two references to it. When one of these copies is written to, the writes are placed somewhere else on the disk, still marked as nodatacow, and the reference count is reduced to 1 for each copy again. (Note that this is done on a per-block basis, although the 30-second transaction commit will tend to coalesce adjacent blocks to reduce fragmentation; autodefrag helps here, too). Basically, a snapshot of a nodatacow file will increase the reference count for its blocks. A write to a block with a reference count of more than one will *always* write a new block elsewhere. A write to a block with a reference count of exactly one will not do so if the file is marked nodatacow. I hope that's clear. > 3/ Are both options mutually incompatible in some more osbcure ways ? Only as noted above. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There are three mistaikes in this sentance. --- [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh 2014-04-09 11:41 ` Hugo Mills @ 2014-04-09 11:56 ` Duncan 2014-04-10 8:22 ` Swâmi Petaramesh 1 sibling, 1 reply; 8+ messages in thread From: Duncan @ 2014-04-09 11:56 UTC (permalink / raw) To: linux-btrfs Swâmi Petaramesh posted on Wed, 09 Apr 2014 13:15:24 +0200 as excerpted: > In the quest for BTRFS and performance, and having received the advice > to "chattr +C" my akonadi DB directory to make it noCow, I would like to > be sure about what will happen when I take a snapshot of the concerned > BTRFS subvolume. > > 1/ Being noCow, will the database be modified in the snapshot as well, > efectively defeating the snapshot ? > > 2/ Being snapshotted, will the database be COWed even though it's > supposed to be noCow ? > > 3/ Are both options mutually incompatible in some more osbcure ways ? > > I'd like to know where I'm going with this ;-) Good questions. =:^) #2. That's from one of the devs when the question came up perhaps a couple months ago. On a NOCOW file the first write to a fileblock (4096 bytes) after a snapshot must still be COW, because the snapshot locks the old version in place, and now the fileblock has changed, so it MUST be written elsewhere despite the NOCOW in ordered to keep the snapshot as it was. However, the file does retain the NOCOW attribute and additional writes to the same fileblock will be in-place... until the next snapshot of course. This is why on filesystems with scripted snapshots as close as a minute a part (I even saw one guy say he was doing them every 30 seconds!!), setting NOCOW has very little value -- they aren't NOCOW on the first write after a snapshot, and with snapshots happening every minute..., Hourly snapshots are still likely to be a problem on a regularly changing file, tho with daily snapshots you'd probably save some fragmentation over the fairly short term anyway, but it'd still be a problem longer term. Which is why I suggest putting such files on a separate subvolume and not snapshotting that subvolume, since snapshots stop at the subvolume boundary. That gives NOCOW a chance to actually *BE* NOCOW. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-09 11:56 ` Duncan @ 2014-04-10 8:22 ` Swâmi Petaramesh 2014-04-10 13:19 ` George Eleftheriou 2014-04-10 14:58 ` Duncan 0 siblings, 2 replies; 8+ messages in thread From: Swâmi Petaramesh @ 2014-04-10 8:22 UTC (permalink / raw) To: linux-btrfs Thanks Duncan for the perfect explanations. >From this, I understand that I might get both better performance by setting my akonadi dir to "nocow", and still be able to take a snapshot from time to time, which is exactly what I need. Besides this, I'm still wondering about the changes in data security that turning a database to "NoCow" would bring, i.e. would the data still be well protected in case of a system crash or power failure ? I have precious data in there and wouldn't like to jeopardize its security for a performance gain... Kind regards. Le mercredi 9 avril 2014 11:56:20 Duncan a écrit : > Good questions. =:^) > > #2. That's from one of the devs when the question came up perhaps a > couple months ago. > > On a NOCOW file the first write to a fileblock (4096 bytes) after a > snapshot must still be COW, because the snapshot locks the old version in > place, and now the fileblock has changed, so it MUST be written elsewhere > despite the NOCOW in ordered to keep the snapshot as it was. However, > the file does retain the NOCOW attribute and additional writes to the > same fileblock will be in-place... until the next snapshot of course. > > This is why on filesystems with scripted snapshots as close as a minute a > part (I even saw one guy say he was doing them every 30 seconds!!), > setting NOCOW has very little value -- they aren't NOCOW on the first > write after a snapshot, and with snapshots happening every minute..., > Hourly snapshots are still likely to be a problem on a regularly changing > file, tho with daily snapshots you'd probably save some fragmentation > over the fairly short term anyway, but it'd still be a problem longer > term. > > Which is why I suggest putting such files on a separate subvolume and not > snapshotting that subvolume, since snapshots stop at the subvolume > boundary. That gives NOCOW a chance to actually *BE* NOCOW. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-10 8:22 ` Swâmi Petaramesh @ 2014-04-10 13:19 ` George Eleftheriou 2014-04-10 14:58 ` Duncan 1 sibling, 0 replies; 8+ messages in thread From: George Eleftheriou @ 2014-04-10 13:19 UTC (permalink / raw) To: Swâmi Petaramesh; +Cc: linux-btrfs > Besides this, I'm still wondering about the changes in data security that > turning a database to "NoCow" would bring, i.e. would the data still be well > protected in case of a system crash or power failure ? > > I have precious data in there and wouldn't like to jeopardize its security for > a performance gain... Quoting from the mount options of the wiki: nodatacow Do not copy-on-write data for newly created files, existing files are unaffected. This also turns off checksumming! IOW, nodatacow implies nodatasum. datacow is used to ensure the user either has access to the old version of a file, or to the newer version of the file. datacow makes sure we never have partially updated files written to disk. nodatacow gives slight performance boost by directly overwriting data (like ext[234]), at the expense of potentially getting partially updated files on system failures. Performance gain is usually < 5% unless the workload is random writes to large database files, where the difference can become very large. NOTE: switches off compression ! I don't know if the above applies for "chattr +C" as well... Perhaps someone more experienced can help. In any case, if you have precious data in there then, COW or NOCOW, you should regularly do backups to a separate, discrete filesystem. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-10 8:22 ` Swâmi Petaramesh 2014-04-10 13:19 ` George Eleftheriou @ 2014-04-10 14:58 ` Duncan 2014-05-07 5:36 ` Russell Coker 1 sibling, 1 reply; 8+ messages in thread From: Duncan @ 2014-04-10 14:58 UTC (permalink / raw) To: linux-btrfs Swâmi Petaramesh posted on Thu, 10 Apr 2014 10:22:15 +0200 as excerpted: > Thanks Duncan for the perfect explanations. > > From this, I understand that I might get both better performance by > setting my akonadi dir to "nocow", and still be able to take a snapshot > from time to time, which is exactly what I need. > > Besides this, I'm still wondering about the changes in data security > that turning a database to "NoCow" would bring, i.e. would the data > still be well protected in case of a system crash or power failure ? > > I have precious data in there and wouldn't like to jeopardize its > security for a performance gain... As eleftg suggests, the data integrity (and compression) features are turned off when something is set NOCOW, whether that "something" is an individual file or the whole subvolume (mounted with nodatacow). However, that's not as bad a situation as one might initially think. Many of the applications that do routine internal-file-writes already have at least basic data integrity management of their own, as they've been more or less forced to in ordered to have any stability at all since filesystems traditionally do not have data integrity management of their own. This being the case, they're already somewhat prepared to detect and do limited recovery from file corruption, perhaps losing the last couple transactions but preserving the data as a whole. What can happen in the event of a crash is that btrfs and the application can both handle data integrity, but because they're implemented and work independently, with no knowledge of what the other one is doing, their restore efforts can "fight", such that the recovery snapshot that the app took internally and tries to recover with can be screwed up by btrfs trying to do its own recovery, due to timing differences between the two recovery mechanisms and btrfs restoring an image that's half from before the internal application snapshot and half after, such that were either mechanism to be used on its own, things would work, but the combination of the two independent mechanisms working on the same data is actually worse than either one alone. This risk is compounded by delicate race/timing issues that develop in an ongoing incoming write stream scenario, where there's data commits in various stages of the pipeline at any particular point and strict ordering must be maintained so that new data doesn't get written to storage before the checksum for the old data is calculated and written to storage as well. Normally, COW is designed to ensure all this is handled atomically, that either the old version or the new version is all there, not some combination of both such that the checksum doesn't match the mix of data that is actually in the file as it exists on-device, but NOCOW bypasses that mechanism and does rewrite-in-place, which dramatically complicates the ordering and timing issues related to getting everything written in the correct order so the checksum reflects the data that's actually there. And of course if you have compression on, this complicates matters further with yet another step, plus since the new data probably compresses to a different size than the old, attempting to rewrite-in- place is essentially impossible. So turning both compression and datasum integrity checking off and bypassing both mechanisms entirely when nocow is set, really is the best choice, letting the application itself manage data integrity, or not, as it chooses based on what it considers the risk/value ratio of the data in question. Tho based on my own experiences, I don't trust akonadi's data integrity management any farther than I could throw a multi-ton truck, which is to say not at all, which is why I switched to a different solution that seems to work far more reliably for me. But that said, one thing the kmail and akonadi devs *DID* do right is that the original messages are still saved in the usual maildir format on- device. All the akonadi database does is cache that data in a form that's faster to use. So in theory at least, if the database DOES get corrupted beyond recovery, it's only caching the data that's still on- device as plain text files, and thus it's simple enough to simply delete and rebuild the caching database based on the still existing plain text files that contain the original data. And actually, that was indeed my experience. I don't think I ever actually lost data. I just had to rebuild the cache more frequently than I thought I should, and that was a hassle I made an executive decision that I simply wasn't going to go thru any longer, since there were other alternatives that worked better for me, without that hassle. So YMMV. You appear to want to keep kmail and akonadi, and that's fine. But there are certain compromises that must be made in ordered to do so. As the saying goes, pick your poison. You can either choose a filesystem other than btrfs to store the akonadi database on and deal with what it offers (generally standard rewrite-in-place and no filesystem-level-data- integrity-management), or choose btrfs, with its other options. If you choose btrfs, you can choose either the normal COW mode with its performance issues that go along with this type of usage, or one of the various NOCOW options. If you take the NOCOW option, you can further choose to do snapshotting and accept the performance and fragmentation issues that brings altho they'll be somewhat less than with COW as long as you limit the snapshotting, or use a subvol for these files and don't snapshot it, instead doing conventional backups for it. Here, if I were dealing with that type of file (either because I'd chosen to keep using akonadi or because something else I was using had the same access pattern), since I tend to use multi-partitioning more than most already, I'd probably simply use a dedicated partition for those files and make it something other than btrfs -- I'm familiar with reiserfs but for that use-case I'd probably try xfs. If for some reason I didn't want to do the whole separate partition with its different filesystem thing, I'd use a dedicated btrfs subvolume for it, set the directory and files NOCOW, and do traditional backups rather than snapshotting for that subvolume. But that's just me. It's your system and priorities you have to deal with, so your choice. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-04-10 14:58 ` Duncan @ 2014-05-07 5:36 ` Russell Coker 2014-05-07 11:09 ` Duncan 0 siblings, 1 reply; 8+ messages in thread From: Russell Coker @ 2014-05-07 5:36 UTC (permalink / raw) To: Duncan, linux-btrfs How could BTRFS and a database "fight" about data recovery? BTRFS offers similar guarantees about data durability etc to other journalled filesystems and only differs by having checksums so that while a snapshot might have half the data that was written by an app you at least know that the half will be consistent. If you had database files on a separate subvol to the database log then you would be at risk of having problems making a any sort of consistent snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a bad idea). But there would be no difference with LVM snapshots in that regard. -- Sent from my Samsung Galaxy Note 2 with K-9 Mail. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Using noCow with snapshots ? 2014-05-07 5:36 ` Russell Coker @ 2014-05-07 11:09 ` Duncan 0 siblings, 0 replies; 8+ messages in thread From: Duncan @ 2014-05-07 11:09 UTC (permalink / raw) To: linux-btrfs Russell Coker posted on Wed, 07 May 2014 15:36:15 +1000 as excerpted: > How could BTRFS and a database "fight" about data recovery? > > BTRFS offers similar guarantees about data durability etc to other > journalled filesystems and only differs by having checksums so that > while a snapshot might have half the data that was written by an app you > at least know that the half will be consistent. > > If you had database files on a separate subvol to the database log then > you would be at risk of having problems making a any sort of consistent > snapshot (the Debian approach of /var/log/mysql and /var/lib/mysql is a > bad idea). But there would be no difference with LVM snapshots in that > regard. Race conditions having to do with unsynced checkpoints, primarily. And it's actually the btrfs checksumming that seems to create the problem. The symptom being reported (tho I can say I've not seen further reports recently, maybe it's fixed now) was that the checksummed values btrfs restored as "correct" were considered corrupted by the database or vm. If the checksums checked out after btrfs did its replay (as they did or btrfs would error on access), but the databases and VMs were still reporting corruption, then the explanation that was left was that the btrfs replay and checksum validation was screwing up the application's own checksumming validation, which could be explained if the two were sufficiently out of sync that btrfs fixing its own view was actually breaking the view as seen by the data validating app. Tho as I said I've not seen that sort of report in several kernel cycles now. But I'm not sure whether that's because the issues have been fixed or for some other reason (maybe everybody experiencing the problem gave up and switched to some other filesystem now, and the message is out there well enough that new people see it before they experience and report the same thing, or similar but everybody's switched to NOCOW now and knows not to do snapshotting on the NOCOW files, or...). Regardless, NOCOW and not doing snapshotting (because it triggers COW anyway) on gig-plus internal-write files remains a very good idea. (Also, quotas and quota sequence numbers play into the combinational explosion problem along with snapshot-aware-defrag, too. See the writeup on that that Dave wrote while he was on paternity leave.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-07 11:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-09 11:15 Using noCow with snapshots ? Swâmi Petaramesh 2014-04-09 11:41 ` Hugo Mills 2014-04-09 11:56 ` Duncan 2014-04-10 8:22 ` Swâmi Petaramesh 2014-04-10 13:19 ` George Eleftheriou 2014-04-10 14:58 ` Duncan 2014-05-07 5:36 ` Russell Coker 2014-05-07 11:09 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).