* Re: Fun BTRFS Stuff: [not found] <56CB737D.1020403@polarismail.net> @ 2016-02-22 21:11 ` Marc MERLIN 2016-02-22 21:18 ` Hugo Mills 0 siblings, 1 reply; 5+ messages in thread From: Marc MERLIN @ 2016-02-22 21:11 UTC (permalink / raw) To: Terrance Harris; +Cc: linux-btrfs On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote: > Hello, > > I'm a btrfs novice, but i've been using it for a few years now on > openSUSE Tumblweed. Howdy. First, please use the linux-btrfs@vger.kernel.org mailing list > Is there a way to convert snaBpshots into mountable files using btrfs send? I am not sure I'm parsing your question. btrfs send/receive copy read only snapshots between 2 btrfs volumes If you mean using a non differential btrfs send to a file, and then using that file to act as if it were a filesystem you can read data from, I don't believe this is easily possible currently (it's possible, just no tool exists to do that). You're supposed to send it to btrfs receive, have it saved on a filesystem, and then use that. > I'm looking make it easy to copy off btrfs files even if my btrfs system > is in total fubar mode. you need to send that btrfs send stream to btrfs receive onto another filesystem, and use that for restores. Hope this helps Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff: 2016-02-22 21:11 ` Fun BTRFS Stuff: Marc MERLIN @ 2016-02-22 21:18 ` Hugo Mills 2016-02-23 0:19 ` Duncan 0 siblings, 1 reply; 5+ messages in thread From: Hugo Mills @ 2016-02-22 21:18 UTC (permalink / raw) To: Marc MERLIN; +Cc: Terrance Harris, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1375 bytes --] On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote: > On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote: > > Hello, > > > > I'm a btrfs novice, but i've been using it for a few years now on > > openSUSE Tumblweed. > > Howdy. > > First, please use the linux-btrfs@vger.kernel.org mailing list > > > Is there a way to convert snaBpshots into mountable files using btrfs send? > > I am not sure I'm parsing your question. > btrfs send/receive copy read only snapshots between 2 btrfs volumes > > If you mean using a non differential btrfs send to a file, and then using > that file to act as if it were a filesystem you can read data from, I don't > believe this is easily possible currently (it's possible, just no tool > exists to do that). You're supposed to send it to btrfs receive, have it > saved on a filesystem, and then use that. It's not really possible with any degree of sanity. There's no indexing in the send stream, so read acesses would have to scan the whole file every time. If you want to read the contents of a send stream in an order other than the (arbitrary) one it's sent in, you need to replay it on a filesystem with receive. Hugo. -- Hugo Mills | You shouldn't anthropomorphise computers. They hugo@... carfax.org.uk | really don't like that. http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff: 2016-02-22 21:18 ` Hugo Mills @ 2016-02-23 0:19 ` Duncan 2016-02-25 2:07 ` Henk Slager 0 siblings, 1 reply; 5+ messages in thread From: Duncan @ 2016-02-23 0:19 UTC (permalink / raw) To: linux-btrfs Hugo Mills posted on Mon, 22 Feb 2016 21:18:45 +0000 as excerpted: > On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote: >> On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote: >> > Hello, >> > >> > I'm a btrfs novice, but i've been using it for a few years now on >> > openSUSE Tumblweed. >> >> Howdy. >> >> First, please use the linux-btrfs@vger.kernel.org mailing list >> >> > Is there a way to convert snaBpshots into mountable files using btrfs >> > send? >> >> I am not sure I'm parsing your question. >> btrfs send/receive copy read only snapshots between 2 btrfs volumes >> >> If you mean using a non differential btrfs send to a file, and then >> using that file to act as if it were a filesystem you can read data >> from, I don't believe this is easily possible currently (it's possible, >> just no tool exists to do that). You're supposed to send it to btrfs >> receive, have it saved on a filesystem, and then use that. > > It's not really possible with any degree of sanity. There's no > indexing in the send stream, so read acesses would have to scan the > whole file every time. > > If you want to read the contents of a send stream in an order other > than the (arbitrary) one it's sent in, you need to replay it on a > filesystem with receive. In that way, btrfs send reminds me very much of the old tape-archive backup method. In general, they were serialized copies of whatever they were archiving as well, intended primarily to be replayed as a whole onto a new filesystem, after which individual files could be accessed from the filesystem, not directly from the tape archive. Altho with indexing files could be read-back/restored directly from tape, neither the format nor the media was really designed for it. I've not seen anyone else explicitly list the following as a practical btrfs send/receive backup strategy, but it does rather directly follow from the STDOUT/STDIN usage of the tools as practical, at least in theory. My primary worry would be the general one of btrfs maturity, that it and the tools including btrfs send and receive are still stabilizing and maturing, with occasional bugs being found, and the following strategy won't find the receive bugs until restore time, at which point you might be depending on it working, so the strategy is really only appropriate once btrfs has settled down and matured somewhat more. So here's the idea. 1) Btrfs send directly to files on some other filesystem, perhaps xfs, intended to be used with larger files. This can either be non- incremental, or (much like full and incremental tape backups) initial full, plus incremental sends. 2) Store the backups as those send files, much like tape backup archives. One option would be to do the initial full send, and then incremental sends as new files, until the multi-TB drive containing the backups is full, at which point replace it and start with a new full send to the fresh xfs or whatever on the new drive. 3) When a restore is needed, then and only then, play back those backups to a newly created btrfs using btrfs receive. If the above initial full plus incrementals until the backup media is full strategy is used, the incrementals can be played back against the initial full, just as the send was originally done. Seems to me this should work fine, except as I said, that receive errors would only be caught at the time receive is actually run, which would be on restore. But as most of those errors tend to be due to incremental bugs, doing full sends all the time would eliminate them, at the cost of much higher space usage over time, of course. And if incrementals /are/ done, with any luck, replay won't be for quite some time and thus using a much newer and hopefully more mature btrfs receive, with fewer bugs due to the bugs caught in the intervening time. Additionally, with any luck, several generations of full backup plus incrementals will have been done before the need to replay even one set, thus sparing the need to reply the intervening sets entirely. It's an interesting strategy to consider, particularly for long-term backups, to say Amazon Glacier for instance, where immediate retrieval and/or retrieval of individual files isn't envisioned. Obviously for glacier or similar storage, an intermediate encryption step could be added, with encryption to whatever strength deemed appropriate, if considered necessary to thwart the NSA and similar nation-level advanced- persistent-threats on cloud-hosted storage. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff: 2016-02-23 0:19 ` Duncan @ 2016-02-25 2:07 ` Henk Slager 2016-02-25 4:18 ` Duncan 0 siblings, 1 reply; 5+ messages in thread From: Henk Slager @ 2016-02-25 2:07 UTC (permalink / raw) To: linux-btrfs On Tue, Feb 23, 2016 at 1:19 AM, Duncan <1i5t5.duncan@cox.net> wrote: > Hugo Mills posted on Mon, 22 Feb 2016 21:18:45 +0000 as excerpted: > >> On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote: >>> On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote: >>> > Hello, >>> > >>> > I'm a btrfs novice, but i've been using it for a few years now on >>> > openSUSE Tumblweed. >>> >>> Howdy. >>> >>> First, please use the linux-btrfs@vger.kernel.org mailing list >>> >>> > Is there a way to convert snaBpshots into mountable files using btrfs >>> > send? >>> >>> I am not sure I'm parsing your question. >>> btrfs send/receive copy read only snapshots between 2 btrfs volumes >>> >>> If you mean using a non differential btrfs send to a file, and then >>> using that file to act as if it were a filesystem you can read data >>> from, I don't believe this is easily possible currently (it's possible, >>> just no tool exists to do that). You're supposed to send it to btrfs >>> receive, have it saved on a filesystem, and then use that. >> >> It's not really possible with any degree of sanity. There's no >> indexing in the send stream, so read acesses would have to scan the >> whole file every time. >> >> If you want to read the contents of a send stream in an order other >> than the (arbitrary) one it's sent in, you need to replay it on a >> filesystem with receive. > > In that way, btrfs send reminds me very much of the old tape-archive > backup method. In general, they were serialized copies of whatever they > were archiving as well, intended primarily to be replayed as a whole onto > a new filesystem, after which individual files could be accessed from the > filesystem, not directly from the tape archive. Altho with indexing > files could be read-back/restored directly from tape, neither the format > nor the media was really designed for it. > > I've not seen anyone else explicitly list the following as a practical > btrfs send/receive backup strategy, but it does rather directly follow > from the STDOUT/STDIN usage of the tools as practical, at least in > theory. My primary worry would be the general one of btrfs maturity, > that it and the tools including btrfs send and receive are still > stabilizing and maturing, with occasional bugs being found, and the > following strategy won't find the receive bugs until restore time, at > which point you might be depending on it working, so the strategy is > really only appropriate once btrfs has settled down and matured somewhat > more. > > So here's the idea. > > 1) Btrfs send directly to files on some other filesystem, perhaps xfs, > intended to be used with larger files. This can either be non- > incremental, or (much like full and incremental tape backups) initial > full, plus incremental sends. I had not thought of the tape-archive method, interesting :) I am using this more or less, although not fully automated. It looks like: btrfs send -p snap_base snap_last | tee /path-on-non-btrfs-fs/snap_base..snap_last.btrfs | btrfs receive /path-on-btrfs-fs/ The key thing is to keep diff as small as possible to that I can transport them over ~1 Mbps internet. But sometime the diff is huge, for example when an upgrade of an OS in a VM has been done. So then I carry the snap_last.btrfs 'by hand'. If you mean sort of : xfs receive /path-on-xfs-fs/ for the last step in the commadline pipe then this 'xfs receive' implementation would face quite some challenges I think, but not impossible I think > 2) Store the backups as those send files, much like tape backup > archives. One option would be to do the initial full send, and then > incremental sends as new files, until the multi-TB drive containing the > backups is full, at which point replace it and start with a new full send > to the fresh xfs or whatever on the new drive. The issue here is that at the point you do a new full backup, you will need more than double the space of the original in order to still have a valid backup all the time. If it is backing up 'small SSD' to 'big HDD', then not such an issue > > 3) When a restore is needed, then and only then, play back those backups > to a newly created btrfs using btrfs receive. If the above initial full > plus incrementals until the backup media is full strategy is used, the > incrementals can be played back against the initial full, just as the > send was originally done. Yes indeed. My motivation for this method was/is that unpacking (so doing the btrfs receive ) takes time if is is a huge number of small files on a HDD > Seems to me this should work fine, except as I said, that receive errors > would only be caught at the time receive is actually run, which would be > on restore. But as most of those errors tend to be due to incremental > bugs, doing full sends all the time would eliminate them, at the cost of > much higher space usage over time, of course. And if incrementals /are/ > done, with any luck, replay won't be for quite some time and thus using a > much newer and hopefully more mature btrfs receive, with fewer bugs due > to the bugs caught in the intervening time. Additionally, with any luck, > several generations of full backup plus incrementals will have been done > before the need to replay even one set, thus sparing the need to reply > the intervening sets entirely. On the other hand, not replaying them makes it that they cannot be used for a lower performance backup or clone server and there is no way to check the actual state. And there could also be silent send errors. If you do playback immediately, creating a writable snapshot on master and clone(s) sides allows online checking potential diffs (rsync -c ) and copying the differences. Using btrfs sub find-new , I once then discovered some 100 MB difference in a multi-TB data set. It were only 2 OS/VM image files, on different clones. It probably has happened sometime early 2015, but quite unsure, so not sure which kernel/tools. > It's an interesting strategy to consider, particularly for long-term > backups, to say Amazon Glacier for instance, where immediate retrieval > and/or retrieval of individual files isn't envisioned. Obviously for > glacier or similar storage, an intermediate encryption step could be > added, with encryption to whatever strength deemed appropriate, if > considered necessary to thwart the NSA and similar nation-level advanced- > persistent-threats on cloud-hosted storage. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff: 2016-02-25 2:07 ` Henk Slager @ 2016-02-25 4:18 ` Duncan 0 siblings, 0 replies; 5+ messages in thread From: Duncan @ 2016-02-25 4:18 UTC (permalink / raw) To: linux-btrfs Henk Slager posted on Thu, 25 Feb 2016 03:07:12 +0100 as excerpted: > On Tue, Feb 23, 2016 at 1:19 AM, Duncan <1i5t5.duncan@cox.net> wrote: >> >> I've not seen anyone else explicitly list the following as a practical >> btrfs send/receive backup strategy, but it does rather directly follow >> from the STDOUT/STDIN usage of the tools as practical, at least in >> theory. My primary worry would be the general one of btrfs maturity, >> that it and the tools including btrfs send and receive are still >> stabilizing and maturing, with occasional bugs being found, and the >> following strategy won't find the receive bugs until restore time, at >> which point you might be depending on it working, so the strategy is >> really only appropriate once btrfs has settled down and matured >> somewhat more. >> >> So here's the idea. >> >> 1) Btrfs send directly to files on some other filesystem, perhaps xfs, >> intended to be used with larger files. This can either be non- >> incremental, or (much like full and incremental tape backups) initial >> full, plus incremental sends. > > I had not thought of the tape-archive method, interesting :) > I am using this more or less, although not fully automated. It looks > like: > > btrfs send -p snap_base snap_last | tee > /path-on-non-btrfs-fs/snap_base..snap_last.btrfs | btrfs receive > /path-on-btrfs-fs/ > > The key thing is to keep diff as small as possible to that I can > transport them over ~1 Mbps internet. But sometime the diff is huge, > for example when an upgrade of an OS in a VM has been done. So then I > carry the snap_last.btrfs 'by hand'. > > If you mean sort of : xfs receive /path-on-xfs-fs/ for the last step > in the commadline pipe then this 'xfs receive' implementation would > face quite some challenges I think, but not impossible I think No, I meant pretty much what you are doing, except just directing to a file, instead of using tee and sending it to btrfs receive as well, as the usage of tee is a variant I hadn't thought of, but is actually a quite creative solution to the problem you describe. =:^) The reason I suggested xfs is that, based on what I know at least, xfs is supposed to be real good at handling large files, generally using a large block size, etc. Perfect for long-term storage of likely multi-gig serialized backup streams. But something like fat32, setup with a large block size, should work well too, and its lack of ownership metadata shouldn't really be an issue when the files are all simply rather large stream backups. And actually, to make the parallel to tape backup even more direct, I /believe/ you could potentially use tar or the like for its original purpose as a tape-archive, feeding the streams via tar directly to raw device without a filesystem at all, just tar, which I /believe/ would provide indexing and let you later write a second (incremental) btrfs send file after the first one, and later a third after the second, etc. Except I'm not actually familiar with using tar that way, and it's quite possible tar doesn't work the way I think it does in that regard and/or simply isn't the best tool for that job. But in theory at least, as long as you either manually tracked the blocks used for each send stream, or had something like tar doing it automatically, you wouldn't even need a proper filesystem and could use a raw device, either block device like a big multi-terabyte disk, or even a char/stream device like an archiving tape-drive. >> 2) Store the backups as those send files, much like tape backup >> archives. One option would be to do the initial full send, and then >> incremental sends as new files, until the multi-TB drive containing the >> backups is full, at which point replace it and start with a new full >> send to the fresh xfs or whatever on the new drive. > > The issue here is that at the point you do a new full backup, you will > need more than double the space of the original in order to still have a > valid backup all the time. If it is backing up 'small SSD' to 'big HDD', > then not such an issue The idea here would be to rotate backup media. But you are correct, in simplest form you'd need larger backup media than the original being backed up, tho that might be small ssd to big hdd, or simply 1 TB hdd to say one of those huge 8 TB SMR drives, which I believe are actually /intended/ for long-term archiving in this manner. So taking that 1 TB hdd to 8 TB SMR archiving hdd example, you wouldn't let the 1 TB get totally full, so say 700 GB of data in the original full send. Then say incrementals average 50 GB. (We're using units-of-ten here instead of GiB/TiB just to make it easier. After all, this is only an illustration.) 8 TB - 700 GB = 7.3 TB = 7300 GB left. 7300 GB / 50 GB = space for 146 incrementals averaging 50 GB each. So say that's 50 GB per day average with daily incrementals. That'll fill roughly 2.5 8 TB archive drives per year, so to make the numbers nice and round, say five drives in rotation, keeping two year's worth of backups. And each time you switch out archive drives, at least twice a year, you start with a full send, so you have it conveniently there on the same device as the incrementals and don't have to worry about tracking a second drive with the full send before you can replay your incrementals. Of course if your primary/working and backup media are closer to the same size, perhaps a 4 device by 4 GB btrfs raid10, 8 GB usable space, working copy, 8 GB archive backups, with a correspondingly larger average incremental send size as well, you'd use pairs of backup devices, one for the full send, one for the backups, and rotate in a second 8 GB each device pair when the first pair got full. And there's all sorts of individual variants on the same theme. > > >> 3) When a restore is needed, then and only then, play back those >> backups to a newly created btrfs using btrfs receive. If the above >> initial full plus incrementals until the backup media is full strategy >> is used, the incrementals can be played back against the initial full, >> just as the send was originally done. > > Yes indeed. My motivation for this method was/is that unpacking (so > doing the btrfs receive ) takes time if is is a huge number of small > files on a HDD And the advantage, until a restore is actually needed, no playback is done. So in the above five archive devices over two years case, if the production copy continues working for two years, that's say four full sends and 146*4 incrementals that will never need played back at all, thus reclaiming the time and energy that would have been unnecessarily spent maintaining the played back copy over that period. >> Seems to me this should work fine, except as I said, that receive >> errors would only be caught at the time receive is actually run, which >> would be on restore. But as most of those errors tend to be due to >> incremental bugs, doing full sends all the time would eliminate them, >> at the cost of much higher space usage over time, of course. And if >> incrementals /are/ done, with any luck, replay won't be for quite some >> time and thus using a much newer and hopefully more mature btrfs >> receive, with fewer bugs due to the bugs caught in the intervening >> time. Additionally, with any luck, >> several generations of full backup plus incrementals will have been >> done before the need to replay even one set, thus sparing the need to >> reply the intervening sets entirely. > > On the other hand, not replaying them makes it that they cannot be used > for a lower performance backup or clone server and there is no way to > check the actual state. And there could also be silent send errors. > If you do playback immediately, creating a writable snapshot on master > and clone(s) sides allows online checking potential diffs (rsync -c ) > and copying the differences. That being the primary disadvantage I suggested, and the reason one probably would not want to use this method until btrfs including send/ receive are fully stabilized, because simply put, it's not yet stable enough to actually trust without verification, that it'd actually receive properly, at this point. But once btrfs is fully stable and people are routinely using send/ receive without known bugs for quite some time, then this scenario may well be quite viable. > Using btrfs sub find-new , I once then discovered some 100 MB > difference in a multi-TB data set. It were only 2 OS/VM image files, > on different clones. It probably has happened sometime early 2015, but > quite unsure, so not sure which kernel/tools. To my knowledge, there has been exactly one such known bug, since the initial feature introduction bugs were worked thru anyway, where a successful send and receive didn't produce a valid copy, and AFAIK, that didn't actually turn out to be a send/receive bug, but ultimately traced to a more general btrfs bug, and it was simply send/receive that happened to catch it. So it'd be interesting to have more information about that event and track down what happened. But it's likely to be way too late, with way too little reliable information about it still available, to do anything about it now. Meanwhile, most of the bugs I'm aware of anyway have been in receive processing various corner-cases. And as I pointed out, if you aren't replaying/receiving immediately, but instead, archiving the raw send streams to be replayed later, in the event you /do/ need to replay that stream to receive, receive will have matured further in the mean time, compared to if you played back the stream to a receive of the same version as the send used to produce the stream. If it's two years later, that's two years worth of further bugs that have been fixed in the mean time, so in theory at least, chances for a successful replay and receive should better after waiting than they would have been had the replay been done immediately. Which /somewhat/ counteracts the problem of btrfs receive in particular not yet being totally mature. However, it doesn't entirely counteract the problem, and I'd still consider this solution to be too dangerous to use in practice at the current time. Tho in say five years, it should be a much more reasonably viable solution to consider. But I /really/ like your tee idea in this regard. =:^) For people doing multiple levels of backup, that lets them enjoy the best of both worlds, while effectively taking care of two levels of backup at the same time. By teeing and replaying immediately (or simply replaying the stored send stream immediately, then keeping it instead of deleting it), you test that it works, and end up with the working btrfs level of backup. By then continuing to archive the now tested send stream, you have a second level of backup, that can be replayed again, should something happen to both the production version and the btrfs level backup that you were replaying to for testing and primary backup. That effectively gives you the best of both worlds. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-25 4:18 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <56CB737D.1020403@polarismail.net> 2016-02-22 21:11 ` Fun BTRFS Stuff: Marc MERLIN 2016-02-22 21:18 ` Hugo Mills 2016-02-23 0:19 ` Duncan 2016-02-25 2:07 ` Henk Slager 2016-02-25 4:18 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).