* Re: Fun BTRFS Stuff:
[not found] <56CB737D.1020403@polarismail.net>
@ 2016-02-22 21:11 ` Marc MERLIN
2016-02-22 21:18 ` Hugo Mills
0 siblings, 1 reply; 5+ messages in thread
From: Marc MERLIN @ 2016-02-22 21:11 UTC (permalink / raw)
To: Terrance Harris; +Cc: linux-btrfs
On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote:
> Hello,
>
> I'm a btrfs novice, but i've been using it for a few years now on
> openSUSE Tumblweed.
Howdy.
First, please use the linux-btrfs@vger.kernel.org mailing list
> Is there a way to convert snaBpshots into mountable files using btrfs send?
I am not sure I'm parsing your question.
btrfs send/receive copy read only snapshots between 2 btrfs volumes
If you mean using a non differential btrfs send to a file, and then using
that file to act as if it were a filesystem you can read data from, I don't
believe this is easily possible currently (it's possible, just no tool
exists to do that). You're supposed to send it to btrfs receive, have it
saved on a filesystem, and then use that.
> I'm looking make it easy to copy off btrfs files even if my btrfs system
> is in total fubar mode.
you need to send that btrfs send stream to btrfs receive onto another
filesystem, and use that for restores.
Hope this helps
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff:
2016-02-22 21:11 ` Fun BTRFS Stuff: Marc MERLIN
@ 2016-02-22 21:18 ` Hugo Mills
2016-02-23 0:19 ` Duncan
0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2016-02-22 21:18 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Terrance Harris, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]
On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote:
> On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote:
> > Hello,
> >
> > I'm a btrfs novice, but i've been using it for a few years now on
> > openSUSE Tumblweed.
>
> Howdy.
>
> First, please use the linux-btrfs@vger.kernel.org mailing list
>
> > Is there a way to convert snaBpshots into mountable files using btrfs send?
>
> I am not sure I'm parsing your question.
> btrfs send/receive copy read only snapshots between 2 btrfs volumes
>
> If you mean using a non differential btrfs send to a file, and then using
> that file to act as if it were a filesystem you can read data from, I don't
> believe this is easily possible currently (it's possible, just no tool
> exists to do that). You're supposed to send it to btrfs receive, have it
> saved on a filesystem, and then use that.
It's not really possible with any degree of sanity. There's no
indexing in the send stream, so read acesses would have to scan the
whole file every time.
If you want to read the contents of a send stream in an order other
than the (arbitrary) one it's sent in, you need to replay it on a
filesystem with receive.
Hugo.
--
Hugo Mills | You shouldn't anthropomorphise computers. They
hugo@... carfax.org.uk | really don't like that.
http://carfax.org.uk/ |
PGP: E2AB1DE4 |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff:
2016-02-22 21:18 ` Hugo Mills
@ 2016-02-23 0:19 ` Duncan
2016-02-25 2:07 ` Henk Slager
0 siblings, 1 reply; 5+ messages in thread
From: Duncan @ 2016-02-23 0:19 UTC (permalink / raw)
To: linux-btrfs
Hugo Mills posted on Mon, 22 Feb 2016 21:18:45 +0000 as excerpted:
> On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote:
>> On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote:
>> > Hello,
>> >
>> > I'm a btrfs novice, but i've been using it for a few years now on
>> > openSUSE Tumblweed.
>>
>> Howdy.
>>
>> First, please use the linux-btrfs@vger.kernel.org mailing list
>>
>> > Is there a way to convert snaBpshots into mountable files using btrfs
>> > send?
>>
>> I am not sure I'm parsing your question.
>> btrfs send/receive copy read only snapshots between 2 btrfs volumes
>>
>> If you mean using a non differential btrfs send to a file, and then
>> using that file to act as if it were a filesystem you can read data
>> from, I don't believe this is easily possible currently (it's possible,
>> just no tool exists to do that). You're supposed to send it to btrfs
>> receive, have it saved on a filesystem, and then use that.
>
> It's not really possible with any degree of sanity. There's no
> indexing in the send stream, so read acesses would have to scan the
> whole file every time.
>
> If you want to read the contents of a send stream in an order other
> than the (arbitrary) one it's sent in, you need to replay it on a
> filesystem with receive.
In that way, btrfs send reminds me very much of the old tape-archive
backup method. In general, they were serialized copies of whatever they
were archiving as well, intended primarily to be replayed as a whole onto
a new filesystem, after which individual files could be accessed from the
filesystem, not directly from the tape archive. Altho with indexing
files could be read-back/restored directly from tape, neither the format
nor the media was really designed for it.
I've not seen anyone else explicitly list the following as a practical
btrfs send/receive backup strategy, but it does rather directly follow
from the STDOUT/STDIN usage of the tools as practical, at least in
theory. My primary worry would be the general one of btrfs maturity,
that it and the tools including btrfs send and receive are still
stabilizing and maturing, with occasional bugs being found, and the
following strategy won't find the receive bugs until restore time, at
which point you might be depending on it working, so the strategy is
really only appropriate once btrfs has settled down and matured somewhat
more.
So here's the idea.
1) Btrfs send directly to files on some other filesystem, perhaps xfs,
intended to be used with larger files. This can either be non-
incremental, or (much like full and incremental tape backups) initial
full, plus incremental sends.
2) Store the backups as those send files, much like tape backup
archives. One option would be to do the initial full send, and then
incremental sends as new files, until the multi-TB drive containing the
backups is full, at which point replace it and start with a new full send
to the fresh xfs or whatever on the new drive.
3) When a restore is needed, then and only then, play back those backups
to a newly created btrfs using btrfs receive. If the above initial full
plus incrementals until the backup media is full strategy is used, the
incrementals can be played back against the initial full, just as the
send was originally done.
Seems to me this should work fine, except as I said, that receive errors
would only be caught at the time receive is actually run, which would be
on restore. But as most of those errors tend to be due to incremental
bugs, doing full sends all the time would eliminate them, at the cost of
much higher space usage over time, of course. And if incrementals /are/
done, with any luck, replay won't be for quite some time and thus using a
much newer and hopefully more mature btrfs receive, with fewer bugs due
to the bugs caught in the intervening time. Additionally, with any luck,
several generations of full backup plus incrementals will have been done
before the need to replay even one set, thus sparing the need to reply
the intervening sets entirely.
It's an interesting strategy to consider, particularly for long-term
backups, to say Amazon Glacier for instance, where immediate retrieval
and/or retrieval of individual files isn't envisioned. Obviously for
glacier or similar storage, an intermediate encryption step could be
added, with encryption to whatever strength deemed appropriate, if
considered necessary to thwart the NSA and similar nation-level advanced-
persistent-threats on cloud-hosted storage.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff:
2016-02-23 0:19 ` Duncan
@ 2016-02-25 2:07 ` Henk Slager
2016-02-25 4:18 ` Duncan
0 siblings, 1 reply; 5+ messages in thread
From: Henk Slager @ 2016-02-25 2:07 UTC (permalink / raw)
To: linux-btrfs
On Tue, Feb 23, 2016 at 1:19 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Hugo Mills posted on Mon, 22 Feb 2016 21:18:45 +0000 as excerpted:
>
>> On Mon, Feb 22, 2016 at 01:11:42PM -0800, Marc MERLIN wrote:
>>> On Mon, Feb 22, 2016 at 02:45:49PM -0600, Terrance Harris wrote:
>>> > Hello,
>>> >
>>> > I'm a btrfs novice, but i've been using it for a few years now on
>>> > openSUSE Tumblweed.
>>>
>>> Howdy.
>>>
>>> First, please use the linux-btrfs@vger.kernel.org mailing list
>>>
>>> > Is there a way to convert snaBpshots into mountable files using btrfs
>>> > send?
>>>
>>> I am not sure I'm parsing your question.
>>> btrfs send/receive copy read only snapshots between 2 btrfs volumes
>>>
>>> If you mean using a non differential btrfs send to a file, and then
>>> using that file to act as if it were a filesystem you can read data
>>> from, I don't believe this is easily possible currently (it's possible,
>>> just no tool exists to do that). You're supposed to send it to btrfs
>>> receive, have it saved on a filesystem, and then use that.
>>
>> It's not really possible with any degree of sanity. There's no
>> indexing in the send stream, so read acesses would have to scan the
>> whole file every time.
>>
>> If you want to read the contents of a send stream in an order other
>> than the (arbitrary) one it's sent in, you need to replay it on a
>> filesystem with receive.
>
> In that way, btrfs send reminds me very much of the old tape-archive
> backup method. In general, they were serialized copies of whatever they
> were archiving as well, intended primarily to be replayed as a whole onto
> a new filesystem, after which individual files could be accessed from the
> filesystem, not directly from the tape archive. Altho with indexing
> files could be read-back/restored directly from tape, neither the format
> nor the media was really designed for it.
>
> I've not seen anyone else explicitly list the following as a practical
> btrfs send/receive backup strategy, but it does rather directly follow
> from the STDOUT/STDIN usage of the tools as practical, at least in
> theory. My primary worry would be the general one of btrfs maturity,
> that it and the tools including btrfs send and receive are still
> stabilizing and maturing, with occasional bugs being found, and the
> following strategy won't find the receive bugs until restore time, at
> which point you might be depending on it working, so the strategy is
> really only appropriate once btrfs has settled down and matured somewhat
> more.
>
> So here's the idea.
>
> 1) Btrfs send directly to files on some other filesystem, perhaps xfs,
> intended to be used with larger files. This can either be non-
> incremental, or (much like full and incremental tape backups) initial
> full, plus incremental sends.
I had not thought of the tape-archive method, interesting :)
I am using this more or less, although not fully automated. It looks like:
btrfs send -p snap_base snap_last | tee
/path-on-non-btrfs-fs/snap_base..snap_last.btrfs | btrfs receive
/path-on-btrfs-fs/
The key thing is to keep diff as small as possible to that I can
transport them over ~1 Mbps internet. But sometime the diff is huge,
for example when an upgrade of an OS in a VM has been done. So then I
carry the snap_last.btrfs 'by hand'.
If you mean sort of : xfs receive /path-on-xfs-fs/ for the last step
in the commadline pipe
then this 'xfs receive' implementation would face quite some
challenges I think, but not impossible I think
> 2) Store the backups as those send files, much like tape backup
> archives. One option would be to do the initial full send, and then
> incremental sends as new files, until the multi-TB drive containing the
> backups is full, at which point replace it and start with a new full send
> to the fresh xfs or whatever on the new drive.
The issue here is that at the point you do a new full backup, you will
need more than double the space of the original in order to still have
a valid backup all the time. If it is backing up 'small SSD' to 'big
HDD', then not such an issue
>
> 3) When a restore is needed, then and only then, play back those backups
> to a newly created btrfs using btrfs receive. If the above initial full
> plus incrementals until the backup media is full strategy is used, the
> incrementals can be played back against the initial full, just as the
> send was originally done.
Yes indeed. My motivation for this method was/is that unpacking (so
doing the btrfs receive ) takes time if is is a huge number of small
files on a HDD
> Seems to me this should work fine, except as I said, that receive errors
> would only be caught at the time receive is actually run, which would be
> on restore. But as most of those errors tend to be due to incremental
> bugs, doing full sends all the time would eliminate them, at the cost of
> much higher space usage over time, of course. And if incrementals /are/
> done, with any luck, replay won't be for quite some time and thus using a
> much newer and hopefully more mature btrfs receive, with fewer bugs due
> to the bugs caught in the intervening time. Additionally, with any luck,
> several generations of full backup plus incrementals will have been done
> before the need to replay even one set, thus sparing the need to reply
> the intervening sets entirely.
On the other hand, not replaying them makes it that they cannot be
used for a lower performance backup or clone server and there is no
way to check the actual state. And there could also be silent send
errors.
If you do playback immediately, creating a writable snapshot on master
and clone(s) sides allows online checking potential diffs (rsync -c )
and copying the differences.
Using btrfs sub find-new , I once then discovered some 100 MB
difference in a multi-TB data set. It were only 2 OS/VM image files,
on different clones. It probably has happened sometime early 2015, but
quite unsure, so not sure which kernel/tools.
> It's an interesting strategy to consider, particularly for long-term
> backups, to say Amazon Glacier for instance, where immediate retrieval
> and/or retrieval of individual files isn't envisioned. Obviously for
> glacier or similar storage, an intermediate encryption step could be
> added, with encryption to whatever strength deemed appropriate, if
> considered necessary to thwart the NSA and similar nation-level advanced-
> persistent-threats on cloud-hosted storage.
>
> --
> Duncan - List replies preferred. No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master." Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fun BTRFS Stuff:
2016-02-25 2:07 ` Henk Slager
@ 2016-02-25 4:18 ` Duncan
0 siblings, 0 replies; 5+ messages in thread
From: Duncan @ 2016-02-25 4:18 UTC (permalink / raw)
To: linux-btrfs
Henk Slager posted on Thu, 25 Feb 2016 03:07:12 +0100 as excerpted:
> On Tue, Feb 23, 2016 at 1:19 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> I've not seen anyone else explicitly list the following as a practical
>> btrfs send/receive backup strategy, but it does rather directly follow
>> from the STDOUT/STDIN usage of the tools as practical, at least in
>> theory. My primary worry would be the general one of btrfs maturity,
>> that it and the tools including btrfs send and receive are still
>> stabilizing and maturing, with occasional bugs being found, and the
>> following strategy won't find the receive bugs until restore time, at
>> which point you might be depending on it working, so the strategy is
>> really only appropriate once btrfs has settled down and matured
>> somewhat more.
>>
>> So here's the idea.
>>
>> 1) Btrfs send directly to files on some other filesystem, perhaps xfs,
>> intended to be used with larger files. This can either be non-
>> incremental, or (much like full and incremental tape backups) initial
>> full, plus incremental sends.
>
> I had not thought of the tape-archive method, interesting :)
> I am using this more or less, although not fully automated. It looks
> like:
>
> btrfs send -p snap_base snap_last | tee
> /path-on-non-btrfs-fs/snap_base..snap_last.btrfs | btrfs receive
> /path-on-btrfs-fs/
>
> The key thing is to keep diff as small as possible to that I can
> transport them over ~1 Mbps internet. But sometime the diff is huge,
> for example when an upgrade of an OS in a VM has been done. So then I
> carry the snap_last.btrfs 'by hand'.
>
> If you mean sort of : xfs receive /path-on-xfs-fs/ for the last step
> in the commadline pipe then this 'xfs receive' implementation would
> face quite some challenges I think, but not impossible I think
No, I meant pretty much what you are doing, except just directing to a
file, instead of using tee and sending it to btrfs receive as well, as
the usage of tee is a variant I hadn't thought of, but is actually a
quite creative solution to the problem you describe. =:^)
The reason I suggested xfs is that, based on what I know at least, xfs is
supposed to be real good at handling large files, generally using a large
block size, etc. Perfect for long-term storage of likely multi-gig
serialized backup streams. But something like fat32, setup with a large
block size, should work well too, and its lack of ownership metadata
shouldn't really be an issue when the files are all simply rather large
stream backups.
And actually, to make the parallel to tape backup even more direct, I
/believe/ you could potentially use tar or the like for its original
purpose as a tape-archive, feeding the streams via tar directly to raw
device without a filesystem at all, just tar, which I /believe/ would
provide indexing and let you later write a second (incremental) btrfs
send file after the first one, and later a third after the second, etc.
Except I'm not actually familiar with using tar that way, and it's quite
possible tar doesn't work the way I think it does in that regard and/or
simply isn't the best tool for that job. But in theory at least, as long
as you either manually tracked the blocks used for each send stream, or
had something like tar doing it automatically, you wouldn't even need a
proper filesystem and could use a raw device, either block device like a
big multi-terabyte disk, or even a char/stream device like an archiving
tape-drive.
>> 2) Store the backups as those send files, much like tape backup
>> archives. One option would be to do the initial full send, and then
>> incremental sends as new files, until the multi-TB drive containing the
>> backups is full, at which point replace it and start with a new full
>> send to the fresh xfs or whatever on the new drive.
>
> The issue here is that at the point you do a new full backup, you will
> need more than double the space of the original in order to still have a
> valid backup all the time. If it is backing up 'small SSD' to 'big HDD',
> then not such an issue
The idea here would be to rotate backup media. But you are correct, in
simplest form you'd need larger backup media than the original being
backed up, tho that might be small ssd to big hdd, or simply 1 TB hdd to
say one of those huge 8 TB SMR drives, which I believe are actually
/intended/ for long-term archiving in this manner.
So taking that 1 TB hdd to 8 TB SMR archiving hdd example, you wouldn't
let the 1 TB get totally full, so say 700 GB of data in the original full
send. Then say incrementals average 50 GB. (We're using units-of-ten
here instead of GiB/TiB just to make it easier. After all, this is only
an illustration.)
8 TB - 700 GB = 7.3 TB = 7300 GB left. 7300 GB / 50 GB = space for 146
incrementals averaging 50 GB each. So say that's 50 GB per day average
with daily incrementals. That'll fill roughly 2.5 8 TB archive drives
per year, so to make the numbers nice and round, say five drives in
rotation, keeping two year's worth of backups.
And each time you switch out archive drives, at least twice a year, you
start with a full send, so you have it conveniently there on the same
device as the incrementals and don't have to worry about tracking a
second drive with the full send before you can replay your incrementals.
Of course if your primary/working and backup media are closer to the same
size, perhaps a 4 device by 4 GB btrfs raid10, 8 GB usable space, working
copy, 8 GB archive backups, with a correspondingly larger average
incremental send size as well, you'd use pairs of backup devices, one for
the full send, one for the backups, and rotate in a second 8 GB each
device pair when the first pair got full. And there's all sorts of
individual variants on the same theme.
>
>
>> 3) When a restore is needed, then and only then, play back those
>> backups to a newly created btrfs using btrfs receive. If the above
>> initial full plus incrementals until the backup media is full strategy
>> is used, the incrementals can be played back against the initial full,
>> just as the send was originally done.
>
> Yes indeed. My motivation for this method was/is that unpacking (so
> doing the btrfs receive ) takes time if is is a huge number of small
> files on a HDD
And the advantage, until a restore is actually needed, no playback is
done. So in the above five archive devices over two years case, if the
production copy continues working for two years, that's say four full
sends and 146*4 incrementals that will never need played back at all,
thus reclaiming the time and energy that would have been unnecessarily
spent maintaining the played back copy over that period.
>> Seems to me this should work fine, except as I said, that receive
>> errors would only be caught at the time receive is actually run, which
>> would be on restore. But as most of those errors tend to be due to
>> incremental bugs, doing full sends all the time would eliminate them,
>> at the cost of much higher space usage over time, of course. And if
>> incrementals /are/ done, with any luck, replay won't be for quite some
>> time and thus using a much newer and hopefully more mature btrfs
>> receive, with fewer bugs due to the bugs caught in the intervening
>> time. Additionally, with any luck,
>> several generations of full backup plus incrementals will have been
>> done before the need to replay even one set, thus sparing the need to
>> reply the intervening sets entirely.
>
> On the other hand, not replaying them makes it that they cannot be used
> for a lower performance backup or clone server and there is no way to
> check the actual state. And there could also be silent send errors.
> If you do playback immediately, creating a writable snapshot on master
> and clone(s) sides allows online checking potential diffs (rsync -c )
> and copying the differences.
That being the primary disadvantage I suggested, and the reason one
probably would not want to use this method until btrfs including send/
receive are fully stabilized, because simply put, it's not yet stable
enough to actually trust without verification, that it'd actually receive
properly, at this point.
But once btrfs is fully stable and people are routinely using send/
receive without known bugs for quite some time, then this scenario may
well be quite viable.
> Using btrfs sub find-new , I once then discovered some 100 MB
> difference in a multi-TB data set. It were only 2 OS/VM image files,
> on different clones. It probably has happened sometime early 2015, but
> quite unsure, so not sure which kernel/tools.
To my knowledge, there has been exactly one such known bug, since the
initial feature introduction bugs were worked thru anyway, where a
successful send and receive didn't produce a valid copy, and AFAIK, that
didn't actually turn out to be a send/receive bug, but ultimately traced
to a more general btrfs bug, and it was simply send/receive that happened
to catch it.
So it'd be interesting to have more information about that event and
track down what happened. But it's likely to be way too late, with way
too little reliable information about it still available, to do anything
about it now.
Meanwhile, most of the bugs I'm aware of anyway have been in receive
processing various corner-cases. And as I pointed out, if you aren't
replaying/receiving immediately, but instead, archiving the raw send
streams to be replayed later, in the event you /do/ need to replay that
stream to receive, receive will have matured further in the mean time,
compared to if you played back the stream to a receive of the same
version as the send used to produce the stream. If it's two years later,
that's two years worth of further bugs that have been fixed in the mean
time, so in theory at least, chances for a successful replay and receive
should better after waiting than they would have been had the replay been
done immediately.
Which /somewhat/ counteracts the problem of btrfs receive in particular
not yet being totally mature. However, it doesn't entirely counteract
the problem, and I'd still consider this solution to be too dangerous to
use in practice at the current time. Tho in say five years, it should be
a much more reasonably viable solution to consider.
But I /really/ like your tee idea in this regard. =:^) For people doing
multiple levels of backup, that lets them enjoy the best of both worlds,
while effectively taking care of two levels of backup at the same time.
By teeing and replaying immediately (or simply replaying the stored send
stream immediately, then keeping it instead of deleting it), you test
that it works, and end up with the working btrfs level of backup. By
then continuing to archive the now tested send stream, you have a second
level of backup, that can be replayed again, should something happen to
both the production version and the btrfs level backup that you were
replaying to for testing and primary backup.
That effectively gives you the best of both worlds. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-25 4:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <56CB737D.1020403@polarismail.net>
2016-02-22 21:11 ` Fun BTRFS Stuff: Marc MERLIN
2016-02-22 21:18 ` Hugo Mills
2016-02-23 0:19 ` Duncan
2016-02-25 2:07 ` Henk Slager
2016-02-25 4:18 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).