From: "João Eduardo Luís" <jecluis@gmail.com>
To: Hugo Mills <hugo@carfax.org.uk>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Snapshot reconciliation
Date: Fri, 1 Jul 2011 15:52:24 +0100 [thread overview]
Message-ID: <F02368DC-096F-49B5-A577-2955DE12D6EB@gmail.com> (raw)
In-Reply-To: <20110628190632.GB24675@carfax.org.uk>
[-- Attachment #1: Type: text/plain, Size: 3057 bytes --]
On Jun 28, 2011, at 8:06 PM, Hugo Mills wrote:
> On Tue, Jun 28, 2011 at 06:55:41PM +0100, João Eduardo Luís wrote:
>> On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote:
>>
>>> 2011/6/28 João Eduardo Luís <jecluis@gmail.com>:
>>>> Hello.
>>>>
>>>> Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot?
>>>>
>>>> This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume.
>>>
>>> generic deduplication?
>>>
>>
>> I'm not sure if deduplication is what I'm looking for.
>>
>> What I actually want to achieve is to reconstruct a file's data from
>> two diverging files. I.e., two snapshots are taken from the same
>> subvolume and, in each snapshot, a given file A is written
>> to. Assuming different blocks were written on, and no expected
>> semantics are violated, what I aim to achieve is the correct
>> reconciliation of file A in one of the snapshots.
>>
>> Maybe this could be achieved by using deduplication. I'll look into
>> those patches. Even if they are not completely useful, they very
>> well contain some neat concept that may be used to solve this little
>> puzzle of mine. :-)
>
> You would need to enumerate the extents on each representation of
> the file, picking the ones with the latest transid in each case. You
> would then need to work out what the extents on the reconstructed file
> would look like, and glue them all together into a new file.
>
In my case, I don't need to search the latest transid, since I keep an in-memory log of changes made within each snapshot. As these snapshots are ephemeral and created/destroyed on-demand by a user-level application, the associated cost of keeping such per-snapshot log doesn't seem to cause much impact on the performance.
However, I log operations performed on a per-page basis. Glueing modified extents on each snapshot doesn't seem viable without deduplicating them first, or I may end up losing updates I did not intended to lose.
On the other hand, I'm afraid the deduplication will lead to severe disk fragmentation when performed on a page-basis (e.g., if changes are made on several non-contiguous pages within several extents, in the same file on different snapshots, I would end up with several smaller extents scattered throughout disk).
This is pretty much why I expected to be able to, literally, copy the changed pages from one snapshot to another, without deduplicating the extents. However, after spending the last couple of days looking for a simple way to do it, I now believe achieving this is far more complicated and prone to error (unless I missed something) than deduplicating the extents based on my logged information.
Any thoughts would be helpful.
---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
next prev parent reply other threads:[~2011-07-01 14:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-28 11:39 Snapshot reconciliation João Eduardo Luís
2011-06-28 15:07 ` C Anthony Risinger
2011-06-28 17:55 ` João Eduardo Luís
[not found] ` <20110628190632.GB24675@carfax.org.uk>
2011-07-01 14:52 ` João Eduardo Luís [this message]
2011-06-29 6:03 ` Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F02368DC-096F-49B5-A577-2955DE12D6EB@gmail.com \
--to=jecluis@gmail.com \
--cc=hugo@carfax.org.uk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).