From: Junio C Hamano <gitster@pobox.com>
To: Mike Hommey <mh@glandium.org>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: fast-import deltas
Date: Tue, 01 Apr 2014 10:14:02 -0700 [thread overview]
Message-ID: <xmqqk3b90y79.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140401141856.GA2497@glandium.org> (Mike Hommey's message of "Tue, 1 Apr 2014 23:18:56 +0900")
Mike Hommey <mh@glandium.org> writes:
> On Tue, Apr 01, 2014 at 09:15:12AM -0400, Jeff King wrote:
>> > It seems to me fast-import keeps a kind of human readable format for its
>> > protocol, i wonder if xdelta format would fit the bill. That being said,
>> > I also wonder if i shouldn't just try to write a pack on my own...
>>
>> The fast-import commands are human readable, but the blob contents are
>> included inline. I don't see how sending a binary delta is any worse
>> than sending a literal binary blob over the stream.
>
> OTOH, the xdelta format is not exactly straightforward to produce, with
> the variable length encoding of integers. Not exactly hard, but when
> everything else in fast-import is straightforward, one has to wonder.
Unless you already have your change in the xdelta on hand, or the
format your foreign change is in gives sufficient information to
produce a corresponding xdelta without looking at the content that
your foreign change applies to, it is silly to try to convert your
foreign change into xdelta and feed it to fast-import.
What constitutes "sufficient" information? The xdelta format is a
series of instructions that lets you:
- copy N bytes from offset in the source material to the
destination; or
- copy these N literal bytes to the destination.
to an existing piece of content, identified by the object name of
the "source material", to produce a result of "applying delta".
As an example, think about the case where you have *,v files used by
RCS (and CVS). The "foreign changes" given to you by that format is
a series of instructions that roughly corresponds to an "ed" script.
Insert these lines at the line number L, delete N lines from line
number K, etc. In order to convert such a change into xdelta, you
would need to know what these line numbers correspond to byte offset
in the original file. You also may want to know what the Git object
name for the original is, although in the fast-import stream you
might be able to get away by using the object mark facility.
Assuming that you do have and are willing to read the original file,
you have three possible (and one impractical) approaches:
- Apply the foreign changes to the original file yourself (as that
is the foreign system you are interested in, you know how to do
that much better than Git does), and produce xdelta between the
original and the result using only the original and the result.
- Apply the foreign changes to the original file yourself, and feed
the resulting content to fast-import in full, letting fast-import
convert into the format Git understands.
- Interpret the foreign changes, using the original file as a
reference, to convert it into xdelta.
- Teach fast-import how to interpret various formats that are used
to express foreign changes, and feed that.
In the first approach, this "given the original and the result,
produce xdelta between them" can be reused by other people's
system. You may be able to borrow diff-delta.c from us under our
licensing terms.
The second is the most straightforward; eventual deltification will
happen when the resulting repository is repacked and uses the same
code from diff-delta.c.
The third would be "*,v expresses the source location and length in
terms of lines, so look at the original to convert these into byte
offset and byte length xdelta wants", which I would think is silly.
And the last one is a maintenance nightmare I do not think we would
want to touch with a ten-foot pole.
In short, the most practical solution would be to reconstitute a
full object and feed that to fast-import, unless you already have
xdelta or you can turn your foreign change into xdelta without ever
looking at the original.
next prev parent reply other threads:[~2014-04-01 17:14 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-01 10:25 fast-import deltas Mike Hommey
2014-04-01 11:45 ` Jeff King
2014-04-01 13:07 ` Mike Hommey
2014-04-01 13:15 ` Jeff King
2014-04-01 14:18 ` Mike Hommey
2014-04-01 17:14 ` Junio C Hamano [this message]
2014-04-01 17:38 ` Jonathan Nieder
2014-04-01 22:10 ` Mike Hommey
2014-04-01 22:32 ` Junio C Hamano
2014-04-01 23:12 ` Mike Hommey
2014-04-01 23:29 ` Max Horn
2014-04-02 4:13 ` Mike Hommey
2014-04-09 17:44 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqk3b90y79.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.