* [offtopic?] xdelta patch format wrapper
@ 2008-02-13 1:53 Martin Langhoff
2008-02-13 3:32 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Martin Langhoff @ 2008-02-13 1:53 UTC (permalink / raw)
To: git; +Cc: jmacd
This is somewhat offtopic but this list is the best-informed crowd on
diff/xdelta matters so here I am, abusing your attention...
I am working on an "incremental content" feature for Moodle - and my
plan is to serve pre-computed "patchfiles" based on the xdelta utility
to the client systems.
Now, I need to provide a wrapper that concats the deltas from the xdelta
utility with a more verbose header - akin to the header in git's unified
diff output. This is because the xdelta utility only handles one file
delta at a time - the file is a pure delta (prefaced with a SHA1 of the
file it applies to, IIRC).
(From xdelta I like that it's one-way-only, and compressed internally --
thus saving a lot of space on large changes. There are small statically
linked binaries for windows, osx and linux. I did consider using git's
own diffs, but it involves significantly more work, and portability to
Win32 is still green for the wide distribution this project is expecting.)
So my question is what is a good format for the header? My thinking sofar:
- have a prefix to scan for, such as "xdelta" at the beginning of
the file, or after a newline/whitespace
- keep the <fromsha1> <tosha1> line
- \0 delimited filenames
- filenames as ambiguous bag'o'bytes or utf-8?
(should we have another flamewar on this? ;-) )
- keep file modes and perhaps support copy/move headers
- keep a/ b/ prefixes?
- last line in the header is length: <length-in-bytes>, followed by
a newline and the xdelta itself
- one or more newlines follow the end of the xdelta if there is another
header coming
Something along the lines of
xdelta d065883..74cd8e5 100644
a/foo.zip\0
b/foo.zip\0
length 1024
<1024 bytes of xdelta data>
xdelta d065883..74cd8e5 100644
a/bar.zip\0
b/bar.zip\0
length 92312
<92312 bytes xdelta data>EOF
---
Would the above work as a reasonably solid patch header format? Is there
something else I should be using instead of rolling my own? If there
isn't a more suitable tool, and I'm hoping to come up with something
unambiguous and reliable that is generally useful.
[ I have to confess, it came as a bit of a surprise that xdelta didn't
support this out of the box. Haven't seen any existing wrapper for it
that covers this either. ]
As a kludge I could tar both sides and xdelta across the tars, but it is
wasteful, and an xdelta-based diff/patch that can handle multiple files
does seem useful. And I don't know how stable the output of tar is (wrt
file ordering for example).
TIA for any feedback :-)
m
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 1:53 [offtopic?] xdelta patch format wrapper Martin Langhoff
@ 2008-02-13 3:32 ` Junio C Hamano
2008-02-13 3:46 ` Martin Langhoff
2008-02-13 4:13 ` Martin Langhoff
0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-02-13 3:32 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git, jmacd
Martin Langhoff <martin@catalyst.net.nz> writes:
> So my question is what is a good format for the header? My thinking sofar:
>
> - have a prefix to scan for, such as "xdelta" at the beginning of
> the file, or after a newline/whitespace
>
> - keep the <fromsha1> <tosha1> line
>
> - \0 delimited filenames
>
> - filenames as ambiguous bag'o'bytes or utf-8?
> (should we have another flamewar on this? ;-) )
>
> - keep file modes and perhaps support copy/move headers
>
> - keep a/ b/ prefixes?
>
> - last line in the header is length: <length-in-bytes>, followed by
> a newline and the xdelta itself
>
> - one or more newlines follow the end of the xdelta if there is another
> header coming
I am lost as to your objective because you seem to be keeping a
whole LOT more than I would have imagined for a specialized
purpose file format.
If you want to reuse that much of git, maybe our binary patch
format is good enough for you? We always produce two xdelta so
that we can apply in reverse, but it is Ok to add a one-way
option.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 3:32 ` Junio C Hamano
@ 2008-02-13 3:46 ` Martin Langhoff
2008-02-13 3:56 ` Junio C Hamano
2008-02-13 11:33 ` Johannes Schindelin
2008-02-13 4:13 ` Martin Langhoff
1 sibling, 2 replies; 7+ messages in thread
From: Martin Langhoff @ 2008-02-13 3:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jmacd
Junio C Hamano wrote:
> I am lost as to your objective because you seem to be keeping a
> whole LOT more than I would have imagined for a specialized
> purpose file format.
My source files are 2 zipfiles that I know contain 1 xml file, and then
may contain any arbitrary files. As a specialised file format is a
pretty general case ;-) Because of compression, xdeltas of the zipfiles
aren't good. So what I want to do is to diff the 2 unzipped directories
- nothing git-specific, I could use diff -urN.
Git diff *is* better in that it handles binary files, but we pay a
sizable cost in being reversible.
So I am thinking of doing is writing a wrapper that does the equivalent
of the "urN" flags to diff, but uses xdelta as the diffing algorithm.
As my case is rather general I suspect I'm better off biting the bullet
and writing something generally useful - it doesn't take that much more
effort and if it ends up being popular, I'll have some help with its
maintenance ;-)
In other words, I'm trolling for peer review to make sure the tool is
sane, and will be useful to others ;-)
> If you want to reuse that much of git
I don't think I'll use *any* git code at all for the time being. If it
was trivial to produce a statically compiled git-diff.exe and
git-apply-patch.exe that work without funny dependencies on any windows
box then I would. Don't think any of the windows ports of git are there
(even though they are excellent!).
cheers,
m
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
NZ: +64(4)916-7224 MOB: +64(21)364-017 UK: 0845 868 5733 ext 7224
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 3:46 ` Martin Langhoff
@ 2008-02-13 3:56 ` Junio C Hamano
2008-02-13 11:33 ` Johannes Schindelin
1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-02-13 3:56 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git, jmacd
Martin Langhoff <martin@catalyst.net.nz> writes:
> Junio C Hamano wrote:
>> I am lost as to your objective because you seem to be keeping a
>> whole LOT more than I would have imagined for a specialized
>> purpose file format.
>
> My source files are 2 zipfiles that I know contain 1 xml file, and then
> may contain any arbitrary files. As a specialised file format is a
> pretty general case ;-) Because of compression, xdeltas of the zipfiles
> aren't good. So what I want to do is to diff the 2 unzipped directories
> - nothing git-specific, I could use diff -urN.
>
> Git diff *is* better in that it handles binary files, but we pay a
> sizable cost in being reversible.
Did I forget to say that I am Ok with --oneway option?
In fact, we started as oneway but we _fixed_ it to make it
reversible soon after the initial version ;-) So "git apply"
still can grok oneway format.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 3:32 ` Junio C Hamano
2008-02-13 3:46 ` Martin Langhoff
@ 2008-02-13 4:13 ` Martin Langhoff
1 sibling, 0 replies; 7+ messages in thread
From: Martin Langhoff @ 2008-02-13 4:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jmacd
Junio C Hamano wrote:
> If you want to reuse that much of git
Wondering about the confusion over this. When I talk about using xdelta,
it's not the implementation in git. I intend to ship this xdelta.exe
http://evanjones.ca/software/xdelta-win32.html (for Windows users at
least!).
What I am sounding out is writing a wrapper written in PHP (I'd write it
in Perl, but we're already shipping the PHP interpreter) that does all
the parsing of the file, splits out the actual "xdelta" blobs and calls
xdelta.exe to apply them to the relevant files.
Someone more talented than me would write it in perfectly portable C so
that on day one works on Win32, OSX, unices and linuces. I can't so I'll
look like a wimp but I'll deliver something workable ;-) But there's no
reason the PHP or Perl implementation can't be considered a working
prototype for a subsequent C version.
Specially if the file format makes sense. And we've been complaining
about problems and ambiguities in the unified diff header. So... I'll
rephrase my question
"What would a unified diff header that didn't suck look like?"
(Ah, can't find the threads where the ambiguities of diff headers were
discussed. Alas, the Google Gods aren't with me today.)
cheers,
m
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 3:46 ` Martin Langhoff
2008-02-13 3:56 ` Junio C Hamano
@ 2008-02-13 11:33 ` Johannes Schindelin
2008-02-13 17:53 ` Martin Langhoff
1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2008-02-13 11:33 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Junio C Hamano, git, jmacd
Hi,
On Wed, 13 Feb 2008, Martin Langhoff wrote:
> I don't think I'll use *any* git code at all for the time being. If it
> was trivial to produce a statically compiled git-diff.exe and
> git-apply-patch.exe that work without funny dependencies on any windows
> box then I would.
It is trivial.
Except that we do not have any git-apply-patch.exe. Maybe you meant
git-apply.exe?
> Don't think any of the windows ports of git are there (even though they
> are excellent!).
How can you say that they are excellent, and then say they are not there
yet?
FWIW I just checked. In msysGit, git-apply.exe and git-diff.exe are
identical (no mystery there: they are both builtins), and weigh in with
2893142 bytes.
If you're serious about wanting something reliable, quick, but smaller
than that, it should be _trivial_ to cut down. For example, a simple
"strip git-diff.exe" brings it down to 821248 bytes.
And that's without removing all the other builtins, which would be
trivial, too (just cull "struct cmd_struct commands" in git.c, and
"BUILT_INS" and "BUILTIN_OBJS" in the Makefile).
Hth,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [offtopic?] xdelta patch format wrapper
2008-02-13 11:33 ` Johannes Schindelin
@ 2008-02-13 17:53 ` Martin Langhoff
0 siblings, 0 replies; 7+ messages in thread
From: Martin Langhoff @ 2008-02-13 17:53 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
Johannes Schindelin wrote:
> > FWIW I just checked. In msysGit, git-apply.exe and git-diff.exe are
> > identical (no mystery there: they are both builtins), and weigh in with
> > 2893142 bytes.
Hmmm. I thought they depended on msys infrastructure. Can I trivially
compile a statically linked git-diff.exe and git-apply.exe and expect
them to just work? How large would they be then?
msysGIT is *excellent* to get a full GIT install for development
purposes. The requirements in this case are different...
cheers,
m
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
NZ: +64(4)916-7224 MOB: +64(21)364-017 UK: 0845 868 5733 ext 7224
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-13 17:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13 1:53 [offtopic?] xdelta patch format wrapper Martin Langhoff
2008-02-13 3:32 ` Junio C Hamano
2008-02-13 3:46 ` Martin Langhoff
2008-02-13 3:56 ` Junio C Hamano
2008-02-13 11:33 ` Johannes Schindelin
2008-02-13 17:53 ` Martin Langhoff
2008-02-13 4:13 ` Martin Langhoff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).