From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Jakub Narebski <jnareb@gmail.com>, git@vger.kernel.org
Subject: Re: Diff format in packs
Date: Mon, 31 Jul 2006 18:32:21 -0400 [thread overview]
Message-ID: <20060731223221.GB24888@spearce.org> (raw)
In-Reply-To: <9e4733910607311420n8537b76lbde4d60062195403@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> On 7/31/06, Jakub Narebski <jnareb@gmail.com> wrote:
> >Jon Smirl wrote:
> >
> >> I'm trying to build a small app that takes a CVS ,v and writes out a
> >> pack corresponding to the versions. Suggestions on the most efficient
> >> strategy for doing this by calling straight into the git C code?
> >> Forking off git commands is not very efficient when done a million
> >> times.
> >
> >Something akin to parsecvs by Keith Packard?
>
> I see the error in my thoughts now, I need the fully expanded delta to
> compute the sha-1 so I might as well use the parsecvs code.
>
> I am working on combining cvs2svn, parsecvs and cvsps into something
> that can handle Mozilla CVS.
I think you sort of have the right idea. Creating a pack file
from scratch without deltas is a very trivial operation. The pack
format is documented in Documentation/technical/pack-format.txt.
The actual delta format isn't documented here and generating a delta
would be somewhat difficult, but creating a pack with no deltas
and only zlib compression is pretty simple. And no, GIT doesn't
use the same (horrible) delta format as RCS so you definately are
right, you have to expand it before you can compress it.
Creating trees and commits from scratch is also really easy. Calling
zlib and a SHA1 routine to create the checksum is the hard part.
I think I wrote the tree and commit construction part of jgit in
a few hours, and that was while I was also being distracted by
someone speaking in the front of the room. :-)
It should be reasonably simple to extract each revision from a
single ,v file into its full undeltafied form, compute its SHA1,
compress it with zlib, and append it into a pack file. Do that
for every file and toss the SHA1 values, file names and revision
numbers off into a table somewhere.
Then loop back through and generate trees while playing around only
with the RCS file paths, timestamps and SHA1 pointers. Again tree
generation is extremely simple; it would be trivial to generate
tree objects and append them into the same (or another) pack.
Finally writing commit objects pointing at the trees is also easy,
without calling git-commit.
When you are all done run a `git-repack -a -d -f` and let the delta
code compress everything down. That first compression might take
a little while but it should do a reasonably good job despite the
input pack(s) being highly unorganized.
So I think I'm suggesting you find a way to generate the base objects
yourself right into a pack file, rather than using the higher level
GIT executables to do it. You may be able to reuse some of the
code in GIT but I know its writer code is organized for writing
loose objects, not for appending new objects into a new pack file,
so some surgery would probably be required.
--
Shawn.
next prev parent reply other threads:[~2006-07-31 22:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-31 21:08 Diff format in packs Jon Smirl
2006-07-31 21:16 ` Jakub Narebski
2006-07-31 21:20 ` Jon Smirl
2006-07-31 22:32 ` Shawn Pearce [this message]
2006-07-31 23:08 ` Jon Smirl
2006-08-01 0:47 ` Martin Langhoff
2006-08-01 1:03 ` Martin Langhoff
2006-08-01 1:13 ` Jon Smirl
2006-08-01 2:16 ` Martin Langhoff
2006-08-01 2:29 ` Jon Smirl
2006-08-01 2:36 ` Martin Langhoff
2006-08-01 10:59 ` Johannes Schindelin
2006-08-01 12:01 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060731223221.GB24888@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).