* gitpacker progress report and a question
@ 2012-11-15 21:28 Eric S. Raymond
2012-11-15 22:35 ` Max Horn
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-15 21:28 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]
Some days ago I reported that I was attempting to write a tool that could
(a) take a git repo and unpack it into a tarball sequence plus a metadata log,
(b) reverse that operation, packing a tarball and log sequence into a repo.
Thanks in part to advice by Andreas Schwab and in part to looking at the
text of the p4 import script, this effort has succeeded. A proof of
concept is enclosed. It isn't documented yet, and has not been tested
on a repository with branches or merges in the history, but I am confident
that the distance from here to a finished and tested tool is short.
The immediate intended use is for importing older projects that are
available only as sequences of release tarballs, but there are other
sorts of repository surgery that would become easier using it.
I'm still looking for a better name for it and would welcome suggestions.
Before I do much further work, I need to determine how this will be shipped.
I see two possibilities: either I ship it as a small standalone project,
or it becomes a git subcommand shipped with the git suite. How I document
it and set up its tests would differ between these two cases.
Is there a process for submitting new subcommands? What are the
test-suite and documentation requirements?
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
[-- Attachment #2: gitpacker --]
[-- Type: text/plain, Size: 12708 bytes --]
#!/usr/bin/env python
"""
gitpacker - assemble tree sequences into repository histories
Requires git and cpio.
"""
import sys, os, getopt, subprocess, time, tempfile
DEBUG_GENERAL = 1
DEBUG_PROGRESS = 2
DEBUG_COMMANDS = 3
class Fatal(Exception):
"Unrecoverable error."
def __init__(self, msg):
Exception.__init__(self)
self.msg = msg
class Baton:
"Ship progress indications to stdout."
def __init__(self, prompt, endmsg='done', enable=False):
self.prompt = prompt
self.endmsg = endmsg
self.countfmt = None
self.counter = 0
if enable:
self.stream = sys.stdout
else:
self.stream = None
self.count = 0
self.time = 0
def __enter__(self):
if self.stream:
self.stream.write(self.prompt + "...")
if os.isatty(self.stream.fileno()):
self.stream.write(" \010")
self.stream.flush()
self.count = 0
self.time = time.time()
return self
def startcounter(self, countfmt, initial=1):
self.countfmt = countfmt
self.counter = initial
def bumpcounter(self):
if self.stream is None:
return
if os.isatty(self.stream.fileno()):
if self.countfmt:
update = self.countfmt % self.counter
self.stream.write(update + ("\010" * len(update)))
self.stream.flush()
else:
self.twirl()
self.counter = self.counter + 1
def endcounter(self):
if self.stream:
w = len(self.countfmt % self.count)
self.stream.write((" " * w) + ("\010" * w))
self.stream.flush()
self.countfmt = None
def twirl(self, ch=None):
"One twirl of the baton."
if self.stream is None:
return
if os.isatty(self.stream.fileno()):
if ch:
self.stream.write(ch)
self.stream.flush()
return
else:
update = "-/|\\"[self.count % 4]
self.stream.write(update + ("\010" * len(update)))
self.stream.flush()
self.count = self.count + 1
def __exit__(self, extype, value_unused, traceback_unused):
if extype == KeyboardInterrupt:
self.endmsg = "interrupted"
if extype == Fatal:
self.endmsg = "aborted by error"
if self.stream:
self.stream.write("...(%2.2f sec) %s.\n" \
% (time.time() - self.time, self.endmsg))
return False
def do_or_die(dcmd, legend=""):
"Either execute a command or raise a fatal exception."
if legend:
legend = " " + legend
if verbose >= DEBUG_COMMANDS:
sys.stdout.write("executing '%s'%s\n" % (dcmd, legend))
try:
retcode = subprocess.call(dcmd, shell=True)
if retcode < 0:
raise Fatal("child was terminated by signal %d." % -retcode)
elif retcode != 0:
raise Fatal("child returned %d." % retcode)
except (OSError, IOError) as e:
raise Fatal("execution of %s%s failed: %s" % (dcmd, legend, e))
def capture_or_die(dcmd, legend=""):
"Either execute a command and capture its output or die."
if legend:
legend = " " + legend
if verbose >= DEBUG_COMMANDS:
sys.stdout.write("executing '%s'%s\n" % (dcmd, legend))
try:
return subprocess.check_output(dcmd, shell=True)
except subprocess.CalledProcessError as e:
if e.returncode < 0:
raise Fatal("child was terminated by signal %d." % -e.returncode)
elif e.returncode != 0:
sys.stderr.write("gitpacker: child returned %d." % e.returncode)
sys.exit(1)
def git_pack(indir, outdir, quiet=False):
"Pack a tree sequence and associated logfile into a repository"
do_or_die("mkdir %s; git init -q %s" % (outdir, outdir))
logfile = os.path.join(indir, "log")
commit_id = [None]
state = 0
parents = []
comment = committername = authorname = ""
commitdate = authordate = commitemail = authoremail = ""
commitcount = 1;
linecount = 0
with Baton("Packing", enable=not quiet) as baton:
for line in open(logfile):
if verbose > DEBUG_PROGRESS:
print "Looking at: '%s'" % repr(line)
if state == 0:
if line == '\n':
state = 1
else:
try:
space = line.index(' ')
leader = line[:space]
follower = line[space:].strip()
if leader == "commit":
commit = follower
elif leader == "parent":
parents.append(follower)
elif leader not in ("author", "committer"):
raise Fatal("unexpected log attribute at %s" \
% repr(line))
elif leader == "committer":
(committername, committeremail, committerdate) = [x.strip() for x in follower.replace('>','<').split('<')]
elif leader == "author":
(authorname, authoremail, authordate) = [x.strip() for x in follower.replace('>','<').split('<')]
except ValueError:
raise Fatal('"%s", line %d: ill-formed log entry' % (logfile, linecount))
elif state == 1:
if line == ".\n":
if verbose > DEBUG_PROGRESS:
print "Interpretation begins"
os.chdir(outdir)
if commitcount > 1:
do_or_die("rm `git ls-tree --name-only HEAD`")
if verbose > DEBUG_PROGRESS:
print "Copying"
os.chdir("%s/%d" % (indir, commitcount))
do_or_die("find . -print | cpio -pd --quiet %s" % (outdir,))
os.chdir(outdir)
do_or_die("git add -A")
tree_id = capture_or_die("git write-tree").strip()
if verbose > DEBUG_PROGRESS:
print "Tree ID is", tree_id
(_, commentfile) = tempfile.mkstemp()
with open(commentfile, "w") as cfp:
cfp.write(comment)
command = "git commit-tree %s " % tree_id
command += " ".join(map(lambda p: "-p " + commit_id[int(p)],parents))
command += "<'%s'" % commentfile
environment = ""
environment += " GIT_AUTHOR_NAME='%s' " % authorname
environment += " GIT_AUTHOR_EMAIL='%s' " % authoremail
environment += " GIT_AUTHOR_DATE='%s' " % authordate
environment += " GIT_COMMITTER_NAME='%s' " % committername
environment += " GIT_COMMITTER_EMAIL='%s' " % committeremail
environment += " GIT_COMMITTER_DATE='%s' " % committerdate
commit_id.append(capture_or_die(environment + command).strip())
do_or_die("git update-ref HEAD %s" % commit_id[-1])
os.remove(commentfile)
state = 0
parents = []
comment = committername = authorname = ""
committerdate = authordate = committeremail = authoremail = ""
commitcount += 1
baton.twirl()
if maxcommit != 0 and commitcount >= maxcommit:
break
else:
if line.startswith("."):
line = line[1:]
comment += line
def git_unpack(indir, outdir, quiet=False):
"Unpack a repository into a tree sequence and associated logfile."
rawlogfile = os.path.join(outdir, "rawlog")
with Baton("Unpacking", enable=not quiet) as baton:
do_or_die("rm -fr %s; mkdir %s" % (outdir, outdir))
baton.twirl()
do_or_die("cd %s; git log --all --reverse --format=raw >%s" % (indir, rawlogfile))
baton.twirl()
commitcount = 1
commit_map = {}
os.chdir(indir)
try:
for line in open(rawlogfile):
baton.twirl()
if line.startswith("commit "):
commit = line.split()[1]
commit_map[commit] = commitcount
do_or_die("git checkout %s 2>/dev/null; mkdir %s/%d" \
% (commit, outdir, commitcount))
do_or_die("git ls-tree -r --name-only --full-tree %s | cpio -pd --quiet %s/%d"
% (commit, outdir, commitcount))
commitcount += 1
finally:
do_or_die("git reset --hard >/dev/null; git checkout master >/dev/null 2>&1")
cooked = os.path.join(outdir, "log")
body_latch = False
try:
with open(cooked, "w") as wfp:
linecount = 0
for line in open(rawlogfile):
linecount += 1
if line[0].isspace():
if line.startswith(" " * 4):
line = line[4:]
# Old-school byte stuffing.
if line.startswith("."):
line = "." + line
else:
space = line.index(' ')
leader = line[:space]
follower = line[space:].strip()
if leader == "tree":
continue
if leader == "commit" and linecount > 1:
wfp.write(".\n")
# FIXME: Check that log raw emits one parent per line
if leader in ("commit", "parent"):
line = "%s %s\n" % (leader, commit_map[follower])
body_latch = False
elif leader not in ("author", "committer"):
raise Fatal("unexpected log attribute at %s" \
% repr(line))
if line == '\n':
if not body_latch:
body_latch = True
else:
continue
wfp.write(line)
wfp.write(".\n")
except (ValueError, IndexError, KeyError):
raise Fatal("log rewrite failed on %s" % repr(line))
os.remove(rawlogfile)
if __name__ == '__main__':
(options, arguments) = getopt.getopt(sys.argv[1:], "ci:m:o:qxv")
mode = 'auto'
indir = '.'
outdir = None
quiet = False
maxcommit = 0
verbose = 0
for (opt, val) in options:
if opt == '-x':
mode = 'unpack'
elif opt == '-c':
mode = 'pack'
elif opt == '-m':
indir = int(val)
elif opt == '-i':
indir = val
elif opt == '-o':
outdir = val
elif opt == '-q':
quiet = True
elif opt == '-v':
verbose += 1
if not os.path.exists(indir):
sys.stderr.write("gitpacker: input directory %s must exist.\n" % indir)
sys.exit(1)
if mode == 'auto':
if os.path.exists(os.path.join(indir, ".git")):
mode = 'unpack'
else:
mode = 'pack'
assert mode == 'pack' or mode == 'unpack'
if outdir is None:
if mode == 'pack':
outdir = indir + "/packed"
elif mode == 'unpack':
outdir = indir + "/unpacked"
if os.path.exists(outdir):
sys.stderr.write("gitpacker: output directory %s must not exist.\n" % outdir)
sys.exit(1)
indir = os.path.abspath(indir)
outdir = os.path.abspath(outdir)
if verbose >= DEBUG_PROGRESS:
sys.stderr.write("gitpacker: %s from %s to %s.\n" % (mode, indir, outdir))
try:
try:
here = os.getcwd()
if mode == 'pack':
git_pack(indir, outdir, quiet=quiet)
elif mode == 'unpack':
git_unpack(indir, outdir, quiet=quiet)
finally:
os.chdir(here)
except Fatal, e:
sys.stderr.write(e.msg + "\n")
sys.exit(1)
except KeyboardInterrupt:
pass
# end
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-15 21:28 gitpacker progress report and a question Eric S. Raymond
@ 2012-11-15 22:35 ` Max Horn
2012-11-15 23:05 ` Eric S. Raymond
2012-11-16 13:13 ` Andreas Schwab
2012-11-26 20:07 ` Felipe Contreras
2 siblings, 1 reply; 16+ messages in thread
From: Max Horn @ 2012-11-15 22:35 UTC (permalink / raw)
To: esr; +Cc: git
On 15.11.2012, at 22:28, Eric S. Raymond wrote:
> Some days ago I reported that I was attempting to write a tool that could
> (a) take a git repo and unpack it into a tarball sequence plus a metadata log,
> (b) reverse that operation, packing a tarball and log sequence into a repo.
Ah, I could have used such a tool a year or so ago. Sounds useful to me, anyway :)
>
> Thanks in part to advice by Andreas Schwab and in part to looking at the
> text of the p4 import script, this effort has succeeded. A proof of
> concept is enclosed. It isn't documented yet, and has not been tested
> on a repository with branches or merges in the history, but I am confident
> that the distance from here to a finished and tested tool is short.
>
> The immediate intended use is for importing older projects that are
> available only as sequences of release tarballs, but there are other
> sorts of repository surgery that would become easier using it.
>
> I'm still looking for a better name for it and would welcome suggestions.
Isn't "gitar" the kind of natural choice? ;) At least for a stand-alone tool, not for a git subcommand.
Cheers,
Max
>
> Before I do much further work, I need to determine how this will be shipped.
> I see two possibilities: either I ship it as a small standalone project,
> or it becomes a git subcommand shipped with the git suite. How I document
> it and set up its tests would differ between these two cases.
>
> Is there a process for submitting new subcommands? What are the
> test-suite and documentation requirements?
> --
> <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
> <gitpacker.txt>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-15 22:35 ` Max Horn
@ 2012-11-15 23:05 ` Eric S. Raymond
0 siblings, 0 replies; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-15 23:05 UTC (permalink / raw)
To: Max Horn; +Cc: git
Max Horn <postbox@quendi.de>:
> > I'm still looking for a better name for it and would welcome suggestions.
>
> Isn't "gitar" the kind of natural choice? ;) At least for a stand-alone tool, not for a git subcommand.
I just renamed it git-weave. I keep talking about tarballs because I keep
thinking about using it archeologically on projects that only exist as
tarball sequences, but the tool actually oacks and unpacks *file tree*
sequences.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-15 21:28 gitpacker progress report and a question Eric S. Raymond
2012-11-15 22:35 ` Max Horn
@ 2012-11-16 13:13 ` Andreas Schwab
2012-11-26 20:07 ` Felipe Contreras
2 siblings, 0 replies; 16+ messages in thread
From: Andreas Schwab @ 2012-11-16 13:13 UTC (permalink / raw)
To: esr; +Cc: git
"Eric S. Raymond" <esr@thyrsus.com> writes:
> if commitcount > 1:
> do_or_die("rm `git ls-tree --name-only HEAD`")
This will fail on file names containing whitespace or glob meta
characters. Better use "git rm -qr ." here. You don't have to care
about the index since you are doing "git add -A" later anyway.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-15 21:28 gitpacker progress report and a question Eric S. Raymond
2012-11-15 22:35 ` Max Horn
2012-11-16 13:13 ` Andreas Schwab
@ 2012-11-26 20:07 ` Felipe Contreras
2012-11-26 22:01 ` Eric S. Raymond
2 siblings, 1 reply; 16+ messages in thread
From: Felipe Contreras @ 2012-11-26 20:07 UTC (permalink / raw)
To: esr; +Cc: git
On Thu, Nov 15, 2012 at 10:28 PM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Some days ago I reported that I was attempting to write a tool that could
> (a) take a git repo and unpack it into a tarball sequence plus a metadata log,
> (b) reverse that operation, packing a tarball and log sequence into a repo.
>
> Thanks in part to advice by Andreas Schwab and in part to looking at the
> text of the p4 import script, this effort has succeeded. A proof of
> concept is enclosed. It isn't documented yet, and has not been tested
> on a repository with branches or merges in the history, but I am confident
> that the distance from here to a finished and tested tool is short.
>
> The immediate intended use is for importing older projects that are
> available only as sequences of release tarballs, but there are other
> sorts of repository surgery that would become easier using it.
>
> I'm still looking for a better name for it and would welcome suggestions.
>
> Before I do much further work, I need to determine how this will be shipped.
> I see two possibilities: either I ship it as a small standalone project,
> or it becomes a git subcommand shipped with the git suite. How I document
> it and set up its tests would differ between these two cases.
Please look at Documentation/SubmittingPatches, you should send
patches in inline format, preferably with 'git format-patch -M', and
preferably with 'git send-email' (in which case you don't need
format-patch), otherwise people will have trouble reviewing, or miss
it completely (as it was the case for me).
I have many comments, but I'll wait until you send the patch inlined,
I'll just address these:
1) I tried it, and it doesn't seem to import (pack?) are repository
with sub-directories in it
2) Using 'git fast-import' is probably simpler, and more efficient
Here is a proof of concept I wrote in ruby that is half the size, and
seems to implement the same functionality. The format is exactly the
same, but I think it should be modified to be more efficient.
Cheers.
>From eb3c34699d7f5d4eec4f088344659b8d9b6a07ea Mon Sep 17 00:00:00 2001
From: Felipe Contreras <felipe.contreras@gmail.com>
Date: Mon, 26 Nov 2012 20:48:38 +0100
Subject: [PATCH] Add new git-weave tool
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
contrib/weave/git-weave | 166 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 166 insertions(+)
create mode 100755 contrib/weave/git-weave
diff --git a/contrib/weave/git-weave b/contrib/weave/git-weave
new file mode 100755
index 0000000..3106121
--- /dev/null
+++ b/contrib/weave/git-weave
@@ -0,0 +1,166 @@
+#!/usr/bin/env ruby
+
+require 'optparse'
+require 'find'
+require 'fileutils'
+
+def export(indir = '.', out = STDOUT)
+ open(File.join(indir, 'log')).each("\n.\n") do |data|
+
+ @msg = nil
+ @parents = []
+
+ data.chomp(".\n").each_line do |l|
+ if not @msg
+ case l
+ when /^commit (.+)$/
+ @id = $1
+ when /^author (.+)$/
+ @author = $1
+ when /^committer (.+)$/
+ @committer = $1
+ when /^parent (.+)$/
+ @parents << $1
+ when /^$/
+ @msg = ""
+ end
+ else
+ @msg << l
+ end
+ end
+
+ out.puts "commit refs/heads/master"
+ out.puts "mark :#{@id}"
+ out.puts "author #{@author}"
+ out.puts "committer #{@committer}"
+ out.puts "data #{@msg.bytesize}"
+ out.puts @msg
+
+ @parents.each_with_index do |p, i|
+ if i == 0
+ out.puts "from :%u" % p
+ else
+ out.puts "merge :%u" % p
+ end
+ end
+
+ # files
+ out.puts 'deleteall'
+ FileUtils.cd(File.join(indir, @id)) do
+ Find.find('.') do |e|
+ next unless File.file?(e)
+ content = File.read(e)
+ filename = e.split(File::SEPARATOR).slice(1..-1).join(File::SEPARATOR)
+ mode = File.executable?(e) ? '100755' : '100644'
+ if File.symlink?(e)
+ mode = '120000'
+ content = File.readlink(e)
+ end
+ out.puts 'M %s inline %s' % [mode, filename]
+ out.puts "data #{content.bytesize}"
+ out.puts content
+ end
+ end
+
+ end
+end
+
+def import(outdir, out)
+ format = 'format:commit %H%nauthor %an <%ae> %ad%ncommitter %cn
<%ce> %cd%nparents %P%n%n%B'
+ cmd = ['git', 'log', '-z', '-s', '--date=raw', '--format=%s' %
format, '--all', '--reverse']
+ commits = {}
+
+ IO.popen(cmd).each_with_index("\0") do |data, i|
+ @msg = nil
+ @parents = []
+ data.chomp("\0").each_line do |l|
+ if not @msg
+ case l
+ when /^commit (.+)$/
+ @id = $1
+ when /^author (.+)$/
+ @author = $1
+ when /^committer (.+)$/
+ @committer = $1
+ when /^parents (.+)$/
+ @parents = $1.split(" ")
+ when /^$/
+ @msg = ""
+ end
+ else
+ @msg << l
+ end
+ end
+
+ num = i + 1
+ commits[@id] = num
+
+ out.puts "commit #{num}"
+ @parents.each do |p|
+ out.puts "parent #{commits[p]}"
+ end
+ out.puts "author #{@author}"
+ out.puts "committer #{@committer}"
+ out.puts
+ out.puts @msg.gsub(/\n\n+/, "\n") # why?
+ out.puts "."
+
+ wd = File.join(outdir, num.to_s)
+ FileUtils.mkdir_p(wd)
+ system('git', '--work-tree', wd, 'checkout', '-f', '-q', @id)
+ end
+end
+
+def git_pack(indir, outdir)
+ indir = File.absolute_path(indir)
+ system('git', 'init', outdir)
+ FileUtils.cd(outdir) do
+ IO.popen(['git', 'fast-import'], 'w') do |io|
+ export(indir, io)
+ end
+ system('git', 'reset', '--hard')
+ end
+end
+
+def git_unpack(indir, outdir)
+ begin
+ FileUtils.mkdir_p(outdir)
+ log = File.open(File.join(outdir, 'log'), 'w')
+ ENV['GIT_DIR'] = File.join(indir, '.git')
+ import(outdir, log)
+ ensure
+ system('git', 'symbolic-ref', 'HEAD', 'refs/heads/master')
+ ENV.delete('GIT_DIR')
+ log.close if log
+ end
+end
+
+$indir = '.'
+
+begin
+ OptionParser.new do |opts|
+ opts.on('-x') do
+ $mode = 'unpack'
+ end
+ opts.on('-c') do
+ $mode = 'pack'
+ end
+ opts.on('-o', '--outdir DIR') do |v|
+ $outdir = v
+ end
+ opts.on('-i', '--indir DIR') do |v|
+ $indir = v
+ end
+ end.parse!
+rescue OptionParser::InvalidOption
+end
+
+$mode = File.exists?(File.join($indir, '.git')) ? 'unpack' : 'pack'
unless $mode
+$outdir = File.join($indir, $mode == 'pack' ? 'packed' : 'unpacked2')
unless $outdir
+
+case $mode
+when 'pack'
+ git_pack($indir, $outdir)
+when 'unpack'
+ git_unpack($indir, $outdir)
+end
--
1.8.0
--
Felipe Contreras
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-26 20:07 ` Felipe Contreras
@ 2012-11-26 22:01 ` Eric S. Raymond
2012-11-26 23:14 ` Felipe Contreras
0 siblings, 1 reply; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-26 22:01 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
Felipe Contreras <felipe.contreras@gmail.com>:
> 1) I tried it, and it doesn't seem to import (pack?) are repository
> with sub-directories in it
I'll make sure my regression test checks this case. The options to git
ls-files are a bit confusing and it's possible my invocation of it
needs to change.
> 2) Using 'git fast-import' is probably simpler, and more efficient
That might well be. I'm not worried about "efficiency" in this context
but reducing the code size is significant and I'm willing to re-code
to do that.
> Here is a proof of concept I wrote in ruby that is half the size, and
> seems to implement the same functionality.
Not anywhere near the same. It only handles commits, not tags. It
doesn't issue delete ops. And it doesn't rebuild branch heads.
If I were willing to omit those features, I'm sure I could halve
the size of my implementation, too. Of course, it would then be
almost completely useless...
> The format is exactly the
> same, but I think it should be modified to be more efficient.
I'm not wedded to the log format as it is, so I'll cheerfully
take suggestions about it.
Be aware, however, that I consider easy editability by human beings
much more important than squeezing the last microsecond out of the
processing time. So, for example, I won't use data byte counts rather
than end delimiters, the way import streams do.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-26 22:01 ` Eric S. Raymond
@ 2012-11-26 23:14 ` Felipe Contreras
2012-11-26 23:43 ` Eric S. Raymond
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Contreras @ 2012-11-26 23:14 UTC (permalink / raw)
To: esr; +Cc: git
On Mon, Nov 26, 2012 at 11:01 PM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com>:
>> 1) I tried it, and it doesn't seem to import (pack?) are repository
>> with sub-directories in it
>
> I'll make sure my regression test checks this case. The options to git
> ls-files are a bit confusing and it's possible my invocation of it
> needs to change.
Might be easier to just call 'git ls-files --with-three foo', but I
don't see the point of those calls:
% git --work-tree=unpacked/1 checkout master
% git --work-tree=unpacked/1 add -A
Should work just fine.
>> 2) Using 'git fast-import' is probably simpler, and more efficient
>
> That might well be. I'm not worried about "efficiency" in this context
> but reducing the code size is significant and I'm willing to re-code
> to do that.
I don't see how the code-size would increase dramatically.
>> Here is a proof of concept I wrote in ruby that is half the size, and
>> seems to implement the same functionality.
>
> Not anywhere near the same. It only handles commits, not tags.
The attached code doesn't handle tags either.
> It doesn't issue delete ops.
What do you mean?
out.puts 'deleteall' <- All current files are removed
And then added.
> And it doesn't rebuild branch heads.
What do you mean? Your code only exports a single branch, the branch
that is currently checked out. And then:
git reset --hard >/dev/null; git checkout master >/dev/null 2>&1
It's resuming to 'master', which might not be the branch the user had
checkout out, and might not even exist.
> If I were willing to omit those features, I'm sure I could halve
> the size of my implementation, too. Of course, it would then be
> almost completely useless...
That's what the code currently does.
Do you want me to show you step by step how they do *exactly the
same*? Of course, I would need to fix your version first so that it
doesn't crash with sub-directories.
>> The format is exactly the
>> same, but I think it should be modified to be more efficient.
>
> I'm not wedded to the log format as it is, so I'll cheerfully
> take suggestions about it.
>
> Be aware, however, that I consider easy editability by human beings
> much more important than squeezing the last microsecond out of the
> processing time. So, for example, I won't use data byte counts rather
> than end delimiters, the way import streams do.
Well, if there's a line with a single dot in the commit message ('.'),
things would go very bad.
Personally I would prefer something like this:
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
tag v0.3 gst-av-0.3.tar "Release 0.3"
And the script in bash would be very simple:
#!/bin/sh
tag() {
d=`mktemp -d` &&
(
cd $d &&
tar -xf "$orig/$2" &&
cd * &&
git add --all &&
git commit -q -m "$3" &&
git tag $1) || error=1
rm -rf $d
test -n "$error" && exit -1
}
orig="$PWD"
repo="$1"
git init -q $repo
export GIT_DIR="$orig/$repo/.git"
source "$orig/$2"
cd "$orig/$repo" && git reset -q --hard
--
Felipe Contreras
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-26 23:14 ` Felipe Contreras
@ 2012-11-26 23:43 ` Eric S. Raymond
2012-11-27 1:29 ` Felipe Contreras
2012-11-27 6:29 ` Felipe Contreras
0 siblings, 2 replies; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-26 23:43 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
Felipe Contreras <felipe.contreras@gmail.com>:
> Might be easier to just call 'git ls-files --with-three foo', but I
> don't see the point of those calls:
Ah, much is now explained. You were looking at an old version. I had
in fact already fixed the subdirectories bug (I've updated my
regression test to check) and have full support for branchy repos,
preserving tags and branch heads.
> > It doesn't issue delete ops.
>
> What do you mean?
>
> out.puts 'deleteall' <- All current files are removed
Yours emits no D ops for files removed after a particular snapshot.
> > Be aware, however, that I consider easy editability by human beings
> > much more important than squeezing the last microsecond out of the
> > processing time. So, for example, I won't use data byte counts rather
> > than end delimiters, the way import streams do.
>
> Well, if there's a line with a single dot in the commit message ('.'),
> things would go very bad.
Apparently you missed the part where I byte-stuffed the message content.
It's a technique used in a lot of old-school Internet protocols, notably
in SMTP.
> Personally I would prefer something like this:
There's a certain elegance to that, but it would be hard to generate by hand.
Remember that a major use case for this tool is making repositories
from projects whose back history exists only as tarballs. So, let's
say you have the following:
foo-1.1.tar.gz
foo-1.2.tar.gz
foo-1.3.tar.gz
What you're going to do before weaving is drop the untarred file trees
in a 'foo' scratch directory, then hand-craft a log file that might
look a bit like this:
-----------------------------------
commit 1
directory foo-1.1
Release 1.1 of project foo
.
commit 2
directory foo-1.2
..This is an example of a byte-stuffed line.
Release 1.2 of project foo
.
commit 3
directory foo-1.3
Release 1.3 of project foo
.
-----------------------------------
The main objective of the logfile design is to make hand-crafting
these easy.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-26 23:43 ` Eric S. Raymond
@ 2012-11-27 1:29 ` Felipe Contreras
2012-11-27 1:38 ` Felipe Contreras
2012-11-27 6:29 ` Felipe Contreras
1 sibling, 1 reply; 16+ messages in thread
From: Felipe Contreras @ 2012-11-27 1:29 UTC (permalink / raw)
To: esr; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 5609 bytes --]
On Tue, Nov 27, 2012 at 12:43 AM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com>:
>> Might be easier to just call 'git ls-files --with-three foo', but I
>> don't see the point of those calls:
>
> Ah, much is now explained. You were looking at an old version. I had
> in fact already fixed the subdirectories bug (I've updated my
> regression test to check) and have full support for branchy repos,
> preserving tags and branch heads.
So you are criticizing my code saying "it would then be almost
completely useless...", when this is in fact what you sent to the
list.
For the record, here is the output of a test with your script vs.
mine: the output is *exactly the same*:
---
== log ==
* afcbedc (tag: v0.2, master) bump
| * cbd2dce (devel) dev
|/
* 46f1813 (HEAD, test) remove
* df95e41 dot .
* ede0876 with
* d6f10fc extra
* e6362b1 (tag: v0.1) one
== files ==
file
== spaces ==
with
spaces
== dot ==
dot
.
== orig ref ==
refs/heads/test
== script ==
bc9a7d99132f97adeb5d2ca266bd3d8bc64ccb21 /home/felipec/Downloads/gitpacker.txt
Unpacking......(0.13 sec) done.
Packing......(0.28 sec) done.
== log ==
* 5d0b634 (HEAD, master) bump
* 2fe4a6d remove
* 0c27d3b dot .
* 5e36d3f with spaces
* d6f10fc extra
* e6362b1 one
== files ==
file
== spaces ==
with
spaces
== dot ==
dot
.
== orig ref ==
refs/heads/master
== script ==
33edcb28667b683fbb5f8782383f782f73c5e9e1 /home/felipec/bin/git-weave
== log ==
* afcbedc (HEAD, master) bump
* 46f1813 remove
* df95e41 dot .
* ede0876 with
* d6f10fc extra
* e6362b1 one
== files ==
file
== spaces ==
with
spaces
== dot ==
dot
.
== orig ref ==
refs/heads/test
---
Unfortunately, when I enable some testing stuff, this is what your
script throws:
---
== script ==
bc9a7d99132f97adeb5d2ca266bd3d8bc64ccb21 /home/felipec/Downloads/gitpacker.txt
Unpacking......(0.17 sec) done.
Packing......(0.02 sec) done.
Traceback (most recent call last):
File "/home/felipec/Downloads/gitpacker.txt", line 308, in <module>
git_pack(indir, outdir, quiet=quiet)
File "/home/felipec/Downloads/gitpacker.txt", line 171, in git_pack
command += " ".join(map(lambda p: "-p " + commit_id[int(p)],parents))
File "/home/felipec/Downloads/gitpacker.txt", line 171, in <lambda>
command += " ".join(map(lambda p: "-p " + commit_id[int(p)],parents))
IndexError: list index out of range
== log ==
fatal: bad default revision 'HEAD'
== files ==
fatal: tree-ish master not found.
== spaces ==
fatal: ambiguous argument ':/with': unknown revision or path not in
the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
== dot ==
fatal: ambiguous argument ':/dot': unknown revision or path not in the
working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
== orig ref ==
refs/heads/master
---
I'm attaching it in case you are interested.
Anyway, I can add support for branches and tags in no time, but I
wonder what's the point. Who will take so much time and effort to
generate all the branches and tags, and the log file?
If the goal is as you say "importing older projects that are available
only as sequences of release tarballs", then that code is overkill,
and it's not even making it easier to import the tarballs.
For that case my proposed format:
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
tag v0.3 gst-av-0.3.tar "Release 0.3"
Would be much more suitable.
>> > It doesn't issue delete ops.
>>
>> What do you mean?
>>
>> out.puts 'deleteall' <- All current files are removed
>
> Yours emits no D ops for files removed after a particular snapshot.
man git fast-import
---
This command is extremely useful if the frontend does not know (or
does not care to know) what files are currently on the branch, and
therefore cannot generate the proper filedelete commands to update the
content.
---
Why would I want to emit D operations, again, deleteall takes care of that.
>> > Be aware, however, that I consider easy editability by human beings
>> > much more important than squeezing the last microsecond out of the
>> > processing time. So, for example, I won't use data byte counts rather
>> > than end delimiters, the way import streams do.
>>
>> Well, if there's a line with a single dot in the commit message ('.'),
>> things would go very bad.
>
> Apparently you missed the part where I byte-stuffed the message content.
> It's a technique used in a lot of old-school Internet protocols, notably
> in SMTP.
You might have done that, but the user that generated the log file
might have not.
>> Personally I would prefer something like this:
>
> There's a certain elegance to that, but it would be hard to generate by hand.
You think this is hard to generate by hand:
---
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
tag v0.3 gst-av-0.3.tar "Release 0.3"
---
Than this?
---
commit 1
directory gst-av-0.1
Release 0.1
.
commit 2
directory gst-av-0.2
Release 0.2
.
commit 3
directory gst-av-0.3
Release 0.3
.
---
After of course, extracting the tarballs, which my script already does
automatically.
> Remember that a major use case for this tool is making repositories
> from projects whose back history exists only as tarballs.
Which is exactly what my script does, except even easier, because it
extracts the tarballs automatically.
> The main objective of the logfile design is to make hand-crafting
> these easy.
What does the above log file achieve, that my log file doesn't?
--
Felipe Contreras
[-- Attachment #2: test-gitpacker --]
[-- Type: application/octet-stream, Size: 2534 bytes --]
#!/bin/sh
rm -rf test test-unpacked* test-new*
test_date=1
test_subdir=1
test_tick () {
if test -z "${test_tick+set}"
then
test_tick=1112911993
else
test "$test_date" -eq 1 || \
test_tick=$(($test_tick + 60))
fi
GIT_COMMITTER_DATE="$test_tick -0700"
GIT_AUTHOR_DATE="$test_tick -0700"
export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
}
(
git init -q test
cd test
echo one > file
git add file
test_tick
git commit -q -m one
git tag v0.1
echo extra > extra
git add extra
test_tick
git commit -q -m extra
echo spaces >> file
test_tick
git commit -q -a -m "$(echo -e "with\n\nspaces")"
echo dot >> file
test_tick
git commit -q -a -m "$(echo -e "dot\n.\n")"
if test "$test_subdir" -eq 1
then
mkdir subdir
echo subdir > subdir/file
git add subdir/file
test_tick
git commit -q -m dir
echo subdir2 >> file
test_tick
git commit -q -a -m subdir2
fi
git rm -q extra
test_tick
git commit -q -m remove
git checkout -q -b devel
echo dev >> file
test_tick
git commit -q -a -m dev
git checkout -q master
echo bump >> file
test_tick
git commit -q -a -m bump
git tag v0.2
git checkout -q -b test master^
echo "== log =="
git log --oneline --graph --decorate --all
echo "== files =="
git ls-files --with-tree master
echo "== spaces =="
git show --quiet --format='%B' :/with
echo "== dot =="
git show --quiet --format='%B' :/dot
)
echo "== orig ref =="
git --git-dir=test/.git symbolic-ref HEAD
git --git-dir=test/.git symbolic-ref HEAD refs/heads/test
script="/home/felipec/Downloads/gitpacker.txt"
echo
echo "== script =="
sha1sum $script
$PYTHON_PATH $script -x -i test -o test-unpacked-1
$PYTHON_PATH $script -c -i test-unpacked-1 -o test-new-1
(
cd test-new-1
echo "== log =="
git log --oneline --graph --decorate --all
echo "== files =="
git ls-files --with-tree master
echo "== spaces =="
git show --quiet --format='%B' :/with
echo "== dot =="
git show --quiet --format='%B' :/dot
)
echo "== orig ref =="
git --git-dir=test/.git symbolic-ref HEAD
git --git-dir=test/.git symbolic-ref HEAD refs/heads/test
script="$HOME/bin/git-weave"
echo
echo "== script =="
sha1sum $script
$script -x -i test -o test-unpacked-2
$script -c -i test-unpacked-2 -o test-new-2
(
cd test-new-2
echo "== log =="
git log --oneline --graph --decorate --all
echo "== files =="
git ls-files --with-tree master
echo "== spaces =="
git show --quiet --format='%B' :/with
echo "== dot =="
git show --quiet --format='%B' :/dot
)
echo "== orig ref =="
git --git-dir=test/.git symbolic-ref HEAD
git --git-dir=test/.git symbolic-ref HEAD refs/heads/test
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 1:29 ` Felipe Contreras
@ 2012-11-27 1:38 ` Felipe Contreras
0 siblings, 0 replies; 16+ messages in thread
From: Felipe Contreras @ 2012-11-27 1:38 UTC (permalink / raw)
To: esr; +Cc: git
On Tue, Nov 27, 2012 at 2:29 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
Actually no, they are not exactly the same, your version has a bug
when dealing with spaces in a commit message (which pretty much all
proper multi-line commit messages have).
> == spaces ==
> with
> spaces
>
Mine doesn't:
> == spaces ==
> with
>
> spaces
>
--
Felipe Contreras
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-26 23:43 ` Eric S. Raymond
2012-11-27 1:29 ` Felipe Contreras
@ 2012-11-27 6:29 ` Felipe Contreras
2012-11-27 7:27 ` Eric S. Raymond
2012-11-27 7:30 ` Eric S. Raymond
1 sibling, 2 replies; 16+ messages in thread
From: Felipe Contreras @ 2012-11-27 6:29 UTC (permalink / raw)
To: esr; +Cc: git
On Tue, Nov 27, 2012 at 12:43 AM, Eric S. Raymond <esr@thyrsus.com> wrote:
> -----------------------------------
> commit 1
> directory foo-1.1
>
> Release 1.1 of project foo
> .
> commit 2
> directory foo-1.2
>
> ..This is an example of a byte-stuffed line.
>
> Release 1.2 of project foo
> .
> commit 3
> directory foo-1.3
>
> Release 1.3 of project foo
> .
> -----------------------------------
>
> The main objective of the logfile design is to make hand-crafting
> these easy.
Here's another version with YAML:
---
-
author: &me Felipe Contreras <felipe.contreras@gmail.com>
date: 2011-1-1
msg: one
- tag v0.1
-
author: *me
date: 2011-1-2
msg: extra
-
author: *me
date: 2011-1-3
msg: |
with
spaces
-
author: *me
date: 2011-1-4
msg: |
dot
.
-
author: *me
date: 2011-1-5
msg: remove
ref: remove
- checkout devel
-
author: *me
date: 2011-1-6
msg: dev
- checkout master
-
author: *me
date: 2011-1-7
msg: bump
- tag v0.2
- checkout test remove
---
I believe that log file is much more human readable. Yet I still fail
to see why would anybody want so much detail only to import tarballs.
diff --git a/contrib/weave/git-weave b/contrib/weave/git-weave
new file mode 100755
index 0000000..646aeaa
--- /dev/null
+++ b/contrib/weave/git-weave
@@ -0,0 +1,234 @@
+#!/usr/bin/env ruby
+
+require 'optparse'
+require 'find'
+require 'fileutils'
+require 'yaml'
+
+$last = nil
+$branches = {}
+$branch = 'master'
+$refs = {}
+
+class Commit
+
+ attr_reader :id, :parents, :author, :committer, :date, :msg, :ref
+
+ @@num = 0
+
+ def initialize(args)
+ @id = @@num += 1
+ @parents = []
+ args.each do |key, value|
+ instance_variable_set("@#{key}", value)
+ end
+ if @author =~ /(.+ <.+>) (.+)/
+ @author = $1
+ end
+ if @committer =~ /(.+ <.+>) (.+)/
+ @committer = $1
+ @date = DateTime.strptime($2, '%s %z')
+ end
+ $refs[@ref] = @id if @ref
+ end
+
+end
+
+def export_commit(cmd, indir, out)
+
+ c = Commit.new(cmd)
+ $last = c.id
+
+ # commit
+ out.puts 'commit refs/heads/%s' % $branch
+ out.puts 'mark :%u' % c.id
+ if c.author and c.committer
+ out.puts 'author %s %s' % [c.author, c.date.strftime('%s %z')]
+ out.puts 'committer %s %s' % [c.committer, c.date.strftime('%s %z')]
+ else
+ out.puts 'committer %s %s' % [c.author, c.date.strftime('%s %z')]
+ end
+ out.puts 'data %u' % c.msg.bytesize
+ out.puts c.msg
+
+ # parents
+ c.parents.each_with_index do |p, i|
+ ref = $refs[p]
+ if i == 0
+ out.puts 'from :%u' % ref
+ else
+ out.puts 'merge :%u' % ref
+ end
+ end
+
+ # files
+ out.puts 'deleteall'
+ FileUtils.cd(File.join(indir, c.id.to_s)) do
+ Find.find('.') do |e|
+ next unless File.file?(e)
+ content = File.read(e)
+ filename = e.split(File::SEPARATOR).slice(1..-1).join(File::SEPARATOR)
+ if File.symlink?(e)
+ mode = '120000'
+ content = File.readlink(e)
+ else
+ mode = File.executable?(e) ? '100755' : '100644'
+ end
+ out.puts 'M %s inline %s' % [mode, filename]
+ out.puts 'data %u' % content.bytesize
+ out.puts content
+ end
+ end
+ out.puts
+
+end
+
+def do_reset(out, ref, from)
+ out.puts "reset %s" % ref
+ out.puts "from :%u" % from
+ out.puts
+end
+
+def export_reset(cmd, indir, out)
+ _, ref, from = cmd.split
+ do_reset(out, ref, from)
+end
+
+def export_checkout(cmd, indir, out)
+ _, $branch, from = cmd.split
+ from = ':%u' % $last if not $branches[$branch]
+ do_reset(out, 'refs/heads/%s' % $branch, from) if from
+ $branches[$branch] = true
+end
+
+def export_tag(cmd, indir, out)
+ _, tag = cmd.split
+ do_reset(out, 'refs/tags/%s' % tag, $last)
+end
+
+def export(indir = '.', out = STDOUT)
+
+ $branches['master'] = true
+
+ YAML.load_file(File.join(indir, 'log')).each do |e|
+ case e
+ when Hash
+ export_commit(e, indir, out)
+ when /^checkout /
+ export_checkout(e, indir, out)
+ when /^tag /
+ export_tag(e, indir, out)
+ when /^reset /
+ export_reset(e, indir, out)
+ end
+ end
+
+end
+
+def import(outdir, out)
+ format = 'format:commit %H%nauthor %an <%ae> %ad%ncommitter %cn
<%ce> %cd%nparents %P%n%n%B'
+ cmd = ['git', 'log', '-z', '-s', '--date=raw', '--format=%s' %
format, '--reverse', '--all']
+ commits = {}
+
+ cmds = []
+
+ IO.popen(cmd).each_with_index("\0") do |data, i|
+ @msg = nil
+ @parents = []
+ data.chomp("\0").each_line do |l|
+ if not @msg
+ case l
+ when /^commit (.+)$/
+ @id = $1
+ when /^author (.+)$/
+ @author = $1
+ when /^committer (.+)$/
+ @committer = $1
+ when /^parents (.+)$/
+ @parents = $1.split(" ")
+ when /^$/
+ @msg = ""
+ end
+ else
+ @msg << l
+ end
+ end
+
+ num = i + 1
+ commits[@id] = num
+
+ cmds << {
+ :author => @author,
+ :committer => @committer,
+ :msg => @msg,
+ :ref => num,
+ :parents => @parents.map { |e| commits[e] },
+ }
+
+ wd = File.join(outdir, num.to_s)
+ FileUtils.mkdir_p(wd)
+ system('git', '--work-tree', wd, 'checkout', '-f', '-q', @id)
+ end
+
+ IO.popen(['git', 'show-ref', '--tags', '--heads']).each do |e|
+ id, ref = e.chomp.split
+ cmds << 'reset %s %s' % [ref, commits[id]]
+ end
+
+ out.write(cmds.to_yaml)
+end
+
+def git_pack(indir, outdir)
+ indir = File.absolute_path(indir)
+ system('git', 'init', '--quiet', outdir)
+ FileUtils.cd(outdir) do
+ IO.popen(['git', 'fast-import', '--quiet'], 'w') do |io|
+ export(indir, io)
+ end
+ system('git', 'reset', '--quiet', '--hard')
+ end
+end
+
+def git_unpack(indir, outdir)
+ begin
+ FileUtils.mkdir_p(outdir)
+ log = File.open(File.join(outdir, 'log'), 'w')
+ ENV['GIT_DIR'] = File.join(indir, '.git')
+ oldref = %x[git symbolic-ref HEAD]
+ import(outdir, log)
+ ensure
+ system('git', 'symbolic-ref', 'HEAD', oldref) if oldref
+ ENV.delete('GIT_DIR')
+ log.close if log
+ end
+end
+
+$indir = '.'
+
+begin
+ OptionParser.new do |opts|
+ opts.on('-x') do
+ $mode = 'unpack'
+ end
+ opts.on('-c') do
+ $mode = 'pack'
+ end
+ opts.on('-o', '--outdir DIR') do |v|
+ $outdir = v
+ end
+ opts.on('-i', '--indir DIR') do |v|
+ $indir = v
+ end
+ end.parse!
+rescue OptionParser::InvalidOption
+end
+
+$mode = File.exists?(File.join($indir, '.git')) ? 'unpack' : 'pack'
unless $mode
+$outdir = File.join($indir, $mode == 'pack' ? 'packed' : 'unpacked2')
unless $outdir
+
+case $mode
+when 'pack'
+ git_pack($indir, $outdir)
+when 'unpack'
+ git_unpack($indir, $outdir)
+end
--
Felipe Contreras
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 6:29 ` Felipe Contreras
@ 2012-11-27 7:27 ` Eric S. Raymond
2012-11-27 8:20 ` Felipe Contreras
2012-11-27 7:30 ` Eric S. Raymond
1 sibling, 1 reply; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-27 7:27 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
Felipe Contreras <felipe.contreras@gmail.com>:
> I believe that log file is much more human readable. Yet I still fail
> to see why would anybody want so much detail only to import tarballs.
The first time I needed such a tool (and I really should have built it then)
was during the events I wrote up in 2010 the INTERCAL Reconstruction Massacree;
full story at <http://esr.ibiblio.org/?p=2491> Note in particular the
following paragraphs:
Reconstructing the history of C-INTERCAL turned out to be something of
an epic in itself. 1990 was back in the Dark Ages as far as version
control and release-management practices go; our tools were
paleolithic and our procedures likewise. The earliest versions of
C-INTERCAL were so old that even CVS wasn’t generally available yet
(CVS 1.0 didn’t even ship until six months after C-INTERCAL 0.3, my
first public release). SCCS had existed since the early 1980s but was
proprietary; the only game in town was RCS. Primitive, file-oriented
RCS.
I was a very early adopter of version control; when I wrote
Emacs’s VC mode in 1992 the idea of integrating version control
into normal workflow that closely was way out in front of current
practice. Today’s routine use of such tools wasn’t even a gleam in
anyone’s eye then, if only because disks were orders of magnitude
smaller and there was a lot of implied pressure to actually throw
away old versions of stuff. So I only RCSed some of the files in
the project at the time, and didn’t think much about that.
As a result, reconstructing C-INTERCAL’s history turned into about two
weeks of work. A good deal of it was painstaking digital archeology,
digging into obscure corners of the net for ancient release tarballs
Alex and I didn’t have on hand any more. I ended up stitching together
material from 18 different release tarballs, 11 unreleased snapshot
tarballs, one release tarball I could reconstruct, one release tarball
mined out of an obsolete Red Hat source RPM, two shar archives, a pax
archive, five published patches, two zip files, a darcs archive, and
my partial RCS history, and that’s before we got to the aerial
photography. To perform the surgery needed to integrate this, I wrote
a custom Python program assisted by two shellscripts, topping out at a
hair over 1200 lines of code.
The second time was much more recent and concerned a project called
(seriously) robotfindskitten. This code existed as a partial CVS
repository created by someone other than the original author,
and some disconnected tarballs from before the repo. The author
has requested that I knit the tarballs and the CVS history (which
is now in git) into one repository.
In both cases the object was to assemble a coherent history
from all the available metadata as if the projects had been using
version control all along.
I know of at least one other group of disconnected tarballs, of a
program called xlife, that is likely to need similar treatment. It's
not an uncommon situation for projects over a certain age, and there is
lots of code like xlife dating from before the mid-1990s waiting for
someone to pick up the pieces.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 6:29 ` Felipe Contreras
2012-11-27 7:27 ` Eric S. Raymond
@ 2012-11-27 7:30 ` Eric S. Raymond
1 sibling, 0 replies; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-27 7:30 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
Felipe Contreras <felipe.contreras@gmail.com>:
> > The main objective of the logfile design is to make hand-crafting
> > these easy.
>
> Here's another version with YAML:
Clever.
Now I have to decide if I should allow my aesthetic dislike of YAML to
prevail despite the fact that it's pretty well suited to this job. There
is definitely a case for applying a standard metaprotocol like YAML (ugh)
or XML (double ugh).
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 7:27 ` Eric S. Raymond
@ 2012-11-27 8:20 ` Felipe Contreras
2012-11-27 8:36 ` Eric S. Raymond
0 siblings, 1 reply; 16+ messages in thread
From: Felipe Contreras @ 2012-11-27 8:20 UTC (permalink / raw)
To: esr; +Cc: git
On Tue, Nov 27, 2012 at 8:27 AM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com>:
>> I believe that log file is much more human readable. Yet I still fail
>> to see why would anybody want so much detail only to import tarballs.
> In both cases the object was to assemble a coherent history
> from all the available metadata as if the projects had been using
> version control all along.
I didn't say I couldn't see why somebody would need such a tool, I
said I couldn't see why somebody would need such a tool _with so much
detail_.
Most of those old projects have a linear history, so a log file like
this would suffice:
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
tag v0.3 gst-av-0.3.tar "Release 0.3"
And if they really had release branches, it shouldn't be difficult to
modify it for:
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
tag v0.2.1 gst-av-0.2.tar "Release 0.2.1"
checkout v0.2
tag v0.3 gst-av-0.3.tar "Release 0.3"
But different commit/author and respective dates, and merges? Sounds
like overkill.
Cheers.
--
Felipe Contreras
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 8:20 ` Felipe Contreras
@ 2012-11-27 8:36 ` Eric S. Raymond
2012-11-27 8:51 ` Felipe Contreras
0 siblings, 1 reply; 16+ messages in thread
From: Eric S. Raymond @ 2012-11-27 8:36 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
Felipe Contreras <felipe.contreras@gmail.com>:
> Most of those old projects have a linear history,
INTERCAL didn't. There were two branches for platform ports.
> But different commit/author and respective dates, and merges? Sounds
> like overkill.
I felt it was important that the metadata format be able to specify
git's entire metadata and DAG semantics. Otherwise, as sure as the
sun rises, *somebody* would run into a corner case not covered, and
(quite rightly) curse me for a shortsighted fool who had done a
half-assed job.
I don't do half-assed jobs. Not ever, no way, nohow.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitpacker progress report and a question
2012-11-27 8:36 ` Eric S. Raymond
@ 2012-11-27 8:51 ` Felipe Contreras
0 siblings, 0 replies; 16+ messages in thread
From: Felipe Contreras @ 2012-11-27 8:51 UTC (permalink / raw)
To: esr; +Cc: git
On Tue, Nov 27, 2012 at 9:36 AM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com>:
>> Most of those old projects have a linear history,
>
> INTERCAL didn't. There were two branches for platform ports.
Fine:
tag v0.1 gst-av-0.1.tar "Release 0.1"
tag v0.2 gst-av-0.2.tar "Release 0.2"
checkout port1
tag v0.2-p1 gst-av-0.2-p1.tar "Release 0.2 p1"
checkout port2 v0.2
tag v0.2-p2 gst-av-0.2-p2.tar "Release 0.2 p2"
checkout master
tag v0.3 gst-av-0.3.tar "Release 0.3"
Problem solved.
>> But different commit/author and respective dates, and merges? Sounds
>> like overkill.
>
> I felt it was important that the metadata format be able to specify
> git's entire metadata and DAG semantics. Otherwise, as sure as the
> sun rises, *somebody* would run into a corner case not covered, and
> (quite rightly) curse me for a shortsighted fool who had done a
> half-assed job.
I'm willing to bet that won't happen.
> I don't do half-assed jobs. Not ever, no way, nohow.
So you prefer code that is way more complicated that it needs to be,
and with a higher likelihood of introducing bugs? There's a point of
diminishing returns where the code that nobody uses causes bugs for
real use-cases. That's not good.
I prefer code that does one thing, and does it well. And when the need
arises, evolve.
Cheers.
--
Felipe Contreras
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-11-27 8:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-15 21:28 gitpacker progress report and a question Eric S. Raymond
2012-11-15 22:35 ` Max Horn
2012-11-15 23:05 ` Eric S. Raymond
2012-11-16 13:13 ` Andreas Schwab
2012-11-26 20:07 ` Felipe Contreras
2012-11-26 22:01 ` Eric S. Raymond
2012-11-26 23:14 ` Felipe Contreras
2012-11-26 23:43 ` Eric S. Raymond
2012-11-27 1:29 ` Felipe Contreras
2012-11-27 1:38 ` Felipe Contreras
2012-11-27 6:29 ` Felipe Contreras
2012-11-27 7:27 ` Eric S. Raymond
2012-11-27 8:20 ` Felipe Contreras
2012-11-27 8:36 ` Eric S. Raymond
2012-11-27 8:51 ` Felipe Contreras
2012-11-27 7:30 ` Eric S. Raymond
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).