* [PATCH] archimport improvements
@ 2005-11-12 9:23 Eric Wong
2005-11-12 9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: Eric Wong @ 2005-11-12 9:23 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git list
Hello,
I'm another Arch-user trying out git. Unfortunately, I encountered
several problems with git-archimport that I needed fixed before my
development trees could be imported into git.
Here's a summary of the changes:
Bug Fixes:
* Support for '--branch'-less Arch version names.
Encoding '/' to '--' (as was previously done) is not 100% reversable
because the "--branch" portion of an fully-qualified Arch version name
is optional (though not many people or Arch-related tools know this).
* I'm encoding the '/' in the fully-qualified name as ',' to not confuse
other porcelains, but leaving '/' in branch names may be alright
provided porcelains can support them.
* Identify git branches as an Arch "archive,category<--branch>--version"
Anything less than that is ambiguous as far as history and patch
relationships go.
* Renamed directories containing renamed/moved files inside didn't get
tracked properly. The original code was inadequate for this, and
making it support all rename cases that Arch supports is too much
work. Instead, I maintain full-blown Arch trees in the temp dir and
replay patches + rsync based on that. Performance is slightly slower
than before, but accuracy is more important to me.
* Permission (execute bit only because of git) tracking as a side effect
of the above.
* Tracking changes from branches that are only cherry-picked now works
* Pika-escaped filenames unhandled. This seems fixed in the latest
git, but I fixed it more generally and removed the ShellQuote module
dependency along the way.
* Don't die() when a merge-base can't be found. Arch supports
merging between unrelated trees.
Usability enhancements:
* Optionally detect merged branches and attempt to import their history,
too. Use the -D <depth> option for this. Specifying a <depth>
greater than 1 is usually not needed unless the tree you're tracking
has had history pruned.
* Optionally attempt to auto-register unknown Arch archives from
mirrors.sourcecontrol.net to pull their history with the -a (boolean)
switch. Not sure how useful users will find this.
* Removed -A <archive> usage (unnecessary in all cases) and made all
Arch calls and output parsing to be compatible with both tla (tested
1.3.3) and baz (1.4.2). Default is still tla, but the ARCH_CLIENT
environment variable may be changed to baz.
Current weaknesses:
* (Present in the original code as well).
The code still assumes that dates in commit logs can be trusted, which is
fine in most cases, but a wayward branch can screw up git-archimport and
cause parents to be missed.
--
Eric Wong
^ permalink raw reply [flat|nested] 39+ messages in thread* [PATCH 1/5] remove shellquote usage for tags 2005-11-12 9:23 [PATCH] archimport improvements Eric Wong @ 2005-11-12 9:25 ` Eric Wong 2005-11-12 9:27 ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong 2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff 2005-11-17 9:26 ` [PATCH] archimport improvements Martin Langhoff 2 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-12 9:25 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list use ',' to encode '/' in "archivename/foo--bar--0.0" so we can allow "--branch"-less trees which are valid in Arch ("archivename/foo--0.0") Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 55 ++++++++++++++++++++++++++------------------------- 1 files changed, 28 insertions(+), 27 deletions(-) applies-to: 76d3d1c302c20b82fd976e958aabd19f7f01e7b5 28d4f9ee8ba83b35eea66d4dd19b8ec26a0218c7 diff --git a/git-archimport.perl b/git-archimport.perl index e22c816..7c15184 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -52,6 +52,7 @@ $ENV{'TZ'}="UTC"; my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; +my $ptag_dir = "$git_dir/archimport/tags"; our($opt_h,$opt_v, $opt_T, $opt_C,$opt_t); @@ -195,16 +196,19 @@ unless (-d $git_dir) { # initial import opendir(DIR, "$git_dir/archimport/tags") || die "can't opendir: $!"; while (my $file = readdir(DIR)) { - # skip non-interesting-files - next unless -f "$git_dir/archimport/tags/$file"; - next if $file =~ m/--base-0$/; # don't care for base-0 + # skip non-interesting-files + next unless -f "$ptag_dir/$file"; + + # convert first '--' to '/' from old git-archimport to use + # as an archivename/c--b--v private tag + if ($file !~ m!,!) { + my $oldfile = $file; + $file =~ s!--!,!; + print STDERR "converting old tag $oldfile to $file\n"; + rename("$ptag_dir/$oldfile", "$ptag_dir/$file") or die $!; + } my $sha = ptag($file); chomp $sha; - # reconvert the 3rd '--' sequence from the end - # into a slash - # $file = reverse $file; - # $file =~ s!^(.+?--.+?--.+?--.+?)--(.+)$!$1/$2!; - # $file = reverse $file; $rptags{$sha} = $file; } closedir DIR; @@ -582,19 +586,20 @@ sub parselog { # write/read a tag sub tag { my ($tag, $commit) = @_; - $tag =~ s|/|--|g; - $tag = shell_quote($tag); + + # don't use subdirs for tags yet, it could screw up other porcelains + $tag =~ s|/|,|; if ($commit) { - open(C,">$git_dir/refs/tags/$tag") + open(C,">","$git_dir/refs/tags/$tag") or die "Cannot create tag $tag: $!\n"; print C "$commit\n" or die "Cannot write tag $tag: $!\n"; close(C) or die "Cannot write tag $tag: $!\n"; - print " * Created tag ' $tag' on '$commit'\n" if $opt_v; + print " * Created tag '$tag' on '$commit'\n" if $opt_v; } else { # read - open(C,"<$git_dir/refs/tags/$tag") + open(C,"<","$git_dir/refs/tags/$tag") or die "Cannot read tag $tag: $!\n"; $commit = <C>; chomp $commit; @@ -609,15 +614,16 @@ sub tag { # reads fail softly if the tag isn't there sub ptag { my ($tag, $commit) = @_; - $tag =~ s|/|--|g; - $tag = shell_quote($tag); + + # don't use subdirs for tags yet, it could screw up other porcelains + $tag =~ s|/|,|g; - unless (-d "$git_dir/archimport/tags") { - mkpath("$git_dir/archimport/tags"); - } + my $tag_file = "$ptag_dir/$tag"; + my $tag_branch_dir = dirname($tag_file); + mkpath($tag_branch_dir) unless (-d $tag_branch_dir); if ($commit) { # write - open(C,">$git_dir/archimport/tags/$tag") + open(C,">",$tag_file) or die "Cannot create tag $tag: $!\n"; print C "$commit\n" or die "Cannot write tag $tag: $!\n"; @@ -627,10 +633,10 @@ sub ptag { unless $tag =~ m/--base-0$/; } else { # read # if the tag isn't there, return 0 - unless ( -s "$git_dir/archimport/tags/$tag") { + unless ( -s $tag_file) { return 0; } - open(C,"<$git_dir/archimport/tags/$tag") + open(C,"<",$tag_file) or die "Cannot read tag $tag: $!\n"; $commit = <C>; chomp $commit; @@ -780,12 +786,7 @@ sub commitid2pset { chomp $commitid; my $name = $rptags{$commitid} || die "Cannot find reverse tag mapping for $commitid"; - # the keys in %rptag are slightly munged; unmunge - # reconvert the 3rd '--' sequence from the end - # into a slash - $name = reverse $name; - $name =~ s!^(.+?--.+?--.+?--.+?)--(.+)$!$1/$2!; - $name = reverse $name; + $name =~ s|,|/|; my $ps = $psets{$name} || (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name"; return $ps; --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/5] archimport: don't die on merge-base failure 2005-11-12 9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong @ 2005-11-12 9:27 ` Eric Wong 2005-11-12 9:29 ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-12 9:27 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Don't die if we can't find a merge base, Arch allows arbitrary cherry-picks between unrelated branches and we should not die when that happens Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) applies-to: 07dfd96ba53890d6a20fa0b028cf96e0e49bc027 7d099adadc041d74a0defc107656f273b35f57cb diff --git a/git-archimport.perl b/git-archimport.perl index 7c15184..699d5f6 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -693,7 +693,13 @@ sub find_parents { next unless -e "$git_dir/refs/heads/$branch"; my $mergebase = `git-merge-base $branch $ps->{branch}`; - die "Cannot find merge base for $branch and $ps->{branch}" if $?; + if ($?) { + # Don't die here, Arch supports one-way cherry-picking + # between branches with no common base (or any relationship + # at all beforehand) + warn "Cannot find merge base for $branch and $ps->{branch}"; + next; + } chomp $mergebase; # now walk up to the mergepoint collecting what patches we have --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git 2005-11-12 9:27 ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong @ 2005-11-12 9:29 ` Eric Wong 2005-11-12 9:30 ` [PATCH 4/5] Overhaul of changeset application Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-12 9:29 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list [-- Attachment #1: Type: text/plain, Size: 3647 bytes --] Disambiguate the term 'branch' in Arch vs git, and start using fully-qualified names. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 65 ++++++++++++++++++++++++++++++++++++++++++--------- 1 files changed, 54 insertions(+), 11 deletions(-) applies-to: bbfe032e4900efc45bb94fb687af0140ccb0a858 ede672b4cd544b5e5418cc5088e92f2e0d2f7394 diff --git a/git-archimport.perl b/git-archimport.perl index 699d5f6..f2bcbb4 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -30,6 +30,24 @@ See man (1) git-archimport for more deta Add print in front of the shell commands invoked via backticks. +=head1 Devel Notes + +There are several places where Arch and git terminology are intermixed +and potentially confused. + +The notion of a "branch" in git is approximately equivalent to +a "archive/category--branch--version" in Arch. Also, it should be noted +that the "--branch" portion of "archive/category--branch--version" is really +optional in Arch although not many people (nor tools!) seem to know this. +This means that "archive/category--version" is also a valid "branch" +in git terms. + +We always refer to Arch names by their fully qualified variant (which +means the "archive" name is prefixed. + +For people unfamiliar with Arch, an "archive" is the term for "repository", +and can contain multiple, unrelated branches. + =cut use strict; @@ -215,9 +233,41 @@ unless (-d $git_dir) { # initial import } # process patchsets -foreach my $ps (@psets) { +# extract the Arch repository name (Arch "archive" in Arch-speak) +sub extract_reponame { + my $fq_cvbr = shift; # archivename/[[[[category]branch]version]revision] + return (split(/\//, $fq_cvbr))[0]; +} + +sub extract_versionname { + my $name = shift; + $name =~ s/--(?:patch|version(?:fix)?|base)-\d+$//; + return $name; +} - $ps->{branch} = branchname($ps->{id}); +# convert a fully-qualified revision or version to a unique dirname: +# normalperson@yhbt.net-05/mpd--uclinux--1--patch-2 +# becomes: normalperson@yhbt.net-05,mpd--uclinux--1 +# +# the git notion of a branch is closer to +# archive/category--branch--version than archive/category--branch, so we +# use this to convert to git branch names. +# Also, keep archive names but replace '/' with ',' since it won't require +# subdirectories, and is safer than swapping '--' which could confuse +# reverse-mapping when dealing with bastard branches that +# are just archive/category--version (no --branch) +sub tree_dirname { + my $revision = shift; + my $name = extract_versionname($revision); + $name =~ s#/#,#; + return $name; +} + +*git_branchname = *tree_dirname; + +# process patchsets +foreach my $ps (@psets) { + $ps->{branch} = git_branchname($ps->{id}); # # ensure we have a clean state @@ -429,16 +479,9 @@ foreach my $ps (@psets) { $opt_v && print " + parents: $par \n"; } -sub branchname { - my $id = shift; - $id =~ s#^.+?/##; - my @parts = split(m/--/, $id); - return join('--', @parts[0..1]); -} - sub apply_import { my $ps = shift; - my $bname = branchname($ps->{id}); + my $bname = git_branchname($ps->{id}); `mkdir -p $tmp`; @@ -669,7 +712,7 @@ sub find_parents { # simple loop to split the merges # per branch foreach my $merge (@{$ps->{merges}}) { - my $branch = branchname($merge); + my $branch = git_branchname($merge); unless (defined $branches{$branch} ){ $branches{$branch} = []; } --- 0.99.9.GIT -- Eric Wong [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 4/5] Overhaul of changeset application 2005-11-12 9:29 ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong @ 2005-11-12 9:30 ` Eric Wong 2005-11-12 9:32 ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong 2005-11-12 12:07 ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff 0 siblings, 2 replies; 39+ messages in thread From: Eric Wong @ 2005-11-12 9:30 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Overhaul of changeset application to use native Arch tree operations. This results in: - reliable rename handling (esp. when dealing with renamed with files that already got renamed) - permissions tracking (execute only for git). - no need to shell-escape or pika-unescape anything. All arguments to external programs are always passed as an array. File modifications are automatically tracked using git (no need to parse Arch patch-log to look for modified files). - Correctly parse multi-line summary text in patch-logs Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 381 ++++++++++++++++++++------------------------------- 1 files changed, 146 insertions(+), 235 deletions(-) applies-to: 12cd9f2d764e50ae4fe2c6cd8b64fc72c668e0dd d3cbba7b8e8e3db61dac685ab55055d360e6138d diff --git a/git-archimport.perl b/git-archimport.perl index f2bcbb4..5616d42 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -55,7 +55,7 @@ use warnings; use Getopt::Std; use File::Spec; use File::Temp qw(tempfile tempdir); -use File::Path qw(mkpath); +use File::Path qw(mkpath rmtree); use File::Basename qw(basename dirname); use String::ShellQuote; use Time::Local; @@ -90,16 +90,17 @@ usage if $opt_h; @ARGV >= 1 or usage(); my @arch_roots = @ARGV; -my ($tmpdir, $tmpdirname) = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); -my $tmp = $opt_t || 1; -$tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); -$opt_v && print "+ Using $tmp as temporary directory\n"; +my $tmptree; +$ENV{'TMPDIR'} = $opt_t if $opt_t; +$tmptree = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); +$opt_v && print "+ Using $tmptree to store temporary trees\n"; my @psets = (); # the collection my %psets = (); # the collection, by name my %rptags = (); # my reverse private tags # to map a SHA1 to a commitid +my $TLA = $ENV{'ARCH_CLIENT'} || 'tla'; foreach my $root (@arch_roots) { my ($arepo, $abranch) = split(m!/!, $root); @@ -211,7 +212,7 @@ unless (-d $git_dir) { # initial import } } else { # progressing an import # load the rptags - opendir(DIR, "$git_dir/archimport/tags") + opendir(DIR, $ptag_dir) || die "can't opendir: $!"; while (my $file = readdir(DIR)) { # skip non-interesting-files @@ -288,26 +289,37 @@ foreach my $ps (@psets) { print " * Starting to work on $ps->{id}\n"; - # - # create the branch if needed - # - if ($ps->{type} eq 'i' && !$import) { - die "Should not have more than one 'Initial import' per GIT import: $ps->{id}"; + # switch to that branch if we're not already in that branch: + if (-e "$git_dir/refs/heads/$ps->{branch}") { + system('git-checkout','-f',$ps->{branch}) == 0 or die "$! $?\n"; + + # remove any old stuff that got leftover: + chomp(my @rm = safe_pipe_capture('git-ls-files','--others')); + rmtree(\@rm) if @rm; } - - unless ($import) { # skip for import - if ( -e "$git_dir/refs/heads/$ps->{branch}") { - # we know about this branch - `git checkout $ps->{branch}`; - } else { - # new branch! we need to verify a few things - die "Branch on a non-tag!" unless $ps->{type} eq 't'; - my $branchpoint = ptag($ps->{tag}); - die "Tagging from unknown id unsupported: $ps->{tag}" - unless $branchpoint; + + # Apply the import/changeset/merge into the working tree + my $dir = sync_to_ps($ps); + # read the new log entry: + my @commitlog = safe_pipe_capture($TLA,'cat-log','-d',$dir,$ps->{id}); + die "Error in cat-log: $!" if $?; + chomp @commitlog; + + # grab variables we want from the log, new fields get added to $ps: + # (author, date, email, summary, message body ...) + parselog($ps, \@commitlog); + + if ($ps->{id} =~ /--base-0$/ && $ps->{id} ne $psets[0]{id}) { + # this should work when importing continuations + if ($ps->{tag} && (my $branchpoint = eval { ptag($ps->{tag}) })) { # find where we are supposed to branch from - `git checkout -b $ps->{branch} $branchpoint`; + system('git-checkout','-f','-b',$ps->{branch}, + $branchpoint) == 0 or die "$! $?\n"; + + # remove any old stuff that got leftover: + chomp(my @rm = safe_pipe_capture('git-ls-files','--others')); + rmtree(\@rm) if @rm; # If we trust Arch with the fact that this is just # a tag, and it does not affect the state of the tree @@ -316,95 +328,26 @@ foreach my $ps (@psets) { ptag($ps->{id}, $branchpoint); print " * Tagged $ps->{id} at $branchpoint\n"; next; - } - die $! if $?; + } else { + warn "Tagging from unknown id unsupported\n" if $ps->{tag}; + } + # allow multiple bases/imports here since Arch supports cherry-picks + # from unrelated trees } - - # - # Apply the import/changeset/merge into the working tree - # - if ($ps->{type} eq 'i' || $ps->{type} eq 't') { - apply_import($ps) or die $!; - $import=0; - } elsif ($ps->{type} eq 's') { - apply_cset($ps); - } - - # - # prepare update git's index, based on what arch knows - # about the pset, resolve parents, etc - # - my $tree; - my $commitlog = `tla cat-archive-log -A $ps->{repo} $ps->{id}`; - die "Error in cat-archive-log: $!" if $?; - - # parselog will git-add/rm files - # and generally prepare things for the commit - # NOTE: parselog will shell-quote filenames! - my ($sum, $msg, $add, $del, $mod, $ren) = parselog($commitlog); - my $logmessage = "$sum\n$msg"; - - - # imports don't give us good info - # on added files. Shame on them - if ($ps->{type} eq 'i' || $ps->{type} eq 't') { - `find . -type f -print0 | grep -zv '^./$git_dir' | xargs -0 -l100 git-update-index --add`; - `git-ls-files --deleted -z | xargs --no-run-if-empty -0 -l100 git-update-index --remove`; - } - - if (@$add) { - while (@$add) { - my @slice = splice(@$add, 0, 100); - my $slice = join(' ', @slice); - `git-update-index --add $slice`; - die "Error in git-update-index --add: $!" if $?; - } - } - if (@$del) { - foreach my $file (@$del) { - unlink $file or die "Problems deleting $file : $!"; - } - while (@$del) { - my @slice = splice(@$del, 0, 100); - my $slice = join(' ', @slice); - `git-update-index --remove $slice`; - die "Error in git-update-index --remove: $!" if $?; - } - } - if (@$ren) { # renamed - if (@$ren % 2) { - die "Odd number of entries in rename!?"; - } - ; - while (@$ren) { - my $from = pop @$ren; - my $to = pop @$ren; - - unless (-d dirname($to)) { - mkpath(dirname($to)); # will die on err - } - #print "moving $from $to"; - `mv $from $to`; - die "Error renaming $from $to : $!" if $?; - `git-update-index --remove $from`; - die "Error in git-update-index --remove: $!" if $?; - `git-update-index --add $to`; - die "Error in git-update-index --add: $!" if $?; - } - - } - if (@$mod) { # must be _after_ renames - while (@$mod) { - my @slice = splice(@$mod, 0, 100); - my $slice = join(' ', @slice); - `git-update-index $slice`; - die "Error in git-update-index: $!" if $?; - } - } - - # warn "errors when running git-update-index! $!"; - $tree = `git-write-tree`; + # update the index with all the changes we got + system('git-ls-files --others -z | '. + 'git-update-index --add -z --stdin') == 0 or die "$! $?\n"; + system('git-ls-files --deleted -z | '. + 'git-update-index --remove -z --stdin') == 0 or die "$! $?\n"; + + # just brute force this and update everything, it's faster than + # parsing the Modified-files header and then having to pika-unescape + # each one in case it has weird characters + system('git-ls-files -z | '. + 'git-update-index -z --stdin') == 0 or die "$! $?\n"; + + my $tree = `git-write-tree`; die "cannot write tree $!" if $?; chomp $tree; @@ -414,7 +357,7 @@ foreach my $ps (@psets) { # my @par; if ( -e "$git_dir/refs/heads/$ps->{branch}") { - if (open HEAD, "<$git_dir/refs/heads/$ps->{branch}") { + if (open HEAD, "<","$git_dir/refs/heads/$ps->{branch}") { my $p = <HEAD>; close HEAD; chomp $p; @@ -429,7 +372,6 @@ foreach my $ps (@psets) { if ($ps->{merges}) { push @par, find_parents($ps); } - my $par = join (' ', @par); # # Commit, tag and clean state @@ -442,13 +384,14 @@ foreach my $ps (@psets) { $ENV{GIT_COMMITTER_EMAIL} = $ps->{email}; $ENV{GIT_COMMITTER_DATE} = $ps->{date}; - my ($pid, $commit_rh, $commit_wh); - $commit_rh = 'commit_rh'; - $commit_wh = 'commit_wh'; - - $pid = open2(*READER, *WRITER, "git-commit-tree $tree $par") + my $pid = open2(*READER, *WRITER, 'git-commit-tree',$tree,@par) or die $!; - print WRITER $logmessage; # write + print WRITER $ps->{summary},"\n"; + print WRITER $ps->{message},"\n"; + + # make it easy to backtrack and figure out which Arch revision this was: + print WRITER 'git-archimport-id: ',$ps->{id},"\n"; + close WRITER; my $commitid = <READER>; # read chomp $commitid; @@ -461,7 +404,7 @@ foreach my $ps (@psets) { # # Update the branch # - open HEAD, ">$git_dir/refs/heads/$ps->{branch}"; + open HEAD, ">","$git_dir/refs/heads/$ps->{branch}"; print HEAD $commitid; close HEAD; unlink ("$git_dir/HEAD"); @@ -476,71 +419,41 @@ foreach my $ps (@psets) { print " + tree $tree\n"; print " + commit $commitid\n"; $opt_v && print " + commit date is $ps->{date} \n"; - $opt_v && print " + parents: $par \n"; + $opt_v && print " + parents: ".join(' ',@par)."\n"; } -sub apply_import { +sub sync_to_ps { my $ps = shift; - my $bname = git_branchname($ps->{id}); + my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id}); - `mkdir -p $tmp`; - - `tla get -s --no-pristine -A $ps->{repo} $ps->{id} $tmp/import`; - die "Cannot get import: $!" if $?; - `rsync -v --archive --delete --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/import/* ./`; - die "Cannot rsync import:$!" if $?; - - `rm -fr $tmp/import`; - die "Cannot remove tempdir: $!" if $?; - - - return 1; -} - -sub apply_cset { - my $ps = shift; - - `mkdir -p $tmp`; - - # get the changeset - `tla get-changeset -A $ps->{repo} $ps->{id} $tmp/changeset`; - die "Cannot get changeset: $!" if $?; - - # apply patches - if (`find $tmp/changeset/patches -type f -name '*.patch'`) { - # this can be sped up considerably by doing - # (find | xargs cat) | patch - # but that cna get mucked up by patches - # with missing trailing newlines or the standard - # 'missing newline' flag in the patch - possibly - # produced with an old/buggy diff. - # slow and safe, we invoke patch once per patchfile - `find $tmp/changeset/patches -type f -name '*.patch' -print0 | grep -zv '{arch}' | xargs -iFILE -0 --no-run-if-empty patch -p1 --forward -iFILE`; - die "Problem applying patches! $!" if $?; - } - - # apply changed binary files - if (my @modified = `find $tmp/changeset/patches -type f -name '*.modified'`) { - foreach my $mod (@modified) { - chomp $mod; - my $orig = $mod; - $orig =~ s/\.modified$//; # lazy - $orig =~ s!^\Q$tmp\E/changeset/patches/!!; - #print "rsync -p '$mod' '$orig'"; - `rsync -p $mod ./$orig`; - die "Problem applying binary changes! $!" if $?; + if (-d $tree_dir) { + if ($ps->{type} eq 't' && defined $ps->{tag}) { + # looks like a tag-only or (worse,) a mixed tags/changeset branch, + # can't rely on replay to work correctly on these + rmtree($tree_dir); + safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); + } else { + my $tree_id = arch_tree_id($tree_dir); + if ($ps->{parent_id} eq $tree_id) { + safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id}); + } else { + safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir, + $tree_id, $ps->{id}); + } } + } else { + safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); } - - # bring in new files - `rsync --archive --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/changeset/new-files-archive/* ./`; - - # deleted files are hinted from the commitlog processing - - `rm -fr $tmp/changeset`; + + # added -I flag to rsync since we're going to fast! AIEEEEE!!!! + system('rsync','-aI','--delete','--exclude',$git_dir, +# '--exclude','.arch-inventory', + '--exclude','.arch-ids','--exclude','{arch}', + '--exclude','+*','--exclude',',*', + "$tree_dir/",'./') == 0 or die "Cannot rsync $tree_dir: $! $?"; + return $tree_dir; } - # =for reference # A log entry looks like # Revision: moodle-org--moodle--1.3.3--patch-15 @@ -560,70 +473,42 @@ sub apply_cset { # admin/editor.html backup/lib.php backup/restore.php # New-patches: arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle--1.3.3--patch-15 # Summary: Updating to latest from MOODLE_14_STABLE (1.4.5+) +# summary can be multiline with a leading space just like the above fields # Keywords: # # Updating yadda tadda tadda madda sub parselog { - my $log = shift; - #print $log; - - my (@add, @del, @mod, @ren, @kw, $sum, $msg ); - - if ($log =~ m/(?:\n|^)New-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @add = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Removed-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @del = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Modified-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @mod = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Renamed-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @ren = split(m/\s+/s, $files); - } - - $sum =''; - if ($log =~ m/^Summary:(.+?)$/m ) { - $sum = $1; - $sum =~ s/^\s+//; - $sum =~ s/\s+$//; - } - - $msg = ''; - if ($log =~ m/\n\n(.+)$/s) { - $msg = $1; - $msg =~ s/^\s+//; - $msg =~ s/\s+$//; - } - - - # cleanup the arrays - foreach my $ref ( (\@add, \@del, \@mod, \@ren) ) { - my @tmp = (); - while (my $t = pop @$ref) { - next unless length ($t); - next if $t =~ m!\{arch\}/!; - next if $t =~ m!\.arch-ids/!; - next if $t =~ m!\.arch-inventory$!; - # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why? - # we can assume that any filename with \ indicates some pika escaping that we want to get rid of. - if ($t =~ /\\/ ){ - $t = `tla escape --unescaped '$t'`; - } - push (@tmp, shell_quote($t)); + my ($ps, $log) = @_; + my $key = undef; + while ($_ = shift @$log) { + if (/^Continuation-of:\s*(.*)/) { + $ps->{tag} = $1; + $key = undef; + } elsif (/^Summary:\s*(.*)$/ ) { + # summary can be multiline as long as it has a leading space + $ps->{summary} = [ $1 ]; + $key = 'summary'; + } elsif (/^Creator: (.*)\s*<([^\>]+)>/) { + $ps->{author} = $1; + $ps->{email} = $2; + $key = undef; + } elsif (/^$/) { + last; # remainder of @$log that didn't get shifted off is message + } elsif ($key) { + if (/^\s+(.*)$/) { + if ($key eq 'summary') { + push @{$ps->{$key}}, $1; + } else { + push @{$ps->{$key}}, split(/\s+/, $1); + } + } else { + $key = undef; + } } - @$ref = @tmp; } - #print Dumper [$sum, $msg, \@add, \@del, \@mod, \@ren]; - return ($sum, $msg, \@add, \@del, \@mod, \@ren); + $ps->{summary} = join("\n",@{$ps->{summary}})."\n"; + $ps->{message} = join("\n",@$log); } # write/read a tag @@ -816,8 +701,11 @@ sub find_parents { } } } - @parents = keys %parents; - @parents = map { " -p " . ptag($_) } @parents; + + @parents = (); + foreach (keys %parents) { + push @parents, '-p', ptag($_); + } return @parents; } @@ -840,3 +728,26 @@ sub commitid2pset { || (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name"; return $ps; } + + +# an alterative to `command` that allows input to be passed as an array +# to work around shell problems with weird characters in arguments +sub safe_pipe_capture { + my @output; + if (my $pid = open my $child, '-|') { + @output = (<$child>); + close $child or die join(' ',@_).": $! $?"; + } else { + exec(@_) or die $?; # exec() can fail the executable can't be found + } + return wantarray ? @output : join('',@output); +} + +# `tla logs -rf -d <dir> | head -n1` or `baz tree-id <dir>` +sub arch_tree_id { + my $dir = shift; + chomp( my $ret = (safe_pipe_capture($TLA,'logs','-rf','-d',$dir))[0] ); + return $ret; +} + + --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 5/5] -D <depth> option to recurse into merged branches 2005-11-12 9:30 ` [PATCH 4/5] Overhaul of changeset application Eric Wong @ 2005-11-12 9:32 ` Eric Wong 2005-11-14 2:01 ` Eric Wong 2005-11-12 12:07 ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff 1 sibling, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-12 9:32 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list -D <depth> option to recurse into merged branches -a auto-register Arch archive if it's on mirrors.sourcecontrol.net fix for dealing with tag revisions remove unused module loading (no more String::ShellQuote dep) Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 257 ++++++++++++++++++++++++++++----------------------- 1 files changed, 141 insertions(+), 116 deletions(-) applies-to: d6d3e5272bc39ea086e5c1b0b39ceb5b51ade1ff 2fe160b44c5e5da1a139668767ba184b6b63f605 diff --git a/git-archimport.perl b/git-archimport.perl index 5616d42..a0ea016 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -22,9 +22,10 @@ See man (1) git-archimport for more deta =head1 TODO - create tag objects instead of ref tags - - audit shell-escaping of filenames - hide our private tags somewhere smarter - - find a way to make "cat *patches | patch" safe even when patchfiles are missing newlines + - sort and apply patches by graphing ancestry relations instead of just + relying in dates supplied in the changeset itself. + tla ancestry-graph -m could be helpful here... =head1 Devel tricks @@ -53,15 +54,9 @@ and can contain multiple, unrelated bran use strict; use warnings; use Getopt::Std; -use File::Spec; -use File::Temp qw(tempfile tempdir); +use File::Temp qw(tempdir); use File::Path qw(mkpath rmtree); use File::Basename qw(basename dirname); -use String::ShellQuote; -use Time::Local; -use IO::Socket; -use IO::Pipe; -use POSIX qw(strftime dup2); use Data::Dumper qw/ Dumper /; use IPC::Open2; @@ -72,29 +67,35 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; my $ptag_dir = "$git_dir/archimport/tags"; -our($opt_h,$opt_v, $opt_T, - $opt_C,$opt_t); +our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a); sub usage() { print STDERR <<END; Usage: ${\basename $0} # fetch/update GIT from Arch - [ -h ] [ -v ] [ -T ] [ -t tempdir ] + [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] repository/arch-branch [ repository/arch-branch] ... END exit(1); } -getopts("Thvt:") or usage(); +getopts("Thvat:D:") or usage(); usage if $opt_h; @ARGV >= 1 or usage(); -my @arch_roots = @ARGV; +# $arch_branches: +# values associated with keys: +# =1 - Arch version / git 'branch' detected via abrowse on a limit +# >1 - Arch version / git 'branch' of an auxilliary branch we've merged +my %arch_branches = map { $_ => 1 } @ARGV; + my $tmptree; $ENV{'TMPDIR'} = $opt_t if $opt_t; $tmptree = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); $opt_v && print "+ Using $tmptree to store temporary trees\n"; +my %reachable = (); # Arch repositories we can access +my %unreachable = (); # Arch repositories we can't access :< my @psets = (); # the collection my %psets = (); # the collection, by name @@ -102,114 +103,117 @@ my %rptags = (); # my rev # to map a SHA1 to a commitid my $TLA = $ENV{'ARCH_CLIENT'} || 'tla'; -foreach my $root (@arch_roots) { - my ($arepo, $abranch) = split(m!/!, $root); - open ABROWSE, "tla abrowse -f -A $arepo --desc --merges $abranch |" - or die "Problems with tla abrowse: $!"; - - my %ps = (); # the current one - my $mode = ''; - my $lastseen = ''; - - while (<ABROWSE>) { - chomp; - - # first record padded w 8 spaces - if (s/^\s{8}\b//) { - - # store the record we just captured - if (%ps) { - my %temp = %ps; # break references - push (@psets, \%temp); - $psets{$temp{id}} = \%temp; - %ps = (); - } - - my ($id, $type) = split(m/\s{3}/, $_); - $ps{id} = $id; - $ps{repo} = $arepo; - - # deal with types - if ($type =~ m/^\(simple changeset\)/) { - $ps{type} = 's'; - } elsif ($type eq '(initial import)') { - $ps{type} = 'i'; - } elsif ($type =~ m/^\(tag revision of (.+)\)/) { - $ps{type} = 't'; - $ps{tag} = $1; - } else { - warn "Unknown type $type"; - } - $lastseen = 'id'; - } - - if (s/^\s{10}//) { - # 10 leading spaces or more - # indicate commit metadata - - # date & author - if ($lastseen eq 'id' && m/^\d{4}-\d{2}-\d{2}/) { +sub do_abrowse { + my $stage = shift; + while (my ($limit, $level) = each %arch_branches) { + next unless $level == $stage; + + open ABROWSE, "$TLA abrowse -fkD --merges $limit |" + or die "Problems with tla abrowse: $!"; + + my %ps = (); # the current one + my $lastseen = ''; + + while (<ABROWSE>) { + chomp; + + # first record padded w 8 spaces + if (s/^\s{8}\b//) { + my ($id, $type) = split(m/\s+/, $_, 2); + + my %last_ps; + # store the record we just captured + if (%ps && !exists $psets{ $ps{id} }) { + %last_ps = %ps; # break references + push (@psets, \%last_ps); + $psets{ $last_ps{id} } = \%last_ps; + } - my ($date, $authoremail) = split(m/\s{2,}/, $_); - $ps{date} = $date; - $ps{date} =~ s/\bGMT$//; # strip off trailign GMT - if ($ps{date} =~ m/\b\w+$/) { - warn 'Arch dates not in GMT?! - imported dates will be wrong'; + my $branch = extract_versionname($id); + %ps = ( id => $id, branch => $branch ); + if (%last_ps && ($last_ps{branch} eq $branch)) { + $ps{parent_id} = $last_ps{id}; + } + + $arch_branches{$branch} = 1; + $lastseen = 'id'; + + # deal with types (should work with baz or tla): + if ($type =~ m/\(.*changeset\)/) { + $ps{type} = 's'; + } elsif ($type =~ /\(.*import\)/) { + $ps{type} = 'i'; + } elsif ($type =~ m/\(tag.*\)/) { + $ps{type} = 't'; + # read which revision we've tagged when we parse the log + #$ps{tag} = $1; + } else { + warn "Unknown type $type"; + } + + $arch_branches{$branch} = 1; + $lastseen = 'id'; + } elsif (s/^\s{10}//) { + # 10 leading spaces or more + # indicate commit metadata + + # date + if ($lastseen eq 'id' && m/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d)/){ + $ps{date} = $1; + $lastseen = 'date'; + } elsif ($_ eq 'merges in:') { + $ps{merges} = []; + $lastseen = 'merges'; + } elsif ($lastseen eq 'merges' && s/^\s{2}//) { + my $id = $_; + push (@{$ps{merges}}, $id); + + # aggressive branch finding: + if ($opt_D) { + my $branch = extract_versionname($id); + my $repo = extract_reponame($branch); + + if (archive_reachable($repo) && + !defined $arch_branches{$branch}) { + $arch_branches{$branch} = $stage + 1; + } + } + } else { + warn "more metadata after merges!?: $_\n" unless /^\s*$/; } - - $authoremail =~ m/^(.+)\s(\S+)$/; - $ps{author} = $1; - $ps{email} = $2; - - $lastseen = 'date'; - - } elsif ($lastseen eq 'date') { - # the only hint is position - # subject is after date - $ps{subj} = $_; - $lastseen = 'subj'; - - } elsif ($lastseen eq 'subj' && $_ eq 'merges in:') { - $ps{merges} = []; - $lastseen = 'merges'; - - } elsif ($lastseen eq 'merges' && s/^\s{2}//) { - push (@{$ps{merges}}, $_); - } else { - warn 'more metadata after merges!?'; } - } - } - if (%ps) { - my %temp = %ps; # break references - push (@psets, \%temp); - $psets{ $temp{id} } = \%temp; - %ps = (); - } - close ABROWSE; + if (%ps && !exists $psets{ $ps{id} }) { + my %temp = %ps; # break references + if ($psets[$#psets]{branch} eq $ps{branch}) { + $temp{parent_id} = $psets[$#psets]{id}; + } + push (@psets, \%temp); + $psets{ $temp{id} } = \%temp; + } + + close ABROWSE or die "$TLA abrowse failed on $limit\n"; + } } # end foreach $root +do_abrowse(1); +my $depth = 2; +$opt_D ||= 0; +while ($depth <= $opt_D) { + do_abrowse($depth); + $depth++; +} + ## Order patches by time +# FIXME see if we can find a more optimal way to do this by graphing +# the ancestry data and walking it, that way we won't have to rely on +# client-supplied dates @psets = sort {$a->{date}.$b->{id} cmp $b->{date}.$b->{id}} @psets; -#print Dumper \@psets; - -## -## TODO cleanup irrelevant patches -## and put an initial import -## or a full tag -my $import = 0; unless (-d $git_dir) { # initial import - if ($psets[0]{type} eq 'i' || $psets[0]{type} eq 't') { - print "Starting import from $psets[0]{id}\n"; - `git-init-db`; - die $! if $?; - $import = 1; - } else { - die "Need to start from an import or a tag -- cannot use $psets[0]{id}"; - } + print "Starting import from $psets[0]{id}\n"; + system('git-init-db') == 0 or die "$! $?\n"; } else { # progressing an import # load the rptags opendir(DIR, $ptag_dir) @@ -233,7 +237,6 @@ unless (-d $git_dir) { # initial import closedir DIR; } -# process patchsets # extract the Arch repository name (Arch "archive" in Arch-speak) sub extract_reponame { my $fq_cvbr = shift; # archivename/[[[[category]branch]version]revision] @@ -266,21 +269,21 @@ sub tree_dirname { *git_branchname = *tree_dirname; -# process patchsets +# process patchsets in ancestry order foreach my $ps (@psets) { $ps->{branch} = git_branchname($ps->{id}); # # ensure we have a clean state # - if (`git diff-files`) { + if (`git-diff-files`) { die "Unclean tree when about to process $ps->{id} " . " - did we fail to commit cleanly before?"; } die $! if $?; # - # skip commits already in repo + # skip commits already in git repo # if (ptag($ps->{id})) { $opt_v && print " * Skipping already imported: $ps->{id}\n"; @@ -427,7 +430,7 @@ sub sync_to_ps { my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id}); if (-d $tree_dir) { - if ($ps->{type} eq 't' && defined $ps->{tag}) { + if ($ps->{type} eq 't') { # looks like a tag-only or (worse,) a mixed tags/changeset branch, # can't rely on replay to work correctly on these rmtree($tree_dir); @@ -435,13 +438,16 @@ sub sync_to_ps { } else { my $tree_id = arch_tree_id($tree_dir); if ($ps->{parent_id} eq $tree_id) { + # the common case (hopefully) safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id}); } else { + # this can happen if branches cherry-pick safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir, $tree_id, $ps->{id}); } } } else { + # new branch work safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); } @@ -750,4 +756,23 @@ sub arch_tree_id { return $ret; } +sub archive_reachable { + my $archive = shift; + return 1 if $reachable{$archive}; + return 0 if $unreachable{$archive}; + + if (system "$TLA whereis-archive $archive >/dev/null") { + if ($opt_a && (system($TLA,'register-archive', + "http://mirrors.sourcecontrol.net/$archive") == 0)) { + $reachable{$archive} = 1; + return 1; + } + print STDERR "Archive is unreachable: $archive\n"; + $unreachable{$archive} = 1; + return 0; + } else { + $reachable{$archive} = 1; + return 1; + } +} --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 5/5] -D <depth> option to recurse into merged branches 2005-11-12 9:32 ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong @ 2005-11-14 2:01 ` Eric Wong 0 siblings, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-14 2:01 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list One small fix on top of this one: Don't check for parents if the only revision we have is a base-0 and @psets is empty. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) applies-to: 8a7e18ff0884cae74a1127d5c96577a85acca3f4 5f2896558284724bcc87eb64daa0933b544ec20d diff --git a/git-archimport.perl b/git-archimport.perl index a0ea016..b624ba6 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -186,7 +186,7 @@ sub do_abrowse { if (%ps && !exists $psets{ $ps{id} }) { my %temp = %ps; # break references - if ($psets[$#psets]{branch} eq $ps{branch}) { + if (@psets && $psets[$#psets]{branch} eq $ps{branch}) { $temp{parent_id} = $psets[$#psets]{id}; } push (@psets, \%temp); --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 4/5] Overhaul of changeset application 2005-11-12 9:30 ` [PATCH 4/5] Overhaul of changeset application Eric Wong 2005-11-12 9:32 ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong @ 2005-11-12 12:07 ` Martin Langhoff 2005-11-12 20:49 ` Eric Wong 1 sibling, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-11-12 12:07 UTC (permalink / raw) To: Eric Wong; +Cc: git list Eric, I'd actually like to improve the script to handle directory renames and file modes correctly so we don't need to ever call the glacially slow `tla get` -- I don't think it's that much work, all I need is a sample repo. OTOH, if you think (or can convince me) that there are more serious problems ahead, perhaps we can have this as an alternative import mechanism? On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote: > - Correctly parse multi-line summary text in patch-logs Was this broken!? I'm sure I've imported multiline summaries! cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/5] Overhaul of changeset application 2005-11-12 12:07 ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff @ 2005-11-12 20:49 ` Eric Wong 0 siblings, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-12 20:49 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Martin Langhoff <martin.langhoff@gmail.com> wrote: > Eric, > > I'd actually like to improve the script to handle directory renames > and file modes correctly so we don't need to ever call the glacially > slow `tla get` -- I don't think it's that much work, all I need is a > sample repo. OTOH, if you think (or can convince me) that there are > more serious problems ahead, perhaps we can have this as an > alternative import mechanism? tla get is hardly ever called, I don't think it's called any more than before, even. tla replay by far the most common case and still reasonably fast. I had to add the -I flag to rsync because it was going at > 1 patch per second, plenty fast enough for me. I also had an alternate implementation for using the revision library, but that was slower than the current strategy because it had to do two full Arch tree integrity checks for each patch applied. Even on a hot (fully filled) revlib and all I had to do was tla library-find + rsync, it ran more slowly, probably because rsync couldn't take advantage of kernel/fs-level caching when it had to work on a different directory each time. Tracking renamed directories (especially when nested subdirectories are also renamed) is very, very far from pleasant. > On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote: > > - Correctly parse multi-line summary text in patch-logs > > Was this broken!? I'm sure I've imported multiline summaries! It only got the first summary line when I tried it. Also, it's possible for hand-made message bodies to fool archimport if it has "headers" after the first \n\n. IIRC, some old tools copied entire logs of merged changesets into the message body. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-12 9:23 [PATCH] archimport improvements Eric Wong 2005-11-12 9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong @ 2005-11-12 11:54 ` Martin Langhoff 2005-11-12 20:21 ` Eric Wong 2005-11-17 9:26 ` [PATCH] archimport improvements Martin Langhoff 2 siblings, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-11-12 11:54 UTC (permalink / raw) To: Eric Wong; +Cc: git list Eric, On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote: > I'm another Arch-user trying out git. Unfortunately, I encountered > several problems with git-archimport that I needed fixed before my > development trees could be imported into git. Welcome and good stuff! I'll give your patches a try when I sober up. In the meantime, some notes after having read the patches a bit... > Bug Fixes: > > * Support for '--branch'-less Arch version names. > Encoding '/' to '--' (as was previously done) is not 100% reversable > because the "--branch" portion of an fully-qualified Arch version name > is optional (though not many people or Arch-related tools know this). > > * I'm encoding the '/' in the fully-qualified name as ',' to not confuse > other porcelains, but leaving '/' in branch names may be alright > provided porcelains can support them. > > * Identify git branches as an Arch "archive,category<--branch>--version" > Anything less than that is ambiguous as far as history and patch > relationships go. These bug/sanity fixes are _good_. As you mention, I wasn't aware that patchnames could show up not having a --branch part. Tricky... > * Renamed directories containing renamed/moved files inside didn't get > tracked properly. The original code was inadequate for this, and > making it support all rename cases that Arch supports is too much > work. Instead, I maintain full-blown Arch trees in the temp dir and > replay patches + rsync based on that. Performance is slightly slower > than before, but accuracy is more important to me. > > * Permission (execute bit only because of git) tracking as a side effect > of the above. Hmmm. I understand what you are doing, but I'm not sure we'd want to replace the current code with this strategy. Importing large trees with hundreds (thousands) of commits is so slow it is just a no go. Renames are described quite well in the 'commit log', and the current code does handle file renames... > * Tracking changes from branches that are only cherry-picked now works Can you elaborate a bit more on this? > * Pika-escaped filenames unhandled. This seems fixed in the latest > git, but I fixed it more generally and removed the ShellQuote module > dependency along the way. Yes, this got fixed recently. Your change here goes together with the 'tla get' + rsync strategy which I'm not sure about. > * Don't die() when a merge-base can't be found. Arch supports > merging between unrelated trees. Fair enough. Does it result on a good graft in git? > Usability enhancements: > > * Optionally detect merged branches and attempt to import their history, > too. Use the -D <depth> option for this. Specifying a <depth> > greater than 1 is usually not needed unless the tree you're tracking > has had history pruned. > > * Optionally attempt to auto-register unknown Arch archives from > mirrors.sourcecontrol.net to pull their history with the -a (boolean) > switch. Not sure how useful users will find this. Those two are interesting! > * Removed -A <archive> usage (unnecessary in all cases) and made all > Arch calls and output parsing to be compatible with both tla (tested > 1.3.3) and baz (1.4.2). Default is still tla, but the ARCH_CLIENT > environment variable may be changed to baz. That's excellent -- thanks! > Current weaknesses: > > * (Present in the original code as well). > The code still assumes that dates in commit logs can be trusted, which is > fine in most cases, but a wayward branch can screw up git-archimport and > cause parents to be missed. Fair enough. You mention an alternative strategy (tla ancestry) -- have you tried it at all? cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff @ 2005-11-12 20:21 ` Eric Wong 2005-11-14 22:38 ` Martin Langhoff 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-12 20:21 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Martin Langhoff <martin.langhoff@gmail.com> wrote: > Eric, > > > On 11/12/05, Eric Wong <normalperson@yhbt.net> wrote: > > I'm another Arch-user trying out git. Unfortunately, I encountered > > several problems with git-archimport that I needed fixed before my > > development trees could be imported into git. > > Welcome and good stuff! I'll give your patches a try when I sober up. > In the meantime, some notes after having read the patches a bit... > > > Bug Fixes: > > > > * Support for '--branch'-less Arch version names. > > Encoding '/' to '--' (as was previously done) is not 100% reversable > > because the "--branch" portion of an fully-qualified Arch version name > > is optional (though not many people or Arch-related tools know this). > > > > * I'm encoding the '/' in the fully-qualified name as ',' to not confuse > > other porcelains, but leaving '/' in branch names may be alright > > provided porcelains can support them. > > > > * Identify git branches as an Arch "archive,category<--branch>--version" > > Anything less than that is ambiguous as far as history and patch > > relationships go. > > These bug/sanity fixes are _good_. As you mention, I wasn't aware that > patchnames could show up not having a --branch part. Tricky... Thanks. I got lazy one day and started ignoring --branch on some of my personal projects to save my fingers :) > > * Renamed directories containing renamed/moved files inside didn't get > > tracked properly. The original code was inadequate for this, and > > making it support all rename cases that Arch supports is too much > > work. Instead, I maintain full-blown Arch trees in the temp dir and > > replay patches + rsync based on that. Performance is slightly slower > > than before, but accuracy is more important to me. > > > > * Permission (execute bit only because of git) tracking as a side effect > > of the above. > > Hmmm. I understand what you are doing, but I'm not sure we'd want to > replace the current code with this strategy. Importing large trees > with hundreds (thousands) of commits is so slow it is just a no go. > Renames are described quite well in the 'commit log', and the current > code does handle file renames... Untouched files inside renamed directories aren't explicitly tracked. Renamed directories are especially a pain when a renamed one contains sub-directories that are also renamed. > > * Tracking changes from branches that are only cherry-picked now works > > Can you elaborate a bit more on this? Basically, don't die when merge-base fails, look a few lines down. > > * Pika-escaped filenames unhandled. This seems fixed in the latest > > git, but I fixed it more generally and removed the ShellQuote module > > dependency along the way. > > Yes, this got fixed recently. Your change here goes together with the > 'tla get' + rsync strategy which I'm not sure about. > > > * Don't die() when a merge-base can't be found. Arch supports > > merging between unrelated trees. > > Fair enough. Does it result on a good graft in git? Right now I end up with separate branches that are imported (according to git-branch) but the git-log and gitk don't seem to to show relationships between the unrelated trees. I think find_parents() may need to use an alternate strategy instead of warning and skipping if a merge-base can't be found. > > Usability enhancements: > > > > * Optionally detect merged branches and attempt to import their history, > > too. Use the -D <depth> option for this. Specifying a <depth> > > greater than 1 is usually not needed unless the tree you're tracking > > has had history pruned. > > > > * Optionally attempt to auto-register unknown Arch archives from > > mirrors.sourcecontrol.net to pull their history with the -a (boolean) > > switch. Not sure how useful users will find this. > > Those two are interesting! > > > * Removed -A <archive> usage (unnecessary in all cases) and made all > > Arch calls and output parsing to be compatible with both tla (tested > > 1.3.3) and baz (1.4.2). Default is still tla, but the ARCH_CLIENT > > environment variable may be changed to baz. > > That's excellent -- thanks! > > > Current weaknesses: > > > > * (Present in the original code as well). > > The code still assumes that dates in commit logs can be trusted, which is > > fine in most cases, but a wayward branch can screw up git-archimport and > > cause parents to be missed. > > Fair enough. You mention an alternative strategy (tla ancestry) -- > have you tried it at all? No, not yet. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-12 20:21 ` Eric Wong @ 2005-11-14 22:38 ` Martin Langhoff 2005-11-15 8:03 ` Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-11-14 22:38 UTC (permalink / raw) To: Eric Wong; +Cc: git list Eric, thanks for resending those so quickly. I think I'm going to sit on the 'overhaul of changeset application' patch a bit -- I'll test & ack your other patches for merge soonish but I want to review and test this one carefully. My main concern is that it seems to be calling tla get for each revision that it imports. For large trees, this is slow. I would be much happier with a fast Perl-based approach. Have you got a public repo with directory renames? Additional comments follow... On 11/13/05, Eric Wong <normalperson@yhbt.net> wrote: > > > * Identify git branches as an Arch "archive,category<--branch>--version" > > > Anything less than that is ambiguous as far as history and patch > > > relationships go. > > > > These bug/sanity fixes are _good_. As you mention, I wasn't aware that > > patchnames could show up not having a --branch part. Tricky... > > Thanks. I got lazy one day and started ignoring --branch on some of my > personal projects to save my fingers :) Yup, makes sense. My concern now is that existing imports will change the name of branches and tags going forward. Can I ask you to resend that patch with the new branchname mangling as default, and the old one as optional? I know it'll force us to go back to using shellquote, but I am not too worried by that dependency at the moment. > > > Current weaknesses: > > > > > > * (Present in the original code as well). > > > The code still assumes that dates in commit logs can be trusted, which is > > > fine in most cases, but a wayward branch can screw up git-archimport and > > > cause parents to be missed. > > > > Fair enough. You mention an alternative strategy (tla ancestry) -- > > have you tried it at all? > > No, not yet. Also interested in this if you get around to it. cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-14 22:38 ` Martin Langhoff @ 2005-11-15 8:03 ` Eric Wong 2005-11-15 8:05 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-15 8:03 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Martin Langhoff <martin.langhoff@gmail.com> wrote: > Eric, > > thanks for resending those so quickly. I think I'm going to sit on the > 'overhaul of changeset application' patch a bit -- I'll test & ack > your other patches for merge soonish but I want to review and test > this one carefully. > > My main concern is that it seems to be calling tla get for each > revision that it imports. For large trees, this is slow. I would be > much happier with a fast Perl-based approach. Have you got a public > repo with directory renames? Please read my sync_to_ps() function very carefully. Next is a patch that helps you track which Arch command (get/replay/apply-delta) is used for each changeset. tla replay is the most common for any halfway normal (changeset-based) tree by far. tla get is not called any more often than before. apply-delta is hardly, if ever called. It may not even be reachable unless somebody commits revisions to the same tree with clocks out of order from patchlevel order. Heck, if it's ever called, it's most likely faster just to rmtree and tla get again. Unfortunately, my heavily used and abused trees are private. > Additional comments follow... > > On 11/13/05, Eric Wong <normalperson@yhbt.net> wrote: > > > > * Identify git branches as an Arch "archive,category<--branch>--version" > > > > Anything less than that is ambiguous as far as history and patch > > > > relationships go. > > > > > > These bug/sanity fixes are _good_. As you mention, I wasn't aware that > > > patchnames could show up not having a --branch part. Tricky... > > > > Thanks. I got lazy one day and started ignoring --branch on some of my > > personal projects to save my fingers :) > > Yup, makes sense. My concern now is that existing imports will change > the name of branches and tags going forward. Can I ask you to resend > that patch with the new branchname mangling as default, and the old > one as optional? Ok, good idea. My previous patch already automatically converted the private tags, which we actually need to parse, and I see no reason to change that, but branch names and public tags which affect non-gitarchimport users can be preserved with the -o flag. > I know it'll force us to go back to using shellquote, but I am not too > worried by that dependency at the moment. Actually, usage of shell_quote() in git-archimport was always unnecessary. Passing arguments to external programs as an array, using the 3-argument version of open() for files, and using -z in git-commands with pipes are better ways to go. > > > > Current weaknesses: > > > > > > > > * (Present in the original code as well). > > > > The code still assumes that dates in commit logs can be trusted, which is > > > > fine in most cases, but a wayward branch can screw up git-archimport and > > > > cause parents to be missed. > > > > > > Fair enough. You mention an alternative strategy (tla ancestry) -- > > > have you tried it at all? > > > > No, not yet. > > Also interested in this if you get around to it. It's not a high priority for me and I probably don't have time to do this. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/2] archimport: allow for old style branch and public tag names 2005-11-15 8:03 ` Eric Wong @ 2005-11-15 8:05 ` Eric Wong 2005-11-15 8:06 ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong 2005-11-15 8:07 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 0 siblings, 2 replies; 39+ messages in thread From: Eric Wong @ 2005-11-15 8:05 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list This patch adds the -o switch, which lets old trees tracked by git-archmirror continue working with their old branch and tag names to make life easier for people tracking your tree. Private tags that are only used internally by git-archimport continue to be new-style, and automatically converted upon first run. Signed-off-by:: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 22 +++++++++++++++++----- 1 files changed, 17 insertions(+), 5 deletions(-) applies-to: 44d831812786f4dfbf54a67b51e5f48c7d5afd66 4b341dd903883db0a89fe2f04e93dab053beb045 diff --git a/git-archimport.perl b/git-archimport.perl index 1f721f6..304d462 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -67,12 +67,12 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; my $ptag_dir = "$git_dir/archimport/tags"; -our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a); +our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o); sub usage() { print STDERR <<END; Usage: ${\basename $0} # fetch/update GIT from Arch - [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] + [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] repository/arch-branch [ repository/arch-branch] ... END exit(1); @@ -267,7 +267,15 @@ sub tree_dirname { return $name; } -*git_branchname = *tree_dirname; +# old versions of git-archimport just use the <category--branch> part: +sub old_style_branchname { + my $id = shift; + my $ret = safe_pipe_capture($TLA,'parse-package-name','-p',$id); + chomp $ret; + return $ret; +} + +*git_branchname = $opt_o ? *old_style_branchname : *tree_dirname; # process patchsets in ancestry order foreach my $ps (@psets) { @@ -527,8 +535,12 @@ sub parselog { sub tag { my ($tag, $commit) = @_; - # don't use subdirs for tags yet, it could screw up other porcelains - $tag =~ s|/|,|; + if ($opt_o) { + $tag =~ s|/|--|g; + } else { + # don't use subdirs for tags yet, it could screw up other porcelains + $tag =~ s|/|,|g; + } if ($commit) { open(C,">","$git_dir/refs/tags/$tag") --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods 2005-11-15 8:05 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong @ 2005-11-15 8:06 ` Eric Wong 2005-11-15 8:07 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 1 sibling, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-15 8:06 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list This patch adds debug messages (enabled with the usual -v switch) for tracking how often each tla command is called. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) applies-to: 8a7cc429d0fd935805851ac5ac10941d0bd86e94 b4de7920e0116afb35016435131a404658818ced diff --git a/git-archimport.perl b/git-archimport.perl index b624ba6..1f721f6 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -429,18 +429,23 @@ sub sync_to_ps { my $ps = shift; my $tree_dir = $tmptree.'/'.tree_dirname($ps->{id}); + $opt_v && print "sync_to_ps($ps->{id}) method: "; + if (-d $tree_dir) { if ($ps->{type} eq 't') { # looks like a tag-only or (worse,) a mixed tags/changeset branch, # can't rely on replay to work correctly on these rmtree($tree_dir); + $opt_v && print "get (tag)\n"; safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); } else { my $tree_id = arch_tree_id($tree_dir); if ($ps->{parent_id} eq $tree_id) { # the common case (hopefully) + $opt_v && print "replay\n"; safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id}); } else { + $opt_v && print "apply-delta\n"; # this can happen if branches cherry-pick safe_pipe_capture($TLA,'apply-delta','-d',$tree_dir, $tree_id, $ps->{id}); @@ -448,6 +453,7 @@ sub sync_to_ps { } } else { # new branch work + $opt_v && print "get (new tree)\n"; safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); } --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 1/2] archimport: allow for old style branch and public tag names 2005-11-15 8:05 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 2005-11-15 8:06 ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong @ 2005-11-15 8:07 ` Eric Wong 1 sibling, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-15 8:07 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Oops, I sent these two patches out of order. They should apply fine without conflicts either way. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-12 9:23 [PATCH] archimport improvements Eric Wong 2005-11-12 9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong 2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff @ 2005-11-17 9:26 ` Martin Langhoff 2005-11-24 7:46 ` Eric Wong 2 siblings, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-11-17 9:26 UTC (permalink / raw) To: Eric Wong; +Cc: git list Eric, I've merged and pushed out to http://locke.catalyst.net.nz/git/git-martinlanghoff.git/#tojunio [PATCH 1/5] remove shellquote usage for tags [PATCH 2/5] archimport: don't die on merge-base failure [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git [PATCH 1/2] archimport: allow for old style branch and public tag names That last one had a small edit to rebase it to the top of the head -- will probably have a small conflict for you on the usage line and getopts() line. What is pending is... * [PATCH 4/5] Overhaul of changeset application I am testing it right now. Finding it rather slow on an idle linux workstation with fast IDE disks, no X.org loaded and 1GB or RAM. iowait is pegged at 90%. Wonder what will happen on a system with slow disk access. tla/baz are unusable under any OS where the fs stack is not _that_ polished (OSX and friends). The early versions of the import also used $TLA for all ops, and I was forced to change it to get my repos transformed in a reasonable time. Can you send me a patch that makes it optional, so users can choose fast or correct? I don't want to force glacial imports on anyone, specially me. Testing an import of a reasonably sized repo must be a quick operation or I won't do it ;-) And I do work on OSX too. On the other hand, I might just implement renamed directories tracking separately, specially if someone can point me to a public repo with some interesting cases of renamed directories. These patches seem to hang from 4/5 so will need rebasing after a reworked. The first one seems to be 3 or 4 patches in one. It'd be good to break it up. * [ PATCH 5/5] -D <depth> option to recurse into merged branches * Re: [PATCH 5/5] -D <depth> option to recurse into merged branches * [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods If you want to see the repos I'm testing with, register arch-eduforge@catalyst.net.nz--2004 http://nzvle.eduforge.org/arch-mirror/ and try: ~/local/git/git-archimport.perl -v \ arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle \ arch-eduforge@catalyst.net.nz--2004/moodle--local \ arch-eduforge@catalyst.net.nz--2004/moodle--local-forum-types \ arch-eduforge@catalyst.net.nz--2004/moodle--local-lock-content \ arch-eduforge@catalyst.net.nz--2004/moodle--nmit \ arch-eduforge@catalyst.net.nz--2004/moodle--topnz cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-17 9:26 ` [PATCH] archimport improvements Martin Langhoff @ 2005-11-24 7:46 ` Eric Wong 2005-11-24 7:47 ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong 2005-11-24 9:25 ` [PATCH] archimport improvements Martin Langhoff 0 siblings, 2 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:46 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list Martin Langhoff <martin.langhoff@gmail.com> wrote: > Eric, > > I've merged and pushed out to > http://locke.catalyst.net.nz/git/git-martinlanghoff.git/#tojunio > > [PATCH 1/5] remove shellquote usage for tags > [PATCH 2/5] archimport: don't die on merge-base failure > [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git > [PATCH 1/2] archimport: allow for old style branch and public tag names > > That last one had a small edit to rebase it to the top of the head -- > will probably have a small conflict for you on the usage line and > getopts() line. > > What is pending is... > > * [PATCH 4/5] Overhaul of changeset application > > I am testing it right now. Finding it rather slow on an idle linux > workstation with fast IDE disks, no X.org loaded and 1GB or RAM. > iowait is pegged at 90%. Wonder what will happen on a system with slow > disk access. tla/baz are unusable under any OS where the fs stack is > not _that_ polished (OSX and friends). Ok, I didn't expect you guys to have 12k of files in your trees. None of your source trees are remotely close to that size (but I have many more changesets). I'm surprised you guys were able to put up with Arch in the first place! 125m58.431s with my method. 8m24.504s with yours :) All of my usual source trees imported 1k changesets in 10-15 minutes > The early versions of the import also used $TLA for all ops, and I was > forced to change it to get my repos transformed in a reasonable time. > > Can you send me a patch that makes it optional, so users can choose > fast or correct? I don't want to force glacial imports on anyone, > specially me. Testing an import of a reasonably sized repo must be a > quick operation or I won't do it ;-) And I do work on OSX too. Patches on the way. OTOH, the time spent importing the bulk of the history is a one-time operation for most people and I'd much rather it get things as right as possible and move on. > On the other hand, I might just implement renamed directories tracking > separately, specially if someone can point me to a public repo with > some interesting cases of renamed directories. IIRC, there are several nasty cases all of which are ordering-related, especially with regard to nested directories or file renames inside directories that are also renamed. It should be noted that not even tla gets all the possible directory rename cases right (baz seems better from my observations). > These patches seem to hang from 4/5 so will need rebasing after a > reworked. The first one seems to be 3 or 4 patches in one. It'd be > good to break it up. Sorry, I rushed through the initial overhaul and didn't generate neat patches because I wanted to get some of my work moved to git ASAP. > * [ PATCH 5/5] -D <depth> option to recurse into merged branches > * Re: [PATCH 5/5] -D <depth> option to recurse into merged branches > * [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/9] archimport: first, make sure it still compiles 2005-11-24 7:46 ` Eric Wong @ 2005-11-24 7:47 ` Eric Wong 2005-11-24 7:48 ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong 2005-11-24 18:54 ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds 2005-11-24 9:25 ` [PATCH] archimport improvements Martin Langhoff 1 sibling, 2 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:47 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) applies-to: a17c1f442587b9c7d68b4f7e08c5f6786599c61e 119b07aa2bdb23d5f4977c4d696dd5e7eea56ca6 diff --git a/git-archimport.perl b/git-archimport.perl index c3bed08..b5f8a2c 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -99,6 +99,7 @@ my %psets = (); # the co my %rptags = (); # my reverse private tags # to map a SHA1 to a commitid +my $TLA = $ENV{'ARCH_CLIENT'} || 'tla'; foreach my $root (@arch_roots) { my ($arepo, $abranch) = split(m!/!, $root); @@ -850,3 +851,18 @@ sub commitid2pset { || (print Dumper(sort keys %psets)) && die "Cannot find patchset for $name"; return $ps; } + +# an alterative to `command` that allows input to be passed as an array +# to work around shell problems with weird characters in arguments +sub safe_pipe_capture { + my @output; + if (my $pid = open my $child, '-|') { + @output = (<$child>); + close $child or die join(' ',@_).": $! $?"; + } else { + exec(@_) or die $?; # exec() can fail the executable can't be found + } + return wantarray ? @output : join('',@output); +} + + --- 0.99.9.GIT -- Eric Wong ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/9] remove String::ShellQuote dependency. 2005-11-24 7:47 ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong @ 2005-11-24 7:48 ` Eric Wong 2005-11-24 7:50 ` [PATCH 3/9] fix -t tmpdir switch Eric Wong 2005-11-24 18:54 ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds 1 sibling, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:48 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff use safe_pipe_capture() or system() over backticks where shellquoting may have been necessary. More changes planned, so I'm not touching the parts I'm planning on replacing entirely. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 51 ++++++++++++++++++++++++++++----------------------- 1 files changed, 28 insertions(+), 23 deletions(-) applies-to: 83307766d30e928179b9aa85a3d7bb906cc08846 80494a7d496ab9f6e0a76a60b1f0b4215fdff442 diff --git a/git-archimport.perl b/git-archimport.perl index b5f8a2c..b7e2480 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -55,9 +55,8 @@ use warnings; use Getopt::Std; use File::Spec; use File::Temp qw(tempfile tempdir); -use File::Path qw(mkpath); +use File::Path qw(mkpath rmtree); use File::Basename qw(basename dirname); -use String::ShellQuote; use Time::Local; use IO::Socket; use IO::Pipe; @@ -306,7 +305,7 @@ foreach my $ps (@psets) { unless ($import) { # skip for import if ( -e "$git_dir/refs/heads/$ps->{branch}") { # we know about this branch - `git checkout $ps->{branch}`; + system('git-checkout',$ps->{branch}); } else { # new branch! we need to verify a few things die "Branch on a non-tag!" unless $ps->{type} eq 't'; @@ -315,7 +314,7 @@ foreach my $ps (@psets) { unless $branchpoint; # find where we are supposed to branch from - `git checkout -b $ps->{branch} $branchpoint`; + system('git-checkout','-b',$ps->{branch},$branchpoint); # If we trust Arch with the fact that this is just # a tag, and it does not affect the state of the tree @@ -344,7 +343,7 @@ foreach my $ps (@psets) { # my $tree; - my $commitlog = `tla cat-archive-log -A $ps->{repo} $ps->{id}`; + my $commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); die "Error in cat-archive-log: $!" if $?; # parselog will git-add/rm files @@ -422,7 +421,7 @@ foreach my $ps (@psets) { # my @par; if ( -e "$git_dir/refs/heads/$ps->{branch}") { - if (open HEAD, "<$git_dir/refs/heads/$ps->{branch}") { + if (open HEAD, "<","$git_dir/refs/heads/$ps->{branch}") { my $p = <HEAD>; close HEAD; chomp $p; @@ -437,7 +436,6 @@ foreach my $ps (@psets) { if ($ps->{merges}) { push @par, find_parents($ps); } - my $par = join (' ', @par); # # Commit, tag and clean state @@ -454,7 +452,7 @@ foreach my $ps (@psets) { $commit_rh = 'commit_rh'; $commit_wh = 'commit_wh'; - $pid = open2(*READER, *WRITER, "git-commit-tree $tree $par") + $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) or die $!; print WRITER $logmessage; # write close WRITER; @@ -469,7 +467,7 @@ foreach my $ps (@psets) { # # Update the branch # - open HEAD, ">$git_dir/refs/heads/$ps->{branch}"; + open HEAD, ">","$git_dir/refs/heads/$ps->{branch}"; print HEAD $commitid; close HEAD; system('git-update-ref', 'HEAD', "$ps->{branch}"); @@ -483,21 +481,23 @@ foreach my $ps (@psets) { print " + tree $tree\n"; print " + commit $commitid\n"; $opt_v && print " + commit date is $ps->{date} \n"; - $opt_v && print " + parents: $par \n"; + $opt_v && print " + parents: ",join(' ',@par),"\n"; } sub apply_import { my $ps = shift; my $bname = git_branchname($ps->{id}); - `mkdir -p $tmp`; + mkpath($tmp); - `tla get -s --no-pristine -A $ps->{repo} $ps->{id} $tmp/import`; + safe_pipe_capture($TLA,'get','-s','--no-pristine',$ps->{id},"$tmp/import"); die "Cannot get import: $!" if $?; - `rsync -v --archive --delete --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/import/* ./`; + system('rsync','-aI','--delete', '--exclude',$git_dir, + '--exclude','.arch-ids','--exclude','{arch}', + "$tmp/import/", './'); die "Cannot rsync import:$!" if $?; - `rm -fr $tmp/import`; + rmtree("$tmp/import"); die "Cannot remove tempdir: $!" if $?; @@ -507,10 +507,10 @@ sub apply_import { sub apply_cset { my $ps = shift; - `mkdir -p $tmp`; + mkpath($tmp); # get the changeset - `tla get-changeset -A $ps->{repo} $ps->{id} $tmp/changeset`; + safe_pipe_capture($TLA,'get-changeset',$ps->{id},"$tmp/changeset"); die "Cannot get changeset: $!" if $?; # apply patches @@ -534,17 +534,20 @@ sub apply_cset { $orig =~ s/\.modified$//; # lazy $orig =~ s!^\Q$tmp\E/changeset/patches/!!; #print "rsync -p '$mod' '$orig'"; - `rsync -p $mod ./$orig`; + system('rsync','-p',$mod,"./$orig"); die "Problem applying binary changes! $!" if $?; } } # bring in new files - `rsync --archive --exclude '$git_dir' --exclude '.arch-ids' --exclude '{arch}' $tmp/changeset/new-files-archive/* ./`; + system('rsync','-aI','--exclude',$git_dir, + '--exclude','.arch-ids', + '--exclude', '{arch}', + "$tmp/changeset/new-files-archive/",'./'); # deleted files are hinted from the commitlog processing - `rm -fr $tmp/changeset`; + rmtree("$tmp/changeset"); } @@ -622,9 +625,9 @@ sub parselog { # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why? # we can assume that any filename with \ indicates some pika escaping that we want to get rid of. if ($t =~ /\\/ ){ - $t = `tla escape --unescaped '$t'`; + $t = (safe_pipe_capture($TLA,'escape','--unescaped',$t))[0]; } - push (@tmp, shell_quote($t)); + push (@tmp, $t); } @$ref = @tmp; } @@ -827,8 +830,10 @@ sub find_parents { } } } - @parents = keys %parents; - @parents = map { " -p " . ptag($_) } @parents; + @parents = (); + foreach (keys %parents) { + push @parents, '-p', ptag($_); + } return @parents; } --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 3/9] fix -t tmpdir switch 2005-11-24 7:48 ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong @ 2005-11-24 7:50 ` Eric Wong 2005-11-24 7:51 ` [PATCH 4/9] remove git wrapper dependency Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:50 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff set TMPDIR env correctly if -t <tmpdir> is passed from the command-line. setting TMPDIR => 1 as an argument to tempdir() has no effect otherwise Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) applies-to: 015fcfae8cdd564f0993940c5bac303c41913b1a 25aadaf3ebc18fcc3c7948dc831d3f93447b03b6 diff --git a/git-archimport.perl b/git-archimport.perl index b7e2480..2ed2e3c 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -88,9 +88,8 @@ usage if $opt_h; @ARGV >= 1 or usage(); my @arch_roots = @ARGV; -my ($tmpdir, $tmpdirname) = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); -my $tmp = $opt_t || 1; -$tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); +$ENV{'TMPDIR'} = $opt_t if $opt_t; # $ENV{TMPDIR} will affect tempdir() calls: +my $tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); $opt_v && print "+ Using $tmp as temporary directory\n"; my @psets = (); # the collection --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 4/9] remove git wrapper dependency 2005-11-24 7:50 ` [PATCH 3/9] fix -t tmpdir switch Eric Wong @ 2005-11-24 7:51 ` Eric Wong 2005-11-24 7:52 ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong 2005-11-24 8:20 ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson 0 siblings, 2 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:51 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff use git-diff-files instead of git diff-files so we don't rely on the wrapper being installed (some people may have git as GNU interactive tools :) Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) applies-to: 36c4ad6c2442e7700e5f1435d834f86d2680dd7f 40b5378433aa6b09dd358d482d7713c2db6a5d92 diff --git a/git-archimport.perl b/git-archimport.perl index 2ed2e3c..938fa2b 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -278,7 +278,7 @@ foreach my $ps (@psets) { # # ensure we have a clean state # - if (`git diff-files`) { + if (`git-diff-files`) { die "Unclean tree when about to process $ps->{id} " . " - did we fail to commit cleanly before?"; } --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 5/9] add -D <depth> and -a switch 2005-11-24 7:51 ` [PATCH 4/9] remove git wrapper dependency Eric Wong @ 2005-11-24 7:52 ` Eric Wong 2005-11-24 7:53 ` [PATCH 6/9] safer log file parsing Eric Wong 2005-11-24 8:20 ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson 1 sibling, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:52 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff add -D <depth> option to abrowse add -a switch to attempt to auto-register archives at mirrors.sourcecontrol.net Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 227 +++++++++++++++++++++++++++++++-------------------- 1 files changed, 136 insertions(+), 91 deletions(-) applies-to: 4fd19922030196431179453910cb2b850b54c17d c52baf62ec29772f7a4e5b0aa2b605bf0f6aa8f7 diff --git a/git-archimport.perl b/git-archimport.perl index 938fa2b..3968740 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -53,14 +53,9 @@ and can contain multiple, unrelated bran use strict; use warnings; use Getopt::Std; -use File::Spec; -use File::Temp qw(tempfile tempdir); +use File::Temp qw(tempdir); use File::Path qw(mkpath rmtree); use File::Basename qw(basename dirname); -use Time::Local; -use IO::Socket; -use IO::Pipe; -use POSIX qw(strftime dup2); use Data::Dumper qw/ Dumper /; use IPC::Open2; @@ -71,27 +66,33 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; my $ptag_dir = "$git_dir/archimport/tags"; -our($opt_h,$opt_v, $opt_T,$opt_t,$opt_o); +our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o); sub usage() { print STDERR <<END; Usage: ${\basename $0} # fetch/update GIT from Arch - [ -o ] [ -h ] [ -v ] [ -T ] [ -t tempdir ] + [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] repository/arch-branch [ repository/arch-branch] ... END exit(1); } -getopts("Thvt:") or usage(); +getopts("Thvat:D:") or usage(); usage if $opt_h; @ARGV >= 1 or usage(); -my @arch_roots = @ARGV; +# $arch_branches: +# values associated with keys: +# =1 - Arch version / git 'branch' detected via abrowse on a limit +# >1 - Arch version / git 'branch' of an auxilliary branch we've merged +my %arch_branches = map { $_ => 1 } @ARGV; $ENV{'TMPDIR'} = $opt_t if $opt_t; # $ENV{TMPDIR} will affect tempdir() calls: my $tmp = tempdir('git-archimport-XXXXXX', TMPDIR => 1, CLEANUP => 1); $opt_v && print "+ Using $tmp as temporary directory\n"; +my %reachable = (); # Arch repositories we can access +my %unreachable = (); # Arch repositories we can't access :< my @psets = (); # the collection my %psets = (); # the collection, by name @@ -99,96 +100,112 @@ my %rptags = (); # my rev # to map a SHA1 to a commitid my $TLA = $ENV{'ARCH_CLIENT'} || 'tla'; -foreach my $root (@arch_roots) { - my ($arepo, $abranch) = split(m!/!, $root); - open ABROWSE, "tla abrowse -f -A $arepo --desc --merges $abranch |" - or die "Problems with tla abrowse: $!"; - - my %ps = (); # the current one - my $mode = ''; - my $lastseen = ''; - - while (<ABROWSE>) { - chomp; +sub do_abrowse { + my $stage = shift; + while (my ($limit, $level) = each %arch_branches) { + next unless $level == $stage; - # first record padded w 8 spaces - if (s/^\s{8}\b//) { - - # store the record we just captured - if (%ps) { - my %temp = %ps; # break references - push (@psets, \%temp); - $psets{$temp{id}} = \%temp; - %ps = (); - } - - my ($id, $type) = split(m/\s{3}/, $_); - $ps{id} = $id; - $ps{repo} = $arepo; - - # deal with types - if ($type =~ m/^\(simple changeset\)/) { - $ps{type} = 's'; - } elsif ($type eq '(initial import)') { - $ps{type} = 'i'; - } elsif ($type =~ m/^\(tag revision of (.+)\)/) { - $ps{type} = 't'; - $ps{tag} = $1; - } else { - warn "Unknown type $type"; - } - $lastseen = 'id'; - } - - if (s/^\s{10}//) { - # 10 leading spaces or more - # indicate commit metadata + open ABROWSE, "$TLA abrowse -fkD --merges $limit |" + or die "Problems with tla abrowse: $!"; + + my %ps = (); # the current one + my $lastseen = ''; + + while (<ABROWSE>) { + chomp; - # date & author - if ($lastseen eq 'id' && m/^\d{4}-\d{2}-\d{2}/) { + # first record padded w 8 spaces + if (s/^\s{8}\b//) { + my ($id, $type) = split(m/\s+/, $_, 2); + + my %last_ps; + # store the record we just captured + if (%ps && !exists $psets{ $ps{id} }) { + %last_ps = %ps; # break references + push (@psets, \%last_ps); + $psets{ $last_ps{id} } = \%last_ps; + } - my ($date, $authoremail) = split(m/\s{2,}/, $_); - $ps{date} = $date; - $ps{date} =~ s/\bGMT$//; # strip off trailign GMT - if ($ps{date} =~ m/\b\w+$/) { - warn 'Arch dates not in GMT?! - imported dates will be wrong'; + my $branch = extract_versionname($id); + %ps = ( id => $id, branch => $branch ); + if (%last_ps && ($last_ps{branch} eq $branch)) { + $ps{parent_id} = $last_ps{id}; + } + + $arch_branches{$branch} = 1; + $lastseen = 'id'; + + # deal with types (should work with baz or tla): + if ($type =~ m/\(.*changeset\)/) { + $ps{type} = 's'; + } elsif ($type =~ /\(.*import\)/) { + $ps{type} = 'i'; + } elsif ($type =~ m/\(tag.*\)/) { + $ps{type} = 't'; + # read which revision we've tagged when we parse the log + #$ps{tag} = $1; + } else { + warn "Unknown type $type"; + } + + $arch_branches{$branch} = 1; + $lastseen = 'id'; + } elsif (s/^\s{10}//) { + # 10 leading spaces or more + # indicate commit metadata + + # date + if ($lastseen eq 'id' && m/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d)/){ + $ps{date} = $1; + $lastseen = 'date'; + } elsif ($_ eq 'merges in:') { + $ps{merges} = []; + $lastseen = 'merges'; + } elsif ($lastseen eq 'merges' && s/^\s{2}//) { + my $id = $_; + push (@{$ps{merges}}, $id); + + # aggressive branch finding: + if ($opt_D) { + my $branch = extract_versionname($id); + my $repo = extract_reponame($branch); + + if (archive_reachable($repo) && + !defined $arch_branches{$branch}) { + $arch_branches{$branch} = $stage + 1; + } + } + } else { + warn "more metadata after merges!?: $_\n" unless /^\s*$/; } - - $authoremail =~ m/^(.+)\s(\S+)$/; - $ps{author} = $1; - $ps{email} = $2; - - $lastseen = 'date'; - - } elsif ($lastseen eq 'date') { - # the only hint is position - # subject is after date - $ps{subj} = $_; - $lastseen = 'subj'; - - } elsif ($lastseen eq 'subj' && $_ eq 'merges in:') { - $ps{merges} = []; - $lastseen = 'merges'; - - } elsif ($lastseen eq 'merges' && s/^\s{2}//) { - push (@{$ps{merges}}, $_); - } else { - warn 'more metadata after merges!?'; } - } - } - if (%ps) { - my %temp = %ps; # break references - push (@psets, \%temp); - $psets{ $temp{id} } = \%temp; - %ps = (); - } - close ABROWSE; + if (%ps && !exists $psets{ $ps{id} }) { + my %temp = %ps; # break references + if (@psets && $psets[$#psets]{branch} eq $ps{branch}) { + $temp{parent_id} = $psets[$#psets]{id}; + } + push (@psets, \%temp); + $psets{ $temp{id} } = \%temp; + } + + close ABROWSE or die "$TLA abrowse failed on $limit\n"; + } } # end foreach $root +do_abrowse(1); +my $depth = 2; +$opt_D ||= 0; +while ($depth <= $opt_D) { + do_abrowse($depth); + $depth++; +} + ## Order patches by time +# FIXME see if we can find a more optimal way to do this by graphing +# the ancestry data and walking it, that way we won't have to rely on +# client-supplied dates @psets = sort {$a->{date}.$b->{id} cmp $b->{date}.$b->{id}} @psets; #print Dumper \@psets; @@ -209,7 +226,7 @@ unless (-d $git_dir) { # initial import } } else { # progressing an import # load the rptags - opendir(DIR, "$git_dir/archimport/tags") + opendir(DIR, $ptag_dir) || die "can't opendir: $!"; while (my $file = readdir(DIR)) { # skip non-interesting-files @@ -829,6 +846,7 @@ sub find_parents { } } } + @parents = (); foreach (keys %parents) { push @parents, '-p', ptag($_); @@ -856,6 +874,7 @@ sub commitid2pset { return $ps; } + # an alterative to `command` that allows input to be passed as an array # to work around shell problems with weird characters in arguments sub safe_pipe_capture { @@ -869,4 +888,30 @@ sub safe_pipe_capture { return wantarray ? @output : join('',@output); } +# `tla logs -rf -d <dir> | head -n1` or `baz tree-id <dir>` +sub arch_tree_id { + my $dir = shift; + chomp( my $ret = (safe_pipe_capture($TLA,'logs','-rf','-d',$dir))[0] ); + return $ret; +} + +sub archive_reachable { + my $archive = shift; + return 1 if $reachable{$archive}; + return 0 if $unreachable{$archive}; + + if (system "$TLA whereis-archive $archive >/dev/null") { + if ($opt_a && (system($TLA,'register-archive', + "http://mirrors.sourcecontrol.net/$archive") == 0)) { + $reachable{$archive} = 1; + return 1; + } + print STDERR "Archive is unreachable: $archive\n"; + $unreachable{$archive} = 1; + return 0; + } else { + $reachable{$archive} = 1; + return 1; + } +} --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 6/9] safer log file parsing 2005-11-24 7:52 ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong @ 2005-11-24 7:53 ` Eric Wong 2005-11-24 7:55 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:53 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff Better logfile parsing, no longer confused by 'headers' after the first blank line. Re-enabled tag-reading with abrowse (baz and tla compatible) Remove need to quote args to external processes Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 211 +++++++++++++++++++++++++++------------------------ 1 files changed, 112 insertions(+), 99 deletions(-) applies-to: 1633bcf09400e93aca2eb335181db298a5f49350 3e12af1d958e2d631e27a2f696ca71f83094c7c3 diff --git a/git-archimport.perl b/git-archimport.perl index 3968740..8676f35 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -140,10 +140,10 @@ sub do_abrowse { $ps{type} = 's'; } elsif ($type =~ /\(.*import\)/) { $ps{type} = 'i'; - } elsif ($type =~ m/\(tag.*\)/) { + } elsif ($type =~ m/\(tag.*?(\S+\@\S+).*?\)/) { $ps{type} = 't'; # read which revision we've tagged when we parse the log - #$ps{tag} = $1; + $ps{tag} = $1; } else { warn "Unknown type $type"; } @@ -359,78 +359,73 @@ foreach my $ps (@psets) { # my $tree; - my $commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); + my @commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); die "Error in cat-archive-log: $!" if $?; - # parselog will git-add/rm files - # and generally prepare things for the commit - # NOTE: parselog will shell-quote filenames! - my ($sum, $msg, $add, $del, $mod, $ren) = parselog($commitlog); - my $logmessage = "$sum\n$msg"; - + parselog($ps,\@commitlog); # imports don't give us good info # on added files. Shame on them - if ($ps->{type} eq 'i' || $ps->{type} eq 't') { - `find . -type f -print0 | grep -zv '^./$git_dir' | xargs -0 -l100 git-update-index --add`; - `git-ls-files --deleted -z | xargs --no-run-if-empty -0 -l100 git-update-index --remove`; + if ($ps->{type} eq 'i' || $ps->{type} eq 't') { + system('git-ls-files --others -z | '. + 'git-update-index --add -z --stdin') == 0 or die "$! $?\n"; + system('git-ls-files --deleted -z | '. + 'git-update-index --remove -z --stdin') == 0 or die "$! $?\n"; } - if (@$add) { + # TODO: handle removed_directories and renamed_directories: + + if (my $add = $ps->{new_files}) { while (@$add) { my @slice = splice(@$add, 0, 100); - my $slice = join(' ', @slice); - `git-update-index --add $slice`; - die "Error in git-update-index --add: $!" if $?; + system('git-update-index','--add','--',@slice) == 0 or + die "Error in git-update-index --add: $! $?\n"; } } - if (@$del) { - foreach my $file (@$del) { - unlink $file or die "Problems deleting $file : $!"; - } + + if (my $del = $ps->{removed_files}) { + unlink @$del; while (@$del) { my @slice = splice(@$del, 0, 100); - my $slice = join(' ', @slice); - `git-update-index --remove $slice`; - die "Error in git-update-index --remove: $!" if $?; + system('git-update-index','--remove','--',@slice) == 0 or + die "Error in git-update-index --remove: $! $?\n"; } } - if (@$ren) { # renamed + + if (my $ren = $ps->{renamed_files}) { # renamed if (@$ren % 2) { die "Odd number of entries in rename!?"; } - ; + while (@$ren) { - my $from = pop @$ren; - my $to = pop @$ren; + my $from = shift @$ren; + my $to = shift @$ren; unless (-d dirname($to)) { mkpath(dirname($to)); # will die on err } - #print "moving $from $to"; - `mv $from $to`; - die "Error renaming $from $to : $!" if $?; - `git-update-index --remove $from`; - die "Error in git-update-index --remove: $!" if $?; - `git-update-index --add $to`; - die "Error in git-update-index --add: $!" if $?; + print "moving $from $to"; + rename($from, $to) or die "Error renaming '$from' '$to': $!\n"; + system('git-update-index','--remove','--',$from) == 0 or + die "Error in git-update-index --remove: $! $?\n"; + system('git-update-index','--add','--',$to) == 0 or + die "Error in git-update-index --add: $! $?\n"; } } - if (@$mod) { # must be _after_ renames + + if (my $mod = $ps->{modified_files}) { while (@$mod) { my @slice = splice(@$mod, 0, 100); - my $slice = join(' ', @slice); - `git-update-index $slice`; - die "Error in git-update-index: $!" if $?; + system('git-update-index','--',@slice) == 0 or + die "Error in git-update-index: $! $?\n"; } } - + # warn "errors when running git-update-index! $!"; $tree = `git-write-tree`; die "cannot write tree $!" if $?; chomp $tree; - # # Who's your daddy? @@ -464,13 +459,14 @@ foreach my $ps (@psets) { $ENV{GIT_COMMITTER_EMAIL} = $ps->{email}; $ENV{GIT_COMMITTER_DATE} = $ps->{date}; - my ($pid, $commit_rh, $commit_wh); - $commit_rh = 'commit_rh'; - $commit_wh = 'commit_wh'; - - $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) + my $pid = open2(*READER, *WRITER,'git-commit-tree',$tree,@par) or die $!; - print WRITER $logmessage; # write + print WRITER $ps->{summary},"\n"; + print WRITER $ps->{message},"\n"; + + # make it easy to backtrack and figure out which Arch revision this was: + print WRITER 'git-archimport-id: ',$ps->{id},"\n"; + close WRITER; my $commitid = <READER>; # read chomp $commitid; @@ -568,7 +564,9 @@ sub apply_cset { # =for reference -# A log entry looks like +# notes: *-files/-directories keys cannot have spaces, they're always +# pika-escaped. Everything after the first newline +# A log entry looks like: # Revision: moodle-org--moodle--1.3.3--patch-15 # Archive: arch-eduforge@catalyst.net.nz--2004 # Creator: Penny Leach <penny@catalyst.net.nz> @@ -586,70 +584,85 @@ sub apply_cset { # admin/editor.html backup/lib.php backup/restore.php # New-patches: arch-eduforge@catalyst.net.nz--2004/moodle-org--moodle--1.3.3--patch-15 # Summary: Updating to latest from MOODLE_14_STABLE (1.4.5+) +# summary can be multiline with a leading space just like the above fields # Keywords: # # Updating yadda tadda tadda madda sub parselog { - my $log = shift; - #print $log; - - my (@add, @del, @mod, @ren, @kw, $sum, $msg ); - - if ($log =~ m/(?:\n|^)New-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @add = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Removed-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @del = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Modified-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @mod = split(m/\s+/s, $files); - } - - if ($log =~ m/(?:\n|^)Renamed-files:(.*?)(?=\n\w)/s ) { - my $files = $1; - @ren = split(m/\s+/s, $files); - } - - $sum =''; - if ($log =~ m/^Summary:(.+?)$/m ) { - $sum = $1; - $sum =~ s/^\s+//; - $sum =~ s/\s+$//; - } + my ($ps, $log) = @_; + my $key = undef; - $msg = ''; - if ($log =~ m/\n\n(.+)$/s) { - $msg = $1; - $msg =~ s/^\s+//; - $msg =~ s/\s+$//; + # headers we want that contain filenames: + my %want_headers = ( + new_files => 1, + modified_files => 1, + renamed_files => 1, + renamed_directories => 1, + removed_files => 1, + removed_directories => 1, + ); + + chomp (@$log); + while ($_ = shift @$log) { + if (/^Continuation-of:\s*(.*)/) { + $ps->{tag} = $1; + $key = undef; + } elsif (/^Summary:\s*(.*)$/ ) { + # summary can be multiline as long as it has a leading space + $ps->{summary} = [ $1 ]; + $key = 'summary'; + } elsif (/^Creator: (.*)\s*<([^\>]+)>/) { + $ps->{author} = $1; + $ps->{email} = $2; + $key = undef; + # any *-files or *-directories can be read here: + } elsif (/^([A-Z][a-z\-]+):\s*(.*)$/) { + my $val = $2; + $key = lc $1; + $key =~ tr/-/_/; # too lazy to quote :P + if ($want_headers{$key}) { + push @{$ps->{$key}}, split(/\s+/, $val); + } else { + $key = undef; + } + } elsif (/^$/) { + last; # remainder of @$log that didn't get shifted off is message + } elsif ($key) { + if (/^\s+(.*)$/) { + if ($key eq 'summary') { + push @{$ps->{$key}}, $1; + } else { # files/directories: + push @{$ps->{$key}}, split(/\s+/, $1); + } + } else { + $key = undef; + } + } } - - - # cleanup the arrays - foreach my $ref ( (\@add, \@del, \@mod, \@ren) ) { - my @tmp = (); - while (my $t = pop @$ref) { - next unless length ($t); - next if $t =~ m!\{arch\}/!; - next if $t =~ m!\.arch-ids/!; - next if $t =~ m!\.arch-inventory$!; + + # post-processing: + $ps->{summary} = join("\n",@{$ps->{summary}})."\n"; + $ps->{message} = join("\n",@$log); + + # skip Arch control files, unescape pika-escaped files + foreach my $k (keys %want_headers) { + next unless (defined $ps->{$k}); + my @tmp; + foreach my $t (@{$ps->{$k}}) { + next unless length ($t); + next if $t =~ m!\{arch\}/!; + next if $t =~ m!\.arch-ids/!; + # should we skip this? + next if $t =~ m!\.arch-inventory$!; # tla cat-archive-log will give us filenames with spaces as file\(sp)name - why? # we can assume that any filename with \ indicates some pika escaping that we want to get rid of. - if ($t =~ /\\/ ){ + if ($t =~ /\\/ ){ $t = (safe_pipe_capture($TLA,'escape','--unescaped',$t))[0]; } - push (@tmp, $t); + push @tmp, $t; } - @$ref = @tmp; + $ps->{$k} = \@tmp if scalar @tmp; } - - #print Dumper [$sum, $msg, \@add, \@del, \@mod, \@ren]; - return ($sum, $msg, \@add, \@del, \@mod, \@ren); } # write/read a tag --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 7/9] Add the accurate changeset applyer 2005-11-24 7:53 ` [PATCH 6/9] safer log file parsing Eric Wong @ 2005-11-24 7:55 ` Eric Wong 2005-11-24 7:56 ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:55 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff And make it the default. This includes stats tracking to verbose mode Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 201 ++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 172 insertions(+), 29 deletions(-) applies-to: aa9140057c95e59f65de0794f9054796fbfc96e5 32e5887eedb01ac4c398a06b0a1433ff6f4599fe diff --git a/git-archimport.perl b/git-archimport.perl index 8676f35..1cf1261 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -25,6 +25,9 @@ See man (1) git-archimport for more deta - audit shell-escaping of filenames - hide our private tags somewhere smarter - find a way to make "cat *patches | patch" safe even when patchfiles are missing newlines + - sort and apply patches by graphing ancestry relations instead of just + relying in dates supplied in the changeset itself. + tla ancestry-graph -m could be helpful here... =head1 Devel tricks @@ -66,18 +69,18 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; my $ptag_dir = "$git_dir/archimport/tags"; -our($opt_h,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o); +our($opt_h,$opt_f,$opt_v,$opt_T,$opt_t,$opt_D,$opt_a,$opt_o); sub usage() { print STDERR <<END; Usage: ${\basename $0} # fetch/update GIT from Arch - [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] + [ -f ] [ -o ] [ -h ] [ -v ] [ -T ] [ -a ] [ -D depth ] [ -t tempdir ] repository/arch-branch [ repository/arch-branch] ... END exit(1); } -getopts("Thvat:D:") or usage(); +getopts("fThvat:D:") or usage(); usage if $opt_h; @ARGV >= 1 or usage(); @@ -95,6 +98,10 @@ my %reachable = (); # Arch r my %unreachable = (); # Arch repositories we can't access :< my @psets = (); # the collection my %psets = (); # the collection, by name +my %stats = ( # Track which strategy we used to import: + get_tag => 0, replay => 0, get_new => 0, get_delta => 0, + simple_changeset => 0, import_or_tag => 0 +); my %rptags = (); # my reverse private tags # to map a SHA1 to a commitid @@ -288,29 +295,69 @@ sub old_style_branchname { *git_branchname = $opt_o ? *old_style_branchname : *tree_dirname; -# process patchsets -foreach my $ps (@psets) { - $ps->{branch} = git_branchname($ps->{id}); - - # - # ensure we have a clean state - # - if (`git-diff-files`) { - die "Unclean tree when about to process $ps->{id} " . - " - did we fail to commit cleanly before?"; - } - die $! if $?; - - # - # skip commits already in repo - # - if (ptag($ps->{id})) { - $opt_v && print " * Skipping already imported: $ps->{id}\n"; - next; +sub process_patchset_accurate { + my $ps = shift; + + # switch to that branch if we're not already in that branch: + if (-e "$git_dir/refs/heads/$ps->{branch}") { + system('git-checkout','-f',$ps->{branch}) == 0 or die "$! $?\n"; + + # remove any old stuff that got leftover: + my $rm = safe_pipe_capture('git-ls-files','--others','-z'); + rmtree(split(/\0/,$rm)) if $rm; } + + # Apply the import/changeset/merge into the working tree + my $dir = sync_to_ps($ps); + # read the new log entry: + my @commitlog = safe_pipe_capture($TLA,'cat-log','-d',$dir,$ps->{id}); + die "Error in cat-log: $!" if $?; + chomp @commitlog; + + # grab variables we want from the log, new fields get added to $ps: + # (author, date, email, summary, message body ...) + parselog($ps, \@commitlog); + + if ($ps->{id} =~ /--base-0$/ && $ps->{id} ne $psets[0]{id}) { + # this should work when importing continuations + if ($ps->{tag} && (my $branchpoint = eval { ptag($ps->{tag}) })) { + + # find where we are supposed to branch from + system('git-checkout','-f','-b',$ps->{branch}, + $branchpoint) == 0 or die "$! $?\n"; + + # remove any old stuff that got leftover: + my $rm = safe_pipe_capture('git-ls-files','--others','-z'); + rmtree(split(/\0/,$rm)) if $rm; - print " * Starting to work on $ps->{id}\n"; + # If we trust Arch with the fact that this is just + # a tag, and it does not affect the state of the tree + # then we just tag and move on + tag($ps->{id}, $branchpoint); + ptag($ps->{id}, $branchpoint); + print " * Tagged $ps->{id} at $branchpoint\n"; + return 0; + } else { + warn "Tagging from unknown id unsupported\n" if $ps->{tag}; + } + # allow multiple bases/imports here since Arch supports cherry-picks + # from unrelated trees + } + + # update the index with all the changes we got + system('git-ls-files --others -z | '. + 'git-update-index --add -z --stdin') == 0 or die "$! $?\n"; + system('git-ls-files --deleted -z | '. + 'git-update-index --remove -z --stdin') == 0 or die "$! $?\n"; + system('git-ls-files -z | '. + 'git-update-index -z --stdin') == 0 or die "$! $?\n"; + return 1; +} +# the native changeset processing strategy. This is very fast, but +# does not handle permissions or any renames involving directories +sub process_patchset_fast { + my $ps = shift; # # create the branch if needed # @@ -338,7 +385,7 @@ foreach my $ps (@psets) { tag($ps->{id}, $branchpoint); ptag($ps->{id}, $branchpoint); print " * Tagged $ps->{id} at $branchpoint\n"; - next; + return 0; } die $! if $?; } @@ -348,16 +395,17 @@ foreach my $ps (@psets) { # if ($ps->{type} eq 'i' || $ps->{type} eq 't') { apply_import($ps) or die $!; + $stats{import_or_tag}++; $import=0; } elsif ($ps->{type} eq 's') { apply_cset($ps); + $stats{simple_changeset}++; } # # prepare update git's index, based on what arch knows # about the pset, resolve parents, etc # - my $tree; my @commitlog = safe_pipe_capture($TLA,'cat-archive-log',$ps->{id}); die "Error in cat-archive-log: $!" if $?; @@ -404,14 +452,13 @@ foreach my $ps (@psets) { unless (-d dirname($to)) { mkpath(dirname($to)); # will die on err } - print "moving $from $to"; + # print "moving $from $to"; rename($from, $to) or die "Error renaming '$from' '$to': $!\n"; system('git-update-index','--remove','--',$from) == 0 or die "Error in git-update-index --remove: $! $?\n"; system('git-update-index','--add','--',$to) == 0 or die "Error in git-update-index --add: $! $?\n"; } - } if (my $mod = $ps->{modified_files}) { @@ -421,9 +468,46 @@ foreach my $ps (@psets) { die "Error in git-update-index: $! $?\n"; } } + return 1; # we successfully applied the changeset +} + +if ($opt_f) { + print "Will import patchsets using the fast strategy\n", + "Renamed directories and permission changes will be missed\n"; + *process_patchset = *process_patchset_fast; +} else { + print "Using the default (accurate) import strategy.\n", + "Things may be a bit slow\n"; + *process_patchset = *process_patchset_accurate; +} + +foreach my $ps (@psets) { + # process patchsets + $ps->{branch} = git_branchname($ps->{id}); + + # + # ensure we have a clean state + # + if (my $dirty = `git-diff-files`) { + die "Unclean tree when about to process $ps->{id} " . + " - did we fail to commit cleanly before?\n$dirty"; + } + die $! if $?; + # + # skip commits already in repo + # + if (ptag($ps->{id})) { + $opt_v && print " * Skipping already imported: $ps->{id}\n"; + return 0; + } + + print " * Starting to work on $ps->{id}\n"; + + process_patchset($ps) or next; + # warn "errors when running git-update-index! $!"; - $tree = `git-write-tree`; + my $tree = `git-write-tree`; die "cannot write tree $!" if $?; chomp $tree; @@ -494,6 +578,65 @@ foreach my $ps (@psets) { print " + commit $commitid\n"; $opt_v && print " + commit date is $ps->{date} \n"; $opt_v && print " + parents: ",join(' ',@par),"\n"; + if (my $dirty = `git-diff-files`) { + die "22 Unclean tree when about to process $ps->{id} " . + " - did we fail to commit cleanly before?\n$dirty"; + } +} + +if ($opt_v) { + foreach (sort keys %stats) { + print" $_: $stats{$_}\n"; + } +} +exit 0; + +# used by the accurate strategy: +sub sync_to_ps { + my $ps = shift; + my $tree_dir = $tmp.'/'.tree_dirname($ps->{id}); + + $opt_v && print "sync_to_ps($ps->{id}) method: "; + + if (-d $tree_dir) { + if ($ps->{type} eq 't') { + $opt_v && print "get (tag)\n"; + # looks like a tag-only or (worse,) a mixed tags/changeset branch, + # can't rely on replay to work correctly on these + rmtree($tree_dir); + safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); + $stats{get_tag}++; + } else { + my $tree_id = arch_tree_id($tree_dir); + if ($ps->{parent_id} && ($ps->{parent_id} eq $tree_id)) { + # the common case (hopefully) + $opt_v && print "replay\n"; + safe_pipe_capture($TLA,'replay','-d',$tree_dir,$ps->{id}); + $stats{replay}++; + } else { + # getting one tree is usually faster than getting two trees + # and applying the delta ... + rmtree($tree_dir); + $opt_v && print "apply-delta\n"; + safe_pipe_capture($TLA,'get','--no-pristine', + $ps->{id},$tree_dir); + $stats{get_delta}++; + } + } + } else { + # new branch work + $opt_v && print "get (new tree)\n"; + safe_pipe_capture($TLA,'get','--no-pristine',$ps->{id},$tree_dir); + $stats{get_new}++; + } + + # added -I flag to rsync since we're going to fast! AIEEEEE!!!! + system('rsync','-aI','--delete','--exclude',$git_dir, +# '--exclude','.arch-inventory', + '--exclude','.arch-ids','--exclude','{arch}', + '--exclude','+*','--exclude',',*', + "$tree_dir/",'./') == 0 or die "Cannot rsync $tree_dir: $! $?"; + return $tree_dir; } sub apply_import { @@ -896,7 +1039,7 @@ sub safe_pipe_capture { @output = (<$child>); close $child or die join(' ',@_).": $! $?"; } else { - exec(@_) or die $?; # exec() can fail the executable can't be found + exec(@_) or die "$! $?"; # exec() can fail the executable can't be found } return wantarray ? @output : join('',@output); } --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 8/9] Fix a bug I introduced in the new log parser 2005-11-24 7:55 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong @ 2005-11-24 7:56 ` Eric Wong 2005-11-24 7:58 ` [PATCH 9/9] fix a in new changeset applyer addition Eric Wong 2005-11-27 4:24 ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff 2005-12-01 17:02 ` Martin Langhoff 2 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:56 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff This fixes the case (that worked originally in Martin's version) where the only new/modified files are Arch control files. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) applies-to: db1362fe6567e349ff9dd9d70ce23c88a42a0ff2 ebe0689722f6c1440e680ec9a235b3dd571c7de0 diff --git a/git-archimport.perl b/git-archimport.perl index 1cf1261..0080850 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -790,7 +790,7 @@ sub parselog { # skip Arch control files, unescape pika-escaped files foreach my $k (keys %want_headers) { next unless (defined $ps->{$k}); - my @tmp; + my @tmp = (); foreach my $t (@{$ps->{$k}}) { next unless length ($t); next if $t =~ m!\{arch\}/!; @@ -804,7 +804,7 @@ sub parselog { } push @tmp, $t; } - $ps->{$k} = \@tmp if scalar @tmp; + $ps->{$k} = \@tmp; } } --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 9/9] fix a in new changeset applyer addition 2005-11-24 7:56 ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong @ 2005-11-24 7:58 ` Eric Wong 0 siblings, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 7:58 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff Fix a stupid bug I introduced when splitting the accurate and fast changeset appliers. Also, remove an old debugging statement I added Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-archimport.perl | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) applies-to: 6dfed0cb7c209cf47902d6dfcd02a974d252041b b081cb1e0f79f1a290bcf1f2161d63415ec5e2a9 diff --git a/git-archimport.perl b/git-archimport.perl index 0080850..aab4e38 100755 --- a/git-archimport.perl +++ b/git-archimport.perl @@ -499,7 +499,7 @@ foreach my $ps (@psets) { # if (ptag($ps->{id})) { $opt_v && print " * Skipping already imported: $ps->{id}\n"; - return 0; + next; } print " * Starting to work on $ps->{id}\n"; @@ -578,10 +578,6 @@ foreach my $ps (@psets) { print " + commit $commitid\n"; $opt_v && print " + commit date is $ps->{date} \n"; $opt_v && print " + parents: ",join(' ',@par),"\n"; - if (my $dirty = `git-diff-files`) { - die "22 Unclean tree when about to process $ps->{id} " . - " - did we fail to commit cleanly before?\n$dirty"; - } } if ($opt_v) { --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 7/9] Add the accurate changeset applyer 2005-11-24 7:55 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong 2005-11-24 7:56 ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong @ 2005-11-27 4:24 ` Martin Langhoff 2005-11-27 5:43 ` Eric Wong 2005-12-01 17:02 ` Martin Langhoff 2 siblings, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-11-27 4:24 UTC (permalink / raw) To: Eric Wong; +Cc: git list, Martin Langhoff On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote: > And make it the default. Cheeky, but right ;-) Would it be a good idea to read the log entry and decide what kind of smarts do we need to apply the changeset? If the log entry looks plain, use process_patchset_fast(), else invoke $TLA? cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/9] Add the accurate changeset applyer 2005-11-27 4:24 ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff @ 2005-11-27 5:43 ` Eric Wong 0 siblings, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-27 5:43 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff Martin Langhoff <martin.langhoff@gmail.com> wrote: > On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote: > > And make it the default. > > Cheeky, but right ;-) > > Would it be a good idea to read the log entry and decide what kind of > smarts do we need to apply the changeset? If the log entry looks > plain, use process_patchset_fast(), else invoke $TLA? This could work. For it to work efficiently, process_patchset_fast() should probably be modified to work on real Arch trees and rsync with the git one. Basically, we can replace the bulk of the tla replay calls with your fast changeset applier. Once the fast mode hits a changeset it can't handle, it can do a tla replay on a single changeset instead of having to do a slow get/apply-delta on an out-of-date tree. process_patchset_fast() must understand how to handle permissions changes, though, as Arch log entries are completely useless for that. Unfortunately, doing this right and fast probably still requires more time than it's worth. Let's face it, trees with 12k files are extremely rare in the Arch world (as are trees constantly reorganized by obsessive-compulsives :), but many trees do get a small handful of directory renames in their lifetime. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/9] Add the accurate changeset applyer 2005-11-24 7:55 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong 2005-11-24 7:56 ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong 2005-11-27 4:24 ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff @ 2005-12-01 17:02 ` Martin Langhoff 2005-12-03 2:51 ` Eric Wong 2 siblings, 1 reply; 39+ messages in thread From: Martin Langhoff @ 2005-12-01 17:02 UTC (permalink / raw) To: Eric Wong; +Cc: git list, Martin Langhoff Eric, My test results are a bit of a mixed bag. On one hand, I'm satisfied that both fast and correct imports reach the same tree (minus file modes) for the same commit with the arch repos I imported. On the other hand, with my "moodle" repo, the 'correct' import seems to have stop importing a lot earlier than it should have. I am re-running it now to try to continue from where it left off, but it's unclear why it abandoned -- I didn't see any error. How widely have you tested this method? cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/9] Add the accurate changeset applyer 2005-12-01 17:02 ` Martin Langhoff @ 2005-12-03 2:51 ` Eric Wong 2005-12-05 18:53 ` Martin Langhoff 0 siblings, 1 reply; 39+ messages in thread From: Eric Wong @ 2005-12-03 2:51 UTC (permalink / raw) To: Martin Langhoff; +Cc: git list, Martin Langhoff Martin Langhoff <martin.langhoff@gmail.com> wrote: > Eric, > > My test results are a bit of a mixed bag. On one hand, I'm satisfied > that both fast and correct imports reach the same tree (minus file > modes) for the same commit with the arch repos I imported. > > On the other hand, with my "moodle" repo, the 'correct' import seems > to have stop importing a lot earlier than it should have. I am > re-running it now to try to continue from where it left off, but it's > unclear why it abandoned -- I didn't see any error. How widely have > you tested this method? This was from the moodle repo I archive-mirrored locally a few weeks ago for testing: get_new: 6 get_tag: 0 import_or_tag: 0 replay: 356 Rerunning it doesn't seem to pull anymore. IIRC, My previous times only imported around ~150 patchsets. The time it took to run this was certainly longer than the last run (~4 hours here, vs ~2 hours I mentioned in <20051124074605.GA4789@mail.yhbt.net>, so there may be a bug somewhere... Unfortunately, I no longer have those old trees around. I've imported several trees with >1000 revisions without problems, mpd-uclinux is among them: http://mpd.bogomips.org/mpd-uclinux.git/ -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/9] Add the accurate changeset applyer 2005-12-03 2:51 ` Eric Wong @ 2005-12-05 18:53 ` Martin Langhoff 0 siblings, 0 replies; 39+ messages in thread From: Martin Langhoff @ 2005-12-05 18:53 UTC (permalink / raw) To: Eric Wong; +Cc: git list, Martin Langhoff On 12/3/05, Eric Wong <normalperson@yhbt.net> wrote: > Rerunning it doesn't seem to pull anymore. IIRC, My previous times > only imported around ~150 patchsets. The time it took to run this > was certainly longer than the last run (~4 hours here, vs ~2 hours > I mentioned in <20051124074605.GA4789@mail.yhbt.net>, so there may > be a bug somewhere... Unfortunately, I no longer have those old > trees around. > > I've imported several trees with >1000 revisions without problems, > mpd-uclinux is among them: > > http://mpd.bogomips.org/mpd-uclinux.git/ Haven't been able to retest an import and have it finished without my ssh session dropping (should have used gnu screen). I'll be able to test it more thoroughly in a couple of days. Very sorry about the delay. cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/9] remove git wrapper dependency 2005-11-24 7:51 ` [PATCH 4/9] remove git wrapper dependency Eric Wong 2005-11-24 7:52 ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong @ 2005-11-24 8:20 ` Andreas Ericsson 2005-11-24 8:35 ` Junio C Hamano 1 sibling, 1 reply; 39+ messages in thread From: Andreas Ericsson @ 2005-11-24 8:20 UTC (permalink / raw) To: git list Eric Wong wrote: > use git-diff-files instead of git diff-files so we don't rely on the > wrapper being installed (some people may have git as GNU interactive > tools :) > This one should do git --exec-path first to get the proper path to git-diff-files. Fall back to it being in the path if finding out fails. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/9] remove git wrapper dependency 2005-11-24 8:20 ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson @ 2005-11-24 8:35 ` Junio C Hamano 2005-11-24 8:50 ` Eric Wong 0 siblings, 1 reply; 39+ messages in thread From: Junio C Hamano @ 2005-11-24 8:35 UTC (permalink / raw) To: git Andreas Ericsson <ae@op5.se> writes: > Eric Wong wrote: >> use git-diff-files instead of git diff-files so we don't rely on the >> wrapper being installed (some people may have git as GNU interactive >> tools :) >> > > This one should do > git --exec-path > > first to get the proper path to git-diff-files. Fall back to it being in > the path if finding out fails. Eric is worried about the case where git on your PATH is GNU interactive tools, so "git --exec-path" would not give you what you want ;-). ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/9] remove git wrapper dependency 2005-11-24 8:35 ` Junio C Hamano @ 2005-11-24 8:50 ` Eric Wong 0 siblings, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-24 8:50 UTC (permalink / raw) To: git Junio C Hamano <junkio@cox.net> wrote: > Andreas Ericsson <ae@op5.se> writes: > > > Eric Wong wrote: > >> use git-diff-files instead of git diff-files so we don't rely on the > >> wrapper being installed (some people may have git as GNU interactive > >> tools :) > >> > > > > This one should do > > git --exec-path > > > > first to get the proper path to git-diff-files. Fall back to it being in > > the path if finding out fails. > > Eric is worried about the case where git on your PATH is GNU > interactive tools, so "git --exec-path" would not give you what > you want ;-). Right on. I'm actually not a GNU interactive tools user, but I do have empathy for them being a cg (cgvg) user myself for many, many years. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/9] archimport: first, make sure it still compiles 2005-11-24 7:47 ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong 2005-11-24 7:48 ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong @ 2005-11-24 18:54 ` Linus Torvalds 2005-11-26 10:51 ` Martin Langhoff 2005-11-26 20:43 ` Eric Wong 1 sibling, 2 replies; 39+ messages in thread From: Linus Torvalds @ 2005-11-24 18:54 UTC (permalink / raw) To: Eric Wong; +Cc: Martin Langhoff, git list, Martin Langhoff Eric, I don't know about Junio, but if I were him, I'd have preferred that all your patches had a archimport: .. prefix in the subject line, not just the first one. For example, if you just merge the patches as-is now, and then look at the end result with gitk (or any of the tools that show the shortlog format: just the first line of the commit), you get explanations like fix -t tmpdir switch which is clearly _correct_, but it's much nicer if they show which area was implied, ie archimport: fix -t tmpdir switch so that you can tell from the shortlog whether it was a "global" change, or somethign that affected a specific program. Just a suggestion, Linus ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/9] archimport: first, make sure it still compiles 2005-11-24 18:54 ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds @ 2005-11-26 10:51 ` Martin Langhoff 2005-11-26 20:43 ` Eric Wong 1 sibling, 0 replies; 39+ messages in thread From: Martin Langhoff @ 2005-11-26 10:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eric Wong, git list, Martin Langhoff On 11/24/05, Linus Torvalds <torvalds@osdl.org> wrote: > I don't know about Junio, but if I were him, I'd have preferred that all > your patches had a > > archimport: .. > > prefix in the subject line, not just the first one. Good catch -- I'll prefix them all as I merge them. If Junio pulls from my tree, he'll get them prefixed. cheers, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/9] archimport: first, make sure it still compiles 2005-11-24 18:54 ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds 2005-11-26 10:51 ` Martin Langhoff @ 2005-11-26 20:43 ` Eric Wong 1 sibling, 0 replies; 39+ messages in thread From: Eric Wong @ 2005-11-26 20:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Martin Langhoff, git list, Martin Langhoff Linus Torvalds <torvalds@osdl.org> wrote: > > > Eric, > I don't know about Junio, but if I were him, I'd have preferred that all > your patches had a > > archimport: .. > > prefix in the subject line, not just the first one. Good idea, will do for future patches. -- Eric Wong ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] archimport improvements 2005-11-24 7:46 ` Eric Wong 2005-11-24 7:47 ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong @ 2005-11-24 9:25 ` Martin Langhoff 1 sibling, 0 replies; 39+ messages in thread From: Martin Langhoff @ 2005-11-24 9:25 UTC (permalink / raw) To: Eric Wong; +Cc: git list On 11/24/05, Eric Wong <normalperson@yhbt.net> wrote: > Ok, I didn't expect you guys to have 12k of files in your trees. None > of your source trees are remotely close to that size (but I have many > more changesets). I'm surprised you guys were able to put up > with Arch in the first place! > > 125m58.431s with my method. > 8m24.504s with yours :) > > All of my usual source trees imported 1k changesets in 10-15 minutes :-) I'm happy that you managed to wait patiently for it to complete -- all my attempts to run your import code were ended by a sleepy ctrl-c. > Patches on the way. Cool -- will review, but may take a couple days, as I'm away from home this week. > OTOH, the time spent importing the bulk of the history is a one-time > operation for most people and I'd much rather it get things as right as > possible and move on. Hmmm. Some teams -- such as mine -- just run it every couple hours to maintain an Arch2cvs gateway. More later, martin ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2005-12-05 18:53 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-12 9:23 [PATCH] archimport improvements Eric Wong 2005-11-12 9:25 ` [PATCH 1/5] remove shellquote usage for tags Eric Wong 2005-11-12 9:27 ` [PATCH 2/5] archimport: don't die on merge-base failure Eric Wong 2005-11-12 9:29 ` [PATCH 3/5] Disambiguate the term 'branch' in Arch vs git Eric Wong 2005-11-12 9:30 ` [PATCH 4/5] Overhaul of changeset application Eric Wong 2005-11-12 9:32 ` [PATCH 5/5] -D <depth> option to recurse into merged branches Eric Wong 2005-11-14 2:01 ` Eric Wong 2005-11-12 12:07 ` [PATCH 4/5] Overhaul of changeset application Martin Langhoff 2005-11-12 20:49 ` Eric Wong 2005-11-12 11:54 ` [PATCH] archimport improvements Martin Langhoff 2005-11-12 20:21 ` Eric Wong 2005-11-14 22:38 ` Martin Langhoff 2005-11-15 8:03 ` Eric Wong 2005-11-15 8:05 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 2005-11-15 8:06 ` [PATCH 2/2] archimport: sync_to_ps() messages for tracking tla methods Eric Wong 2005-11-15 8:07 ` [PATCH 1/2] archimport: allow for old style branch and public tag names Eric Wong 2005-11-17 9:26 ` [PATCH] archimport improvements Martin Langhoff 2005-11-24 7:46 ` Eric Wong 2005-11-24 7:47 ` [PATCH 1/9] archimport: first, make sure it still compiles Eric Wong 2005-11-24 7:48 ` [PATCH 2/9] remove String::ShellQuote dependency Eric Wong 2005-11-24 7:50 ` [PATCH 3/9] fix -t tmpdir switch Eric Wong 2005-11-24 7:51 ` [PATCH 4/9] remove git wrapper dependency Eric Wong 2005-11-24 7:52 ` [PATCH 5/9] add -D <depth> and -a switch Eric Wong 2005-11-24 7:53 ` [PATCH 6/9] safer log file parsing Eric Wong 2005-11-24 7:55 ` [PATCH 7/9] Add the accurate changeset applyer Eric Wong 2005-11-24 7:56 ` [PATCH 8/9] Fix a bug I introduced in the new log parser Eric Wong 2005-11-24 7:58 ` [PATCH 9/9] fix a in new changeset applyer addition Eric Wong 2005-11-27 4:24 ` [PATCH 7/9] Add the accurate changeset applyer Martin Langhoff 2005-11-27 5:43 ` Eric Wong 2005-12-01 17:02 ` Martin Langhoff 2005-12-03 2:51 ` Eric Wong 2005-12-05 18:53 ` Martin Langhoff 2005-11-24 8:20 ` [PATCH 4/9] remove git wrapper dependency Andreas Ericsson 2005-11-24 8:35 ` Junio C Hamano 2005-11-24 8:50 ` Eric Wong 2005-11-24 18:54 ` [PATCH 1/9] archimport: first, make sure it still compiles Linus Torvalds 2005-11-26 10:51 ` Martin Langhoff 2005-11-26 20:43 ` Eric Wong 2005-11-24 9:25 ` [PATCH] archimport improvements Martin Langhoff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).