* [PATCH] cvsimport: keep one index per branch during import
@ 2006-06-12 11:50 Martin Langhoff
0 siblings, 0 replies; only message in thread
From: Martin Langhoff @ 2006-06-12 11:50 UTC (permalink / raw)
To: junkio, git; +Cc: Martin Langhoff
With this patch we have a speedup and much lower IO when
importing trees with many branches. Instead of forcing
index re-population for each branch switch, we keep
many index files around, one per branch.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
---
This patch should get some review. It is trivial, but not fully tested.
I am testing it on the moz repo (which will take a while) to check that I get
the same result with and without it.
Performance-wise, it seems to be doing ~15K commits per hour, with
the mozilla repo, up from ~6Kcph on the same hardware. Of course,
this is only noticeable in projects with lots of concurrent branches.
Linear projects don't get much from this patch.
With this change, we are now truly waiting on cvs to hand over the
files pronto! Running locally, it is apparent that it isn't IO wait
but the latency of the chatty cvs protocol that is making this slow.
Probably forking 2 or 3 processes to prefetch filerevs from cvs
and put them in a queue directory for the main process to pick
up would work wonders. Actually, they could call git-hash-object
and just put some file metadata in the queue directory.
---
git-cvsimport.perl | 37 ++++++++++++++++++++++++++++++-------
1 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
old mode 100755
new mode 100644
index 76f6246..9c4588f
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -465,10 +465,15 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git";
$ENV{"GIT_DIR"} = $git_dir;
my $orig_git_index;
$orig_git_index = $ENV{GIT_INDEX_FILE} if exists $ENV{GIT_INDEX_FILE};
-my ($git_ih, $git_index) = tempfile('gitXXXXXX', SUFFIX => '.idx',
- DIR => File::Spec->tmpdir());
-close ($git_ih);
-$ENV{GIT_INDEX_FILE} = $git_index;
+
+my %index; # holds filenames of one index per branch
+{ # init with an index for origin
+ my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+ DIR => File::Spec->tmpdir());
+ close ($fh);
+ $index{$opt_o} = $fn;
+}
+$ENV{GIT_INDEX_FILE} = $index{$opt_o};
unless(-d $git_dir) {
system("git-init-db");
die "Cannot init the GIT db at $git_tree: $?\n" if $?;
@@ -496,6 +501,13 @@ unless(-d $git_dir) {
$tip_at_start = `git-rev-parse --verify HEAD`;
# populate index
+ unless ($index{$last_branch}) {
+ my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+ DIR => File::Spec->tmpdir());
+ close ($fh);
+ $index{$last_branch} = $fn;
+ }
+ $ENV{GIT_INDEX_FILE} = $index{$last_branch};
system('git-read-tree', $last_branch);
die "read-tree failed: $?\n" if $?;
@@ -776,8 +788,17 @@ while(<CVS>) {
}
if(($ancestor || $branch) ne $last_branch) {
print "Switching from $last_branch to $branch\n" if $opt_v;
- system("git-read-tree", $branch);
- die "read-tree failed: $?\n" if $?;
+ unless ($index{$branch}) {
+ my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx',
+ DIR => File::Spec->tmpdir());
+ close ($fh);
+ $index{$branch} = $fn;
+ $ENV{GIT_INDEX_FILE} = $index{$branch};
+ system("git-read-tree", $branch);
+ die "read-tree failed: $?\n" if $?;
+ } else {
+ $ENV{GIT_INDEX_FILE} = $index{$branch};
+ }
}
$last_branch = $branch if $branch ne $last_branch;
$state = 9;
@@ -841,7 +862,9 @@ # VERSION:1.96->1.96.2.1
}
commit() if $branch and $state != 11;
-unlink($git_index);
+foreach my $git_index (values %index) {
+ unlink($git_index);
+}
if (defined $orig_git_index) {
$ENV{GIT_INDEX_FILE} = $orig_git_index;
--
1.4.0.g5fba
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2006-06-12 11:43 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12 11:50 [PATCH] cvsimport: keep one index per branch during import Martin Langhoff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).