* [PATCH] git-svn: cache max revision in rev_db databases @ 2007-06-10 9:04 Sam Vilain 2007-06-10 21:47 ` Eric Wong 0 siblings, 1 reply; 7+ messages in thread From: Sam Vilain @ 2007-06-10 9:04 UTC (permalink / raw) To: Eric Wong; +Cc: git Cache the maximum revision for each rev_db URL rather than looking it up each time. This saves a lot of time when rebuilding indexes on a freshly cloned repository. Signed-off-by: Sam Vilain <sam@vilain.net> --- git-svn.perl | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 610563c..c9758a0 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -804,6 +804,7 @@ sub working_head_info { my ($head, $refs) = @_; my ($fh, $ctx) = command_output_pipe('log', $head); my $hash; + my %max; while (<$fh>) { if ( m{^commit ($::sha1)$} ) { $hash = $1; @@ -812,11 +813,14 @@ sub working_head_info { next unless s{^\s+(git-svn-id:)}{$1}; my ($url, $rev, $uuid) = extract_metadata($_); if (defined $url && defined $rev) { + next if $max{$url} and $max{$url} < $rev; if (my $gs = Git::SVN->find_by_url($url)) { my $c = $gs->rev_db_get($rev); if ($c && $c eq $hash) { close $fh; # break the pipe return ($url, $rev, $uuid, $gs); + } else { + $max{$url} ||= $gs->rev_db_max; } } } -- 1.5.0.4.210.gf8a7c-dirty ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: cache max revision in rev_db databases 2007-06-10 9:04 [PATCH] git-svn: cache max revision in rev_db databases Sam Vilain @ 2007-06-10 21:47 ` Eric Wong 2007-06-11 11:43 ` Sam Vilain 0 siblings, 1 reply; 7+ messages in thread From: Eric Wong @ 2007-06-10 21:47 UTC (permalink / raw) To: Sam Vilain; +Cc: git Sam Vilain <sam@vilain.net> wrote: > Cache the maximum revision for each rev_db URL rather than looking it > up each time. This saves a lot of time when rebuilding indexes on a > freshly cloned repository. It looks correct, but I'm not sure how often we can even hit this optimization to make it worth it. Any live repositories and benchmarks you've run this on? (without the rev-list => log patch applied, which would be a big performance improvement if all log messages were under 16k). > Signed-off-by: Sam Vilain <sam@vilain.net> > --- > git-svn.perl | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/git-svn.perl b/git-svn.perl > index 610563c..c9758a0 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -804,6 +804,7 @@ sub working_head_info { > my ($head, $refs) = @_; > my ($fh, $ctx) = command_output_pipe('log', $head); > my $hash; > + my %max; > while (<$fh>) { > if ( m{^commit ($::sha1)$} ) { > $hash = $1; > @@ -812,11 +813,14 @@ sub working_head_info { > next unless s{^\s+(git-svn-id:)}{$1}; > my ($url, $rev, $uuid) = extract_metadata($_); > if (defined $url && defined $rev) { > + next if $max{$url} and $max{$url} < $rev; > if (my $gs = Git::SVN->find_by_url($url)) { > my $c = $gs->rev_db_get($rev); > if ($c && $c eq $hash) { > close $fh; # break the pipe > return ($url, $rev, $uuid, $gs); > + } else { > + $max{$url} ||= $gs->rev_db_max; > } > } > } -- Eric Wong ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: cache max revision in rev_db databases 2007-06-10 21:47 ` Eric Wong @ 2007-06-11 11:43 ` Sam Vilain 0 siblings, 0 replies; 7+ messages in thread From: Sam Vilain @ 2007-06-11 11:43 UTC (permalink / raw) To: Eric Wong; +Cc: git Eric Wong wrote: > Sam Vilain <sam@vilain.net> wrote: > >> Cache the maximum revision for each rev_db URL rather than looking it >> up each time. This saves a lot of time when rebuilding indexes on a >> freshly cloned repository. >> > > It looks correct, but I'm not sure how often we can even hit this > optimization to make it worth it. > > Any live repositories and benchmarks you've run this on? > (without the rev-list => log patch applied, which would be a big > performance improvement if all log messages were under 16k). > It may not make a huge difference with the current code, but it might with a different rev_db back-end with higher connection cost, and it seemed like an obvious and innocuous enough memoization. Sam ^ permalink raw reply [flat|nested] 7+ messages in thread
* a bunch of outstanding updates @ 2007-06-30 8:56 Sam Vilain 2007-06-30 8:56 ` [PATCH] repack: improve documentation on -a option Sam Vilain 0 siblings, 1 reply; 7+ messages in thread From: Sam Vilain @ 2007-06-30 8:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Following up to this e-mail are a whole load of outstanding feature requests of mine. These changes are relatively mundane: * repack: improve documentation on -a option * git-remote: document -n * git-remote: allow 'git-remote fetch' as a synonym for 'git fetch' * git-svn: use git-log rather than rev-list | xargs cat-file * git-svn: cache max revision in rev_db databases This one will impact on the version displayed by "git --version", but I think this is for the better: * GIT-VERSION-GEN: don't convert - delimiter to .'s These ones are really only very minor updates based on feedback so far: * git-merge-ff: fast-forward only merge * git-mergetool: add support for ediff This one is just the previously posted hook script put into the templates directory, let me know if you'd rather I reshaped it to go into contrib/hooks: * contrib/hooks: add post-update hook for updating working copy This one probably needs a bit more consideration and review, could perhaps sit on pu. * git-repack: generational repacking (and example hook script) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] repack: improve documentation on -a option 2007-06-30 8:56 a bunch of outstanding updates Sam Vilain @ 2007-06-30 8:56 ` Sam Vilain 2007-06-30 8:56 ` [PATCH] git-svn: use git-log rather than rev-list | xargs cat-file Sam Vilain 0 siblings, 1 reply; 7+ messages in thread From: Sam Vilain @ 2007-06-30 8:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Sam Vilain Some minor enhancements to the git-repack manual page. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> --- Documentation/git-repack.txt | 13 ++++++++----- 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index c33a512..be8e5f8 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -14,7 +14,8 @@ DESCRIPTION ----------- This script is used to combine all objects that do not currently -reside in a "pack", into a pack. +reside in a "pack", into a pack. It can also be used to re-organise +existing packs into a single, more efficient pack. A pack is a collection of objects, individually compressed, with delta compression applied, stored in a single file, with an @@ -28,11 +29,13 @@ OPTIONS -a:: Instead of incrementally packing the unpacked objects, - pack everything available into a single pack. + pack everything referenced into a single pack. Especially useful when packing a repository that is used - for private development and there is no need to worry - about people fetching via dumb file transfer protocols - from it. Use with '-d'. + for private development and there no need to worry + about people fetching via dumb protocols from it. Use + with '-d'. This will clean up the objects that `git prune` + leaves behind, but `git fsck-objects --full` shows as + dangling. -d:: After packing, if the newly created packs make some -- 1.5.2.1.1131.g3b90 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] git-svn: use git-log rather than rev-list | xargs cat-file 2007-06-30 8:56 ` [PATCH] repack: improve documentation on -a option Sam Vilain @ 2007-06-30 8:56 ` Sam Vilain 2007-06-30 8:56 ` [PATCH] git-svn: cache max revision in rev_db databases Sam Vilain 0 siblings, 1 reply; 7+ messages in thread From: Sam Vilain @ 2007-06-30 8:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Sam Vilain, Sam Vilain From: Sam Vilain <sam@vilain.net> This saves a bit of time when rebuilding the git-svn index. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> --- git-svn.perl | 36 ++++++++++++++++++++++-------------- 1 files changed, 22 insertions(+), 14 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 3033b50..556cd7d 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -782,12 +782,12 @@ sub read_repo_config { sub extract_metadata { my $id = shift or return (undef, undef, undef); - my ($url, $rev, $uuid) = ($id =~ /^git-svn-id:\s(\S+?)\@(\d+) + my ($url, $rev, $uuid) = ($id =~ /^\s*git-svn-id:\s+(.*)\@(\d+) \s([a-f\d\-]+)$/x); if (!defined $rev || !$uuid || !$url) { # some of the original repositories I made had # identifiers like this: - ($rev, $uuid) = ($id =~/^git-svn-id:\s(\d+)\@([a-f\d\-]+)/); + ($rev, $uuid) = ($id =~/^\s*git-svn-id:\s(\d+)\@([a-f\d\-]+)/); } return ($url, $rev, $uuid); } @@ -799,10 +799,16 @@ sub cmt_metadata { sub working_head_info { my ($head, $refs) = @_; - my ($fh, $ctx) = command_output_pipe('rev-list', $head); - while (my $hash = <$fh>) { - chomp($hash); - my ($url, $rev, $uuid) = cmt_metadata($hash); + my ($fh, $ctx) = command_output_pipe('log', $head); + my $hash; + while (<$fh>) { + if ( m{^commit ($::sha1)$} ) { + unshift @$refs, $hash if $hash and $refs; + $hash = $1; + next; + } + next unless s{^\s*(git-svn-id:)}{$1}; + my ($url, $rev, $uuid) = extract_metadata($_); if (defined $url && defined $rev) { if (my $gs = Git::SVN->find_by_url($url)) { my $c = $gs->rev_db_get($rev); @@ -812,7 +818,6 @@ sub working_head_info { } } } - unshift @$refs, $hash if $refs; } command_close_pipe($fh, $ctx); (undef, undef, undef, undef); @@ -2019,16 +2024,19 @@ sub rebuild { return; } print "Rebuilding $db_path ...\n"; - my ($rev_list, $ctx) = command_output_pipe("rev-list", $self->refname); + my ($log, $ctx) = command_output_pipe("log", $self->refname); my $latest; my $full_url = $self->full_url; remove_username($full_url); my $svn_uuid; - while (<$rev_list>) { - chomp; - my $c = $_; - die "Non-SHA1: $c\n" unless $c =~ /^$::sha1$/o; - my ($url, $rev, $uuid) = ::cmt_metadata($c); + my $c; + while (<$log>) { + if ( m{^commit ($::sha1)$} ) { + $c = $1; + next; + } + next unless s{^\s*(git-svn-id:)}{$1}; + my ($url, $rev, $uuid) = ::extract_metadata($_); remove_username($url); # ignore merges (from set-tree) @@ -2046,7 +2054,7 @@ sub rebuild { $self->rev_db_set($rev, $c); print "r$rev = $c\n"; } - command_close_pipe($rev_list, $ctx); + command_close_pipe($log, $ctx); print "Done rebuilding $db_path\n"; } -- 1.5.2.1.1131.g3b90 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] git-svn: cache max revision in rev_db databases 2007-06-30 8:56 ` [PATCH] git-svn: use git-log rather than rev-list | xargs cat-file Sam Vilain @ 2007-06-30 8:56 ` Sam Vilain 2007-07-01 3:50 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Sam Vilain @ 2007-06-30 8:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Sam Vilain, Sam Vilain From: Sam Vilain <sam@vilain.net> Cache the maximum revision for each rev_db URL rather than looking it up each time. This saves a lot of time when rebuilding indexes on a freshly cloned repository. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> --- git-svn.perl | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 556cd7d..a8b6669 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -801,6 +801,7 @@ sub working_head_info { my ($head, $refs) = @_; my ($fh, $ctx) = command_output_pipe('log', $head); my $hash; + my %max; while (<$fh>) { if ( m{^commit ($::sha1)$} ) { unshift @$refs, $hash if $hash and $refs; @@ -810,11 +811,14 @@ sub working_head_info { next unless s{^\s*(git-svn-id:)}{$1}; my ($url, $rev, $uuid) = extract_metadata($_); if (defined $url && defined $rev) { + next if $max{$url} and $max{$url} < $rev; if (my $gs = Git::SVN->find_by_url($url)) { my $c = $gs->rev_db_get($rev); if ($c && $c eq $hash) { close $fh; # break the pipe return ($url, $rev, $uuid, $gs); + } else { + $max{$url} ||= $gs->rev_db_max; } } } -- 1.5.2.1.1131.g3b90 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: cache max revision in rev_db databases 2007-06-30 8:56 ` [PATCH] git-svn: cache max revision in rev_db databases Sam Vilain @ 2007-07-01 3:50 ` Junio C Hamano 2007-07-01 5:31 ` Eric Wong 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2007-07-01 3:50 UTC (permalink / raw) To: Eric Wong; +Cc: git, Sam Vilain Sam Vilain <sam.vilain@catalyst.net.nz> writes: > Cache the maximum revision for each rev_db URL rather than looking it > up each time. This saves a lot of time when rebuilding indexes on a > freshly cloned repository. > > Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> I think both the previous one from Sam that makes it use git-log instead of git-rev-list and this one looks sane. Ack/Nack is appreciated. > --- > git-svn.perl | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/git-svn.perl b/git-svn.perl > index 556cd7d..a8b6669 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -801,6 +801,7 @@ sub working_head_info { > my ($head, $refs) = @_; > my ($fh, $ctx) = command_output_pipe('log', $head); > my $hash; > + my %max; > while (<$fh>) { > if ( m{^commit ($::sha1)$} ) { > unshift @$refs, $hash if $hash and $refs; > @@ -810,11 +811,14 @@ sub working_head_info { > next unless s{^\s*(git-svn-id:)}{$1}; > my ($url, $rev, $uuid) = extract_metadata($_); > if (defined $url && defined $rev) { > + next if $max{$url} and $max{$url} < $rev; > if (my $gs = Git::SVN->find_by_url($url)) { > my $c = $gs->rev_db_get($rev); > if ($c && $c eq $hash) { > close $fh; # break the pipe > return ($url, $rev, $uuid, $gs); > + } else { > + $max{$url} ||= $gs->rev_db_max; > } > } > } > -- > 1.5.2.1.1131.g3b90 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: cache max revision in rev_db databases 2007-07-01 3:50 ` Junio C Hamano @ 2007-07-01 5:31 ` Eric Wong 2007-07-01 6:49 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Eric Wong @ 2007-07-01 5:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Sam Vilain Junio C Hamano <gitster@pobox.com> wrote: > Sam Vilain <sam.vilain@catalyst.net.nz> writes: > > > Cache the maximum revision for each rev_db URL rather than looking it > > up each time. This saves a lot of time when rebuilding indexes on a > > freshly cloned repository. > > > > Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> > > I think both the previous one from Sam that makes it use git-log > instead of git-rev-list and this one looks sane. Ack/Nack is > appreciated. Now that 80583c0ef61cc966c7eee79cf3623a83197e19b8 is in, both patches are: Acked-by: Eric Wong <normalperson@yhbt.net> > > --- > > git-svn.perl | 4 ++++ > > 1 files changed, 4 insertions(+), 0 deletions(-) > > > > diff --git a/git-svn.perl b/git-svn.perl > > index 556cd7d..a8b6669 100755 > > --- a/git-svn.perl > > +++ b/git-svn.perl > > @@ -801,6 +801,7 @@ sub working_head_info { > > my ($head, $refs) = @_; > > my ($fh, $ctx) = command_output_pipe('log', $head); > > my $hash; > > + my %max; > > while (<$fh>) { > > if ( m{^commit ($::sha1)$} ) { > > unshift @$refs, $hash if $hash and $refs; > > @@ -810,11 +811,14 @@ sub working_head_info { > > next unless s{^\s*(git-svn-id:)}{$1}; > > my ($url, $rev, $uuid) = extract_metadata($_); > > if (defined $url && defined $rev) { > > + next if $max{$url} and $max{$url} < $rev; > > if (my $gs = Git::SVN->find_by_url($url)) { > > my $c = $gs->rev_db_get($rev); > > if ($c && $c eq $hash) { > > close $fh; # break the pipe > > return ($url, $rev, $uuid, $gs); > > + } else { > > + $max{$url} ||= $gs->rev_db_max; > > } > > } > > } > > -- > > 1.5.2.1.1131.g3b90 > -- Eric Wong ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: cache max revision in rev_db databases 2007-07-01 5:31 ` Eric Wong @ 2007-07-01 6:49 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2007-07-01 6:49 UTC (permalink / raw) To: Eric Wong; +Cc: git, Sam Vilain Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-07-01 6:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-10 9:04 [PATCH] git-svn: cache max revision in rev_db databases Sam Vilain 2007-06-10 21:47 ` Eric Wong 2007-06-11 11:43 ` Sam Vilain -- strict thread matches above, loose matches on Subject: below -- 2007-06-30 8:56 a bunch of outstanding updates Sam Vilain 2007-06-30 8:56 ` [PATCH] repack: improve documentation on -a option Sam Vilain 2007-06-30 8:56 ` [PATCH] git-svn: use git-log rather than rev-list | xargs cat-file Sam Vilain 2007-06-30 8:56 ` [PATCH] git-svn: cache max revision in rev_db databases Sam Vilain 2007-07-01 3:50 ` Junio C Hamano 2007-07-01 5:31 ` Eric Wong 2007-07-01 6:49 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).