* git-svn performance
@ 2014-10-17 20:47 Fabian Schmied
2014-10-19 0:32 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Fabian Schmied @ 2014-10-17 20:47 UTC (permalink / raw)
To: git
Hi,
I'm currently migrating an SVN repository to Git using git-svn (Git
for Windows 1.8.3-preview20130601), and I'm experiencing severe
performance problems with "git svn fetch". Commits to the SVN "trunk"
are fetched very fast (a few seconds or so per SVN revision), but
commits to some branches ("hotfix" branches) are currently taking
about 9 minutes per revision. I fear that the time per these commits
is increasing and that indeed the migration might not be finishable at
all.
For the commits that take such a long time, git-svn always outputs
lots of warnings about ignored SVN cherry-picks, and it tells me it
can't find a revmap for the path being imported. (See [1].)
AFAICS, the offending commits take place on some branches that include
a lot of manually merged ("SVN cherry-picked") revisions. Git-svn
seems to be checking something (though I don't know what) that makes
importing these revisions really slow. And it repeats this for every
revision on these branches with increasing work to do.
Is there anything I can do to speed this up? (I already tried
increasing the --log-window-size to 500, didn't have any effect.)
Thank you, best regards,
Fabian
[1]
M foo/bar/XXX.xml
M foo/bar/YYY.xml
W:svn cherry-pick ignored (/branches/frob:6940-7068) - missing 12
commit(s) (eg abeaece820ceae44ebf2c06011cf43bbcbf4b1ce)
W:svn cherry-pick ignored (/branches/feature:3316-4798,4811,4827) -
missing 10 commit(s) (eg e255fff14ab1e581f21671ca8b36c0747869cf8c)
W:svn cherry-pick ignored
(/hotfixes/ZZZ.159:2131,2133,2145-2146,2148,2169) - missing 10
commit(s) (eg e04b0326c998f0611c18144b3ed8f686d3b52f4c)
W:svn cherry-pick ignored
(/hotfixes/ZZZ.333:4536,4610-4611,4625,4665,4669,4685,4713,4745,4785,4788,4908-4917,4920,4933-4944,4955,5003,5103,5174,5222,5227,
5261,5267,5306,5310,5321,5360,5416,5467,5501,5508,5599-5614,5650-5651,5757,5761-5762,5764,5778-5779,5784,5811,5814,5819,5823,5825,5836-5838,5860,5862,5873,5889,
5910,5924,5948) - missing 137 commit(s) (eg
9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
C:\Program Files (x86)\Git\bin\perl.exe: *** WFSO timed out
W:svn cherry-pick ignored (/hotfixes/ZZZ.333.39:5696,5847) - missing
84 commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/AAA:5905,6095) - missing 119
commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/BBB_1.1:6971) - missing 198
commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored
(/hotfixes/CCC:6134,6164,6168,6174,6206,6211,6237,6239,6244-6245,6250,6257,6269,6271,6276,6289-6292,6294,6296,6301-6302,6313,6315-6316,6329,6333,6379,6383,6394,6405,6411,6456,6478,6483,6491,6519,6537,6557)
- missing 194 commit(s) (eg 9daec24cbdf55200d2cdfb0cd6b3f10485e296ac)
W:svn cherry-pick ignored (/hotfixes/DDD:7635) - missing 1 commit(s)
(eg 6a3ba817635eb3a9411a307924dec393311d93be)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.2:7786,7794,7797,7803,7829-7830,7843,7886,7889,7933,7937,7949,7953)
- missing 80 commit(s) (eg e78b1bc68f7a9b041588a39f3fa5e1a61f98942b)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.3:8159,8170,8173-8174,8177,8181-8182,8185,8187,8194-8195,8201,8203,8206,8251,8255,8257,8259-8262,8265,8280,8286,8294,8296,8304-8305,8312,8318,8323,8327,8363,8387-8388,8390,8422-8423,8432,8446,8536-8537,8548-8549,8556,8559,8566,8569,8572,8578,8597-8598,8602,8617,8619,8655,8687,8720)
- missing 104 commit(s) (eg 33febd4591f42a9d871ba330432840917b157f9e)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.4:8766,8768,8770,8777-8779,8795-8796,8802-8809,8812-8814,8816-8817,8820,8823,8825,8827,8831,8836,8841,8845,8848-8852,8854-8855,8866,8868-8869,8871-8873,8875-8878,8880,8888,8892,8911-8912,8917-8918,8946,8956-8957,8964,8984,8994,9003,9008,9011,9029,9038,9040,9046-9048,9055,9086,9101,9108,9111,9113,9124,9129,9133,9138-9139,9150,9152,9154,9156,9172,9174,9188-9189,9208,9211,9217)
- missing 44 commit(s) (eg 0621fb44de682650d762c707b102bc2472c088f8)
W:svn cherry-pick ignored
(/hotfixes/EEE_1.5:9412,9421,9430,9433-9436,9439,9441,9449,9459,9468,9529,9548,9561,9568,9605-9606,9612,9614,9617,9628,9630-9631,9637,9687,9807)
- missing 41 commit(s) (eg 1bd1a9b72336bf4d3839a00348b7f2a52368c16c)
W:svn cherry-pick ignored
(/trunk:9852-9853,9857,9859,9862,9868,9872,9876,9879,9890,9895,9926-9927,9933,9953,9956,9960-9962)
- missing 60 commit(s) (eg 3322e7ffc6ab49181976d9e94c91a4556951f38a)
Couldn't find revmap for https://the-svn-server/svn/something/trunk/foo
r9963 = 597df48cb830825f9029d1cfdf45df024d7fd3dd (refs/remotes/EEE_1.6)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-17 20:47 git-svn performance Fabian Schmied
@ 2014-10-19 0:32 ` Eric Wong
2014-10-19 2:29 ` Eric Wong
2014-10-19 9:38 ` Fabian Schmied
0 siblings, 2 replies; 15+ messages in thread
From: Eric Wong @ 2014-10-19 0:32 UTC (permalink / raw)
To: Fabian Schmied; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
Fabian Schmied <fabian.schmied@gmail.com> wrote:
> Hi,
>
> I'm currently migrating an SVN repository to Git using git-svn (Git
> for Windows 1.8.3-preview20130601), and I'm experiencing severe
> performance problems with "git svn fetch". Commits to the SVN "trunk"
> are fetched very fast (a few seconds or so per SVN revision), but
> commits to some branches ("hotfix" branches) are currently taking
> about 9 minutes per revision. I fear that the time per these commits
> is increasing and that indeed the migration might not be finishable at
> all.
>
> For the commits that take such a long time, git-svn always outputs
> lots of warnings about ignored SVN cherry-picks, and it tells me it
> can't find a revmap for the path being imported. (See [1].)
>
> AFAICS, the offending commits take place on some branches that include
> a lot of manually merged ("SVN cherry-picked") revisions. Git-svn
> seems to be checking something (though I don't know what) that makes
> importing these revisions really slow. And it repeats this for every
> revision on these branches with increasing work to do.
>
> Is there anything I can do to speed this up? (I already tried
> increasing the --log-window-size to 500, didn't have any effect.)
Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)
Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
Also downloadable here:
http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
Hin-Tak (Cc-ed) reported good improvements with them, but also
a large memory increase:
http://mid.gmane.org/1412706046.90413.YahooMailBasic@web172303.mail.ir2.yahoo.com
Jakob (or anybody else): I suppose we could tie the new
cached_mergeinfo* caches to disk-backed storage to avoid the memory
bloat.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-19 0:32 ` Eric Wong
@ 2014-10-19 2:29 ` Eric Wong
2014-10-19 2:33 ` Eric Wong
2014-10-19 9:38 ` Fabian Schmied
1 sibling, 1 reply; 15+ messages in thread
From: Eric Wong @ 2014-10-19 2:29 UTC (permalink / raw)
To: Fabian Schmied; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
Eric Wong <normalperson@yhbt.net> wrote:
> Hin-Tak (Cc-ed) reported good improvements with them, but also
> a large memory increase:
This might reduce the pathname and internal hash overheads:
------------------------8<-----------------------
From: Eric Wong <normalperson@yhbt.net>
Date: Sun, 19 Oct 2014 02:26:53 +0000
Subject: [PATCH] git-svn: simplify cached_mergeinfo layout
This reduces hash lookups for looking up cache data and will
simplify tying data to disk in the next commit.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
perl/Git/SVN.pm | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index b1a84d0..25dbcd5 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1708,15 +1708,17 @@ sub mergeinfo_changes {
my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
my $old_minfo = {};
+ # layout: $path => [ $rev, \%mergeinfo ]
+ my $cached_mergeinfo = $self->{cached_mergeinfo};
+
# Initialize cache on the first call.
- unless (defined $self->{cached_mergeinfo_rev}) {
- $self->{cached_mergeinfo_rev} = {};
- $self->{cached_mergeinfo} = {};
+ unless (defined $cached_mergeinfo) {
+ $cached_mergeinfo = $self->{cached_mergeinfo} = {};
}
- my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- if (defined $cached_rev && $cached_rev == $old_rev) {
- $old_minfo = $self->{cached_mergeinfo}{$old_path};
+ my $cached = $cached_mergeinfo->{$old_path};
+ if (defined $cached && $cached->[0] == $old_rev) {
+ $old_minfo = $cached->[1];
} else {
my $ra = $self->ra;
# Give up if $old_path isn't in the repo.
@@ -1733,13 +1735,11 @@ sub mergeinfo_changes {
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $self->{cached_mergeinfo}{$old_path} = $old_minfo;
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
+ $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
}
# Cache the new mergeinfo.
- $self->{cached_mergeinfo}{$path} = \%minfo;
- $self->{cached_mergeinfo_rev}{$path} = $rev;
+ $cached_mergeinfo->{$path} = [ $rev, \%minfo ];
my %changes = ();
foreach my $p (keys %minfo) {
--
EW
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-19 2:29 ` Eric Wong
@ 2014-10-19 2:33 ` Eric Wong
2014-10-19 14:56 ` Jakob Stoklund Olesen
0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2014-10-19 2:33 UTC (permalink / raw)
To: Fabian Schmied; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
Eric Wong <normalperson@yhbt.net> wrote:
> This reduces hash lookups for looking up cache data and will
> simplify tying data to disk in the next commit.
I considered the following, but GDBM might not be readily available on
non-POSIX platforms. I think the other problem is the existing caches
are still in memory (whether YAML or Storable) even if disk-backed,
causing a large amount of memory usage anyways.
(Both patches on top of Jakob's)
-------------------------
Subject: [RFC] git-svn: tie cached_mergeinfo to a GDBM_File store
This should reduce per-instance memory usage by allowing
serialization to disk. Using the existing Memoize::Storable
or YAML backends does not allow fast lookups.
GDBM_File should be available in most Perl installations
and should not pose unnecessary burden
---
perl/Git/SVN.pm | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..3e477c7 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -14,6 +14,7 @@ use IPC::Open3;
use Memoize; # core since 5.8.0, Jul 2002
use Memoize::Storable;
use POSIX qw(:signal_h);
+use Storable qw(freeze thaw);
use Git qw(
command
@@ -1713,10 +1714,21 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $cached_mergeinfo) {
- $cached_mergeinfo = $self->{cached_mergeinfo} = {};
+ my %hash;
+ eval '
+ require File::Temp;
+ use GDBM_File;
+ my $fh = File::Temp->new(TEMPLATE => "mergeinfo.XXXXXXXX");
+ $self->{cached_mergeinfo_fh} = $fh;
+ $fh->unlink_on_destroy(1);
+ tie %hash => "GDBM_File", $fh->filename, GDBM_WRCREAT, 0600;
+ ';
+ $cached_mergeinfo = $self->{cached_mergeinfo} = \%hash;
}
my $cached = $cached_mergeinfo->{$old_path};
+ $cached = thaw($cached) if defined $cached;
+
if (defined $cached && $cached->[0] == $old_rev) {
$old_minfo = $cached->[1];
} else {
@@ -1735,11 +1747,12 @@ sub mergeinfo_changes {
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
+ $cached_mergeinfo->{$old_path} =
+ freeze([ $old_rev, $old_minfo ]);
}
# Cache the new mergeinfo.
- $cached_mergeinfo->{$path} = [ $rev, \%minfo ];
+ $cached_mergeinfo->{$path} = freeze([ $rev, \%minfo ]);
my %changes = ();
foreach my $p (keys %minfo) {
--
EW
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-19 0:32 ` Eric Wong
2014-10-19 2:29 ` Eric Wong
@ 2014-10-19 9:38 ` Fabian Schmied
1 sibling, 0 replies; 15+ messages in thread
From: Fabian Schmied @ 2014-10-19 9:38 UTC (permalink / raw)
To: Eric Wong; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
On Sun, Oct 19, 2014 at 2:32 AM, Eric Wong <normalperson@yhbt.net> wrote:
> Fabian Schmied <fabian.schmied@gmail.com> wrote:
>> Hi,
>>
>> I'm currently migrating an SVN repository to Git using git-svn (Git
>> for Windows 1.8.3-preview20130601), and I'm experiencing severe
>> performance problems with "git svn fetch". Commits to the SVN "trunk"
>> are fetched very fast (a few seconds or so per SVN revision), but
>> commits to some branches ("hotfix" branches) are currently taking
>> about 9 minutes per revision. I fear that the time per these commits
>> is increasing and that indeed the migration might not be finishable at
>> all.
[...]
>> Is there anything I can do to speed this up? (I already tried
>> increasing the --log-window-size to 500, didn't have any effect.)
>
> Can you take a look at the following two "mergeinfo-speedups"
> in my repo? (git://bogomips.org/git-svn)
>
> Jakob Stoklund Olesen (2):
> git-svn: only look at the new parts of svn:mergeinfo
> git-svn: only look at the root path for svn:mergeinfo
>
> Also downloadable here:
>
> http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
> http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
[...]
Thank you _very_ much, the performance increase is tremendous: from,
ATM, 15 minutes per commit (with large merge-infos) down to 15 seconds
each. This means that instead of taking weeks, the migration will now
complete in hours! Memory consumption might be a bit higher, but not a
problem for me at all.
(I didn't apply the two additional patches you supplied, only the two
ones linked above.)
Thanks again, you saved my deadline :)
Fabian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-19 2:33 ` Eric Wong
@ 2014-10-19 14:56 ` Jakob Stoklund Olesen
2014-10-20 1:16 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Jakob Stoklund Olesen @ 2014-10-19 14:56 UTC (permalink / raw)
To: Eric Wong
Cc: Fabian Schmied, git@vger.kernel.org, sam@vilain.net,
stevenrwalter@gmail.com, waste.manager@gmx.de, amyrick@apple.com
> On Oct 18, 2014, at 19:33, Eric Wong <normalperson@yhbt.net> wrote:
>
> Eric Wong <normalperson@yhbt.net> wrote:
>> This reduces hash lookups for looking up cache data and will
>> simplify tying data to disk in the next commit.
>
> I considered the following, but GDBM might not be readily available on
> non-POSIX platforms. I think the other problem is the existing caches
> are still in memory (whether YAML or Storable) even if disk-backed,
> causing a large amount of memory usage anyways.
If cached_mergeinfo is using too much memory, you can probably drop that cache entirely. IIRC, it didn't give that much of a speed up.
I am surprised that it is using a lot of memory, though. There is only one entry per SVN branch.
> (Both patches on top of Jakob's)
> -------------------------
> Subject: [RFC] git-svn: tie cached_mergeinfo to a GDBM_File store
>
> This should reduce per-instance memory usage by allowing
> serialization to disk. Using the existing Memoize::Storable
> or YAML backends does not allow fast lookups.
>
> GDBM_File should be available in most Perl installations
> and should not pose unnecessary burden
> ---
> perl/Git/SVN.pm | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index 25dbcd5..3e477c7 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -14,6 +14,7 @@ use IPC::Open3;
> use Memoize; # core since 5.8.0, Jul 2002
> use Memoize::Storable;
> use POSIX qw(:signal_h);
> +use Storable qw(freeze thaw);
>
> use Git qw(
> command
> @@ -1713,10 +1714,21 @@ sub mergeinfo_changes {
>
> # Initialize cache on the first call.
> unless (defined $cached_mergeinfo) {
> - $cached_mergeinfo = $self->{cached_mergeinfo} = {};
> + my %hash;
> + eval '
> + require File::Temp;
> + use GDBM_File;
> + my $fh = File::Temp->new(TEMPLATE => "mergeinfo.XXXXXXXX");
> + $self->{cached_mergeinfo_fh} = $fh;
> + $fh->unlink_on_destroy(1);
> + tie %hash => "GDBM_File", $fh->filename, GDBM_WRCREAT, 0600;
> + ';
> + $cached_mergeinfo = $self->{cached_mergeinfo} = \%hash;
> }
>
> my $cached = $cached_mergeinfo->{$old_path};
> + $cached = thaw($cached) if defined $cached;
> +
> if (defined $cached && $cached->[0] == $old_rev) {
> $old_minfo = $cached->[1];
> } else {
> @@ -1735,11 +1747,12 @@ sub mergeinfo_changes {
> $props->{"svn:mergeinfo"};
> $old_minfo = \%omi;
> }
> - $cached_mergeinfo->{$old_path} = [ $old_rev, $old_minfo ];
> + $cached_mergeinfo->{$old_path} =
> + freeze([ $old_rev, $old_minfo ]);
> }
>
> # Cache the new mergeinfo.
> - $cached_mergeinfo->{$path} = [ $rev, \%minfo ];
> + $cached_mergeinfo->{$path} = freeze([ $rev, \%minfo ]);
>
> my %changes = ();
> foreach my $p (keys %minfo) {
> --
> EW
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-19 14:56 ` Jakob Stoklund Olesen
@ 2014-10-20 1:16 ` Eric Wong
2014-10-20 13:46 ` Jakob Stoklund Olesen
0 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2014-10-20 1:16 UTC (permalink / raw)
To: Jakob Stoklund Olesen
Cc: Fabian Schmied, git, sam, stevenrwalter, waste.manager, amyrick,
Hin-Tak Leung
Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
> If cached_mergeinfo is using too much memory, you can probably drop
> that cache entirely. IIRC, it didn't give that much of a speed up.
>
> I am surprised that it is using a lot of memory, though. There is only
> one entry per SVN branch.
Something like the below? (on top of your original two patches)
Pushed to my master @ git://bogomips.org/git-svn.git
Eric Wong (2):
git-svn: reduce check_cherry_pick cache overhead
git-svn: cache only mergeinfo revisions
Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
git-svn still seems to have some excessive memory usage problems,
even independenty of mergeinfo stuff.
--------------------------8<----------------------------
From: Eric Wong <normalperson@yhbt.net>
Date: Mon, 20 Oct 2014 01:02:53 +0000
Subject: [PATCH] git-svn: cache only mergeinfo revisions
This should reduce excessive memory usage from the new mergeinfo
caches without hurting performance too much, assuming reasonable
latency to the SVN server.
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Suggested-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
perl/Git/SVN.pm | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 171af37..f8a75b1 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1713,13 +1713,10 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $self->{cached_mergeinfo_rev}) {
$self->{cached_mergeinfo_rev} = {};
- $self->{cached_mergeinfo} = {};
}
my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- if (defined $cached_rev && $cached_rev == $old_rev) {
- $old_minfo = $self->{cached_mergeinfo}{$old_path};
- } else {
+ unless (defined $cached_rev && $cached_rev == $old_rev) {
my $ra = $self->ra;
# Give up if $old_path isn't in the repo.
# This is probably a merge on a subtree.
@@ -1728,19 +1725,16 @@ sub mergeinfo_changes {
"directory didn't exist in r$old_rev\n";
return {};
}
- my (undef, undef, $props) =
- $self->ra->get_dir($old_path, $old_rev);
- if (defined $props->{"svn:mergeinfo"}) {
- my %omi = map {split ":", $_ } split "\n",
- $props->{"svn:mergeinfo"};
- $old_minfo = \%omi;
- }
- $self->{cached_mergeinfo}{$old_path} = $old_minfo;
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
}
+ my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
+ if (defined $props->{"svn:mergeinfo"}) {
+ my %omi = map {split ":", $_ } split "\n",
+ $props->{"svn:mergeinfo"};
+ $old_minfo = \%omi;
+ }
+ $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
# Cache the new mergeinfo.
- $self->{cached_mergeinfo}{$path} = \%minfo;
$self->{cached_mergeinfo_rev}{$path} = $rev;
my %changes = ();
--
EW
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-20 1:16 ` Eric Wong
@ 2014-10-20 13:46 ` Jakob Stoklund Olesen
2014-10-21 9:00 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Jakob Stoklund Olesen @ 2014-10-20 13:46 UTC (permalink / raw)
To: Eric Wong
Cc: Fabian Schmied, git@vger.kernel.org, sam@vilain.net,
stevenrwalter@gmail.com, waste.manager@gmx.de, amyrick@apple.com,
Hin-Tak Leung
> On Oct 19, 2014, at 18:16, Eric Wong <normalperson@yhbt.net> wrote:
>
> Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
>> If cached_mergeinfo is using too much memory, you can probably drop
>> that cache entirely. IIRC, it didn't give that much of a speed up.
>>
>> I am surprised that it is using a lot of memory, though. There is only
>> one entry per SVN branch.
>
> Something like the below? (on top of your original two patches)
> Pushed to my master @ git://bogomips.org/git-svn.git
Yes, but I think you can remove cached_mergeinfo_rev too.
Thanks
/Jakob
> Eric Wong (2):
> git-svn: reduce check_cherry_pick cache overhead
> git-svn: cache only mergeinfo revisions
>
> Jakob Stoklund Olesen (2):
> git-svn: only look at the new parts of svn:mergeinfo
> git-svn: only look at the root path for svn:mergeinfo
>
> git-svn still seems to have some excessive memory usage problems,
> even independenty of mergeinfo stuff.
> --------------------------8<----------------------------
> From: Eric Wong <normalperson@yhbt.net>
> Date: Mon, 20 Oct 2014 01:02:53 +0000
> Subject: [PATCH] git-svn: cache only mergeinfo revisions
>
> This should reduce excessive memory usage from the new mergeinfo
> caches without hurting performance too much, assuming reasonable
> latency to the SVN server.
>
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> Suggested-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
> Signed-off-by: Eric Wong <normalperson@yhbt.net>
> ---
> perl/Git/SVN.pm | 22 ++++++++--------------
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index 171af37..f8a75b1 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -1713,13 +1713,10 @@ sub mergeinfo_changes {
> # Initialize cache on the first call.
> unless (defined $self->{cached_mergeinfo_rev}) {
> $self->{cached_mergeinfo_rev} = {};
> - $self->{cached_mergeinfo} = {};
> }
>
> my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
> - if (defined $cached_rev && $cached_rev == $old_rev) {
> - $old_minfo = $self->{cached_mergeinfo}{$old_path};
> - } else {
> + unless (defined $cached_rev && $cached_rev == $old_rev) {
> my $ra = $self->ra;
> # Give up if $old_path isn't in the repo.
> # This is probably a merge on a subtree.
> @@ -1728,19 +1725,16 @@ sub mergeinfo_changes {
> "directory didn't exist in r$old_rev\n";
> return {};
> }
> - my (undef, undef, $props) =
> - $self->ra->get_dir($old_path, $old_rev);
> - if (defined $props->{"svn:mergeinfo"}) {
> - my %omi = map {split ":", $_ } split "\n",
> - $props->{"svn:mergeinfo"};
> - $old_minfo = \%omi;
> - }
> - $self->{cached_mergeinfo}{$old_path} = $old_minfo;
> - $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
> }
> + my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
> + if (defined $props->{"svn:mergeinfo"}) {
> + my %omi = map {split ":", $_ } split "\n",
> + $props->{"svn:mergeinfo"};
> + $old_minfo = \%omi;
> + }
> + $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
>
> # Cache the new mergeinfo.
> - $self->{cached_mergeinfo}{$path} = \%minfo;
> $self->{cached_mergeinfo_rev}{$path} = $rev;
>
> my %changes = ();
> --
> EW
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-20 13:46 ` Jakob Stoklund Olesen
@ 2014-10-21 9:00 ` Eric Wong
0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2014-10-21 9:00 UTC (permalink / raw)
To: Jakob Stoklund Olesen
Cc: Fabian Schmied, git@vger.kernel.org, sam@vilain.net,
stevenrwalter@gmail.com, waste.manager@gmx.de, amyrick@apple.com,
Hin-Tak Leung
Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
> Yes, but I think you can remove cached_mergeinfo_rev too.
Thanks, pushed the patch at the bottom, too.
Also started working on some memory reductions here:
http://mid.gmane.org/20141021033912.GA27462@dcvr.yhbt.net
But there seem to be more problems :<
----------------------------8<-----------------------------
From: Eric Wong <normalperson@yhbt.net>
Date: Tue, 21 Oct 2014 06:23:22 +0000
Subject: [PATCH] git-svn: remove mergeinfo rev caching
This should further reduce memory usage from the new mergeinfo
speedups without hurting performance too much, assuming
reasonable latency to the SVN server.
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Suggested-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
perl/Git/SVN.pm | 30 +++++++++---------------------
1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index f8a75b1..4364506 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1710,32 +1710,20 @@ sub mergeinfo_changes {
my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
my $old_minfo = {};
- # Initialize cache on the first call.
- unless (defined $self->{cached_mergeinfo_rev}) {
- $self->{cached_mergeinfo_rev} = {};
- }
-
- my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
- unless (defined $cached_rev && $cached_rev == $old_rev) {
- my $ra = $self->ra;
- # Give up if $old_path isn't in the repo.
- # This is probably a merge on a subtree.
- if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
- warn "W: ignoring svn:mergeinfo on $old_path, ",
- "directory didn't exist in r$old_rev\n";
- return {};
- }
- }
- my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
+ my $ra = $self->ra;
+ # Give up if $old_path isn't in the repo.
+ # This is probably a merge on a subtree.
+ if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
+ warn "W: ignoring svn:mergeinfo on $old_path, ",
+ "directory didn't exist in r$old_rev\n";
+ return {};
+ }
+ my (undef, undef, $props) = $ra->get_dir($old_path, $old_rev);
if (defined $props->{"svn:mergeinfo"}) {
my %omi = map {split ":", $_ } split "\n",
$props->{"svn:mergeinfo"};
$old_minfo = \%omi;
}
- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
-
- # Cache the new mergeinfo.
- $self->{cached_mergeinfo_rev}{$path} = $rev;
my %changes = ();
foreach my $p (keys %minfo) {
--
EW
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git-svn performance
@ 2014-10-22 17:38 Hin-Tak Leung
2014-10-25 0:02 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Hin-Tak Leung @ 2014-10-22 17:38 UTC (permalink / raw)
To: normalperson, stoklund
Cc: fabian.schmied, git, sam, stevenrwalter, waste.manager, amyrick
------------------------------
On Tue, Oct 21, 2014 10:00 BST Eric Wong wrote:
>Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
>> Yes, but I think you can remove cached_mergeinfo_rev too.
>
>Thanks, pushed the patch at the bottom, too.
>Also started working on some memory reductions here:
> http://mid.gmane.org/20141021033912.GA27462@dcvr.yhbt.net
>But there seem to be more problems :<
>
>----------------------------8<-----------------------------
>From: Eric Wong <normalperson@yhbt.net>
>Date: Tue, 21 Oct 2014 06:23:22 +0000
>Subject: [PATCH] git-svn: remove mergeinfo rev caching
>
>This should further reduce memory usage from the new mergeinfo
>speedups without hurting performance too much, assuming
>reasonable latency to the SVN server.
>
>Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>Suggested-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
>Signed-off-by: Eric Wong <normalperson@yhbt.net>
>---
> perl/Git/SVN.pm | 30 +++++++++---------------------
> 1 file changed, 9 insertions(+), 21 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index f8a75b1..4364506 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1710,32 +1710,20 @@ sub mergeinfo_changes {
> my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
> my $old_minfo = {};
>
>- # Initialize cache on the first call.
>- unless (defined $self->{cached_mergeinfo_rev}) {
>- $self->{cached_mergeinfo_rev} = {};
>- }
>-
>- my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
>- unless (defined $cached_rev && $cached_rev == $old_rev) {
>- my $ra = $self->ra;
>- # Give up if $old_path isn't in the repo.
>- # This is probably a merge on a subtree.
>- if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
>- warn "W: ignoring svn:mergeinfo on $old_path, ",
>- "directory didn't exist in r$old_rev\n";
>- return {};
>- }
>- }
>- my (undef, undef, $props) = $self->ra->get_dir($old_path, $old_rev);
>+ my $ra = $self->ra;
>+ # Give up if $old_path isn't in the repo.
>+ # This is probably a merge on a subtree.
>+ if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
>+ warn "W: ignoring svn:mergeinfo on $old_path, ",
>+ "directory didn't exist in r$old_rev\n";
>+ return {};
>+ }
>+ my (undef, undef, $props) = $ra->get_dir($old_path, $old_rev);
> if (defined $props->{"svn:mergeinfo"}) {
> my %omi = map {split ":", $_ } split "\n",
> $props->{"svn:mergeinfo"};
> $old_minfo = \%omi;
> }
>- $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
>-
>- # Cache the new mergeinfo.
>- $self->{cached_mergeinfo_rev}{$path} = $rev;
>
> my %changes = ();
> foreach my $p (keys %minfo) {
>--
>EW
I'll have a look at the new changes at some point - I am still keeping the old
clone and the new clone and just fetching from time to time to keep them
in sync. I just tried that and fetching the same 50 commits on the old clone
took 1.7 GB memory vs 1.0 GB memory on the new. Details below.
This is just with the 2 earliest patches - I'll put the new 3 in at some point.
So I see some needs for retrospectively fixing old clones (maybe as part
of garbage collection?), since most would simply use an old clone through
the ages...
Comparing trunk of old and new, I see one difference - One short
commit message is missing in the *old* (the "Add checkPoFiles etc." part)
and so all the sha1 afterwards differed. Is that an old bug that's fixed
and therefore I should throw away the old clone?
Date: Wed Apr 25 18:21:29 2012 +0000
Add checkPoFiles etc.
git-svn-id: https://svn.r-project.org/R/trunk@59188
Here is the details of fetching old and new:
<---
$ /usr/bin/time -v git svn fetch --all
M doc/manual/R-admin.texi
r66784 = fc20374f26f8e03bb88c00933982e29138a6f929 (refs/remotes/trunk)
...
M configure
r66834 = d8d1876f6aa71b3fe3773cd28a760ff945d30bdf (refs/remotes/R-3-1-branch)
Command being timed: "git svn fetch --all"
User time (seconds): 1520.77
System time (seconds): 156.32
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 28:15.82
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1738276
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 613
Minor (reclaiming a frame) page faults: 2039305
Voluntary context switches: 11243
Involuntary context switches: 181507
Swaps: 0
File system inputs: 658328
File system outputs: 754688
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ cd ../R-2/
[Hin-Tak@localhost R-2]$ /usr/bin/time -v git svn fetch --all
M doc/manual/R-admin.texi
r66784 = 6a08d94b456d33d85add914a1b780a972689443a (refs/remotes/trunk)
...
M configure
r66834 = 370a6484c2a65be78dfae184b50d8f08685d389c (refs/remotes/R-3-1-branch)
Command being timed: "git svn fetch --all"
User time (seconds): 1507.89
System time (seconds): 134.25
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 27:38.49
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1026656
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1110
Minor (reclaiming a frame) page faults: 1630150
Voluntary context switches: 10280
Involuntary context switches: 176444
Swaps: 0
File system inputs: 361472
File system outputs: 477912
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---->
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-22 17:38 Hin-Tak Leung
@ 2014-10-25 0:02 ` Eric Wong
0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2014-10-25 0:02 UTC (permalink / raw)
To: Hin-Tak Leung
Cc: stoklund, fabian.schmied, git, sam, stevenrwalter, waste.manager,
amyrick
Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> Comparing trunk of old and new, I see one difference - One short
> commit message is missing in the *old* (the "Add checkPoFiles etc." part)
> and so all the sha1 afterwards differed. Is that an old bug that's fixed
> and therefore I should throw away the old clone?
I don't recall a bug which would cause a revision to be skipped.
I suppose it's alright now the new clone has that revision.
Perhaps there was a power outage or improper shutdown?
At least we can be glad current versions see this revision...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
@ 2014-10-25 5:23 Hin-Tak Leung
2014-10-25 5:32 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Hin-Tak Leung @ 2014-10-25 5:23 UTC (permalink / raw)
To: normalperson
Cc: stoklund, fabian.schmied, git, sam, stevenrwalter, waste.manager,
amyrick
------------------------------
On Sat, Oct 25, 2014 01:02 BST Eric Wong wrote:
>Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>> Comparing trunk of old and new, I see one difference - One short
>> commit message is missing in the *old* (the "Add checkPoFiles etc." part)
>> and so all the sha1 afterwards differed. Is that an old bug that's fixed
>> and therefore I should throw away the old clone?
>
>I don't recall a bug which would cause a revision to be skipped.
>I suppose it's alright now the new clone has that revision.
>Perhaps there was a power outage or improper shutdown?
>
>At least we can be glad current versions see this revision...
the old didn't missing a revision - just a revision 'message' - blank instead of 3 words, above the git svn id. I supppse it is possible some power problem or etc caused this. I'll check the other branches as well, and possibly clone again to be sure. ( The new clone did have one break...)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-25 5:23 Hin-Tak Leung
@ 2014-10-25 5:32 ` Eric Wong
0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2014-10-25 5:32 UTC (permalink / raw)
To: Hin-Tak Leung
Cc: stoklund, fabian.schmied, git, sam, stevenrwalter, waste.manager,
amyrick
Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> the old didn't missing a revision - just a revision 'message' - blank
> instead of 3 words, above the git svn id. I supppse it is possible
> some power problem or etc caused this. I'll check the other branches
> as well, and possibly clone again to be sure. ( The new clone did have
> one break...)
Oh, there's a possibility the commit message in SVN was edited/added
after-the-fact, but that depends on the SVN admin (most never allow
or do it).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
@ 2014-10-25 5:47 Hin-Tak Leung
2014-10-25 6:01 ` Eric Wong
0 siblings, 1 reply; 15+ messages in thread
From: Hin-Tak Leung @ 2014-10-25 5:47 UTC (permalink / raw)
To: normalperson
Cc: stoklund, fabian.schmied, git, sam, stevenrwalter, waste.manager,
amyrick
------------------------------
On Sat, Oct 25, 2014 06:32 BST Eric Wong wrote:
>Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>> the old didn't missing a revision - just a revision 'message' - blank
>> instead of 3 words, above the git svn id. I supppse it is possible
>> some power problem or etc caused this. I'll check the other branches
>> as well, and possibly clone again to be sure. ( The new clone did have
>> one break...)
>
>Oh, there's a possibility the commit message in SVN was edited/added
>after-the-fact, but that depends on the SVN admin (most never allow
>or do it).
That's a possibility - the old clone was created by fetching every few days. It is possible that the author edited it after commiting a blank message and i fetched.
btw, git svn seems to disallow single word commit messages (or is it a svn config?). i found that i could not do git svn dcommit, when i had merely did git commit -m 'typos', for example, for an svn repo i have write access to. (I don't have them many such things, so it is difficult to tell whether it is a repo config, or a git svn strangeness). i just do rebase and do 'typo correction' or something before re-dcommit in the past.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git-svn performance
2014-10-25 5:47 Hin-Tak Leung
@ 2014-10-25 6:01 ` Eric Wong
0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2014-10-25 6:01 UTC (permalink / raw)
To: Hin-Tak Leung
Cc: stoklund, fabian.schmied, git, sam, stevenrwalter, waste.manager,
amyrick
Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> btw, git svn seems to disallow single word commit messages (or is it a
> svn config?). i found that i could not do git svn dcommit, when i had
> merely did git commit -m 'typos', for example, for an svn repo i have
> write access to. (I don't have them many such things, so it is
> difficult to tell whether it is a repo config, or a git svn
> strangeness). i just do rebase and do 'typo correction' or something
> before re-dcommit in the past.
Probably an SVN hook preventing it. git-svn test cases such as
t/t9118-git-svn-funky-branch-names.sh do single word commits.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-25 21:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 20:47 git-svn performance Fabian Schmied
2014-10-19 0:32 ` Eric Wong
2014-10-19 2:29 ` Eric Wong
2014-10-19 2:33 ` Eric Wong
2014-10-19 14:56 ` Jakob Stoklund Olesen
2014-10-20 1:16 ` Eric Wong
2014-10-20 13:46 ` Jakob Stoklund Olesen
2014-10-21 9:00 ` Eric Wong
2014-10-19 9:38 ` Fabian Schmied
-- strict thread matches above, loose matches on Subject: below --
2014-10-22 17:38 Hin-Tak Leung
2014-10-25 0:02 ` Eric Wong
2014-10-25 5:23 Hin-Tak Leung
2014-10-25 5:32 ` Eric Wong
2014-10-25 5:47 Hin-Tak Leung
2014-10-25 6:01 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).