* "git svn fetch" slow
@ 2009-01-31 13:14 Markus Heidelberg
2009-01-31 16:23 ` Sverre Rabbelier
2009-02-01 2:18 ` [PATCH] git-svn: allow disabling expensive broken symlink checks Eric Wong
0 siblings, 2 replies; 11+ messages in thread
From: Markus Heidelberg @ 2009-01-31 13:14 UTC (permalink / raw)
To: git, Eric Wong
Hi,
since several days "git svn fetch" didn't seem to work any more. I
bisected it down to
commit dbc6c74d0858d77e61e092a48d467e725211f8e9
git-svn: handle empty files marked as symlinks in SVN
2009-01-11
In the new function _mark_empty_symlinks() there is a loop that takes
about 36 seconds for me. That means each svn revision takes 36+x seconds
for downloading. So it still works, but I aborted it before waiting so
much time, so I thought, it didn't work any more.
The loop loops over each blob ("git ls-tree -r git-svn | wc -l" times).
The project I'm using git-svn with is Buildroot and it has currently
3074 blobs in the tree. Printing a loop counter every time the loop is
executed, I can see that it mostly goes really fast, but there are
files, where it needs much time then.
Could there be a way to avoid this time consuming step?
Markus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "git svn fetch" slow
2009-01-31 13:14 "git svn fetch" slow Markus Heidelberg
@ 2009-01-31 16:23 ` Sverre Rabbelier
2009-01-31 17:01 ` Markus Heidelberg
2009-02-01 2:18 ` [PATCH] git-svn: allow disabling expensive broken symlink checks Eric Wong
1 sibling, 1 reply; 11+ messages in thread
From: Sverre Rabbelier @ 2009-01-31 16:23 UTC (permalink / raw)
To: Eric Wong; +Cc: git, markus.heidelberg
On Sat, Jan 31, 2009 at 14:14, Markus Heidelberg
<markus.heidelberg@web.de> wrote:
> since several days "git svn fetch" didn't seem to work any more. I
> bisected it down to
I noticed it too, it's horribly slow; I can't really revert the patch
since it conflicts, and I'm not familiar with the code, so I don't
know how to resolve the conflict :(.
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "git svn fetch" slow
2009-01-31 16:23 ` Sverre Rabbelier
@ 2009-01-31 17:01 ` Markus Heidelberg
2009-01-31 17:31 ` Sverre Rabbelier
0 siblings, 1 reply; 11+ messages in thread
From: Markus Heidelberg @ 2009-01-31 17:01 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Eric Wong, git
Sverre Rabbelier, 31.01.2009:
> On Sat, Jan 31, 2009 at 14:14, Markus Heidelberg
> <markus.heidelberg@web.de> wrote:
> > since several days "git svn fetch" didn't seem to work any more. I
> > bisected it down to
>
> I noticed it too, it's horribly slow; I can't really revert the patch
> since it conflicts, and I'm not familiar with the code, so I don't
> know how to resolve the conflict :(.
The following should work around it:
diff --git a/git-svn.perl b/git-svn.perl
index 79888a0..bc7bd21 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3255,7 +3255,6 @@ sub new {
bless $self, $class;
if (exists $git_svn->{last_commit}) {
$self->{c} = $git_svn->{last_commit};
- $self->{empty_symlinks} = _mark_empty_symlinks($git_svn);
}
$self->{empty} = {};
$self->{dir_prop} = {};
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: "git svn fetch" slow
2009-01-31 17:01 ` Markus Heidelberg
@ 2009-01-31 17:31 ` Sverre Rabbelier
0 siblings, 0 replies; 11+ messages in thread
From: Sverre Rabbelier @ 2009-01-31 17:31 UTC (permalink / raw)
To: markus.heidelberg; +Cc: Eric Wong, git
On Sat, Jan 31, 2009 at 18:01, Markus Heidelberg
<markus.heidelberg@web.de> wrote:
> The following should work around it:
Awesome! I tested it and it does indeed work around the issue for me, thanks!
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-01-31 13:14 "git svn fetch" slow Markus Heidelberg
2009-01-31 16:23 ` Sverre Rabbelier
@ 2009-02-01 2:18 ` Eric Wong
2009-02-02 3:03 ` Junio C Hamano
1 sibling, 1 reply; 11+ messages in thread
From: Eric Wong @ 2009-02-01 2:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Markus Heidelberg
Since dbc6c74d0858d77e61e092a48d467e725211f8e9, git-svn has had
an expensive check for broken symlinks that exist in some
repositories. This leads to a heavy performance hit on
repositories with many empty blobs that are not supposed to be
symlinks.
The workaround is enabled by default; and may be disabled via:
git config svn.brokenSymlinkWorkaround false
Reported by Markus Heidelberg.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
Markus Heidelberg <markus.heidelberg@web.de> wrote:
> Hi,
>
> since several days "git svn fetch" didn't seem to work any more. I
> bisected it down to
>
> commit dbc6c74d0858d77e61e092a48d467e725211f8e9
> git-svn: handle empty files marked as symlinks in SVN
> 2009-01-11
>
> In the new function _mark_empty_symlinks() there is a loop that takes
> about 36 seconds for me. That means each svn revision takes 36+x seconds
> for downloading. So it still works, but I aborted it before waiting so
> much time, so I thought, it didn't work any more.
>
> The loop loops over each blob ("git ls-tree -r git-svn | wc -l" times).
> The project I'm using git-svn with is Buildroot and it has currently
> 3074 blobs in the tree. Printing a loop counter every time the loop is
> executed, I can see that it mostly goes really fast, but there are
> files, where it needs much time then.
>
> Could there be a way to avoid this time consuming step?
>
> Markus
Documentation/git-svn.txt | 8 ++++++++
git-svn.perl | 20 ++++++++++++++++++++
t/t9131-git-svn-empty-symlink.sh | 10 ++++++++++
3 files changed, 38 insertions(+), 0 deletions(-)
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 7b654f7..3d45654 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -499,6 +499,14 @@ svn-remote.<name>.rewriteRoot::
the repository with a public http:// or svn:// URL in the
metadata so users of it will see the public URL.
+svn.brokenSymlinkWorkaround::
+This disables potentially expensive checks to workaround broken symlinks
+checked into SVN by broken clients. Set this option to "false" if you
+track a SVN repository with many empty blobs that are not symlinks.
+This option may be changed while "git-svn" is running and take effect on
+the next revision fetched. If unset, git-svn assumes this option to be
+"true".
+
--
Since the noMetadata, rewriteRoot, useSvnsyncProps and useSvmProps
diff --git a/git-svn.perl b/git-svn.perl
index 79888a0..bebcbde 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3271,10 +3271,18 @@ sub new {
# do_{switch,update}
sub _mark_empty_symlinks {
my ($git_svn) = @_;
+ my $bool = Git::config_bool('svn.brokenSymlinkWorkaround');
+ return {} if (defined($bool) && ! $bool);
+
my %ret;
my ($rev, $cmt) = $git_svn->last_rev_commit;
return {} unless ($rev && $cmt);
+ # allow the warning to be printed for each revision we fetch to
+ # ensure the user sees it. The user can also disable the workaround
+ # on the repository even while git svn is running and the next
+ # revision fetched will skip this expensive function.
+ my $printed_warning;
chomp(my $empty_blob = `git hash-object -t blob --stdin < /dev/null`);
my ($ls, $ctx) = command_output_pipe(qw/ls-tree -r -z/, $cmt);
local $/ = "\0";
@@ -3283,6 +3291,18 @@ sub _mark_empty_symlinks {
while (<$ls>) {
chomp;
s/\A100644 blob $empty_blob\t//o or next;
+ unless ($printed_warning) {
+ print STDERR "Scanning for empty symlinks, ",
+ "this may take a while if you have ",
+ "many empty files\n",
+ "You may disable this with `",
+ "git config svn.brokenSymlinkWorkaround ",
+ "false'.\n",
+ "This may be done in a different ",
+ "terminal without restarting ",
+ "git svn\n";
+ $printed_warning = 1;
+ }
my $path = $_;
my (undef, $props) =
$git_svn->ra->get_file($pfx.$path, $rev, undef);
diff --git a/t/t9131-git-svn-empty-symlink.sh b/t/t9131-git-svn-empty-symlink.sh
index 704a4f8..c3c3e42 100755
--- a/t/t9131-git-svn-empty-symlink.sh
+++ b/t/t9131-git-svn-empty-symlink.sh
@@ -87,4 +87,14 @@ test_expect_success '"bar" is an empty file' 'test -f x/bar && ! test -s x/bar'
test_expect_success 'get "bar" => symlink fix from svn' \
'(cd x && git svn rebase)'
test_expect_success '"bar" becomes a symlink' 'test -L x/bar'
+
+
+test_expect_success 'clone using git svn' 'git svn clone -r1 "$svnrepo" y'
+test_expect_success 'disable broken symlink workaround' \
+ '(cd y && git config svn.brokenSymlinkWorkaround false)'
+test_expect_success '"bar" is an empty file' 'test -f y/bar && ! test -s y/bar'
+test_expect_success 'get "bar" => symlink fix from svn' \
+ '(cd y && git svn rebase)'
+test_expect_success '"bar" becomes a symlink' '! test -L y/bar'
+
test_done
--
Eric Wong
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-01 2:18 ` [PATCH] git-svn: allow disabling expensive broken symlink checks Eric Wong
@ 2009-02-02 3:03 ` Junio C Hamano
2009-02-03 4:45 ` Eric Wong
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2009-02-02 3:03 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Markus Heidelberg
Eric Wong <normalperson@yhbt.net> writes:
> Since dbc6c74d0858d77e61e092a48d467e725211f8e9, git-svn has had
> an expensive check for broken symlinks that exist in some
> repositories. This leads to a heavy performance hit on
> repositories with many empty blobs that are not supposed to be
> symlinks.
>
> The workaround is enabled by default; and may be disabled via:
>
> git config svn.brokenSymlinkWorkaround false
>
> Reported by Markus Heidelberg.
>
> Signed-off-by: Eric Wong <normalperson@yhbt.net>
How common is this breakage in people's subversion repositories that
dbc6c74d (git-svn: handle empty files marked as symlinks in SVN,
2009-01-11) works around?
What's the way to recover from a broken import, when the subversion
repository does have such a breakage, and the user used git-svn that
predates dbc6c74? Is it very involved, and it is much better to have the
safety by default than to force everybody else who interacts with
non-broken subversion repository suffer from this performance penalty?
Because the fix (that is broken from the performance angle) is relatively
recent, I am wondering if it makes more sense to turn it off by default,
and allow people with such a broken history to optionally turn it on.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-02 3:03 ` Junio C Hamano
@ 2009-02-03 4:45 ` Eric Wong
2009-02-03 6:52 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Eric Wong @ 2009-02-03 4:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Markus Heidelberg
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
>
> > Since dbc6c74d0858d77e61e092a48d467e725211f8e9, git-svn has had
> > an expensive check for broken symlinks that exist in some
> > repositories. This leads to a heavy performance hit on
> > repositories with many empty blobs that are not supposed to be
> > symlinks.
> >
> > The workaround is enabled by default; and may be disabled via:
> >
> > git config svn.brokenSymlinkWorkaround false
> >
> > Reported by Markus Heidelberg.
> >
> > Signed-off-by: Eric Wong <normalperson@yhbt.net>
>
> How common is this breakage in people's subversion repositories that
> dbc6c74d (git-svn: handle empty files marked as symlinks in SVN,
> 2009-01-11) works around?
It's not common at all. Some broken Windows clients were able to
create it.
> What's the way to recover from a broken import, when the subversion
> repository does have such a breakage, and the user used git-svn that
> predates dbc6c74? Is it very involved, and it is much better to have the
> safety by default than to force everybody else who interacts with
> non-broken subversion repository suffer from this performance penalty?
Previously, git-svn would just stop importing and refuse to continue.
So allowing the user to enable it would be a problem; too. I don't
recall the error being easy to distinguish from other errors.
> Because the fix (that is broken from the performance angle) is relatively
> recent, I am wondering if it makes more sense to turn it off by default,
> and allow people with such a broken history to optionally turn it on.
I'm considering disabling it by default, too. It only gets triggered if
there are any empty blobs at all in the repository (which are fairly
rare, but not as rare as broken symlinks in SVN). So for a "normal"
repository it would just be the (low) overhead of calling ls-files for
every revision we fetch (and the hash-object </dev/null, which could
even be hard-coded).
Perhaps just having a "try enabling this ..." type option on any
fetch failure would be better...
--
Eric Wong
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-03 4:45 ` Eric Wong
@ 2009-02-03 6:52 ` Junio C Hamano
2009-02-03 19:10 ` Eric Wong
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2009-02-03 6:52 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Markus Heidelberg
Eric Wong <normalperson@yhbt.net> writes:
>> How common is this breakage in people's subversion repositories that
>> dbc6c74d (git-svn: handle empty files marked as symlinks in SVN,
>> 2009-01-11) works around?
>
> It's not common at all. Some broken Windows clients were able to
> create it.
>
>> What's the way to recover from a broken import, when the subversion
>> repository does have such a breakage, and the user used git-svn that
>> predates dbc6c74? Is it very involved, and it is much better to have the
>> safety by default than to force everybody else who interacts with
>> non-broken subversion repository suffer from this performance penalty?
>
> Previously, git-svn would just stop importing and refuse to continue.
> So allowing the user to enable it would be a problem; too. I don't
> recall the error being easy to distinguish from other errors.
>
>> Because the fix (that is broken from the performance angle) is relatively
>> recent, I am wondering if it makes more sense to turn it off by default,
>> and allow people with such a broken history to optionally turn it on.
>
> I'm considering disabling it by default, too.
I leave it entirely up to you to choose whichever default you find
sensible (I do not think I have to say this). I wasn't complaining your
original choice to stay on the safer side, with an option to trigger a
faster but potentially riskier behaviour.
I was curious how black-and-white the deciding factor for a sensible
default would be for this particular case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-03 6:52 ` Junio C Hamano
@ 2009-02-03 19:10 ` Eric Wong
2009-02-05 7:42 ` Eric Wong
0 siblings, 1 reply; 11+ messages in thread
From: Eric Wong @ 2009-02-03 19:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Markus Heidelberg
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
>
> >> How common is this breakage in people's subversion repositories that
> >> dbc6c74d (git-svn: handle empty files marked as symlinks in SVN,
> >> 2009-01-11) works around?
> >
> > It's not common at all. Some broken Windows clients were able to
> > create it.
> >
> >> What's the way to recover from a broken import, when the subversion
> >> repository does have such a breakage, and the user used git-svn that
> >> predates dbc6c74? Is it very involved, and it is much better to have the
> >> safety by default than to force everybody else who interacts with
> >> non-broken subversion repository suffer from this performance penalty?
> >
> > Previously, git-svn would just stop importing and refuse to continue.
> > So allowing the user to enable it would be a problem; too. I don't
> > recall the error being easy to distinguish from other errors.
> >
> >> Because the fix (that is broken from the performance angle) is relatively
> >> recent, I am wondering if it makes more sense to turn it off by default,
> >> and allow people with such a broken history to optionally turn it on.
> >
> > I'm considering disabling it by default, too.
>
> I leave it entirely up to you to choose whichever default you find
> sensible (I do not think I have to say this). I wasn't complaining your
> original choice to stay on the safer side, with an option to trigger a
> faster but potentially riskier behaviour.
>
> I was curious how black-and-white the deciding factor for a sensible
> default would be for this particular case.
If there are no objections by the time I get home tonight I'll
disable the workaround by default.
--
Eric Wong
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-03 19:10 ` Eric Wong
@ 2009-02-05 7:42 ` Eric Wong
2009-02-05 8:02 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Eric Wong @ 2009-02-05 7:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Markus Heidelberg
Eric Wong <normalperson@yhbt.net> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > Eric Wong <normalperson@yhbt.net> writes:
> >
> > >> How common is this breakage in people's subversion repositories that
> > >> dbc6c74d (git-svn: handle empty files marked as symlinks in SVN,
> > >> 2009-01-11) works around?
> > >
> > > It's not common at all. Some broken Windows clients were able to
> > > create it.
> > >
> > >> What's the way to recover from a broken import, when the subversion
> > >> repository does have such a breakage, and the user used git-svn that
> > >> predates dbc6c74? Is it very involved, and it is much better to have the
> > >> safety by default than to force everybody else who interacts with
> > >> non-broken subversion repository suffer from this performance penalty?
> > >
> > > Previously, git-svn would just stop importing and refuse to continue.
> > > So allowing the user to enable it would be a problem; too. I don't
> > > recall the error being easy to distinguish from other errors.
Actually I was wrong on git-svn refusing to continue. git-svn will
create a regular 100644 file with "link path/to/dest" as its contents.
git-svn only croaks on checksum differences with the blob itself; it
does not have an easy way to verify the mode change => 120000 if it
happened previously.
> > >> Because the fix (that is broken from the performance angle) is relatively
> > >> recent, I am wondering if it makes more sense to turn it off by default,
> > >> and allow people with such a broken history to optionally turn it on.
> > >
> > > I'm considering disabling it by default, too.
I'm considering keeping it on by default given the above
(re)discovery...
--
Eric Wong
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] git-svn: allow disabling expensive broken symlink checks
2009-02-05 7:42 ` Eric Wong
@ 2009-02-05 8:02 ` Junio C Hamano
0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2009-02-05 8:02 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Markus Heidelberg
Eric Wong <normalperson@yhbt.net> writes:
> Eric Wong <normalperson@yhbt.net> wrote:
>
>> > > Previously, git-svn would just stop importing and refuse to continue.
>> > > So allowing the user to enable it would be a problem; too. I don't
>> > > recall the error being easy to distinguish from other errors.
>
> Actually I was wrong on git-svn refusing to continue. git-svn will
> create a regular 100644 file with "link path/to/dest" as its contents.
> git-svn only croaks on checksum differences with the blob itself; it
> does not have an easy way to verify the mode change => 120000 if it
> happened previously.
Thanks for being thorough.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-02-05 8:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-31 13:14 "git svn fetch" slow Markus Heidelberg
2009-01-31 16:23 ` Sverre Rabbelier
2009-01-31 17:01 ` Markus Heidelberg
2009-01-31 17:31 ` Sverre Rabbelier
2009-02-01 2:18 ` [PATCH] git-svn: allow disabling expensive broken symlink checks Eric Wong
2009-02-02 3:03 ` Junio C Hamano
2009-02-03 4:45 ` Eric Wong
2009-02-03 6:52 ` Junio C Hamano
2009-02-03 19:10 ` Eric Wong
2009-02-05 7:42 ` Eric Wong
2009-02-05 8:02 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).