* following untracked parents in git-svn
@ 2009-12-22 10:28 Robert Schiele
2009-12-22 18:38 ` Eric Wong
0 siblings, 1 reply; 2+ messages in thread
From: Robert Schiele @ 2009-12-22 10:28 UTC (permalink / raw)
To: Eric Wong; +Cc: git
[-- Attachment #1.1: Type: text/plain, Size: 3680 bytes --]
Hi Eric et al.,
While using git-svn to work with a repository with a very complex history I
discovered a very unfortunate behavior:
In general when a branch was derived (copied) from somewhere else git-svn
follows this parent branch and imports it. If multiple branches do that
git-svn detects that the corresponding parrent branch already had been
imported and reuses the imported data. Unfortunately when the parent
directory in the svn repository is not tracked as a branch in the svn-remote
section of the config file (for instance when it is just a subdirectory of a
tracked branch) this situation is no longer detected and this parent branch is
imported multiple times with the same result. In a large repository this can
increase importing time drastically.
My analysis (as far as I understand the code) is that this is because the map
files in .git/svn are indexed by their ref name in the git repository.
Untracked branches are indexed by the name of their following branch ref name
followed by @XX where XX is the revision number of the branch point.
Obviously with that scheme the index name for two branches following a common
parent tree is different and thus an already imported tree is not correctly
detected.
My thoughts where now that this could potentially be fixed by not indexing
those map files by their ref name in the git repository but by their location
in the original svn repository. Given that my understanding of the git-svn
code is not good enough to decide about all the consequences of such a design
change I'd like to ask you whether you think this change would be a good idea
or whether I might have overlooked a fundamental problem that makes it
impossible (or at least hard) to implement this idea.
Since my description of the problem might be a bit confusing without an
example I created a very small svn repository that shows this problem. A svn
repository dump for it is attached. When importing this repository using the
svn-remote section
[svn-remote "svn"]
url = file:///dev/shm/x/svn1
fetch = trunk:refs/remotes/trunk
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
you will get the following behavior during the import:
$ git svn init -s file:///dev/shm/x/svn1
Initialized empty Git repository in /dev/shm/x/git2/.git/
$ git svn fetch
r1 = 7920f3e7e70c9bb9d8a7caf28830c7ed205c20c6 (refs/remotes/trunk)
A x/alpha
r2 = db7ad1b41f1d2ad18d198b9a80d2606b27557faf (refs/remotes/trunk)
A x/beta
r3 = a35cab9c510f66d96437f21ecb738c93e0c6b793 (refs/remotes/trunk)
Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo1, 2
Initializing parent: refs/remotes/foo1@2
A alpha
r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo1@2)
Found branch parent: (refs/remotes/foo1) 5584693b5216dc1fa05f56455c67dfd61093ee43
Following parent with do_switch
A beta
Successfully followed parent
r4 = d0cb7cfc1f69e52ecd39d8eb67518abe136b53d3 (refs/remotes/foo1)
Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo2, 2
Initializing parent: refs/remotes/foo2@2
A alpha
r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo2@2)
Found branch parent: (refs/remotes/foo2) 5584693b5216dc1fa05f56455c67dfd61093ee43
Following parent with do_switch
A beta
Successfully followed parent
r5 = 181cb81070b816bef74adefa1bc4c451100a5eef (refs/remotes/foo2)
Checked out HEAD:
file:///dev/shm/x/svn1/trunk r3
As you can see file:///dev/shm/x/svn1/trunk/x is imported twice. For this
small repository this is not a big issue but when this tree had a deep history
in a large repository you wanted to avoid that.
Robert
[-- Attachment #1.2: svndump.gz --]
[-- Type: application/x-gzip, Size: 616 bytes --]
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: following untracked parents in git-svn
2009-12-22 10:28 following untracked parents in git-svn Robert Schiele
@ 2009-12-22 18:38 ` Eric Wong
0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2009-12-22 18:38 UTC (permalink / raw)
To: Robert Schiele; +Cc: git
Robert Schiele <rschiele@gmail.com> wrote:
> Hi Eric et al.,
>
> While using git-svn to work with a repository with a very complex history I
> discovered a very unfortunate behavior:
>
> In general when a branch was derived (copied) from somewhere else git-svn
> follows this parent branch and imports it. If multiple branches do that
> git-svn detects that the corresponding parrent branch already had been
> imported and reuses the imported data. Unfortunately when the parent
> directory in the svn repository is not tracked as a branch in the svn-remote
> section of the config file (for instance when it is just a subdirectory of a
> tracked branch) this situation is no longer detected and this parent branch is
> imported multiple times with the same result. In a large repository this can
> increase importing time drastically.
>
> My analysis (as far as I understand the code) is that this is because the map
> files in .git/svn are indexed by their ref name in the git repository.
> Untracked branches are indexed by the name of their following branch ref name
> followed by @XX where XX is the revision number of the branch point.
> Obviously with that scheme the index name for two branches following a common
> parent tree is different and thus an already imported tree is not correctly
> detected.
Hi Robert, I'm aware of this problem. It's not hit too often, but
occassional repositories I follow tend to hit this.
> My thoughts where now that this could potentially be fixed by not indexing
> those map files by their ref name in the git repository but by their location
> in the original svn repository. Given that my understanding of the git-svn
> code is not good enough to decide about all the consequences of such a design
> change I'd like to ask you whether you think this change would be a good idea
> or whether I might have overlooked a fundamental problem that makes it
> impossible (or at least hard) to implement this idea.
Your idea sounds like it should work. Unfortunately the code is a mess
and I've been lazy and lacking time/sufficient motivation to clean it
up, but I'd be glad to accept patches since the test coverage is pretty
good.
--
Eric Wong
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-12-22 18:38 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-22 10:28 following untracked parents in git-svn Robert Schiele
2009-12-22 18:38 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox