Git development
 help / color / mirror / Atom feed
From: Robert Schiele <rschiele@gmail.com>
To: Eric Wong <normalperson@yhbt.net>
Cc: git@vger.kernel.org
Subject: following untracked parents in git-svn
Date: Tue, 22 Dec 2009 11:28:17 +0100	[thread overview]
Message-ID: <20091222102815.GA12259@sigfpe.ibm.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 3680 bytes --]

Hi Eric et al.,

While using git-svn to work with a repository with a very complex history I
discovered a very unfortunate behavior:

In general when a branch was derived (copied) from somewhere else git-svn
follows this parent branch and imports it.  If multiple branches do that
git-svn detects that the corresponding parrent branch already had been
imported and reuses the imported data.  Unfortunately when the parent
directory in the svn repository is not tracked as a branch in the svn-remote
section of the config file (for instance when it is just a subdirectory of a
tracked branch) this situation is no longer detected and this parent branch is
imported multiple times with the same result.  In a large repository this can
increase importing time drastically.

My analysis (as far as I understand the code) is that this is because the map
files in .git/svn are indexed by their ref name in the git repository.
Untracked branches are indexed by the name of their following branch ref name
followed by @XX where XX is the revision number of the branch point.
Obviously with that scheme the index name for two branches following a common
parent tree is different and thus an already imported tree is not correctly
detected.

My thoughts where now that this could potentially be fixed by not indexing
those map files by their ref name in the git repository but by their location
in the original svn repository.  Given that my understanding of the git-svn
code is not good enough to decide about all the consequences of such a design
change I'd like to ask you whether you think this change would be a good idea
or whether I might have overlooked a fundamental problem that makes it
impossible (or at least hard) to implement this idea.

Since my description of the problem might be a bit confusing without an
example I created a very small svn repository that shows this problem.  A svn
repository dump for it is attached.  When importing this repository using the
svn-remote section

[svn-remote "svn"]
	url = file:///dev/shm/x/svn1
	fetch = trunk:refs/remotes/trunk
	branches = branches/*:refs/remotes/*
	tags = tags/*:refs/remotes/tags/*

you will get the following behavior during the import:

$ git svn init -s file:///dev/shm/x/svn1
Initialized empty Git repository in /dev/shm/x/git2/.git/
$ git svn fetch
r1 = 7920f3e7e70c9bb9d8a7caf28830c7ed205c20c6 (refs/remotes/trunk)
	A	x/alpha
r2 = db7ad1b41f1d2ad18d198b9a80d2606b27557faf (refs/remotes/trunk)
	A	x/beta
r3 = a35cab9c510f66d96437f21ecb738c93e0c6b793 (refs/remotes/trunk)
Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo1, 2
Initializing parent: refs/remotes/foo1@2
	A	alpha
r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo1@2)
Found branch parent: (refs/remotes/foo1) 5584693b5216dc1fa05f56455c67dfd61093ee43
Following parent with do_switch
	A	beta
Successfully followed parent
r4 = d0cb7cfc1f69e52ecd39d8eb67518abe136b53d3 (refs/remotes/foo1)
Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo2, 2
Initializing parent: refs/remotes/foo2@2
	A	alpha
r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo2@2)
Found branch parent: (refs/remotes/foo2) 5584693b5216dc1fa05f56455c67dfd61093ee43
Following parent with do_switch
	A	beta
Successfully followed parent
r5 = 181cb81070b816bef74adefa1bc4c451100a5eef (refs/remotes/foo2)
Checked out HEAD:
  file:///dev/shm/x/svn1/trunk r3

As you can see file:///dev/shm/x/svn1/trunk/x is imported twice.  For this
small repository this is not a big issue but when this tree had a deep history
in a large repository you wanted to avoid that.

Robert

[-- Attachment #1.2: svndump.gz --]
[-- Type: application/x-gzip, Size: 616 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

             reply	other threads:[~2009-12-22 10:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-22 10:28 Robert Schiele [this message]
2009-12-22 18:38 ` following untracked parents in git-svn Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091222102815.GA12259@sigfpe.ibm.com \
    --to=rschiele@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox