git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: Geert Bosch <bosch@adacore.com>,
	Steven Grimm <koreth@midwinter.com>,
	"git@vger.kernel.org List" <git@vger.kernel.org>
Subject: Re: Excruciatingly slow git-svn imports
Date: Mon, 5 May 2008 21:25:08 -0700	[thread overview]
Message-ID: <20080506042508.GA23465@untitled> (raw)
In-Reply-To: <32541b130805052056g450b69cfg46693bc3c0c5a1ed@mail.gmail.com>

Avery Pennarun <apenwarr@gmail.com> wrote:
> On 5/5/08, Eric Wong <normalperson@yhbt.net> wrote:
> > Interesting.  By  "These commits seemed all to have thousands of files",
> >  you mean the first 35K that took up most of the time?  If so, yes,
> >  that's definitely a problem...
> >
> >  git-svn requests a log from SVN containing a list of all paths modified
> >  in each revision.  By default, git-svn only requests log entries for up
> >  to 100 revisions at a time to reduce memory usage.  However, having
> >  thousands of files modified for each revision would still be
> >  problematic, as would having insanely long commit messages.
> 
> On my system, any branch that was created using "svn cp" of a toplevel
> directory seems to cause git-svn to (rather slowly) download every
> single file in the entire branch for the first commit on that branch,
> giving a symptom that sounds a lot like the above "commits with
> thousands of files".  I assumed this was just an intentional design
> decision in git-svn, to be slow and safe instead of fast and loose.
> Is it actually supposed to do something smarter than that?

When using "svn cp" on a top-level directory, it *should*
just show up as a single file change in the log entry.
Something like:

  A /project/branch/my-new-branch (from /project/trunk:1234)

This would not take much memory at all.
However, I've also occasionally seen stuff like this:

  A /project/branch/my-new-branch
  A /project/branch/my-new-branch/file1 (from /project/trunk/file1:1234)
  A /project/branch/my-new-branch/file2 (from /project/trunk/file2:1234)
  A /project/branch/my-new-branch/file3 (from /project/trunk/file3:1234)
  .... many more files and directories along the same lines ...

This is what I suspect Geert is seeing in his repository and causing
problems.  Perhaps something caused by cvs2svn importing those tags into
SVN originally?


But the symptom you're seeing with git-svn downloading every file seems
to be the result of using a pre-1.4.3 version of the Perl SVN bindings
which lacked a working do_switch() function.  I fallback to using
do_update() and checking out a new tree for SVN 1.4.2 and before.
So yes, I'm definitely safe, slow and _lazy_ by falling back to
do_update() instead of doing something fancy to workaround something
that's already fixed in SVN :)

-- 
Eric Wong

  reply	other threads:[~2008-05-06  4:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-24 18:54 Excruciatingly slow git-svn imports Geert Bosch
2008-04-24 19:57 ` Steven Grimm
2008-04-29  7:11   ` Eric Wong
2008-05-05  4:29     ` Geert Bosch
2008-05-06  3:28       ` Eric Wong
2008-05-06  3:56         ` Avery Pennarun
2008-05-06  4:25           ` Eric Wong [this message]
2008-05-06 11:23             ` Geert Bosch
2008-04-29  7:03 ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080506042508.GA23465@untitled \
    --to=normalperson@yhbt.net \
    --cc=apenwarr@gmail.com \
    --cc=bosch@adacore.com \
    --cc=git@vger.kernel.org \
    --cc=koreth@midwinter.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).