git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: "linux@horizon.com" <linux@horizon.com>, git@vger.kernel.org
Subject: Re: svn to git, N-squared?
Date: Mon, 12 Jun 2006 10:08:05 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0606120958230.5498@g5.osdl.org> (raw)
In-Reply-To: <9e4733910606120944p4deb170ejc2863846685917f6@mail.gmail.com>



On Mon, 12 Jun 2006, Jon Smirl wrote:
> 
> The svn repository was built by cvs2svn, none of the git tools were involved.

Ok, so that part is purely a SVN issue.

Having that many files in a single directory (or two) is a total disaster. 
That said, it works well enough if you don't create new files very often 
(and _preferably_ don't look them up either, although that is effectively 
helped by indexing). I _suspect_ that 

 - the "cvs->svn" import process was probably optimized so that it did one 
   file at a time (your "eight stages" description certainly sounds as if 
   it could do it), and in that case it's entirely possible that that can 
   be done efficiently (ie you still do file creates and lookups in an 
   increasingly big directory, but you do it only _once_ per file, rather 
   than look up old files all the time). So your lookup ratio would be 1:1 
   with the files.

   Doing a git-cvsimport would then do basically random lookups in that 
   _huge_ directory, and instead of reading the files one at a time (and 
   fully) and never again, I assume it opens them, reads one revision, 
   closes it, and then goes on to the next revision, so it will have a 
   much higher lookup ratio (you'd look up every file several times).

 - I suspect the SVN people must be hurting for performance themselves. I 
   guess they don't expect to be able to do 5-10 commits per second, the 
   way git was designed to do. So they optimized the cvs import part, but 
   their actual regular live usage is probably hitting this same directory 
   inefficiency.

Of course, the old SVN Berkeley DB usage was probably even worse (not in 
system time, but I'd expect the access patterns within the BDB file to be 
pretty nasty, and probably a lot of user time spent seeking around it). 
But in this particular case, it might even have been better.

Maybe we could teach the SVN people about pack-files? ;)

			Linus

  reply	other threads:[~2006-06-12 17:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-12  4:39 svn to git, N-squared? linux
2006-06-12 15:32 ` Jon Smirl
2006-06-12 15:45   ` Linus Torvalds
2006-06-12 15:55     ` Jon Smirl
2006-06-12 16:12       ` Linus Torvalds
2006-06-12 16:22         ` Jon Smirl
2006-06-12 16:32           ` Jon Smirl
2006-06-12 16:57             ` Linus Torvalds
2006-06-12 16:41           ` Linus Torvalds
2006-06-12 16:44             ` Jon Smirl
2006-06-12 17:08               ` Linus Torvalds [this message]
2006-06-12 18:06                 ` Jon Smirl
2006-06-12 19:00                   ` Jon Smirl
2006-06-12 16:16     ` Jon Smirl
  -- strict thread matches above, loose matches on Subject: below --
2006-06-12  2:02 Jon Smirl
2006-06-12  3:31 ` Linus Torvalds
2006-06-12  3:39   ` Jon Smirl
2006-06-12  4:02     ` Linus Torvalds
2006-06-12 19:04       ` Yakov Lerner
2006-06-12 19:17         ` Linus Torvalds
2006-06-12 16:18   ` Randal L. Schwartz
2006-06-12 16:25     ` Randal L. Schwartz
2006-06-12  4:29 ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0606120958230.5498@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=linux@horizon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).