From: Linus Torvalds <torvalds@osdl.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: "linux@horizon.com" <linux@horizon.com>, git@vger.kernel.org
Subject: Re: svn to git, N-squared?
Date: Mon, 12 Jun 2006 10:08:05 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0606120958230.5498@g5.osdl.org> (raw)
In-Reply-To: <9e4733910606120944p4deb170ejc2863846685917f6@mail.gmail.com>
On Mon, 12 Jun 2006, Jon Smirl wrote:
>
> The svn repository was built by cvs2svn, none of the git tools were involved.
Ok, so that part is purely a SVN issue.
Having that many files in a single directory (or two) is a total disaster.
That said, it works well enough if you don't create new files very often
(and _preferably_ don't look them up either, although that is effectively
helped by indexing). I _suspect_ that
- the "cvs->svn" import process was probably optimized so that it did one
file at a time (your "eight stages" description certainly sounds as if
it could do it), and in that case it's entirely possible that that can
be done efficiently (ie you still do file creates and lookups in an
increasingly big directory, but you do it only _once_ per file, rather
than look up old files all the time). So your lookup ratio would be 1:1
with the files.
Doing a git-cvsimport would then do basically random lookups in that
_huge_ directory, and instead of reading the files one at a time (and
fully) and never again, I assume it opens them, reads one revision,
closes it, and then goes on to the next revision, so it will have a
much higher lookup ratio (you'd look up every file several times).
- I suspect the SVN people must be hurting for performance themselves. I
guess they don't expect to be able to do 5-10 commits per second, the
way git was designed to do. So they optimized the cvs import part, but
their actual regular live usage is probably hitting this same directory
inefficiency.
Of course, the old SVN Berkeley DB usage was probably even worse (not in
system time, but I'd expect the access patterns within the BDB file to be
pretty nasty, and probably a lot of user time spent seeking around it).
But in this particular case, it might even have been better.
Maybe we could teach the SVN people about pack-files? ;)
Linus
next prev parent reply other threads:[~2006-06-12 17:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-12 4:39 svn to git, N-squared? linux
2006-06-12 15:32 ` Jon Smirl
2006-06-12 15:45 ` Linus Torvalds
2006-06-12 15:55 ` Jon Smirl
2006-06-12 16:12 ` Linus Torvalds
2006-06-12 16:22 ` Jon Smirl
2006-06-12 16:32 ` Jon Smirl
2006-06-12 16:57 ` Linus Torvalds
2006-06-12 16:41 ` Linus Torvalds
2006-06-12 16:44 ` Jon Smirl
2006-06-12 17:08 ` Linus Torvalds [this message]
2006-06-12 18:06 ` Jon Smirl
2006-06-12 19:00 ` Jon Smirl
2006-06-12 16:16 ` Jon Smirl
-- strict thread matches above, loose matches on Subject: below --
2006-06-12 2:02 Jon Smirl
2006-06-12 3:31 ` Linus Torvalds
2006-06-12 3:39 ` Jon Smirl
2006-06-12 4:02 ` Linus Torvalds
2006-06-12 19:04 ` Yakov Lerner
2006-06-12 19:17 ` Linus Torvalds
2006-06-12 16:18 ` Randal L. Schwartz
2006-06-12 16:25 ` Randal L. Schwartz
2006-06-12 4:29 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0606120958230.5498@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=linux@horizon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).