From: Linus Torvalds <torvalds@linux-foundation.org>
To: Anton Tropashko <atropashko@yahoo.com>
Cc: git@vger.kernel.org
Subject: Re: Errors cloning large repo
Date: Mon, 12 Mar 2007 11:40:38 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0703121057530.9690@woody.linux-foundation.org> (raw)
In-Reply-To: <315943.12751.qm@web52606.mail.yahoo.com>
On Mon, 12 Mar 2007, Anton Tropashko wrote:
>
> > Its very likely this did fit in just under 4 GiB of packed data,
> > but as you said, without O_LARGEFILE we can't work with it.
>
> .git is 3.5GB according to du -H :)
Ok, that's good. That means that we really can use git without any major
issues, and that it's literally apparently only receive-pack that has
problems.
I didn't even realize that we have
#define _FILE_OFFSET_BITS 64
in the header file, but not only is that a glibc-specific thing, it also
won't really even cover all issues.
For example, if a file is opened from the shell (ie we're talking shell
re-direction etc), that means that since the program that used
_FILE_OFFSET_BITS wasn't the one opening, it was opened without
O_LARGEFILE, and as such a write() will hit the LFS 31-bit limit.
That said, I'm not quite seeing why the _FILE_OFFSET_BITS trick doesn't
help. We don't have any shell redirection in that path.
I just did an "strace -f" on a git clone on x86, and all the git open's
seemed to use O_LARGEFILE, but that's with a very recent git.
I think you said that you had git-1.4.1 on the client, and I think that
the _FILE_OFFSET_BITS=64 hack went in after that, and if your client just
upgrades to the current 1.5.x release, it will all "just work" for you.
> Just curious why won't you use something like
> PostgreSQL for data storage at this point, but, then
> I know nothing about git internals :)
I can pretty much guarantee that if we used a "real" database, we'd have
- really really horrendously bad performance
- total inability to actually recover from errors.
Other SCM projects have used databases, and it *always* boils down that.
Most either die off, or decide to just do their own homegrown database (eg
switching to FSFS for SVN).
Even database people seem to have figured it out lately: relational
databases are starting to lose ground to specialized ones. These days you
can google for something like
relational specialized database performance
and you'll see real papers that are actually finally being taken seriously
about how specialized databases often have performance-advantages of
orders of magnitude. There's a paper (the above will find it, but if you
add "one size fits all" you'll probably find it even better) that talks
about benchmarking specialized databases against RDBMS, and they are
*literally* talking about three and four *orders*of*magnitude* speedups
(ie not factors of 2 or three, but factors of _seven_hundred_).
In other words, the whole relational database hype is so seventies and
eighties. People have since figured out that yeah, they are convenient to
program in if you want to do Visual Basic kind of things, but they really
are *not* a replacement for good data structures.
So git has ended up writing its own data structures, but git is a lot
better for it.
Linus
next prev parent reply other threads:[~2007-03-12 18:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-12 17:39 Errors cloning large repo Anton Tropashko
2007-03-12 18:40 ` Linus Torvalds [this message]
-- strict thread matches above, loose matches on Subject: below --
2007-03-13 0:02 Anton Tropashko
2007-03-10 2:37 Anton Tropashko
2007-03-10 3:07 ` Shawn O. Pearce
2007-03-10 5:54 ` Linus Torvalds
2007-03-10 6:01 ` Shawn O. Pearce
2007-03-10 22:32 ` Martin Waitz
2007-03-10 22:46 ` Linus Torvalds
2007-03-11 21:35 ` Martin Waitz
2007-03-10 10:27 ` Jakub Narebski
2007-03-11 2:00 ` Shawn O. Pearce
2007-03-12 11:09 ` Jakub Narebski
2007-03-12 14:24 ` Shawn O. Pearce
2007-03-17 13:23 ` Jakub Narebski
[not found] ` <82B0999F-73E8-494E-8D66-FEEEDA25FB91@adacore.com>
2007-03-10 22:21 ` Linus Torvalds
2007-03-10 5:10 ` Linus Torvalds
2007-03-10 1:21 Anton Tropashko
2007-03-10 1:45 ` Linus Torvalds
2007-03-09 23:48 Anton Tropashko
2007-03-10 0:54 ` Linus Torvalds
2007-03-10 2:03 ` Linus Torvalds
2007-03-10 2:12 ` Junio C Hamano
2007-03-09 19:20 Anton Tropashko
2007-03-09 21:37 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0703121057530.9690@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=atropashko@yahoo.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).