From: Linus Torvalds <torvalds@osdl.org>
To: Nix <nix@esperi.org.uk>
Cc: git@vger.kernel.org
Subject: Re: Handling very large numbers of symbolic references?
Date: Tue, 25 Jul 2006 15:23:57 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0607251508540.29649@g5.osdl.org> (raw)
In-Reply-To: <87psfteb4l.fsf@hades.wkstn.nix>
On Tue, 25 Jul 2006, Nix wrote:
>
> However, this causes a potential problem. There are tens of thousands of
> these bugs, and the .git/refs/heads directory gets *enormous* and thus
> the system gets terribly terribly slow (crappy old Solaris filesystem
> syndrome).
I would really suggest you use some lookup logic of your own to handle
this, because having that many refs will slow down a lot of things.
That said, you can certainly use a hierarchy of refs, and just have them
as
.git/refs/heads/00/000-999
01/000-999
02/000-999
...
if you want to avoid the dreaded filesystem meltdown.
I suspect it would suck, though. You'd still end up with tens of thousands
of small files, with no good way to pack them together.
> It seems to me there are two ways to fix this:
>
> - restructure .git/refs/* in a similar way to .git/objects, i.e. as a
> one- or two-level tree.
So this work already.
> - the vast majority of these bugs are closed. They still need to be got
> at now and again for branch merges, but they could be got out of
> .refs/heads at delete_branch time, and pushed into a tree consisting
> entirely of deleted branches, which would in turn be pointed at from
> some new place under .refs; perhaps .refs/heads/heavy (by analogy to
> non-lightweight tags). The problem here is that whenever we delete
> a tag, we'll leak that tree (at least we will if it's in a pack), and
> that leakage really could add up in the end.
Well, the problem to some degree is that a number of git routines will
look up all heads (eg things like "git pull" and "git ls-remote" and "git
push", not to mention all the visualizers that want to show all the heads.
So so if you really en dup doing them as individual heads, I'm afraid that
performance will suck big-time. And it wouldn't really help to put them
under .git/refs/heads/heavy, you'd still be in trouble.
> I'm not sure which way is preferable. Suggestions? Is the entire idea
> lunatic?
I think you _can_ use git in the way you propose, but it's going to be
fundamentally pretty inefficient. The diskspace usage will be inefficient
(tens of thousands of files, all just 41 characters in size), but even
more importantly, as mentioned, you'll have things like cloning or pulling
a repository always havign to get tens of thousands of references, and
that's just going to be very very slow.
So yes, I think it's a bit lunatic.
Git scales much better in _other_ ways. For example, one thing you could
do is to have each bug-report be described as a _file_ instead of as a
tag, and then have just one (or a few branches), and you'd have nice
naming of bugs just because the filenames can be nice. That would allow
git to shine because it scales well in things git is good at, ie the
database itself.
You'd probably want to introduce the notion of a nice specialized "merge"
for those files (assuming you really want to do _distributed_ reporting,
and actually merge two different databases that have the same bugs), but
git should actually be quite good at supporting something like that, even
if you might have to do some infrastructure yourself.
OR, you could actually teach git about other ways of looking up names. So
if you decide that you do want to have one branch per bug, you might want
to teach git about a new "ref" file format that has multiple name/ref
translations in the same file. That would solve the disk usage problem,
even if it would _not_ solve the ineffiency of tools that might be
slightly unhappy to see thousands and thousands of refs.
Anyway, whatever approach you select, send patches to Junio. I'm sure that
we can try to make git support even some rather strange models.
Linus
next prev parent reply other threads:[~2006-07-25 22:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-25 19:29 Handling very large numbers of symbolic references? Nix
2006-07-25 21:29 ` Rene Scharfe
2006-07-25 21:52 ` Nix
2006-07-25 22:23 ` Linus Torvalds [this message]
2006-07-25 23:08 ` Nix
2006-07-25 23:20 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2006-07-26 18:38 linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0607251508540.29649@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=nix@esperi.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).