From: "Dana How" <danahow@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: "Junio C Hamano" <junkio@cox.net>,
git@vger.kernel.org, danahow@gmail.com
Subject: Re: [PATCH 1/3] Lazily open pack index files on demand
Date: Sat, 26 May 2007 21:40:42 -0700 [thread overview]
Message-ID: <56b7f5510705262140rea5e1e5r49bdd5e99c466daa@mail.gmail.com> (raw)
In-Reply-To: <20070527033429.GY28023@spearce.org>
On 5/26/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> Dana How <danahow@gmail.com> wrote:
> > Shawn: When I first saw the index-loading code, my first
> > thought was that all the index tables should be
> > merged (easy since sorted) so callers only need to do one search.
>
> Yes; in fact this has been raised on the list before. The general
> idea was to create some sort of "super index" that had a list of
> all objects and which packfile they could be found in. This way the
> running process doesn't have to search multiple indexes, and the
> process doesn't have to be responsible for the merging itself.
>
> See the thing is, if you read all of every .idx file on a simple
> `git-log` operation you've already lost. The number of trees and
> blobs tends to far outweigh the number of commits and they really
> outweigh the number of commits the average user looks at in a
> `git-log` session before they abort their pager. So sorting all
> of the available .idx files before we produce even the first commit
> is a horrible thing to do.
>
> But the problem with a super index is repacking. Every time the user
> repacks their recent loose objects (or recently fetched packs) we are
> folding some packfiles together, but may be leaving others alone.
> The super index would need to account for the packfiles we aren't
> looking at or repacking. It gets complicated fast.
>
> There's also the problem of alternate ODBs; do we fold the indexes
> of our alternates into our own super index? Or does each ODB get
> its own super index and we still have to load multiple super index
> files?
Yes, the problem is that even an on-demand, "lazy" merge
is likely to require far more work than the expected number of index probes.
> In pack v4 we're likely to move the SHA-1 table from the .idx file
> into the front of the .pack file. This makes the .idx file hold
> only the offsets and the CRC checkums of each object. If we start
> making a super index, we have to duplicate the SHA-1 table twice
> (once in the .pack, again in the super index).
Hmm, hopefully the SHA-1 table can go at the _end_
since with split packs that's the only time we know the number
of objects in the pack... ;-)
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
next prev parent reply other threads:[~2007-05-27 4:40 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-26 5:24 [PATCH 1/3] Lazily open pack index files on demand Shawn O. Pearce
2007-05-26 8:29 ` Junio C Hamano
2007-05-26 17:30 ` Shawn O. Pearce
2007-05-26 17:31 ` Dana How
2007-05-27 2:43 ` Nicolas Pitre
2007-05-27 4:31 ` Dana How
2007-05-27 14:41 ` Nicolas Pitre
2007-05-27 3:34 ` Shawn O. Pearce
2007-05-27 4:40 ` Dana How [this message]
2007-05-27 15:29 ` Nicolas Pitre
2007-05-27 21:35 ` Shawn O. Pearce
2007-05-28 1:35 ` Dana How
2007-05-28 2:30 ` A Large Angry SCM
2007-05-28 18:31 ` Nicolas Pitre
2007-05-28 2:18 ` Nicolas Pitre
2007-05-27 15:26 ` Nicolas Pitre
2007-05-27 16:06 ` Dana How
2007-05-27 21:52 ` Shawn O. Pearce
2007-05-27 23:35 ` Nicolas Pitre
2007-05-28 16:22 ` Linus Torvalds
2007-05-28 17:13 ` Nicolas Pitre
2007-05-28 17:40 ` Karl Hasselström
-- strict thread matches above, loose matches on Subject: below --
2007-05-27 10:46 Martin Koegler
2007-05-27 15:36 ` Nicolas Pitre
2007-05-29 0:09 linux
2007-05-29 3:26 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56b7f5510705262140rea5e1e5r49bdd5e99c466daa@mail.gmail.com \
--to=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).