All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Dana How <danahow@gmail.com>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [PATCH 1/3] Lazily open pack index files on demand
Date: Sat, 26 May 2007 23:34:29 -0400	[thread overview]
Message-ID: <20070527033429.GY28023@spearce.org> (raw)
In-Reply-To: <56b7f5510705261031o311b89bapd730374cbc063931@mail.gmail.com>

Dana How <danahow@gmail.com> wrote:
> Shawn:  When I first saw the index-loading code,  my first
> thought was that all the index tables should be
> merged (easy since sorted) so callers only need to do one search.

Yes; in fact this has been raised on the list before.  The general
idea was to create some sort of "super index" that had a list of
all objects and which packfile they could be found in.  This way the
running process doesn't have to search multiple indexes, and the
process doesn't have to be responsible for the merging itself.

See the thing is, if you read all of every .idx file on a simple
`git-log` operation you've already lost.  The number of trees and
blobs tends to far outweigh the number of commits and they really
outweigh the number of commits the average user looks at in a
`git-log` session before they abort their pager.  So sorting all
of the available .idx files before we produce even the first commit
is a horrible thing to do.

But the problem with a super index is repacking.  Every time the user
repacks their recent loose objects (or recently fetched packs) we are
folding some packfiles together, but may be leaving others alone.
The super index would need to account for the packfiles we aren't
looking at or repacking.  It gets complicated fast.

There's also the problem of alternate ODBs; do we fold the indexes
of our alternates into our own super index?  Or does each ODB get
its own super index and we still have to load multiple super index
files?

In pack v4 we're likely to move the SHA-1 table from the .idx file
into the front of the .pack file.  This makes the .idx file hold
only the offsets and the CRC checkums of each object.  If we start
making a super index, we have to duplicate the SHA-1 table twice
(once in the .pack, again in the super index).

-- 
Shawn.

  parent reply	other threads:[~2007-05-27  3:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-26  5:24 [PATCH 1/3] Lazily open pack index files on demand Shawn O. Pearce
2007-05-26  8:29 ` Junio C Hamano
2007-05-26 17:30   ` Shawn O. Pearce
2007-05-26 17:31   ` Dana How
2007-05-27  2:43     ` Nicolas Pitre
2007-05-27  4:31       ` Dana How
2007-05-27 14:41         ` Nicolas Pitre
2007-05-27  3:34     ` Shawn O. Pearce [this message]
2007-05-27  4:40       ` Dana How
2007-05-27 15:29         ` Nicolas Pitre
2007-05-27 21:35           ` Shawn O. Pearce
2007-05-28  1:35             ` Dana How
2007-05-28  2:30               ` A Large Angry SCM
2007-05-28 18:31               ` Nicolas Pitre
2007-05-28  2:18             ` Nicolas Pitre
2007-05-27 15:26       ` Nicolas Pitre
2007-05-27 16:06         ` Dana How
2007-05-27 21:52         ` Shawn O. Pearce
2007-05-27 23:35           ` Nicolas Pitre
2007-05-28 16:22             ` Linus Torvalds
2007-05-28 17:13               ` Nicolas Pitre
2007-05-28 17:40               ` Karl Hasselström
  -- strict thread matches above, loose matches on Subject: below --
2007-05-27 10:46 Martin Koegler
2007-05-27 15:36 ` Nicolas Pitre
2007-05-29  0:09 linux
2007-05-29  3:26 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070527033429.GY28023@spearce.org \
    --to=spearce@spearce.org \
    --cc=danahow@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.