Re: [PATCH 1/3] Lazily open pack index files on demand

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: A Large Angry SCM <gitzilla@gmail.com>
To: Dana How <danahow@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Nicolas Pitre <nico@cam.org>, Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org
Subject: Re: [PATCH 1/3] Lazily open pack index files on demand
Date: Sun, 27 May 2007 22:30:17 -0400	[thread overview]
Message-ID: <465A3EB9.7090403@gmail.com> (raw)
In-Reply-To: <56b7f5510705271835m5a375324p3a908fe766fdf902@mail.gmail.com>

Dana How wrote:
[...]
> 
> Some history of what I've been doing with git:
> First I simply had to import the repo,
> which led to split packs (this was before index v2).
> Then maintaining the repo led to the unfinished maxblobsize stuff.
> Distributing the repo included users pulling (usually) from the central 
> repo,
> which would be trivial since it was also an alternate.
> Local repacking would avoid heavy load on it.
> 
> Now I've started looking into how to push back into the
> central repo from a user's repo (not everything will be central;
> some pulling between users will occur
> otherwise I wouldn't be as interested).
> 
> It looks like the entire sequence is:
> A. git add file [compute SHA-1 & compress file into objects/xx]
> B. git commit [write some small objects locally]
> C. git push {using PROTO_LOCAL}:
> 1. read & uncompress objects
> 2. recompress objects into a pack and send through a pipe
> 3. read pack on other end of pipe and uncompress each object
> 4. compute SHA-1 for each object and compress file into objects/xx
> 
> So, after creating an object in the local working tree,
> to get it into the central repo,  we must:
> compress -> uncompress -> compress -> uncompress -> compress.
> In responsiveness this won't compare very well to Perforce,
> which has only one compress step.
> 
> The sequence above could be somewhat different currently in git.
> The user might have repacked their repo before pushing,
> but this just moves C1 and C2 back earlier in time,
> it doesn't remove the need for them.  Besides,  the blobs in
> a push are more likely to be recent and hence unpacked.
> 
> Also,  C3 and C4 might not happen if more than 100 blobs get pushed.
> But this seems very unusual; only 0.3% of commits in the history
> had 100+ new files/file contents.  If the 100 level is reduced,
> then the central repo fills up with packfiles and their index files,
> reducing performance for everybody (using the central repo as an 
> alternate).
> 
> Thus there really is 5X more compression activity going on
> compared to Perforce.  How can this be reduced?
> 
> One way is to restore the ability to write the "new" loose object format.
> Then C1, C2, and C4 disappear.  C3 must remain because we need
> to uncompress the object to compute its SHA-1;  we don't need
> to recompress since we were already given the compressed form.
> 
> And that final sentence is why I sent this email:  if the packfile
> contained the SHA-1s,  either at the beginning or before each object,
> then they wouldn't need to be recomputed at the receiving end
> and the extra decompression could be skipped as well.  This would
> make the total zlib effort the same as Perforce.
> 
> The fact that a loose object is never overwritten would still be retained.
> Is that sufficient security?  Or does the SHA-1 always need to be
> recomputed on the receiving end?  Could that be skipped just for
> specific connections and/or protocols (presumably "trusted" ones)?
[...]

So how do you want to decide when to trust the sender and when to 
validate that the objects received have the SHA-1's claimed? A _central_ 
repository, being authoritative, would need to _always_ validate _all_ 
objects it receives. An since, with a central repository setup, the 
central repository is where the CPU resources are the most in demand, 
validating the object IDs when received at the developers repositories 
should not be a problem. And just to be fair, how does Perforce 
guarantee that the retrieved version of a file matches what was checked in?

next prev parent reply	other threads:[~2007-05-28  2:30 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-26  5:24 [PATCH 1/3] Lazily open pack index files on demand Shawn O. Pearce
2007-05-26  8:29 ` Junio C Hamano
2007-05-26 17:30   ` Shawn O. Pearce
2007-05-26 17:31   ` Dana How
2007-05-27  2:43     ` Nicolas Pitre
2007-05-27  4:31       ` Dana How
2007-05-27 14:41         ` Nicolas Pitre
2007-05-27  3:34     ` Shawn O. Pearce
2007-05-27  4:40       ` Dana How
2007-05-27 15:29         ` Nicolas Pitre
2007-05-27 21:35           ` Shawn O. Pearce
2007-05-28  1:35             ` Dana How
2007-05-28  2:30               ` A Large Angry SCM [this message]
2007-05-28 18:31               ` Nicolas Pitre
2007-05-28  2:18             ` Nicolas Pitre
2007-05-27 15:26       ` Nicolas Pitre
2007-05-27 16:06         ` Dana How
2007-05-27 21:52         ` Shawn O. Pearce
2007-05-27 23:35           ` Nicolas Pitre
2007-05-28 16:22             ` Linus Torvalds
2007-05-28 17:13               ` Nicolas Pitre
2007-05-28 17:40               ` Karl Hasselström
  -- strict thread matches above, loose matches on Subject: below --
2007-05-27 10:46 Martin Koegler
2007-05-27 15:36 ` Nicolas Pitre
2007-05-29  0:09 linux
2007-05-29  3:26 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465A3EB9.7090403@gmail.com \
    --to=gitzilla@gmail.com \
    --cc=danahow@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).