git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Geert Bosch <bosch@adacore.com>
Cc: Nicolas Pitre <nico@cam.org>,
	Troy Telford <ttelford.groups@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] Support 64-bit indexes for pack files.
Date: Tue, 27 Feb 2007 11:11:22 -0500	[thread overview]
Message-ID: <20070227161122.GE3230@spearce.org> (raw)
In-Reply-To: <5FE0C988-0DA8-4BFB-8F0C-42F97808E6F8@adacore.com>

Geert Bosch <bosch@adacore.com> wrote:
> When I import a large code-base (such as a *.tar.gz), I don't know
> beforehand how many objects I'm going to create. Ideally, I'd like
> to stream them directly into a new pack without ever having to write
> the expanded source to the filesystem.

See git-fast-import.  If you are coming from a tar, also see
contrib/fast-import/import-tars.perl.  :-)
 
> So for creating a large pack from a stream of data, you have to do  
> the following:
>   1. write out a temporary pack file to disk without correct count
>   2. fix-up the count
>   3. read the entire temporary pack file to compute the final SHA-1
>   4. fix-up the SHA1 at the end of the file
>   5. construct and write out the index

Yes, this is exactly what git-fast-import does.  Yes, it sort of
sucks.  But its not as bad as you think.
 
> There are a few ways to fixing this:
>   - Have a count of 0xffffffff mean: look in the index for the count.
>     Pulling/pushing would still use regular counted pack files.
>   - Have the pack file checksum be the SHA1 of (the count followed
>     by the SHA1 of the compressed data of each object). This would  
> allow 3.
>     to be done without reading back all data.

I don't think it is worth it.  Aside from git-fast-import we
always know the object count before we start writing any data.
But despite that, fast-import runs quite well.

-- 
Shawn.

  reply	other threads:[~2007-02-27 20:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-26 22:40 [PATCH] Support 64-bit indexes for pack files Troy Telford
2007-02-26 23:55 ` Shawn O. Pearce
2007-02-27  0:24   ` Nicolas Pitre
2007-02-27  0:31     ` Shawn O. Pearce
2007-02-27  4:32       ` Nicolas Pitre
2007-02-27  4:55         ` Geert Bosch
2007-02-27  5:11           ` Nicolas Pitre
2007-02-27 16:04             ` Geert Bosch
2007-02-27 16:11               ` Shawn O. Pearce [this message]
2007-02-27 16:55                 ` Geert Bosch
2007-02-27 17:36                   ` Nicolas Pitre
2007-02-28  3:52                     ` Shawn O. Pearce
2007-02-28  4:12                       ` Nicolas Pitre
2007-02-27 17:03               ` Nicolas Pitre
2007-02-27 20:05               ` Johannes Schindelin
2007-02-27 20:25                 ` Geert Bosch
2007-02-27 20:35                   ` Johannes Schindelin
2007-02-27  1:16   ` Troy Telford
2007-02-27  4:56   ` Nicolas Pitre
2007-02-28 19:46 ` Troy Telford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070227161122.GE3230@spearce.org \
    --to=spearce@spearce.org \
    --cc=bosch@adacore.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=ttelford.groups@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).