git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Nix <nix@esperi.org.uk>, Steven Grimm <koreth@midwinter.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Subject: [PATCH] git-merge-pack
Date: Thu, 06 Sep 2007 20:51:58 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.0.9999.0709061942320.21186@xanadu.home> (raw)
In-Reply-To: <7v1wdb9ymf.fsf_-_@gitster.siamese.dyndns.org>

On Thu, 6 Sep 2007, Junio C Hamano wrote:

> This is a beginning of "git-merge-pack" that combines smaller
> packs into one.  Currently it does not actually create a new
> pack, but pretends that it is a (dumb) "git-rev-list --objects"
> that lists the objects in the affected packs.  You have to pipe
> its output to "git-pack-objects".
> 
> The command reads names of pack-*.pack files from the standard
> input, outputs the objects' names in the order they are stored
> in the original packs (i.e. the offset order).  This sorting is
> done in order to emulate the traversal order the original
> "git-rev-list --objects" that was used to create the existing
> pack listed the objects.
> 
> While this approach would give the resulting packfile very
> similar locality of access as the original, it does not give the
> "name" component you would see in "git-rev-list --objects"
> output.  This information is used as the clustering cue while
> computing delta, and the lack of it means you can get horrible
> delta selection.  You do _not_ want to run the downstream
> "git-pack-objects" without the optimization/heuristics to reuse
> delta.  IOW, do not run it with --no-reuse-delta.

I wonder if this is the best way to go.  In the context of a really fast 
repack happening automatically after (or during) user interactive 
operations, the above seems a bit heavyweight and slow to me.

I would have concatenated all packs provided on the command line into a 
single one, simply by reading data from existing packs and writing it 
back without any processing at all.  The offset for OBJ_OFS_DELTA is 
relative so a simple concatenation will just work.

Then the index for that pack can be created just as easily by reading 
existing pack index files and storing the data into an array of struct 
pack_idx_entry, adding the appropriate offset to object offsets, then 
call write_idx_file().

All data is read once and written once making it no more costly than a 
simple file copy.  On the flip side it wouldn't get rid of duplicated 
objects (I don't know if that matters i.e. if something might break with 
the same object twice in a pack).

> To consolidate all packs that are smaller than a megabytes into
> one, you would use it in its current form like this:
> 
>     $ old=$(find .git/objects/pack -type f -name '*.pack' -size 1M)
>     $ new=$(echo "$old" | git merge-pack | git pack-objects pack)
>     $ for p in $old; do rm -f $p ${p%.pack}.idx; done
>     $ for s in pack idx; do mv pack-$new.$s .git/objects/pack/; done

You might want to move the new pack before removing the old ones though.

> An obvious next steps that can be done in parallel by interested
> parties would be:
> 
>  (1) come up with a way to give "name" aka "clustering cue" (I
>      think this is very hard);

It is, and IMHO not worth it.  If you do it separately from the usual 
pack-objects process you'll perform extra IO and decompression when 
walking tree objects just to reconstruct those paths, becoming really 
slow by the context definition I provided above.

If you really want to do it then the best way might simply to reverse 
your find result above, in order to use pack-objects as if the larger 
packs, i.e. the ones that you don't want to merge, simply had an 
associated .keep file.

In fact, since we want to _also_ perform a repack of loose objects in 
the context of automatic repacking, I wonder why we wouldn't use that 
--unpacked= argument to also repack smallish packs at the same time in 
only one pack-objects pass.  Or maybe I'm missing something?


Nicolas

  parent reply	other threads:[~2007-09-07  0:52 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-05  7:09 People unaware of the importance of "git gc"? Linus Torvalds
2007-09-05  7:21 ` Martin Langhoff
2007-09-05  7:37   ` Karl Hasselström
2007-09-05  7:30 ` Junio C Hamano
2007-09-05  7:26   ` Tomash Brechko
2007-09-05  8:13   ` Johan Herland
2007-09-05  8:39     ` Matthieu Moy
2007-09-05  8:41       ` Johan Herland
2007-09-05  8:47         ` David Kastrup
2007-09-05  8:51       ` Pierre Habouzit
2007-09-05  9:02         ` David Kastrup
2007-09-05  9:04         ` Matthieu Moy
2007-09-05  8:51   ` Wincent Colaiuta
2007-09-05  7:42 ` Pierre Habouzit
2007-09-05  8:16   ` Junio C Hamano
2007-09-05  8:50   ` Steven Grimm
     [not found]     ` <86ps0xcwxo.fsf@lola.quinscape.zz>
2007-09-05  9:07       ` Steven Grimm
2007-09-05  9:13         ` David Kastrup
2007-09-05  9:07     ` Junio C Hamano
2007-09-05  9:27       ` Martin Langhoff
2007-09-05  9:33         ` Matthieu Moy
2007-09-05 14:17           ` Johan De Messemaeker
2007-09-05 17:31             ` Matthieu Moy
2007-09-05 23:56               ` Jeff King
2007-09-05  9:13     ` David Kastrup
2007-09-05  9:14     ` Pierre Habouzit
2007-09-05 17:51   ` Nix
2007-09-05 18:14     ` Steven Grimm
2007-09-05 18:22       ` Nix
2007-09-05 18:54         ` Nicolas Pitre
2007-09-05 20:01           ` Junio C Hamano
2007-09-05 20:35             ` Nicolas Pitre
2007-09-05 21:14               ` Nix
2007-09-05 21:46               ` Junio C Hamano
2007-09-05 23:04                 ` Nicolas Pitre
2007-09-05 23:42                   ` Junio C Hamano
2007-09-06  0:27                     ` Carlos Rica
2007-09-06  5:55                 ` David Kastrup
2007-09-05 21:49               ` Junio C Hamano
2007-09-05 21:59                 ` Invoke "git gc --auto" from commit, merge, am and rebase Junio C Hamano
2007-09-06  2:39                   ` Shawn O. Pearce
2007-09-05 20:37             ` [PATCH] Invoke "git gc --auto" from "git add" and "git fetch" Junio C Hamano
     [not found]               ` <69b0c0350709051357ifa547aarfe3e0b36cf9be98f@mail.gmail.com>
2007-09-05 20:59                 ` Fwd: " Govind Salinas
2007-09-06 12:02               ` Johannes Schindelin
2007-09-05 21:18             ` People unaware of the importance of "git gc"? Alex Riesen
2007-09-06  2:44             ` Russ Dill
2007-09-06  2:52               ` Shawn O. Pearce
2007-09-06  9:28               ` Andreas Ericsson
2007-09-06  2:45             ` Shawn O. Pearce
2007-09-06  2:49               ` Steven Grimm
2007-09-06  2:56                 ` Shawn O. Pearce
2007-09-06 15:54             ` Johannes Schindelin
2007-09-06 17:49               ` Junio C Hamano
2007-09-06 18:15                 ` Linus Torvalds
2007-09-06 18:29                   ` Steven Grimm
2007-09-06 23:12                   ` Subject: [PATCH] git-merge-pack Junio C Hamano
2007-09-06 23:35                     ` Linus Torvalds
2007-09-07  0:51                     ` Nicolas Pitre [this message]
2007-09-07  1:58                       ` Junio C Hamano
2007-09-07  2:32                         ` Nicolas Pitre
2007-09-07  4:07                       ` Shawn O. Pearce
2007-09-07  4:43                       ` Junio C Hamano
2007-09-08  9:50                         ` [PATCH] make sha1_file.c::matches_pack_name() available to others Junio C Hamano
2007-09-08 10:01                         ` [PATCH] pack-objects --repack-unpacked Junio C Hamano
2007-09-07  7:11                     ` Subject: [PATCH] git-merge-pack Johannes Sixt
2007-09-07  7:34                       ` Junio C Hamano
2007-09-07  7:24                     ` Andy Parkins
2007-09-07  4:48                 ` People unaware of the importance of "git gc"? Shawn O. Pearce
2007-09-07 10:12                 ` Johannes Schindelin
2018-10-07 18:28           ` What's so special about objects/17/ ? Ævar Arnfjörð Bjarmason
2018-10-07 18:35             ` Johannes Sixt
2018-10-07 19:06               ` Ævar Arnfjörð Bjarmason
2018-10-07 22:39                 ` Johannes Sixt
2018-10-08  0:54                   ` Junio C Hamano
2018-10-07 19:46             ` Junio C Hamano
2018-10-07 20:07               ` Junio C Hamano
2018-10-08 19:17                 ` Stefan Beller
2018-10-09  1:03                   ` Junio C Hamano
2018-10-09 17:37                     ` Stefan Beller
2018-10-10  1:10                       ` Junio C Hamano
2018-10-10 19:08                         ` Stefan Beller
2018-10-08 10:36               ` Ævar Arnfjörð Bjarmason
2018-10-09  1:07                 ` Junio C Hamano
2018-10-09 17:40                   ` Stefan Beller
2007-09-05  8:16 ` People unaware of the importance of "git gc"? David Kastrup
2007-09-05 16:47 ` Govind Salinas
2007-09-05 17:19   ` Carl Worth
2007-09-05 17:55     ` Jing Xue
2007-09-05 17:35   ` Steven Grimm
2007-09-05 18:28     ` Nix
2007-09-05 17:44 ` J. Bruce Fields
2007-09-05 18:46   ` Brandon Casey
2007-09-05 19:09     ` David Kastrup
2007-09-05 19:13       ` J. Bruce Fields
2007-09-05 19:43         ` David Kastrup
2007-09-05 19:20       ` Mike Hommey
2007-09-05 21:07 ` Alex Riesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.9999.0709061942320.21186@xanadu.home \
    --to=nico@cam.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=koreth@midwinter.com \
    --cc=nix@esperi.org.uk \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).