git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Eran Tromer <git2eran@tromer.org>
Cc: Nicolas Pitre <nico@cam.org>, Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org
Subject: Re: fetching packs and storing them as packs
Date: Fri, 27 Oct 2006 00:42:34 -0400	[thread overview]
Message-ID: <20061027044233.GA29057@spearce.org> (raw)
In-Reply-To: <4541850B.8060608@tromer.org>

Eran Tromer <git2eran@tromer.org> wrote:
> On 2006-10-27 05:00, Shawn Pearce wrote:
> >> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
> >> To receive or fetch a pack:
> >> 1. Add references to the new heads in
> >>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
> >> 2. Put the new .pack under $GIT_DIR/objects/pack/.
> >> 3. Put the new .idx under $GIT_DIR/objects/pack/.
> >> 4. Update the relevant heads under $GIT_DIR/refs/.
> >> 5. Delete the references from step 1.
> 
> > That was actually my (and also Sean's) solution.  Except I would
> > put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
> > less code to change and its consistent with how temporary loose
> > objects are created.
> 
> If you do that, other programs (e.g., anyone who uses rev-list --all)
> may try to walk those heads or consider them available before the pack
> is really there. The point about $GIT_DIR/tmp/refs is that only programs
> meddling with physical packs (git-fetch, git-receive-pack, git-repack)
> will know about it.
 
Doh.  Yes, of course, that makes much sense.

Hmm... Looking at git-repack we have two things currently pending
to rework in there:

  - Historical vs. active packs.
  - Don't delete a possibly still incoming pack during -d.

These have a lot of the same implementation issues.  We need to
be able to identify a set of packs which should be allowed for
repack with -a, and allowed for removal with -d if -a was also used.
A newly uploaded pack cannot be in that list unless its contents are
referenced by one or more refs (which implies that the receive-pack
process has completed).

I'm thinking that the ref thing might be unnecessary.  We just
need to fix repack so it builds a list of "active packs" whose
objects should be copied into the new pack, and then only packs
loose objects and those objects contained by an active packs.

So the receive-pack process becomes:

  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
  b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
  c. Write pack and index.
  d. Move pack to $GIT_DIR/objects/pack/...
  e. Move index to $GIT_DIR/objects/pack...
  f. Update refs.
  g. Arrange for new pack and index to be considered active.

And the repack -a -d process becomes:

  1. List all active packs and store in memory.
  2. Repack only loose objects and objects contained in active packs.
  3. Move new pack and idx into $GIT_DIR/objects/pack/...
  4. Arrange for new pack and idx to be considered active.
  5. Delete active packs found by step #1.

Junio was originally considering making historical packs
historical by placing their names into an information file (such as
`$GIT_DIR/objects/info/historical-packs`) and then consider all other
packs as active.  Thus step #1 is list all packs and removes those
whose names appear in historical-packs, while step #4 is unnecessary.

I was thinking about just changing the "pack-" prefix to "hist-" for
the historical packs and assuming all "pack-*.pack" to be active.
Thus step #1 is a simple glob on the pack directory and step #4
is unnecessary.

In the latter case its easy to mark an existing pack as historical
(just hardlink hist- names for pack, then idx, then unlink previous
names) and its also easy to mark new incoming packs as non active
by using a different prefix (e.g. "incm-") during step #d/#e and
then relinking them as "pack-" during step #g.  Its also very safe
on systems that support hardlinks.

We shouldn't ever need to worry about race conditions with repacking
historical packs.  For starters historical packs will tend to be
several years' worth of object accumulation and will be so large
that repacking them might take 45 minutes or more.  Thus they
probably will never get repacked.  An active pack will simply move
into historical status after it gets so large that its no longer
worthwhile to keep repacking it.  They also will tend to have objects
that are so old that at least one ref in the repository will point
at their entire DAG and thus everything would carry over on a repack.

So this would be cleaner then messing around with temporary refs and
gets us the historical pack feature we've been looking to implement.

-- 

  reply	other threads:[~2006-10-27  4:42 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-26  3:44 fetching packs and storing them as packs Nicolas Pitre
2006-10-26 14:45 ` Eran Tromer
     [not found]   ` <Pine.LNX.4.64.0610261105200.12418@xanadu.home>
2006-10-26 22:09     ` Eran Tromer
2006-10-27  0:50       ` Nicolas Pitre
2006-10-27  1:42         ` Shawn Pearce
2006-10-27  2:38           ` Sean
2006-10-27  6:57             ` Junio C Hamano
2006-10-27 17:23               ` Nicolas Pitre
2006-10-27  2:41           ` Nicolas Pitre
2006-10-27  2:42           ` Eran Tromer
2006-10-27  3:00             ` Shawn Pearce
2006-10-27  3:13               ` Sean
2006-10-27  3:20                 ` Jakub Narebski
2006-10-27  3:27                   ` Sean
2006-10-27  4:03               ` Eran Tromer
2006-10-27  4:42                 ` Shawn Pearce [this message]
2006-10-27  7:42                   ` Alex Riesen
2006-10-27  7:52                     ` Shawn Pearce
2006-10-27  8:08                       ` Alex Riesen
2006-10-27  8:13                         ` Shawn Pearce
2006-10-27 14:27               ` Nicolas Pitre
2006-10-27 14:38                 ` Petr Baudis
2006-10-27 14:48                   ` J. Bruce Fields
2006-10-27 15:03                     ` Petr Baudis
2006-10-27 16:04                       ` J. Bruce Fields
2006-10-27 16:05                         ` J. Bruce Fields
2006-10-27 18:56                   ` Junio C Hamano
2006-10-27 20:22   ` Linus Torvalds
2006-10-27 21:53     ` Junio C Hamano
2006-10-28  3:42       ` Shawn Pearce
2006-10-28  4:09         ` Junio C Hamano
2006-10-28  4:18         ` Linus Torvalds
2006-10-28  5:42           ` Junio C Hamano
2006-10-28  7:21             ` Shawn Pearce
2006-10-28  8:40               ` Shawn Pearce
2006-10-28 19:15                 ` Junio C Hamano
2006-10-29  3:50                   ` Shawn Pearce
2006-10-29  4:29                     ` Junio C Hamano
2006-10-29  4:38                       ` Shawn Pearce
2006-10-29  5:16                         ` Junio C Hamano
2006-10-29  5:21                           ` Shawn Pearce
2006-10-28 17:59               ` Linus Torvalds
2006-10-28 18:34               ` Junio C Hamano
2006-10-28 22:31               ` Eran Tromer
2006-10-29  3:38                 ` Shawn Pearce
2006-10-29  3:48                   ` Jakub Narebski
2006-10-29  3:52                     ` Shawn Pearce
2006-10-29  7:47 ` [PATCH] send-pack --keep: do not explode into loose objects on the receiving end Junio C Hamano
2006-10-29  7:56   ` Shawn Pearce
2006-10-29  8:05     ` Junio C Hamano
2006-10-30  1:44     ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061027044233.GA29057@spearce.org \
    --to=spearce@spearce.org \
    --cc=git2eran@tromer.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).