git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Jeff King <peff@peff.net>
Cc: Thomas Rast <trast@student.ethz.ch>,
	Hallvard B Furuseth <h.b.furuseth@usit.uio.no>,
	git@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>
Subject: Re: Keeping unreachable objects in a separate pack instead of loose?
Date: Mon, 11 Jun 2012 17:14:01 -0400	[thread overview]
Message-ID: <20120611211401.GA21775@thunk.org> (raw)
In-Reply-To: <20120611183414.GD20134@sigill.intra.peff.net>

On Mon, Jun 11, 2012 at 02:34:14PM -0400, Jeff King wrote:
> You _could_ make a separate cruft pack for each pack that you repack. So
> if I have A.pack and B.pack, I'd pack all of the reachable objects into
> C.pack, and then make D.pack containing the unreachable objects from
> A.pack, and E.pack with the unreachable objects from B.pack. And then
> set the mtime of the cruft packs to that of their parent packs.
> 
> And then the next time you pack, repacking D and E would probably be a
> no-op that preserves mtime, but might create a new pack that ejects some
> now-reachable object.
> 
> To implement that, I think your --list-unreachable would just have to
> print a list of "<pack-mtime> <sha1>" pairs, and then you would pack
> each set with an identical mtime (or even a "close enough" mtime within
> some slop)....

How about this instead?  We distinguish between cruft packs and "real"
packs by the filename.  So we have "cruft-<SHA1>.{idx,pack}" and
"pack-<SHA1>.{idx.pack}".

Normally, git will look at any pack in the pack directory that has an
.idx and .pack extension, but during repack operation, it will by only
look in the pack-* packs first.  If it can't find an object there, it
will then fall back to trying to fetch the object from the cruft-*
packs, and if it finds the object, it copies it into the new pack
which is creating, thus "rescueing" an object which reappears during
the expiry window.  This should be a relatively rare event, and if it
happens, the object will be in two packs, a pack-* pack and a cruft-*
pack, but that's OK.

So since git pack-objects isn't even looking in the cruft-* packs
except when it needs to rescue an object, the objects in the cruft-*
packs won't get copied, and we won't need to have per-object mtimes.
It also means it will go faster since it's not copying the cruft-*
packs at all, and possibly not even looking at them.

Now all we need to do is delete any cruft-* packs which are older than
the expiry window.  We don't even need to look at their contents.

It does imply that we may accumulate a new cruft-<SHA1> pack each time
we run git gc, but users shouldn't be running git gc all that often
anyway.  And even if they do run it all the time, it will still be
more efficient than keeping the unreachable objects as loose objects.

     	       	    	    		    	    - Ted

  parent reply	other threads:[~2012-06-11 21:14 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o
2012-06-10 23:24 ` Hallvard B Furuseth
2012-06-11 14:44   ` Thomas Rast
2012-06-11 15:31     ` Ted Ts'o
2012-06-11 16:08       ` Jeff King
2012-06-11 17:04         ` Nicolas Pitre
2012-06-11 17:45           ` Ted Ts'o
2012-06-11 17:54             ` Jeff King
2012-06-11 18:20               ` Ted Ts'o
2012-06-11 18:43                 ` Jeff King
2012-06-11 17:46           ` Jeff King
2012-06-11 17:27         ` Ted Ts'o
2012-06-11 18:34           ` Jeff King
2012-06-11 20:44             ` Hallvard Breien Furuseth
2012-06-11 21:14               ` Jeff King
2012-06-11 21:41                 ` Hallvard Breien Furuseth
2012-06-11 21:14             ` Ted Ts'o [this message]
2012-06-11 21:39               ` Jeff King
2012-06-11 22:14                 ` Ted Ts'o
2012-06-11 22:23                   ` Jeff King
2012-06-11 22:28                     ` Ted Ts'o
2012-06-11 22:35                       ` Jeff King
2012-06-12  0:41                     ` Nicolas Pitre
2012-06-12 17:10                       ` Jeff King
2012-06-12 17:30                         ` Nicolas Pitre
2012-06-12 17:32                           ` Jeff King
2012-06-12 17:45                             ` Shawn Pearce
2012-06-12 17:50                               ` Jeff King
2012-06-12 17:57                                 ` Nicolas Pitre
2012-06-12 18:43                                 ` Andreas Schwab
2012-06-12 19:07                                   ` Jeff King
2012-06-12 19:09                                   ` Nicolas Pitre
2012-06-12 19:23                                     ` Jeff King
2012-06-12 19:39                                       ` Nicolas Pitre
2012-06-12 19:41                                         ` Jeff King
2012-06-12 17:55                               ` Nicolas Pitre
2012-06-12 17:49                             ` Nicolas Pitre
2012-06-12 17:54                               ` Jeff King
2012-06-12 18:25                                 ` Nicolas Pitre
2012-06-12 18:37                                   ` Ted Ts'o
2012-06-12 19:15                                     ` Nicolas Pitre
2012-06-12 19:19                                       ` Ted Ts'o
2012-06-12 19:35                                         ` Nicolas Pitre
2012-06-12 19:43                                           ` Ted Ts'o
2012-06-12 19:15                                   ` Jeff King
2012-06-13 18:17                                     ` Martin Fick
2012-06-13 21:27                                       ` Johan Herland
2012-06-11 15:40 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120611211401.GA21775@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=h.b.furuseth@usit.uio.no \
    --cc=nico@fluxnic.net \
    --cc=peff@peff.net \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).