All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Jeff King <peff@peff.net>
Cc: Thomas Rast <trast@student.ethz.ch>,
	Hallvard B Furuseth <h.b.furuseth@usit.uio.no>,
	git@vger.kernel.org, Nicolas Pitre <nico@fluxnic.net>
Subject: Re: Keeping unreachable objects in a separate pack instead of loose?
Date: Mon, 11 Jun 2012 13:27:32 -0400	[thread overview]
Message-ID: <20120611172732.GB16086@thunk.org> (raw)
In-Reply-To: <20120611160824.GB12773@sigill.intra.peff.net>

On Mon, Jun 11, 2012 at 12:08:24PM -0400, Jeff King wrote:
> On Mon, Jun 11, 2012 at 11:31:03AM -0400, Ted Ts'o wrote:
> 
> > I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked
> > loose a little over 4.5 megabytes of loose objects were not pruned via
> > "git prune" (since they hadn't yet expired).  These loose objects
> > could be stored in a 244k pack file.
> 
> Out of curiosity, what is the size of the whole repo? If it's a 500M
> kernel repository, then 4.5M is not all _that_ worrisome. Not that it
> could not be better, or that it's not worth addressing (since there are
> corner cases that behave way worse). But it gives a sense of the urgency
> of the problem, if that is the scope of the issue for average use.

It' my e2fsprogs development repo.  I have my "base" repo, which is
what has been pushed out to the public (including a rewinding pu
branch).  The total size of that repo is a little over 15 megs:

<tytso@tytso-glaptop.cam.corp.google.com> {/usr/projects/e2fsprogs/e2fsprogs}  [maint]
899% ls ../base/objects/pack/
total 16156
  908 pack-6964a1516433f16e43dcdf4fcec1996052099f31.idx
15248 pack-6964a1516433f16e43dcdf4fcec1996052099f31.pack

I then have my development repo, which uses a
.git/objects/info/alternates pointing at the bare "base" repo, so the
only thing in this repo are my private development branches, and other
things that haven't been pushed for public consumption.

<tytso@tytso-glaptop.cam.corp.google.com> {/usr/projects/e2fsprogs/e2fsprogs}  [maint]
900% ls .git/objects/pack/
total 1048
 28 5a486e6c2156109f7dfc725b36a201c10652803d.idx    28 pack-7b2a9cccab669338f61a681e34c39362976fb5de.idx
224 5a486e6c2156109f7dfc725b36a201c10652803d.pack  768 pack-7b2a9cccab669338f61a681e34c39362976fb5de.pack

The 4.5 megabytes of loose objects packed down to a 224k "cruft" repo,
and 768k worth of private development objects.

So depending on how you would want to do the comparison, probably the
fairest thing to say is that I had a total "good" packs totally about
16 megs, and the loose cruft objects was an additional 4.5 megabytes.

> I don't think that will work, because we will keep repacking the
> unreachable bits into new packs. And the 2-week expiration is based on
> the pack timestamp. So if your "repack -Ad" ends in two packs (the one
> you actually want, and the pack of expired crap), then you would get
> into this cycle:
> 
>   1. You run "git repack -Ad". It makes A.pack, with stuff you want, and
>      B.pack, with unreachable junk. They both get a timestamp of "now".
> 
>   2. A day passes. You run "git repack -Ad" again. It makes C.pack, the
>      new stuff you want, and repacks all of B.pack along with the
>      new expired cruft from A.pack, making D.pack. B.pack can go away.
>      D.pack gets a timestamp of "now".

Hmm, yes.  What we'd really want to do is to make D.pack contain those
items that were are newly unreachable, not including the objects in
B.pack, and keep B.pack around until the expiry window goes by.  But
that's a much more complicated thing, and the proof-of-concept
algorithm I had outlined wouldn't do that.

> I think solving it for good would involve a separate list of per-object
> expiration dates. Obviously we get that easily with loose objects (since
> it is one object per file).

Well, either that or we need to teach git-repack the difference
between packs that are expected to contain good stuff, and packs that
contain cruft, and to not copy "old cruft" to new packs, so the old
pack can finally get nuked 2 weeks (or whatever the expire window
might happen to be) later.

					- Ted

  parent reply	other threads:[~2012-06-11 17:27 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o
2012-06-10 23:24 ` Hallvard B Furuseth
2012-06-11 14:44   ` Thomas Rast
2012-06-11 15:31     ` Ted Ts'o
2012-06-11 16:08       ` Jeff King
2012-06-11 17:04         ` Nicolas Pitre
2012-06-11 17:45           ` Ted Ts'o
2012-06-11 17:54             ` Jeff King
2012-06-11 18:20               ` Ted Ts'o
2012-06-11 18:43                 ` Jeff King
2012-06-11 17:46           ` Jeff King
2012-06-11 17:27         ` Ted Ts'o [this message]
2012-06-11 18:34           ` Jeff King
2012-06-11 20:44             ` Hallvard Breien Furuseth
2012-06-11 21:14               ` Jeff King
2012-06-11 21:41                 ` Hallvard Breien Furuseth
2012-06-11 21:14             ` Ted Ts'o
2012-06-11 21:39               ` Jeff King
2012-06-11 22:14                 ` Ted Ts'o
2012-06-11 22:23                   ` Jeff King
2012-06-11 22:28                     ` Ted Ts'o
2012-06-11 22:35                       ` Jeff King
2012-06-12  0:41                     ` Nicolas Pitre
2012-06-12 17:10                       ` Jeff King
2012-06-12 17:30                         ` Nicolas Pitre
2012-06-12 17:32                           ` Jeff King
2012-06-12 17:45                             ` Shawn Pearce
2012-06-12 17:50                               ` Jeff King
2012-06-12 17:57                                 ` Nicolas Pitre
2012-06-12 18:43                                 ` Andreas Schwab
2012-06-12 19:07                                   ` Jeff King
2012-06-12 19:09                                   ` Nicolas Pitre
2012-06-12 19:23                                     ` Jeff King
2012-06-12 19:39                                       ` Nicolas Pitre
2012-06-12 19:41                                         ` Jeff King
2012-06-12 17:55                               ` Nicolas Pitre
2012-06-12 17:49                             ` Nicolas Pitre
2012-06-12 17:54                               ` Jeff King
2012-06-12 18:25                                 ` Nicolas Pitre
2012-06-12 18:37                                   ` Ted Ts'o
2012-06-12 19:15                                     ` Nicolas Pitre
2012-06-12 19:19                                       ` Ted Ts'o
2012-06-12 19:35                                         ` Nicolas Pitre
2012-06-12 19:43                                           ` Ted Ts'o
2012-06-12 19:15                                   ` Jeff King
2012-06-13 18:17                                     ` Martin Fick
2012-06-13 21:27                                       ` Johan Herland
2012-06-11 15:40 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120611172732.GB16086@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=h.b.furuseth@usit.uio.no \
    --cc=nico@fluxnic.net \
    --cc=peff@peff.net \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.