git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
To: Oswald Buddenhagen <ossi@kde.org>
Cc: git@vger.kernel.org
Subject: GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?)
Date: Tue, 28 Aug 2012 21:19:53 +0200 (CEST)	[thread overview]
Message-ID: <hbf.20120828vnfp@bombur.uio.no> (raw)
In-Reply-To: <loom.20120827T233125-780@post.gmane.org>

Oswald Buddenhagen wrote:
> (...)so the second approach is the "bare aggregator repo" which adds
> all other repos as remotes, and the other repos link back via
> alternates. problems:
> 
> - to actually share objects, one always needs to push to the aggregator

Run a cron job which frequently does that?

> - tags having a shared namespace doesn't actually work, because the
> repos have the same tags on different commits (they are independent
> repos, after all)

Junio's proposal partially fixes that: It pushes refs/* instead of
refs/heads/*, to refs/remotes/<borrowing repo>/.  However...

> - one still cannot safely garbage-collect the aggregator, as the refs
> don't include the stashes and the index, so rebasing may invalidate
> these more transient objects.

Also if you copy a repo (e.g. making a backup) instead of cloning it,
and then start using both, they'll push into the same namespace -
overwriting each other's refs.  Non-fast-forward pushes can thus lose
refs to objects needed by the other repo.

receive.denyNonFastForwards only rejects pushes to refs/heads/ or
something.  (A feature, as I learned when I reported it as bug:-)
IIRC Git has no config option to reject all non-fast-forward pushes.

> i would re-propose hallvard's "volatile" alternates (at least i think that's
> what he was talking about two weeks ago): they can be used to obtain
> objects, but every object which is in any way referenced from the current
> clone must be available locally (or from a "regular" alternate). that means
> that diffing, etc.  would get objects only temporarily, while cherry-picking
> would actually copy (some of) the objects. this would make it possible to
> "cross-link" repositories, safely and without any "3rd parties".

I'm afraid that idea by itself won't work:-(  Either you borrow from a
store or not.  If Git uses an object from the volatile store, it can't
always know if the caller needs the object to be copied.

OTOH volatile stores which you do *not* borrow from would be useful:
Let fetch/repack/gc/whatever copy missing objects from there.


2nd attempt for a way to gc of the alternate repo:  Copy the with
removed objects into each borrowing repo, then gc them.   Like this:

1. gc, but pack all to-be-removed objects into a "removable" pack.

2. Hardlink/copy the removable pack - with a .keep file - into
   borrowing repos when feasible:  I.e. repos you can find and
   have write access to.  Update their .git/objects/info/packs.
   (Is there a Git command for this?)  Repeat until nothing to do,
   in case someone created a new repo during this step.

3. Move the pack from the alternate repo to a backup object store
   which will keep it for a while.

4. Delete the .keep files from step (2).  They were needed in case
   a user gc'ed away an object from the pack and then added an
   identical object - borrowed from the to-be-removed pack.

5. gc/repack the other repos at your leisure.

666. Repos you could not update in step (2), can get temporarily
   broken.  Their owners must link the pack from the backup store by
   hand, or use that store as a volatile store and then gc/repack.

Loose objects are a problem:  If a repo has longer expiry time(s)
than the alternate store, it will get loads of loose objects from all
repos which push into the alternate store.  Worse, gc can *unpack*
those objects, consuming a lot of space.  See threads "git gc == git
garbage-create from removed branch" (3 May) and "Keeping unreachable
objects in a separate pack instead of loose?" (10 Jun).

Presumably the work-arounds are:
- Use long expiry times in the alternate repo.  I don't know which
  expiration config settings are relevant how.
- Add some command which checks and warns if the repo has longer
  expiry time than the repo it borrows from.
Also I hope Git will be changed to instead pack such loose objects
somewhere, as discussed in the above threads.

All in all, this isn't something you'd want to do every day.  But it
looks doable and can be scripted.

  reply	other threads:[~2012-08-28 19:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
2012-08-05  9:38 ` Michael Haggerty
2012-08-05 19:01   ` Junio C Hamano
2012-08-07  6:16   ` Jeff King
2012-08-06 21:55 ` Junio C Hamano
2012-08-08  1:42 ` Sascha Cunz
2012-08-11  9:35 ` Hallvard Breien Furuseth
2012-08-27 22:39 ` Oswald Buddenhagen
2012-08-28 19:19   ` Hallvard Breien Furuseth [this message]
2012-08-29  7:42     ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Oswald Buddenhagen
2012-08-29 15:52       ` GC of alternate object store Junio C Hamano
2012-08-30  9:53         ` Oswald Buddenhagen
2012-08-30 16:03           ` Junio C Hamano
2012-08-31 16:26             ` Oswald Buddenhagen
2012-08-31 19:18               ` Dan Johnson
2012-08-31 19:45                 ` Junio C Hamano
2012-09-01  4:25                   ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-01 11:22                     ` Jeff King
2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
2012-09-01 14:34                         ` Jens Lehmann
2012-09-01 15:27                           ` [PATCH] submodule: " Jens Lehmann
2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
2012-09-01 11:34                         ` [PATCH 3/2] argv-array: fix bogus cast when freeing array Jeff King
2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-07 17:07                         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=hbf.20120828vnfp@bombur.uio.no \
    --to=h.b.furuseth@usit.uio.no \
    --cc=git@vger.kernel.org \
    --cc=ossi@kde.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).