git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
Date: Sat, 11 Aug 2012 11:35:53 +0200 (CEST)	[thread overview]
Message-ID: <hbf.20120811d15z@bombur.uio.no> (raw)
In-Reply-To: <7vmx2a3pif.fsf@alter.siamese.dyndns.org>

Junio C Hamano wrote:
>    Some ideas:
> 
>    - Make "clone --reference" without "-s" not to borrow from the
>      reference repository.  (...)

Generalize: Introduce volatile alternate object stores.  Commands like
(remote) fetch, repack, gc will copy desired objects they see there.

That allows pruneable alternates if people want them: Make every
borrowing repo also borrow from a companion volatile store.  To prune
some shared objects:  Move them from the alternate to the volatile.
Repack or gc all borrowing repos.  Empty the volatile alternate.
Similar to detach from one alternate repo while keeping others:
gc with the to-be-dropped alternate as a volatile.

Also it gives a simple way to try to repair a repo with missing
objects, if you have some other repositories which might have the
objects: Repack with the other repositories as volatile alternates.

BTW, if a wanted object disappears from the volatile alternate while
fetch is running, fetch should get it from the remote after all.

>    - Make the distinction between a regular repository and an object
>      store that is meant to be used for object sharing stronger.
> 
>      Perhaps a configuration item "core.objectstore = readonly" can
>      be introduced, and we forbid "clone -s" from pointing at a
>      repository without such a configuration.  We also forbid object
>      pruning operations such as "gc" and "repack" from being run in
>      a repository marked as such.

I hope Michael's "append-only"/"donor" is feasible instead.  In which
case safer gc/repack are needed, like you outline:

>      It may be necessary to allow some special kind of repacking of
>      such a "readonly" object store, in order to reduce the number
>      of packfiles (and get rid of loose object files); it needs to
>      be implemented carefully not to lose any object, regardless of
>      local reachability.

And it needs to be default behavior in such stores, so users won't
need don't-shoot-myself-in-foot options.

>    - It might not be a bad idea to have a dedicated new command to
>      help users manage alternates ("git alternates"?); obviously
>      this will be one of its subcommand "git alternates detach" if
>      we go that route.

"git object-store <subcommand>  -- manage alternates & object stores"?

>    - Or just an entry in the documentation is sufficient?

Better doc would be useful anyway, and this command gives a place to
put it:-)  I had no idea alternates were intended to be read-only,
but that does explain some seeming defects I'd wondered about.

>  - When you have two or more repositories that do not share objects,
>    you may want to rearrange things so that they share their objects
>    from a single common object store.
> 
>    There is no direct UI to do this, as far as I know.  You can
>    obviously create a new bare repository, push there from all
>    of these repositories, and then borrow from there, e.g.
>    
> 	git --bare init shared.git &&
> 	for r in a.git b.git c.git ...
>         do
> 	    (
> 		cd "$r" &&
> 	        git push ../shared.git "refs/*:refs/remotes/$r/*" &&
> 		echo ../../../shared.git/objects >.git/objects/info/alternates
>    	    )
> 	done
> 
>    And then repack shared.git once.

...and finally gc the other repositories.

The refs/remotes/$r/ namespace becomes misleading if the user renames
or copies the corresponding Git repository, and then cleverly does
something to the shared repo and the repo (if any) in directory $r.

I suggest refs/remotes/$unique_number/ and note $unique_number
somewhere in the borrowing repo.  If someone insists on being clever,
this may force them to read up on what they're doing first.

Or store no refs, since the shared repo shouldn't lose objects anyway.

If we're sure objects won't be lost: Create a proper remote with the
shared repo.  That way the user can push into it once in a while, and
he can configure just which refs should be shared.

> 
>    Some ideas:
> 
>    - (obvious: give a canned command to do the above, perhaps then
>      set the core.objectstore=readonly in the resuting shared.git)

That's getting closer to 'bzr init-repository': One dir with the
shared repo and all borrowing repositories.  A simple model which Git
can track and the user need not think further about.

This way, git clone/init of a new repo in this dir can learn to notice
and use the shared repo.

We can also have a command (git object-store?) to maintain the
repository collection, since Git knows where to find them all:
Push from all repos into the shared repo, gc all repos, even prune
unused objects from the shared repo - after imlementing sufficient
paranoia.

>  - When you have one object store and a repository that does not yet
>    borrow from it, you may want to make the repository borrow from
>    the object store.  Obviously you can run "echo" like the sample
>    script in the previous item above, but it is not obvious how to
>    perform the logical next step of shrinking $GIT_DIR/objects of
>    the repository that now borrows the objects.
> 
>    I think "git repack -a -d" is the way to do this, but if you
>    compare this command to "git repack -a -d -f" we saw previously
>    in this message, it is not surprising that the users would be
>    confused---it is not obvious at all.

Hopefully users only need to know "git gc".

> [Footnote]
> 
> *1* Making the borrowed object store aware of all the repositories
> that borrow from it, so that operations like "gc" and "repack" in
> the repository with the borrowed object store can keep objects that
> are needed by borrowing repositories, is theoretically possible, but
> is not a workable approach in practice, as (1) borrowers may not
> have a write access to the shared object store to add such a back
> pointer to begin with,

Thus this can only be an optional feature.
Not via direct backrefs though, see above about refs/remotes/$r/.

> (2) "gc"/"repack" in the borrowed object
> store and normal operations in the borrowing repositories can easily
> race with each other, without any coordination between the users,

This sounds like a bug to me, unless you refer to deleting objects
from the shared store.  The doc does not warn that we can't even
maintain shared store while using Git commands in borrowing
repositories.

> and (3) a casual "borrowing" can simply be done with a simple "echo"
> as shown in the main text of this message, and there is no way to
> ensure a backpointer from the borrowed object store to such a
> borrowing repository.

-- 
Hallvard

  parent reply	other threads:[~2012-08-11 10:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
2012-08-05  9:38 ` Michael Haggerty
2012-08-05 19:01   ` Junio C Hamano
2012-08-07  6:16   ` Jeff King
2012-08-06 21:55 ` Junio C Hamano
2012-08-08  1:42 ` Sascha Cunz
2012-08-11  9:35 ` Hallvard Breien Furuseth [this message]
2012-08-27 22:39 ` Oswald Buddenhagen
2012-08-28 19:19   ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Hallvard Breien Furuseth
2012-08-29  7:42     ` Oswald Buddenhagen
2012-08-29 15:52       ` GC of alternate object store Junio C Hamano
2012-08-30  9:53         ` Oswald Buddenhagen
2012-08-30 16:03           ` Junio C Hamano
2012-08-31 16:26             ` Oswald Buddenhagen
2012-08-31 19:18               ` Dan Johnson
2012-08-31 19:45                 ` Junio C Hamano
2012-09-01  4:25                   ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-01 11:22                     ` Jeff King
2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
2012-09-01 14:34                         ` Jens Lehmann
2012-09-01 15:27                           ` [PATCH] submodule: " Jens Lehmann
2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
2012-09-01 11:34                         ` [PATCH 3/2] argv-array: fix bogus cast when freeing array Jeff King
2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-07 17:07                         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=hbf.20120811d15z@bombur.uio.no \
    --to=h.b.furuseth@usit.uio.no \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).