From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
Date: Sat, 04 Aug 2012 21:56:56 -0700 [thread overview]
Message-ID: <7vmx2a3pif.fsf@alter.siamese.dyndns.org> (raw)
The "alternates" mechanism lets you keep a single object store (not
necessarily a git repository on its own, but just the objects/ part
of it) on a machine, have multiple repositories on the same machine
share objects from it, to save the network transfer bandwidth when
cloning from remote repositories and the disk space used by the
local repositories. A repository created by "clone --reference" or
"clone -s" uses this mechanism to borrow objects from the object
store of another repository. A user also can manually add new
entries to $GIT_DIR/objects/info/alternates to borrow from other
object stores.
The UI for this mechanism however has some room for improvement, and
we may want to start improving it for the next release after the
upcoming Git 1.7.12 (or even Git 2.0 if the change is a large one
that may be backward incompatible but gives us a vast improvement).
Here are some random thoughts as a discussion starter.
- By design, the borrowed object store MUST not ever lose any
object from it, as such an object loss can corrupt the borrowing
repositories. In theory, it is OK for the object store whose
objects are borrowed by repositories to acquire new objects, but
losing existing objects is an absolute no-no.
But the UI of "clone -s" encourages users to borrow from the
object store of a repository that the user may actively develop
in. It is perfectly normal for users to perform operations that
make objects that used to be reachable from tips of its branches
unreachable (e.g. rebase, reset, "branch -d") in a repository
that is used for active development, but a "gc" after such an
operation will lose objects that were originally available in the
repository. If objects lost that way were still needed by the
repositories that borrow from it, the borrowing repository gets
corrupt immediately.
In practice, this means that users who use "clone -s" to make a
new repository can *never* prune the original repository without
risking to corrupt its borrowing repository [*1*].
Some ideas:
- Make "clone --reference" without "-s" not to borrow from the
reference repository. E.g. if you have a clone of Linus
repository at /git/linux.git/, cloning a related repository
using it as --reference:
$ git clone --reference /git/linux.git git://k.org/linux-next.git
should still take advantage of /git/linux.git/{refs,objects} to
reduce the transfer cost of fetching from k.org, but the
resulting repository should not point /git/linux.git with its
objects/info/alternates file.
- Make the distinction between a regular repository and an object
store that is meant to be used for object sharing stronger.
Perhaps a configuration item "core.objectstore = readonly" can
be introduced, and we forbid "clone -s" from pointing at a
repository without such a configuration. We also forbid object
pruning operations such as "gc" and "repack" from being run in
a repository marked as such.
It may be necessary to allow some special kind of repacking of
such a "readonly" object store, in order to reduce the number
of packfiles (and get rid of loose object files); it needs to
be implemented carefully not to lose any object, regardless of
local reachability.
- When you have a repository and one or more repositories that
borrow from it, you may want to dissociate the borrowing
repositories from the borrowed one (e.g. so that you can repack
or prune the original repository safely, or you may even want to
remove it).
I think "git repack -a -d -f" in the borrowing repository happens
to be the way to do this, but it is not clear to the users why.
Some ideas:
- It might not be a bad idea to have a dedicated new command to
help users manage alternates ("git alternates"?); obviously
this will be one of its subcommand "git alternates detach" if
we go that route.
- Or just an entry in the documentation is sufficient?
- When you have two or more repositories that do not share objects,
you may want to rearrange things so that they share their objects
from a single common object store.
There is no direct UI to do this, as far as I know. You can
obviously create a new bare repository, push there from all
of these repositories, and then borrow from there, e.g.
git --bare init shared.git &&
for r in a.git b.git c.git ...
do
(
cd "$r" &&
git push ../shared.git "refs/*:refs/remotes/$r/*" &&
echo ../../../shared.git/objects >.git/objects/info/alternates
)
done
And then repack shared.git once.
Some ideas:
- (obvious: give a canned command to do the above, perhaps then
set the core.objectstore=readonly in the resuting shared.git)
- When you have one object store and a repository that does not yet
borrow from it, you may want to make the repository borrow from
the object store. Obviously you can run "echo" like the sample
script in the previous item above, but it is not obvious how to
perform the logical next step of shrinking $GIT_DIR/objects of
the repository that now borrows the objects.
I think "git repack -a -d" is the way to do this, but if you
compare this command to "git repack -a -d -f" we saw previously
in this message, it is not surprising that the users would be
confused---it is not obvious at all.
Some ideas:
- (obvious: give a canned subcommand to do this)
[Footnote]
*1* Making the borrowed object store aware of all the repositories
that borrow from it, so that operations like "gc" and "repack" in
the repository with the borrowed object store can keep objects that
are needed by borrowing repositories, is theoretically possible, but
is not a workable approach in practice, as (1) borrowers may not
have a write access to the shared object store to add such a back
pointer to begin with, (2) "gc"/"repack" in the borrowed object
store and normal operations in the borrowing repositories can easily
race with each other, without any coordination between the users,
and (3) a casual "borrowing" can simply be done with a simple "echo"
as shown in the main text of this message, and there is no way to
ensure a backpointer from the borrowed object store to such a
borrowing repository.
next reply other threads:[~2012-08-05 4:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-05 4:56 Junio C Hamano [this message]
2012-08-05 9:38 ` Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Michael Haggerty
2012-08-05 19:01 ` Junio C Hamano
2012-08-07 6:16 ` Jeff King
2012-08-06 21:55 ` Junio C Hamano
2012-08-08 1:42 ` Sascha Cunz
2012-08-11 9:35 ` Hallvard Breien Furuseth
2012-08-27 22:39 ` Oswald Buddenhagen
2012-08-28 19:19 ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Hallvard Breien Furuseth
2012-08-29 7:42 ` Oswald Buddenhagen
2012-08-29 15:52 ` GC of alternate object store Junio C Hamano
2012-08-30 9:53 ` Oswald Buddenhagen
2012-08-30 16:03 ` Junio C Hamano
2012-08-31 16:26 ` Oswald Buddenhagen
2012-08-31 19:18 ` Dan Johnson
2012-08-31 19:45 ` Junio C Hamano
2012-09-01 4:25 ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-01 11:22 ` Jeff King
2012-09-01 11:25 ` [PATCH 1/2] argv-array: add pop function Jeff King
2012-09-01 11:27 ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
2012-09-01 14:34 ` Jens Lehmann
2012-09-01 15:27 ` [PATCH] submodule: " Jens Lehmann
2012-09-01 11:32 ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
2012-09-01 11:34 ` [PATCH 3/2] argv-array: fix bogus cast when freeing array Jeff King
2012-09-05 21:22 ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-07 17:07 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vmx2a3pif.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).