From: "Shawn O. Pearce" <spearce@spearce.org>
To: Brandon Casey <casey@nrlssc.navy.mil>
Cc: git@vger.kernel.org
Subject: Re: clarify git clone --local --shared --reference
Date: Tue, 5 Jun 2007 00:50:08 -0400 [thread overview]
Message-ID: <20070605045008.GC9513@spearce.org> (raw)
In-Reply-To: <4664A5FE.30208@nrlssc.navy.mil>
Brandon Casey <casey@nrlssc.navy.mil> wrote:
>
> I think the goal of these three objects is space savings (and speed),
> but I don't understand when I should prefer one option over another, or
> when/whether to use a combination of them. And I am unsure (SCARED)
> about any side effects they may have.
Yes, they are mainly about saving time setting up the new clone,
and about disk space required by the new clone.
> 1) What does local mean?
> --local says repository must be on the "local" machine and claims it
> attempts to make hardlinks when possible. Of course hard links cannot
> be created across filesystems, so are there other speedups/space
> savings when repository is on local machine but not on the same
> filesystem? Is this option still valid then?
Basically --local means instead of using the native Git transport to
copy object data from one repository to another we shortcut and use
`find . | cpio -lpumd` or somesuch, so that cpio can use hardlinks if
possible (same filesystem) but fallback to whole copy if it cannot.
This is usually faster than the native Git transport as we copy
every file, without first trying to compute if the file would be
needed by the new clone or not.
So --local may copy garbage that git-prune would have removed,
or that git-repack/git-gc might have eliminated from a packfile.
But generally that's such a small amount of data that the faster
cpio path (and even better, the hardlinks) saves disk.
Note we only hardlink the immutable data under .git/objects; the
mutable data and the working directory files that are checked out
are *not* hardlinked.
> 2) Does --shared imply shared write access? Does --local?
> I'll point out that git-init has an option with the same name.
No. --shared means something entirely different in git-clone
than it does in git-init.
The --shared here implies adds the source repository to the new
repository's .git/objects/info/alternates. This means that the
new clone doesn't copy the object database; instead it just accesses
the source repository when it needs data.
This exposes two risks:
a) Don't delete the source repository. If you delete the source
repository then the clone repository is "corrupt" as it won't be
able to access object data.
b) Don't repack the source repository without accounting for the
refs and reflogs of all --shared repositories that came from it.
Otherwise you may delete objects that the source repository no
longer needs, but that one or more of the --shared repositories
still needs.
Objects that are newly created in a --shared repository are written
in the --shared area, not in the source repository. Hence the
source repository can be read-only to the current user.
> 3) --shared seems like a special case of --reference? Are there
> differences?
--reference is actually a special case of --shared. --reference is
meant for cloning a remote repository over the network, where you
already have an existing local repository that has most of the
objects you need to successfully clone the remote repository.
With --reference we setup a temporary copy of refs from the
--reference repository in the new repository, so that during the
network transfer from the remote system we don't download things
the --reference repository already has.
But --reference implies --shared, and has the same issues as above.
> 4) what happens if the source repository dissappears? Is --local ok
> but --shared screwed?
Correct.
> 4) is space savings obtained only at initial clone? or is it on going?
> does a future git pull from the source repository create new hard
> links where possible?
Only on initial clone. Later pulls will copy. You can try using
git-relink to redo the hardlinks after the pull.
> Can --shared be used with --reference. Can --reference be used multiple
> times (and would I want to). Does -l with -s get you anything? (the
> examples use this)
--reference can only be given once in a git-clone; we only setup
one set of temporary references during the network transfer.
And as I said above, --reference implies --shared.
--
Shawn.
next prev parent reply other threads:[~2007-06-05 4:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-04 23:53 clarify git clone --local --shared --reference Brandon Casey
2007-06-05 4:50 ` Shawn O. Pearce [this message]
2007-06-05 16:30 ` Brandon Casey
2007-06-06 5:11 ` Shawn O. Pearce
2007-06-06 18:50 ` Brandon Casey
2007-06-06 18:55 ` Brandon Casey
2007-06-08 5:37 ` Shawn O. Pearce
2007-06-08 15:57 ` Loeliger Jon-LOELIGER
2007-06-08 18:35 ` Brandon Casey
2007-06-13 23:07 ` Brandon Casey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070605045008.GC9513@spearce.org \
--to=spearce@spearce.org \
--cc=casey@nrlssc.navy.mil \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).