* clarify git clone --local --shared --reference
@ 2007-06-04 23:53 Brandon Casey
2007-06-05 4:50 ` Shawn O. Pearce
0 siblings, 1 reply; 10+ messages in thread
From: Brandon Casey @ 2007-06-04 23:53 UTC (permalink / raw)
To: git
I think the goal of these three objects is space savings (and speed),
but I don't understand when I should prefer one option over another, or
when/whether to use a combination of them. And I am unsure (SCARED)
about any side effects they may have.
This is all based on the information in git-clone.txt. If there is more
detail someplace else please let me know.
1) What does local mean?
--local says repository must be on the "local" machine and claims it
attempts to make hardlinks when possible. Of course hard links cannot
be created across filesystems, so are there other speedups/space
savings when repository is on local machine but not on the same
filesystem? Is this option still valid then?
2) Does --shared imply shared write access? Does --local?
I'll point out that git-init has an option with the same name.
3) --shared seems like a special case of --reference? Are there
differences?
4) what happens if the source repository dissappears? Is --local ok
but --shared screwed?
4) is space savings obtained only at initial clone? or is it on going?
does a future git pull from the source repository create new hard
links where possible?
Can --shared be used with --reference. Can --reference be used multiple
times (and would I want to). Does -l with -s get you anything? (the
examples use this)
thanks,
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-04 23:53 clarify git clone --local --shared --reference Brandon Casey
@ 2007-06-05 4:50 ` Shawn O. Pearce
2007-06-05 16:30 ` Brandon Casey
0 siblings, 1 reply; 10+ messages in thread
From: Shawn O. Pearce @ 2007-06-05 4:50 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
Brandon Casey <casey@nrlssc.navy.mil> wrote:
>
> I think the goal of these three objects is space savings (and speed),
> but I don't understand when I should prefer one option over another, or
> when/whether to use a combination of them. And I am unsure (SCARED)
> about any side effects they may have.
Yes, they are mainly about saving time setting up the new clone,
and about disk space required by the new clone.
> 1) What does local mean?
> --local says repository must be on the "local" machine and claims it
> attempts to make hardlinks when possible. Of course hard links cannot
> be created across filesystems, so are there other speedups/space
> savings when repository is on local machine but not on the same
> filesystem? Is this option still valid then?
Basically --local means instead of using the native Git transport to
copy object data from one repository to another we shortcut and use
`find . | cpio -lpumd` or somesuch, so that cpio can use hardlinks if
possible (same filesystem) but fallback to whole copy if it cannot.
This is usually faster than the native Git transport as we copy
every file, without first trying to compute if the file would be
needed by the new clone or not.
So --local may copy garbage that git-prune would have removed,
or that git-repack/git-gc might have eliminated from a packfile.
But generally that's such a small amount of data that the faster
cpio path (and even better, the hardlinks) saves disk.
Note we only hardlink the immutable data under .git/objects; the
mutable data and the working directory files that are checked out
are *not* hardlinked.
> 2) Does --shared imply shared write access? Does --local?
> I'll point out that git-init has an option with the same name.
No. --shared means something entirely different in git-clone
than it does in git-init.
The --shared here implies adds the source repository to the new
repository's .git/objects/info/alternates. This means that the
new clone doesn't copy the object database; instead it just accesses
the source repository when it needs data.
This exposes two risks:
a) Don't delete the source repository. If you delete the source
repository then the clone repository is "corrupt" as it won't be
able to access object data.
b) Don't repack the source repository without accounting for the
refs and reflogs of all --shared repositories that came from it.
Otherwise you may delete objects that the source repository no
longer needs, but that one or more of the --shared repositories
still needs.
Objects that are newly created in a --shared repository are written
in the --shared area, not in the source repository. Hence the
source repository can be read-only to the current user.
> 3) --shared seems like a special case of --reference? Are there
> differences?
--reference is actually a special case of --shared. --reference is
meant for cloning a remote repository over the network, where you
already have an existing local repository that has most of the
objects you need to successfully clone the remote repository.
With --reference we setup a temporary copy of refs from the
--reference repository in the new repository, so that during the
network transfer from the remote system we don't download things
the --reference repository already has.
But --reference implies --shared, and has the same issues as above.
> 4) what happens if the source repository dissappears? Is --local ok
> but --shared screwed?
Correct.
> 4) is space savings obtained only at initial clone? or is it on going?
> does a future git pull from the source repository create new hard
> links where possible?
Only on initial clone. Later pulls will copy. You can try using
git-relink to redo the hardlinks after the pull.
> Can --shared be used with --reference. Can --reference be used multiple
> times (and would I want to). Does -l with -s get you anything? (the
> examples use this)
--reference can only be given once in a git-clone; we only setup
one set of temporary references during the network transfer.
And as I said above, --reference implies --shared.
--
Shawn.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-05 4:50 ` Shawn O. Pearce
@ 2007-06-05 16:30 ` Brandon Casey
2007-06-06 5:11 ` Shawn O. Pearce
0 siblings, 1 reply; 10+ messages in thread
From: Brandon Casey @ 2007-06-05 16:30 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Shawn O. Pearce wrote:
> Brandon Casey <casey@nrlssc.navy.mil> wrote:
[snip]
>> 2) Does --shared imply shared write access? Does --local?
>> I'll point out that git-init has an option with the same name.
>
> No. --shared means something entirely different in git-clone
> than it does in git-init.
This did cause the thought that git-init --shared and git-clone --shared
may be pairs to be used together in some special way.
ok. Rather selfish "sharing" in my opinion :)
--reference did not cause any confusion and implied to me exactly
what it does: Use supplied repository as a reference for objects
which cannot be resolved locally.
> The --shared here implies adds the source repository to the new
> repository's .git/objects/info/alternates. This means that the
> new clone doesn't copy the object database; instead it just accesses
> the source repository when it needs data.
>
> This exposes two risks:
>
> a) Don't delete the source repository. If you delete the source
> repository then the clone repository is "corrupt" as it won't be
> able to access object data.
>
> b) Don't repack the source repository without accounting for the
> refs and reflogs of all --shared repositories that came from it.
> Otherwise you may delete objects that the source repository no
> longer needs, but that one or more of the --shared repositories
> still needs.
How should this be accomplished? Does this mean never run
git-gc/git-repack on the source repository? Or is there a way to
cause the sharing repositories to copy over objects no longer
required by the source repository?
[snip]
>> 4) is space savings obtained only at initial clone? or is it on going?
>> does a future git pull from the source repository create new hard
>> links where possible?
>
> Only on initial clone. Later pulls will copy. You can try using
> git-relink to redo the hardlinks after the pull.
How about with --shared? Particularly with a fast-forward not much
would need to be copied over. Do later pulls into a repository with
configured objects/info/alternates take advantage of space savings
when possible?
If the answer above is "yes", then this brings up an interesting use
case. I assume that clone, fetch, etc follow the alternates of the
source repository? Otherwise a --shared repository would be unclone-able
right? And only pull-able from the source repository? So if that is the
case (that remote alternates are followed), then a group of developers
could add all of the other developers to their alternates list (if
multiple alternates are supported) and reference their objects when
possible. To the extent that it is possible, each developer would end up
only storing their commit objects. This would then create a distributed
repository.
Of course, this new distributed repository may be somewhat fragile since
the entire thing could become unusable if any portion was corrupted.
Just because you can do a thing, doesn't mean you should.
thanks for your excellent reply,
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-05 16:30 ` Brandon Casey
@ 2007-06-06 5:11 ` Shawn O. Pearce
2007-06-06 18:50 ` Brandon Casey
0 siblings, 1 reply; 10+ messages in thread
From: Shawn O. Pearce @ 2007-06-06 5:11 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
Brandon Casey <casey@nrlssc.navy.mil> wrote:
> Shawn O. Pearce wrote:
> >
> > b) Don't repack the source repository without accounting for the
> > refs and reflogs of all --shared repositories that came from it.
> > Otherwise you may delete objects that the source repository no
> > longer needs, but that one or more of the --shared repositories
> > still needs.
>
> How should this be accomplished? Does this mean never run
> git-gc/git-repack on the source repository? Or is there a way to
> cause the sharing repositories to copy over objects no longer
> required by the source repository?
Well, you can repack, but only if if you account for everything.
The easiest way to do this is push every branch from the --shared
repos to the source repository, repack the source repository, then
you can run `git prune-packed` in the --shared repos to remove
loose objects that the source repository now has.
You can account for the refs by hand when you run pack-objects
by hand, but its horribly difficult compared to the push and then
repack I just described. I think that long-lived --shared isn't that
common of a workflow; most people use --shared for shortterm things.
For example contrib/continuous uses --shared when it clones the
repository to create a temporary build area.
> >>4) is space savings obtained only at initial clone? or is it on going?
> >> does a future git pull from the source repository create new hard
> >> links where possible?
> >
> >Only on initial clone. Later pulls will copy. You can try using
> >git-relink to redo the hardlinks after the pull.
>
> How about with --shared? Particularly with a fast-forward not much
> would need to be copied over. Do later pulls into a repository with
> configured objects/info/alternates take advantage of space savings
> when possible?
Yes. Recently a --shared avoids copying the objects if at all
possible. This makes fetches from the source repository into the
--shared repository very, very fast, and uses no additional disk.
> If the answer above is "yes", then this brings up an interesting use
> case. I assume that clone, fetch, etc follow the alternates of the
> source repository? Otherwise a --shared repository would be unclone-able
> right? And only pull-able from the source repository? So if that is the
> case (that remote alternates are followed),
Alternates are followed as many as 5 deep. So you can do something like
this:
git clone --shared source share1
git clone --shared share1 share2
git clone --shared share2 share3
git clone --shared share3 share4
git clone --shared share4 share5
git clone --shared share5 corrupt
I think corrupt is corrupt; it doesn't have access to the source anymore
and therefore is missing 90%+ of the object database. To help make this
case work the objects/info/alternates should always contain absolute paths;
we store them absolute in git-clone by default but you could set them up
by hand. The other repositories should however be intact and usable, but
you cannot clone from share5.
Normal fetch/push/pull will work fine against any of those working
repos, as they are all using the normal Git object transport methods,
which means we copy objects unless they are available to us already
(see above).
> then a group of developers
> could add all of the other developers to their alternates list (if
> multiple alternates are supported)
Yes, they are. I don't think we have a limit on the number of
alternates you are allowed to have. However each additional
alternate adds some cost to starting up any given Git process.
The more alternates you have (or the more deeply nested they are)
the slower Git will initialize itself. For 1 or 2 alternates its
within the fork+exec noise of any good UNIX system; for 50 alternates
I think you would notice it.
> and reference their objects when
> possible. To the extent that it is possible, each developer would end up
> only storing their commit objects. This would then create a distributed
> repository.
Yes, but that has very high risk. If developer Joe Smith quits and
then the administrator `rm -rf /home/jsmith` everyone is hosed as
they can no longer access the objects that were originally created
by Joe. Then the administrator is off looking for backup tapes,
assuming he has them and they are valid. One nice property of Git
(really any DVCS) is that the data is automatically backed up by
every developer participating in the project. Its unlikely you
will lose the project that way.
Also this scheme doesn't really work well for packing. I don't
think we'll pack the loose objects that we borrow from the other
developers, and Git packfiles are a major performance improvement
for all Git operations. Plus they are very small, so they save a
lot of disk.
You might find that it takes up less total disk to have everyone
keep a complete (non --shared) copy of the project, but repack
regularly, then to have everyone using alternates against each
other and nobody repacks.
> Of course, this new distributed repository may be somewhat fragile since
> the entire thing could become unusable if any portion was corrupted.
> Just because you can do a thing, doesn't mean you should.
Yes, exactly. ;-)
In my day-job repositories I have about 150 MiB of blobs that
are very common across a number of Git repositories. I've made a
single repo that has all of those packed, and then setup that as an
alternate for everything else. It saves a huge chunk of disk for us.
But that common-blob.git thing that I created never gets changed,
and never gets repacked. Its sort of a "historical archive" for us.
Works very nicely. Alternates have their uses...
--
Shawn.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-06 5:11 ` Shawn O. Pearce
@ 2007-06-06 18:50 ` Brandon Casey
2007-06-06 18:55 ` Brandon Casey
2007-06-08 5:37 ` Shawn O. Pearce
0 siblings, 2 replies; 10+ messages in thread
From: Brandon Casey @ 2007-06-06 18:50 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Shawn O. Pearce wrote:
> Brandon Casey <casey@nrlssc.navy.mil> wrote:
>> Shawn O. Pearce wrote:
>>> b) Don't repack the source repository without accounting for the
>>> refs and reflogs of all --shared repositories that came from it.
>>> Otherwise you may delete objects that the source repository no
>>> longer needs, but that one or more of the --shared repositories
>>> still needs.
>> How should this be accomplished? Does this mean never run
>> git-gc/git-repack on the source repository? Or is there a way to
>> cause the sharing repositories to copy over objects no longer
>> required by the source repository?
>
> Well, you can repack, but only if if you account for everything.
> The easiest way to do this is push every branch from the --shared
> repos to the source repository, repack the source repository, then
> you can run `git prune-packed` in the --shared repos to remove
> loose objects that the source repository now has.
>
> You can account for the refs by hand when you run pack-objects
> by hand, but its horribly difficult compared to the push and then
> repack I just described. I think that long-lived --shared isn't that
> common of a workflow; most people use --shared for shortterm things.
> For example contrib/continuous uses --shared when it clones the
> repository to create a temporary build area.
>
ok. I just want to make sure this is not really about prune'ing.
In the following, source and --shared repos are identical except...
1) Source repo contains loose objects which are new commits.
--shared repo does git-pull.
we fast-forward, copying very little.
success.
2) Source repo contains loose objects which are new commits.
Source repo does git-gc, which repacks but doesn't prune.
--shared repo does git-pull.
success?
3) Source repo deletes a branch that --shared repo also has.
This deletion creates dangling unreferenced objects.
** --shared repo still ok here, right?
Source repo does git-gc, which repacks but doesn't prune.
** Is --shared screwed at this point? (This is what I understand
you to say above) Or do the dangling objects still exist,
so --shared is still ok?
git gc --prune
--shared is fubar, at least on the deleted branch.
The docs (git-repack.txt) seem to suggest that git-repack (without -d)
does not delete any objects. And if -d is used, then at most objects
already referenced in other packs will be deleted. This makes me think
that repack is safe on the source repository.
If the above is wrong, then I'm missing a clue about git-rebase. And
don't feel like you have to explain it to me. Do feel free to give a
short *NO!! REPACK IS DANGEROUS ON SHARED REPOS YOU IDIOT!!* or other
such beating. But if you see what I'm misunderstanding please let me know.
If the above is right, then it seems like the source repo developer
should be able to go about his developing, and git-gc'ing without
regard for other developers who may be --share'ing. And only when the
source developer wants to do a prune of dangling objects must something
special be done. git-prune.txt suggests:
git prune $(cd ../another && $(git-rev-parse --all))
> In my day-job repositories I have about 150 MiB of blobs that
> are very common across a number of Git repositories. I've made a
> single repo that has all of those packed, and then setup that as an
> alternate for everything else. It saves a huge chunk of disk for us.
> But that common-blob.git thing that I created never gets changed,
> and never gets repacked. Its sort of a "historical archive" for us.
> Works very nicely. Alternates have their uses...
Ahh, now that's interesting.
Could I create something like that by doing something like this:
(in a cloned repository with only a master branch)
git reset --hard HEAD^ # I know, HEAD is still in the history
git gc --prune # log so it doesn't get pruned, but
# it's just an example
(now back in my devel repo)
<add archive repo to alternates>
git prune-packed
Now most everything is likely referenced in the archive repo except the
last commit. Well, maybe even the HEAD commit due to the history log.
Depending on how you reply above, a periodic pull into the archive repo
(and a repack?), then a 'git prune-packed' from the sharers could allow
good sharing? <waiting for a "NO, YOU IDIOT!"> If not, then I guess the
archive creation steps could be repeated periodically.
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-06 18:50 ` Brandon Casey
@ 2007-06-06 18:55 ` Brandon Casey
2007-06-08 5:37 ` Shawn O. Pearce
1 sibling, 0 replies; 10+ messages in thread
From: Brandon Casey @ 2007-06-06 18:55 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Brandon Casey wrote:
> If the above is wrong, then I'm missing a clue about git-rebase.
^^^^^^
I mean repack.
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-06 18:50 ` Brandon Casey
2007-06-06 18:55 ` Brandon Casey
@ 2007-06-08 5:37 ` Shawn O. Pearce
2007-06-08 15:57 ` Loeliger Jon-LOELIGER
2007-06-13 23:07 ` Brandon Casey
1 sibling, 2 replies; 10+ messages in thread
From: Shawn O. Pearce @ 2007-06-08 5:37 UTC (permalink / raw)
To: Brandon Casey; +Cc: git
Brandon Casey <casey@nrlssc.navy.mil> wrote:
> ok. I just want to make sure this is not really about prune'ing.
>
> In the following, source and --shared repos are identical except...
> 1) Source repo contains loose objects which are new commits.
> --shared repo does git-pull.
> we fast-forward, copying very little.
> success.
Copying nothing actually. All of the objects required are in the
source repository, so --shared needs nothing additional.
> 2) Source repo contains loose objects which are new commits.
> Source repo does git-gc, which repacks but doesn't prune.
> --shared repo does git-pull.
> success?
Yes, same as above. Except: If --shared has a branch that points
at a commit that used to be in source (hence its data isn't in
--shared) and that data used to be packed, but source no longer
has a reference to it. When you repack source we won't include
that commit in the new pack, but we will delete the old pack.
That means the commit goes away.
> 3) Source repo deletes a branch that --shared repo also has.
> This deletion creates dangling unreferenced objects.
Yes... see above about the repack problem.
> ** --shared repo still ok here, right?
> Source repo does git-gc, which repacks but doesn't prune.
Broken. The repack didn't include the dangling objects.
> ** Is --shared screwed at this point? (This is what I understand
> you to say above) Or do the dangling objects still exist,
> so --shared is still ok?
--shared is fubar, on the deleted branch.
> git gc --prune
> --shared is fubar, at least on the deleted branch.
also fubar, on the deleted branch.
> The docs (git-repack.txt) seem to suggest that git-repack (without -d)
> does not delete any objects. And if -d is used, then at most objects
> already referenced in other packs will be deleted. This makes me think
> that repack is safe on the source repository.
You are correct that leaving off the '-d' won't delete objects.
But a pack is created by listing the objects we need, and if we don't
need the object in source, we don't include it into the new pack.
-a -d implies delete all packs that existed when we started the
repack. So if an object was in the old packfile, and we didn't
copy it to the new packfile, it gets deleted. ;-)
Packfiles are immutable. We cannot delete something from it without
deleting the entire packfile.
> If the above is wrong, then I'm missing a clue about git-rebase. And
> don't feel like you have to explain it to me. Do feel free to give a
> short *NO!! REPACK IS DANGEROUS ON SHARED REPOS YOU IDIOT!!* or other
> such beating. But if you see what I'm misunderstanding please let me know.
;-)
NO!! REPACK IS DANGEROUS ON SHARED REPOS YOU IDIOT!!
Exactly because of what I'm saying above, and what you mention about
source deleting a branch and repacking or pruning, and --shared
then being corrupt because its now missing at least one object it
wants to have.
> If the above is right, then it seems like the source repo developer
> should be able to go about his developing, and git-gc'ing without
> regard for other developers who may be --share'ing. And only when the
> source developer wants to do a prune of dangling objects must something
> special be done. git-prune.txt suggests:
>
> git prune $(cd ../another && $(git-rev-parse --all))
Right; but the thing that is shocking to some people (me in my
early days with Git) is that a repack is *also* doing a prune if
you add -d. Its not quite the same type of prune as it prunes only
previously packed objects, but its still a prune.
> >In my day-job repositories I have about 150 MiB of blobs that
> >are very common across a number of Git repositories. I've made a
> >single repo that has all of those packed, and then setup that as an
> >alternate for everything else. It saves a huge chunk of disk for us.
> >But that common-blob.git thing that I created never gets changed,
> >and never gets repacked. Its sort of a "historical archive" for us.
> >Works very nicely. Alternates have their uses...
>
> Ahh, now that's interesting.
>
> Could I create something like that by doing something like this:
>
> (in a cloned repository with only a master branch)
> git reset --hard HEAD^ # I know, HEAD is still in the history
> git gc --prune # log so it doesn't get pruned, but
> # it's just an example
>
> (now back in my devel repo)
> <add archive repo to alternates>
> git prune-packed
Actually you want `git gc` in your --shared repo now. That way it
minimizes the current packfiles. prune-packed only applies to loose
objects; not already packed stuff. Since you added a repo with a
large packfile to your alternates list you probably want to shrink
your own packfile down as much as possible. That works because
`git gc` is actually running `git repack -a -d -l`. The -l means
"only things that are local to this --shared repository.
> Now most everything is likely referenced in the archive repo except the
> last commit. Well, maybe even the HEAD commit due to the history log.
Right.
> Depending on how you reply above, a periodic pull into the archive repo
> (and a repack?), then a 'git prune-packed' from the sharers could allow
> good sharing? <waiting for a "NO, YOU IDIOT!"> If not, then I guess the
> archive creation steps could be repeated periodically.
Yes, exactly. The trick here is that once something enters the
shared repo IT MUST STAY THERE. You cannot allow a branch to be
deleted from the shared repo, unless it is fully merged into another
branch (or tag) that is staying. You also cannot rewind a branch
(e.g. push with --force).
But I do exactly what you are suggesting. I pull every so often
into a the shared common repo and repack it. And then everyone
else can repack and their repack deletes things that are in the
shared common repo.
Just make sure you have a good backup of the shared common repo. :)
--
Shawn.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: clarify git clone --local --shared --reference
2007-06-08 5:37 ` Shawn O. Pearce
@ 2007-06-08 15:57 ` Loeliger Jon-LOELIGER
2007-06-08 18:35 ` Brandon Casey
2007-06-13 23:07 ` Brandon Casey
1 sibling, 1 reply; 10+ messages in thread
From: Loeliger Jon-LOELIGER @ 2007-06-08 15:57 UTC (permalink / raw)
To: Shawn O. Pearce, Brandon Casey; +Cc: git
Shawn O. Pearce wrote:
>
> Brandon Casey <casey@nrlssc.navy.mil> wrote:
> > ok. I just want to make sure this is not really about prune'ing.
> >
> > In the following, source and --shared repos are identical except...
> > 1) Source repo contains loose objects which are new commits.
> > --shared repo does git-pull.
> > we fast-forward, copying very little.
> > success.
>
> Copying nothing actually. All of the objects required are in the
> source repository, so --shared needs nothing additional.
So the thing I find myself wanting to do is
A "crib from local copy". That is, the network
Cost is large, so when cloning point to a local
(ie, already on same Filesystem) clone that is
Similar, use it as a reference, but, in the end,
Create a complete copy into the new repository.
I don't want it hard linked with --local.
I don't want it shared with --shared.
I don't want it as an altrnate source with --reference.
What I want is a new, clean, complete, unshared repository.
But for efficiency reasons, I want to grab objects
From a different, filesystem-local clone if possible.
Does that work?
jdl
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-08 15:57 ` Loeliger Jon-LOELIGER
@ 2007-06-08 18:35 ` Brandon Casey
0 siblings, 0 replies; 10+ messages in thread
From: Brandon Casey @ 2007-06-08 18:35 UTC (permalink / raw)
To: Loeliger Jon-LOELIGER; +Cc: Shawn O. Pearce, git
Loeliger Jon-LOELIGER wrote:
> Shawn O. Pearce wrote:
>> Brandon Casey <casey@nrlssc.navy.mil> wrote:
>>> ok. I just want to make sure this is not really about prune'ing.
>>>
>>> In the following, source and --shared repos are identical except...
>>> 1) Source repo contains loose objects which are new commits.
>>> --shared repo does git-pull.
>>> we fast-forward, copying very little.
>>> success.
>> Copying nothing actually. All of the objects required are in the
>> source repository, so --shared needs nothing additional.
>
> So the thing I find myself wanting to do is
> A "crib from local copy". That is, the network
> Cost is large, so when cloning point to a local
> (ie, already on same Filesystem) clone that is
> Similar, use it as a reference, but, in the end,
> Create a complete copy into the new repository.
>
> I don't want it hard linked with --local.
> I don't want it shared with --shared.
> I don't want it as an altrnate source with --reference.
>
> What I want is a new, clean, complete, unshared repository.
>
> But for efficiency reasons, I want to grab objects
> From a different, filesystem-local clone if possible.
>
> Does that work?
I don't think that exact behavior is implemented yet, but...
If the filesystem-local repo is pure subset version of the source repo
you could do this:
(assuming the filesystem-local repo is on branch master, and that is
what you want)
git clone -l <filesystem-local repo> <my_new_repo>
cd <my_new_repo>
git pull <source-repo>
No reason not to use -l on clone in this case IMO.
Otherwise...
If the filesystem-local repo has changes past the master HEAD on source
repo that you are not necessarily interested in...
1) git clone -l -n <filesystem-local repo> <my_new_repo>
2) cd <my_new_repo>
3) git fetch <source_repo> master:tmp
4) git branch -M tmp master
5) git checkout master
1) Here we use -l to encourage hard linking (no reason not to IMO),
and tell clone not (-n) to checkout the active branch.
3) Now fetch the master branch from the source_repo and store into
a new branch named tmp.
4) Rename tmp to master.
5) Checkout the files.
- Now the HEAD of master branch is at the same commit as the
source_repo.
One drawback is that origin is now tracking the filesystem-local repo,
so a git pull without supplying a repo will pull from filesystem-local repo.
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: clarify git clone --local --shared --reference
2007-06-08 5:37 ` Shawn O. Pearce
2007-06-08 15:57 ` Loeliger Jon-LOELIGER
@ 2007-06-13 23:07 ` Brandon Casey
1 sibling, 0 replies; 10+ messages in thread
From: Brandon Casey @ 2007-06-13 23:07 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Shawn O. Pearce wrote:
> Brandon Casey <casey@nrlssc.navy.mil> wrote:
>> The docs (git-repack.txt) seem to suggest that git-repack (without -d)
>> does not delete any objects. And if -d is used, then at most objects
>> already referenced in other packs will be deleted. This makes me think
>> that repack is safe on the source repository.
>
> You are correct that leaving off the '-d' won't delete objects.
> But a pack is created by listing the objects we need, and if we don't
> need the object in source, we don't include it into the new pack.
>
> -a -d implies delete all packs that existed when we started the
> repack. So if an object was in the old packfile, and we didn't
> copy it to the new packfile, it gets deleted. ;-)
Ok. There is the connection I did not make. repack -d is NOT harmless,
since a pack that contains only objects referenced in other packs and
dangling unreferenced objects, will be deleted. Which will be all of the
preexisting packs in the case of git-gc since as you mentioned it
repacks using -a -d -l.
Thanks for taking the time.
-brandon
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-06-13 23:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-04 23:53 clarify git clone --local --shared --reference Brandon Casey
2007-06-05 4:50 ` Shawn O. Pearce
2007-06-05 16:30 ` Brandon Casey
2007-06-06 5:11 ` Shawn O. Pearce
2007-06-06 18:50 ` Brandon Casey
2007-06-06 18:55 ` Brandon Casey
2007-06-08 5:37 ` Shawn O. Pearce
2007-06-08 15:57 ` Loeliger Jon-LOELIGER
2007-06-08 18:35 ` Brandon Casey
2007-06-13 23:07 ` Brandon Casey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).