* sharing object packs
@ 2008-06-18 19:57 marc.zonzon+git
2008-06-19 9:01 ` Shawn O. Pearce
0 siblings, 1 reply; 3+ messages in thread
From: marc.zonzon+git @ 2008-06-18 19:57 UTC (permalink / raw)
To: git
Hello
I have a big bare repository 'main.git' and many small git repositories sub1, sub2, ... subn.
All repositories lie in the same file file system, and each subx repository track and fetch main.git in a remote branch.
I would like to avoid duplicating main.git objects, I have made some tries:
- Putting a hard link to the pack in the object repository of main.git into subx object repository before fetching the main.git remote.
It works well... until the first repack on either side.
Note that the problem is the same for any clone of a local repository, the hard link of packs vanish on the first repack.
You end up with a pack with the same objects, and so the same name, but organized in a different way so with an associated idx file and often a different file size.
- Using an objects/info/alternates with the path of main.git object repository.
It work well too, but I import objects from main.git inside subx, and they don't have the same life time than those in main.git. So they can, disapear during a git-prune-packed or gc. (The same problem we have with: git clone --share)
- I tried also to use git-relink to synchronize diverging repositories. But git-relink see different packs with the same name, (because repacked in a different way) and refuse to hard-link the packs but hard-links the .idx which has also an identical name and had the same size in my experiments.
So the .idx does no longer agree with the .pack and git fsck fail. Of course we can recover the repository by generating a new idx, but git-relink is a dangerous tool to use here.
If we unpack all objects we have not all these problems, but the main.git is in this case quite big. Of course it may be better to have a big repository than 20 copies of the packed repository.
But I suppose that that there is some way to share an object repository in a safe way, even if it includes some packs.
I am quite new to git core and internals, so I may have missed the point.
Marc
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: sharing object packs
2008-06-18 19:57 sharing object packs marc.zonzon+git
@ 2008-06-19 9:01 ` Shawn O. Pearce
2008-06-19 17:01 ` marc zonzon
0 siblings, 1 reply; 3+ messages in thread
From: Shawn O. Pearce @ 2008-06-19 9:01 UTC (permalink / raw)
To: marc.zonzon+git; +Cc: git
marc.zonzon+git@gmail.com wrote:
> I have a big bare repository 'main.git' and many small git repositories sub1, sub2, ... subn.
>
> All repositories lie in the same file file system, and each subx
> repository track and fetch main.git in a remote branch.
>
> I would like to avoid duplicating main.git objects
...
> - Using an objects/info/alternates with the path of main.git object repository.
> It work well too, but I import objects from main.git inside subx,
> and they don't have the same life time than those in main.git. So
> they can, disapear during a git-prune-packed or gc. (The same
> problem we have with: git clone --share)
This is the approach you want to use. The risk is that you do
not allow objects to be added to main.git to later be deleted from
main.git. This means main.git cannot rewind/reset/delete a branch.
If that is not acceptable perhaps you could instead create 3 tiers:
main.git ---
\
shared.git
/
subx.git ---
Have main.git and subx.git both use shared.git as an alternate
(place path of shared.git/objects in their objects/info/alternates).
You can still allow subx.git to fetch main.git.
Push only stable commits to shared.git that will never be
rewind/reset/deleted. Once something enters shared.git it should
never be deleted. This way shared objects will not be removed by
git-prune or git-gc. Every so often push newer stable branches from
main.git to shared.git, once they cannot be rewind/reset/deleted.
Repack main.git and subx.git using `git gc` as that includes the
-l flag to `git repack`. Any objects which are now available from
shared.git will not be included in main.git or subx.git, so their
usage will shrink after shared.git is updated.
If you also configure gc.packrefs to never in shared.git and
symlink shared.git/refs into main.git/refs/shared and also into
subx.git/refs/shared and do this configuration on both server and
client systems you can have everyone transfer only the minimal
objects necessary.
I use basically that arrangement at day-job to avoid 7 copies of
roughly 150 MB of shared history across 8 repositories. This
reduces the amount of data the OS needs to store in buffer cache
by nearly 1050 MB and thus makes things run rather quickly. (Yes,
many operations hit all 8 repositories in rather rapid succession,
its a submodule sort of arrangement.)
--
Shawn.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: sharing object packs
2008-06-19 9:01 ` Shawn O. Pearce
@ 2008-06-19 17:01 ` marc zonzon
0 siblings, 0 replies; 3+ messages in thread
From: marc zonzon @ 2008-06-19 17:01 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Thank You Shawn
On Thu, Jun 19, 2008 at 11:01 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> marc.zonzon+git@gmail.com wrote:
>> I have a big bare repository 'main.git' and many small git repositories sub1, sub2, ... subn.
>>
>> All repositories lie in the same file file system, and each subx
>> repository track and fetch main.git in a remote branch.
>>
>> I would like to avoid duplicating main.git objects
> ...
>> - Using an objects/info/alternates with the path of main.git object repository.
>> It work well too, but I import objects from main.git inside subx,
>> and they don't have the same life time than those in main.git. So
>> they can, disapear during a git-prune-packed or gc. (The same
>> problem we have with: git clone --share)
>
> This is the approach you want to use. The risk is that you do
> not allow objects to be added to main.git to later be deleted from
> main.git. This means main.git cannot rewind/reset/delete a branch.
>
> If that is not acceptable perhaps you could instead create 3 tiers:
>
> main.git ---
> \
> shared.git
> /
> subx.git ---
>
> Have main.git and subx.git both use shared.git as an alternate
> (place path of shared.git/objects in their objects/info/alternates).
> You can still allow subx.git to fetch main.git.
>
Your solution of 3 tiers seems to solve the problems I met when trying
to take main.git as alternates.
But I feel we can even make it more secure than what you explain:
> Push only stable commits to shared.git that will never be
> rewind/reset/deleted. Once something enters shared.git it should
> never be deleted. This way shared objects will not be removed by
> git-prune or git-gc. Every so often push newer stable branches from
> main.git to shared.git, once they cannot be rewind/reset/deleted.
>
My option is to fetch from shared not only the branches of main, but
all the branches of all the subx.
So shared.git host all the objects of the sum of main and all subx.
Then there is any problem to reset, delete, or rewind a
branch in main, even if you fetch the resetted branch from shared (a
non fast-forward fetch), The objects of the deleted branch are either
not in any sub directory, and nothing is lost when they are
pruned, or they have been imported in some branch and they will be
kept, since there is a reference to them.
> Repack main.git and subx.git using `git gc` as that includes the
> -l flag to `git repack`. Any objects which are now available from
> shared.git will not be included in main.git or subx.git, so their
> usage will shrink after shared.git is updated.
>
Yes I tested that, with git gc, I had no immediate shrinking, I suppose
we have to wait for gc.pruneExpire to see the result.
But:
* fetching the remotes of shared.git,
* packing shared.git,
* packing and pruning (with git prune) the directories subx and
main.git
reduces immediately the object store of the subx to nearly nothing.
> If you also configure gc.packrefs to never in shared.git and
> symlink shared.git/refs into main.git/refs/shared and also into
> subx.git/refs/shared and do this configuration on both server and
> client systems you can have everyone transfer only the minimal
> objects necessary.
Thank you also for this setting, my level of knowledge of git transfer
mechanism is yet too low for understanding it without further
explanation/reading. If you can give some pointers they are welcome.
This solution seems great to implement some kind of submodules.
I suppose we could also use this 3 tiers solution to do a more clever
clone --share by the following scheme:
# mkdir shared
# cd shared
# git init
# cd ..
# git clone --no-hardlinks --bare shared shared.git
# rm -rf shared
# cd shared.git
# git remote add -f repo ../repo
* [new branch] master -> repo/master
# cd ..
# git clone repo repo_copy
# echo $PWD/shared.git/objects >> repo/.git/objects/alternates
# cd shared.git
# git remote add -f repo_copy ../repo_copy
* [new branch] master -> repo_copy/master
# cd ..
# echo $PWD/shared.git/objects >> repo_copy/.git/objects/alternates
then the sequence of repack, gc, prune outlined above.
But I have not yet the experience in git, to allow me to foresee the
consequences of these settings.
All criticisms are welcome.
Marc
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-06-19 17:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-18 19:57 sharing object packs marc.zonzon+git
2008-06-19 9:01 ` Shawn O. Pearce
2008-06-19 17:01 ` marc zonzon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).