* Clone a repository with only the objects needed for a single tag @ 2005-11-02 2:02 Ben Lau 2005-11-02 7:01 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Ben Lau @ 2005-11-02 2:02 UTC (permalink / raw) To: git Hi, Is there any method to clone/copy a repository with only the objects needed for a single tag in order to save disk space? For example, if I want to start a new project based on a specific version of kernel like v2.6.14. I would run `git-clone` and then checkout a new branch based on the tag. However, one of the development host is a notebook computer which has only 2GB space leave. Therefore I hope the space occupied by the respository could be mininiumized while it keeps to be able to fetch/update from my master repository. Thanks for any advise. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clone a repository with only the objects needed for a single tag 2005-11-02 2:02 Clone a repository with only the objects needed for a single tag Ben Lau @ 2005-11-02 7:01 ` Junio C Hamano 2005-11-02 8:27 ` Ben Lau 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2005-11-02 7:01 UTC (permalink / raw) To: Ben Lau; +Cc: git Ben Lau <benlau@ust.hk> writes: > Is there any method to clone/copy a repository with only the > objects needed for a single tag in order to save disk space? > For example, if I want to start a new project based on a > specific version of kernel like v2.6.14. I would run > `git-clone` and then checkout a new branch based on the tag. Depends on what you want to do in that shallow copy. If the only thing you would want to do is to build it, then you could 'git-tar-tree v2.6.14' and extract that on your notebook. The output is just a tar so there will no history, though. If you want to develop while on the road, but do not particularly need to be able to inspect the history beyond the point you started, you could create a deliberately broken repository, using the git-shallow-clone script (attached), like this: $ git clone -n $mothership satellite $ cd satellite $ git-shallow-pack --all $ rm -f .git/objects/pack/pack-* $ mv pack-* .git/objects/pack/. If the original repository you are cloning from is local, you could instead do: $ git clone -l -s -n $mothership satellite $ cd satellite $ git-shallow-pack --all $ rm .git/objects/info/alternates $ mv pack-* .git/objects/pack/. You could develop in this repository, even build up your own commit chains, and when you come back you could push from this repository back to your 'mothership' repository. In essense, any operation that does not require you to have full history should work. One important thing that would not always work would be to pull into this repository over git-aware protocols, although pulling from your 'mothership' repository would probably work most of the time. One case that would probably break is if the mothership side reverted a commit beyond this shallow-pack boundary and then you try to pull from there. After the revert, the trees and blobs in that new commit you will be pulling from the mothership are likely to be the same as the ones contained in commits before the shallow clone is made. Because your satellite repo would claim to have everything that is reachable from the tip (as of the time the clone was made) of the branch, you cannot complain if the mothership side assumes you must have those blobs and trees and did not send them to you when you pull. --- #!/bin/sh # git-shallow-pack git-rev-parse --revs-only --no-flags --default HEAD "$@" | while read sha1 do echo "$sha1" while type=`git-cat-file -t "$sha1"` && case "$type" in tag) ;; *) break ;; esac do next=`git-cat-file tag "$sha1" | sed -ne 's/^object //p' -e q` echo "$next" sha1="$next" done git-rev-parse --verify "$sha1^{tree}" 2>/dev/null && git-ls-tree -r "$sha1" | sed -e 's/^[0-7]* [^ ]* //' done | sort -k 1,1 -u | git-pack-objects pack ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clone a repository with only the objects needed for a single tag 2005-11-02 7:01 ` Junio C Hamano @ 2005-11-02 8:27 ` Ben Lau 2005-11-02 8:49 ` Andreas Ericsson 2005-11-02 9:20 ` Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Ben Lau @ 2005-11-02 8:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi Junio, It works! Thanks a lot. However, it has a problem when involves the gitk/git-log. Because the parent commit is missed in the shallow repository, gitk would complain the object is missed and exit immediately. To solve the problem, i added a pair of ID of the new root object into .git/info/grafts. Example: $ cat .git/info/grafts 741b2252a5e14d6c60a913c77a6099abe73a854a 741b2252a5e14d6c60a913c77a6099abe73a854a git-log/gitk do not complains afterward, but it also make gitk shows nothing during run. Any solution? By the way, although I am not sure do any other people also require this feature, I wish the process could be more smooth. As the git-shallow-pack script do not destroy and modify any thing in the cloned repository beside the newly created pack file, I would suggest it can run the script in the 'monthership' repository, take out the pack file to another directory or remote machine. And then build a new git repository based on the pack file. To achieve the process, it need another script that could create a git repository from pack file. Do any similar script existed? Junio C Hamano wrote: >Ben Lau <benlau@ust.hk> writes: > > > >> Is there any method to clone/copy a repository with only the >>objects needed for a single tag in order to save disk space? >> For example, if I want to start a new project based on a >>specific version of kernel like v2.6.14. I would run >>`git-clone` and then checkout a new branch based on the tag. >> >> > >Depends on what you want to do in that shallow copy. > >If the only thing you would want to do is to build it, then you >could 'git-tar-tree v2.6.14' and extract that on your notebook. >The output is just a tar so there will no history, though. > >If you want to develop while on the road, but do not >particularly need to be able to inspect the history beyond the >point you started, you could create a deliberately broken >repository, using the git-shallow-clone script (attached), like >this: > > $ git clone -n $mothership satellite > $ cd satellite > $ git-shallow-pack --all > $ rm -f .git/objects/pack/pack-* > $ mv pack-* .git/objects/pack/. > >If the original repository you are cloning from is local, you >could instead do: > > $ git clone -l -s -n $mothership satellite > $ cd satellite > $ git-shallow-pack --all > $ rm .git/objects/info/alternates > $ mv pack-* .git/objects/pack/. > >You could develop in this repository, even build up your own >commit chains, and when you come back you could push from this >repository back to your 'mothership' repository. In essense, >any operation that does not require you to have full history >should work. > >One important thing that would not always work would be to pull >into this repository over git-aware protocols, although pulling >from your 'mothership' repository would probably work most of >the time. > >One case that would probably break is if the mothership side >reverted a commit beyond this shallow-pack boundary and then you >try to pull from there. After the revert, the trees and blobs >in that new commit you will be pulling from the mothership are >likely to be the same as the ones contained in commits before >the shallow clone is made. Because your satellite repo would >claim to have everything that is reachable from the tip (as of >the time the clone was made) of the branch, you cannot complain >if the mothership side assumes you must have those blobs and >trees and did not send them to you when you pull. > >--- >#!/bin/sh ># git-shallow-pack > >git-rev-parse --revs-only --no-flags --default HEAD "$@" | >while read sha1 >do > echo "$sha1" > while type=`git-cat-file -t "$sha1"` && > case "$type" in tag) ;; *) break ;; esac > do > next=`git-cat-file tag "$sha1" | > sed -ne 's/^object //p' -e q` > echo "$next" > sha1="$next" > done > git-rev-parse --verify "$sha1^{tree}" 2>/dev/null && > git-ls-tree -r "$sha1" | sed -e 's/^[0-7]* [^ ]* //' >done | >sort -k 1,1 -u | >git-pack-objects pack > > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clone a repository with only the objects needed for a single tag 2005-11-02 8:27 ` Ben Lau @ 2005-11-02 8:49 ` Andreas Ericsson 2005-11-02 9:10 ` Ben Lau 2005-11-02 9:20 ` Junio C Hamano 1 sibling, 1 reply; 6+ messages in thread From: Andreas Ericsson @ 2005-11-02 8:49 UTC (permalink / raw) To: git Ben Lau wrote: > Hi Junio, > > It works! Thanks a lot. > > However, it has a problem when involves the gitk/git-log. Both those programs use the history, which you don't have. With a shallow repository like this some commands just won't work. git-*log and gitk are among those. > > git-log/gitk do not complains afterward, but it also make gitk shows > nothing during run. Any solution? > * Get a second machine with more disk-space and run the history tools there. If you use it as mothership and push your commits to it you'll be able to track your changes from the laptop as well. * Get a larger disk. They're not terribly expensive now adays. * Check out one tag (i.e. release) more than you need. Then you'll get history back to that tag. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clone a repository with only the objects needed for a single tag 2005-11-02 8:49 ` Andreas Ericsson @ 2005-11-02 9:10 ` Ben Lau 0 siblings, 0 replies; 6+ messages in thread From: Ben Lau @ 2005-11-02 9:10 UTC (permalink / raw) To: Andreas Ericsson; +Cc: git Andreas Ericsson wrote: > Ben Lau wrote: > >> Hi Junio, >> >> It works! Thanks a lot. >> >> However, it has a problem when involves the gitk/git-log. > > > > Both those programs use the history, which you don't have. With a > shallow repository like this some commands just won't work. git-*log > and gitk are among those. yes, it is expected. However, I just think with a fake parent like what .git/info/grafts could provide may solve the issue. At least it make git-log be happy to show the only log message in the shallow repository without any error. The rest of problem is gitk could not show any item like what git-log shown. I am not sure is it right to have a pair of same ID into .git/info/grafts or it should be solved by a little patch to gitk. > >> >> git-log/gitk do not complains afterward, but it also make gitk shows >> nothing during run. Any solution? >> > > * Get a second machine with more disk-space and run the history tools > there. If you use it as mothership and push your commits to it you'll > be able to track your changes from the laptop as well. > > * Get a larger disk. They're not terribly expensive now adays. > > * Check out one tag (i.e. release) more than you need. Then you'll get > history back to that tag. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clone a repository with only the objects needed for a single tag 2005-11-02 8:27 ` Ben Lau 2005-11-02 8:49 ` Andreas Ericsson @ 2005-11-02 9:20 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2005-11-02 9:20 UTC (permalink / raw) To: Ben Lau; +Cc: git Ben Lau <benlau@ust.hk> writes: > However, it has a problem when involves the gitk/git-log. That's why I said anything that requires you to have a complete history would not work. The shallow repository is by definition a *broken* repository; the refs are supposed to mean the repository has everything reachable from them, and many tools rely on that assumption, but the shallow setup deliberately breaks that assumption, so you need to be aware of what operations you can and cannot do without having the full history. Although using grafts to cauterize somewhat helps as you discovered, you are operating in "do it at your own risk" territory. Having said that, here are the things that *should* work without having full history (not many): . git diff between your index, working tree, and commits your shallow setup happens to have. . git commit on top of any of the tip of branches your shallow copy started out with, including git am and git applymbox. . git fetch/pull over commit walker from a remote repository. . git push to send the work done in the shallow repository back into your mothership repository (running git pull on the mothership to fetch from the shallow copy probably would not work, unless you use commit walkers). . git whatchanged, git log, and gitk to view the work you did in your shallow repository. Creating packs in the mothership repository is certainly possible. You could instead do this: $ git-shallow-pack --all ;# in mothership $ mkdir -p /var/tmp/shallow $ tar cf - .git/HEAD .git/refs/ | (cd /var/tmp/shallow; git-init-db; tar xf -) $ mv pack-* /var/tmp/shallow/.git/objects/pack If you want a bit deeper history, instead of giving '--all' to git-shallow-pack, you could probably say something silly like this: $ (git-rev-parse --all; git-rev-list --max-count=20 HEAD) | xargs git-shallow-pack The shallow-pack script I sent earlier is probably not very useful in practice. To polish it to be somewhat more useful, it would probably need the following enhancements, at least: . Instead of taking 'git-rev-parse' arguments, take the names of refs; . Instead of packing the objects contained in the commits and trees the named refs reference, include all blobs/trees/commits between the given refs and their common ancestor to create a pack; . Create a tarball that contains: (1) the pack file created by the above procedure, stored in .git/objects/pack/.; (2) .git/refs/ to be used in the shallow copy; this should include only the refs given to the command to create the above pack. (3) .git/info/grafts to cauterize the common ancestor commit and side branches merged into the lines you are taking (computing the latter may be somewhat expensive). Then the command can be run in the mothership repository like this: $ cd linux-2.6 $ git-shallow-pack v2.6.14 v2.6.14-rc2 master to produce a tarball, which can be taken to another location. When extracted, it would contain all commits between the three named refs, and you could view the history across them. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-11-02 9:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-02 2:02 Clone a repository with only the objects needed for a single tag Ben Lau 2005-11-02 7:01 ` Junio C Hamano 2005-11-02 8:27 ` Ben Lau 2005-11-02 8:49 ` Andreas Ericsson 2005-11-02 9:10 ` Ben Lau 2005-11-02 9:20 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).