* [QUESTION] about .git/info/grafts file [not found] <cda58cb80601170928r252a6e34y@mail.gmail.com> @ 2006-01-17 17:32 ` Franck 2006-01-18 17:47 ` Franck 2006-01-19 0:40 ` Junio C Hamano 0 siblings, 2 replies; 21+ messages in thread From: Franck @ 2006-01-17 17:32 UTC (permalink / raw) To: Git Mailing List Hi, I'm wondering why the "grafts" files is not involved during push/pull/clone operations ? Another question regarding grafting use case. Let's say I have my origin branch looks like: origin ---0---1---<snip>---300 000---300 001---300 002 Let's say that the 300 000th commit is where I started my work by using: $ git-checkout -b master <300 000 shaid> I do some work on master branch and get the following a---b---c---d master / origin ---0---1---...---300,000---300,001---300,002 Now, I would like to make my own public repository based on my work but before pushing master branch in that repo I would like to get rid of all unused commits [0 299,999]. Indeed each of these commits do not have useful history for my work. So I used grafts things to have: a---b---c---d master / origin 300,000---300,001---300,002 But now if I ask to git for: $ git-merge-base master origin # nothing So git failed to found the common commit object which should be 300,000. Why ? In other the hand, if I use grafting to get: a---b---c---d master / origin 2999,999---300,000---300,001---300,002 $ git-merge-base master origin 2dcaaf2decd31ac9a21d616604c0a7c1fa65d5a4 So now git found the common commit. Can anybody explain me why ? Do you think it's a good usage of git ? Or should I do otherwise to setup my public repository ? Thanks -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck @ 2006-01-18 17:47 ` Franck 2006-01-19 0:40 ` Junio C Hamano 1 sibling, 0 replies; 21+ messages in thread From: Franck @ 2006-01-18 17:47 UTC (permalink / raw) To: Git Mailing List Hi, Could anybody shed some light there ? It would be very nice. Thanks Franck 2006/1/17, Franck <vagabon.xyz@gmail.com>: > Hi, > > I'm wondering why the "grafts" files is not involved during > push/pull/clone operations ? > > Another question regarding grafting use case. Let's say I have my > origin branch looks like: > > origin ---0---1---<snip>---300 000---300 001---300 002 > > Let's say that the 300 000th commit is where I started my work by using: > > $ git-checkout -b master <300 000 shaid> > > I do some work on master branch and get the following > > a---b---c---d master > / > origin ---0---1---...---300,000---300,001---300,002 > > Now, I would like to make my own public repository based on my work > but before pushing master branch in that repo I would like to get rid > of all unused commits [0 299,999]. Indeed each of these commits do not > have useful history for my work. So I used grafts things to have: > > a---b---c---d master > / > origin 300,000---300,001---300,002 > > But now if I ask to git for: > > $ git-merge-base master origin > # nothing > > So git failed to found the common commit object which should be 300,000. Why ? > > In other the hand, if I use grafting to get: > > a---b---c---d master > / > origin 2999,999---300,000---300,001---300,002 > > $ git-merge-base master origin > 2dcaaf2decd31ac9a21d616604c0a7c1fa65d5a4 > > So now git found the common commit. Can anybody explain me why ? > > Do you think it's a good usage of git ? Or should I do otherwise to > setup my public repository ? > > Thanks > -- > Franck > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck 2006-01-18 17:47 ` Franck @ 2006-01-19 0:40 ` Junio C Hamano 2006-01-19 10:51 ` Franck ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Junio C Hamano @ 2006-01-19 0:40 UTC (permalink / raw) To: Franck; +Cc: Git Mailing List Franck <vagabon.xyz@gmail.com> writes: > I'm wondering why the "grafts" files is not involved during > push/pull/clone operations ? Commit ancestry grafting is a local repository issue and even if you manage to lie to your local git that 300,000th commit is the epoch, the commit object you send out to the downloader would record its true parent (or parents, if it is a merge), so the downloader would want to go further back. And no, rewriting that commit and feeding a parentless commit to the downloader is not an option, because such a commit object would have different object name and unpack-objects would be unhappy. If you choose not to have full history in your public repository for whatever reason (ISP server diskquota comes to mind) that is OK, but be honest about it to your downloaders. Tell them that you do not have the full history, and they first need to clone from some other repository you started your development upon, in order to use what you added upon. "This repository does not have all the history -- please first clone from XX repository (you need at least xxx commit), and then do another 'git pull' from here", or something like that. It _might_ work if you tell your downloader to have a proper graft file in his repository to cauterize the commit ancestry chain _before_ he pulls from you, though. I haven't tried it (and honestly I did not feel that is something important to support, so it might work by accident but that is not by design). > $ git-merge-base master origin > # nothing Maybe you did not use grafts properly to cauterize? I tried the following and am getting expected results. I did not have patience to do 300,000, so I cut things at #4, though. -- 8< -- #!/bin/sh rm -fr .git git init-db echo 0 >path git add path for i in 1 2 3 4 5 6 7 do echo $i >path git commit -a -m "Iteration #$i" git tag "iter#$i" done git checkout -b mine iter#4 for i in A B C D do echo $i >path git commit -a -m "Alternate #$i" git tag "alt#$i" done git log --pretty=oneline --topo-order echo merge base is `git merge-base master mine` | git name-rev --stdin git-rev-parse iter#4 >.git/info/grafts echo "Cauterize away history before #4" git log --pretty=oneline --topo-order echo merge base is `git merge-base master mine` | git name-rev --stdin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 0:40 ` Junio C Hamano @ 2006-01-19 10:51 ` Franck 2006-01-19 13:09 ` Petr Baudis 2006-01-19 18:24 ` Junio C Hamano 2006-01-19 11:10 ` Andreas Ericsson 2006-01-20 1:14 ` Junio C Hamano 2 siblings, 2 replies; 21+ messages in thread From: Franck @ 2006-01-19 10:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List Thanks Junio for answering 2006/1/19, Junio C Hamano <junkio@cox.net>: > Franck <vagabon.xyz@gmail.com> writes: > > > I'm wondering why the "grafts" files is not involved during > > push/pull/clone operations ? > > Commit ancestry grafting is a local repository issue and even if > you manage to lie to your local git that 300,000th commit is the > epoch, the commit object you send out to the downloader would > record its true parent (or parents, if it is a merge), so the > downloader would want to go further back. And no, rewriting > that commit and feeding a parentless commit to the downloader is > not an option, because such a commit object would have different > object name and unpack-objects would be unhappy. > > If you choose not to have full history in your public repository > for whatever reason (ISP server diskquota comes to mind) well, dealing with a repo that has more than 300,000 objects becomes a burden. A lots of git commands are slow, and cloning it take a while ! > that is > OK, but be honest about it to your downloaders. Tell them that > you do not have the full history, and they first need to clone > from some other repository you started your development upon, in > order to use what you added upon. "This repository does not > have all the history -- please first clone from XX repository > (you need at least xxx commit), and then do another 'git pull' > from here", or something like that. > I don't try to hide or lie to my downloaders. I just want them to avoid to deal with totaly pointless history. My work have been started recently and is based on current XX repository. IMHO storing, dealing with objects which are more than 10 years old is useless. I don't see why it is so bad to create a "grafted" repository ? I want it to be small but still want to merge by using git-resolve with XX repository. > > > $ git-merge-base master origin > > # nothing > > Maybe you did not use grafts properly to cauterize? Well in my graft file I did: $ cat > .git/info/grafts <shaid> <shaid> $ By reading "Documentation/repository-layout.txt", I thought it would have been the right thing to do. If I did the same like you did ie: $ cat > .git/info/grafts <shaid> $ It works. Thanks -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 10:51 ` Franck @ 2006-01-19 13:09 ` Petr Baudis 2006-01-19 16:58 ` Linus Torvalds 2006-01-19 18:24 ` Junio C Hamano 1 sibling, 1 reply; 21+ messages in thread From: Petr Baudis @ 2006-01-19 13:09 UTC (permalink / raw) To: Franck; +Cc: Junio C Hamano, Git Mailing List Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter where Franck <vagabon.xyz@gmail.com> said that... > well, dealing with a repo that has more than 300,000 objects becomes a > burden. A lots of git commands are slow, and cloning it take a while ! Were the objects packed? It would be interesting to have some data about how GIT performs with that much objects... -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 13:09 ` Petr Baudis @ 2006-01-19 16:58 ` Linus Torvalds 2006-01-19 17:30 ` Petr Baudis ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Linus Torvalds @ 2006-01-19 16:58 UTC (permalink / raw) To: Petr Baudis; +Cc: Franck, Junio C Hamano, Git Mailing List On Thu, 19 Jan 2006, Petr Baudis wrote: > > Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter > where Franck <vagabon.xyz@gmail.com> said that... > > well, dealing with a repo that has more than 300,000 objects becomes a > > burden. A lots of git commands are slow, and cloning it take a while ! > > Were the objects packed? It would be interesting to have some data about > how GIT performs with that much objects... The historical linux archive has a lot more than 300,000 objects. In fact, even the _current_ kernel archive has almost 200,000 objects. Maybe somebody was thinking "commits", not "objects". Something with 300,000 commits is indeed a pretty big project. Anyway, from a scalability standpoint, git should have no problem at all with tons of objects, as long as you pack the old history. There are a few things that get slower: - if you end up doing things that look at history, they are obviously at least linear is history size. Often there are other downsides too (using lots of memory). Example: try even just a simple "gitk" on the (regular, new) kernel archive, and it will take a while before the whole thing has been done. Of course, you'll see the top entries interactively, so mostly you won't care, but I routinely limit it some way just to make it not make the CPU fans come on. So I do something like gitk --since=1.week.ago gitk v2.6.15.. instead of plain gitk, just because it makes operations cheaper. - a full clone takes a long time. Git _could_ fairly easily have an extension to add a date specifier to clone too: git clone --since=1.month.ago <source> <dst> and just leave any older stuff (you could always fetch it later), but we've just never done it. Maybe we should. It _should_ be pretty simple to do from a conceptual standpoint. but "everyday" operations shouldn't slow down from having a long history. I can still apply 4-5 patches a second to the kernel archive, for example, as you can see from git log --pretty=fuller | grep CommitDate | less -S and looking for one of the patch series I've applied from Andrew.. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 16:58 ` Linus Torvalds @ 2006-01-19 17:30 ` Petr Baudis 2006-01-19 17:33 ` Franck 2006-01-19 18:24 ` Junio C Hamano 2 siblings, 0 replies; 21+ messages in thread From: Petr Baudis @ 2006-01-19 17:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Franck, Junio C Hamano, Git Mailing List Dear diary, on Thu, Jan 19, 2006 at 05:58:09PM CET, I got a letter where Linus Torvalds <torvalds@osdl.org> said that... > On Thu, 19 Jan 2006, Petr Baudis wrote: > > > > Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter > > where Franck <vagabon.xyz@gmail.com> said that... > > > well, dealing with a repo that has more than 300,000 objects becomes a > > > burden. A lots of git commands are slow, and cloning it take a while ! > > > > Were the objects packed? It would be interesting to have some data about > > how GIT performs with that much objects... > > The historical linux archive has a lot more than 300,000 objects. In fact, > even the _current_ kernel archive has almost 200,000 objects. Eek. I was burnt by git-count-objects' misleading name. I guess git-rev-list --objects --all | wc -l should give accurate results - 145941 for kernel repository back from December. I will follow up later with a patch for git-count-objects. > - a full clone takes a long time. Git _could_ fairly easily have an > extension to add a date specifier to clone too: > > git clone --since=1.month.ago <source> <dst> > > and just leave any older stuff (you could always fetch it later), but > we've just never done it. Maybe we should. It _should_ be pretty simple > to do from a conceptual standpoint. Yes. I receive wishes for this time by time and it is buried somewhere deep in my TODO list. I'm not sure how happy the GIT tools will be about invalid parent references. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 16:58 ` Linus Torvalds 2006-01-19 17:30 ` Petr Baudis @ 2006-01-19 17:33 ` Franck 2006-01-19 17:49 ` Linus Torvalds 2006-01-19 18:24 ` Junio C Hamano 2 siblings, 1 reply; 21+ messages in thread From: Franck @ 2006-01-19 17:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, Git Mailing List 2006/1/19, Linus Torvalds <torvalds@osdl.org>: > - a full clone takes a long time. Git _could_ fairly easily have an > extension to add a date specifier to clone too: > > git clone --since=1.month.ago <source> <dst> > > and just leave any older stuff (you could always fetch it later), but > we've just never done it. Maybe we should. It _should_ be pretty simple > to do from a conceptual standpoint. > that would be great ! something like: git clone --since=v2.6.15 <src> <dst> would be very useful for me. How would it work ? Does it automatically set up a graft file for me ? > but "everyday" operations shouldn't slow down from having a long history. but it's really a pain to run for example git-repack or git-prune commands. Thanks -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 17:33 ` Franck @ 2006-01-19 17:49 ` Linus Torvalds 0 siblings, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2006-01-19 17:49 UTC (permalink / raw) To: Franck; +Cc: Petr Baudis, Junio C Hamano, Git Mailing List On Thu, 19 Jan 2006, Franck wrote: > > that would be great ! something like: > > git clone --since=v2.6.15 <src> <dst> > > would be very useful for me. How would it work ? Does it automatically > set up a graft file for me ? I think we'd have to set up the grafts file, yes. However, it's actually less of an advantage than you'd think: especially for long development histories, the incremental packing is very very efficient. In contrast, if you only get recent versions, there's nothing to be incremental against, so the size of the pack will not be that much smaller. So getting just a tenth of the development history will _not_ cause the pack to be just a tenth in size. It's probably closer to half the size of the full history. Anyway, it's _conceptually_ something that git wouldn't have any problems with, but that doesn't mean that it's totally trivial either. The easiest way to do it (by far) would be to expand the native git protocol with a "get all objects of this one version" or something like that, and then you'd just do a "pull and mark all unknown commits in the grafts file". So in effect, instead of getting the whole history pack, you'd get a pack that contains _one_ version (no history at all), and then (if you want to) you can get a pack that gets all stuff that isn't reachable from that one (ie "newer"). That would have the advantage that it's quite possible that many users might want to do just git clone --only=v2.6.15 <source> <target> which would do that "one single version" variant of the clone. Then, later on, you could just do git pull --graft-unknown <source> <target> to update the history. Anybody want to try that? It would be a new command to "git-daemon" (instead of "git-upoload-pack", you'd do a new "git-upload-version" command internally: it would look a lot like upload-pack, and use the same unpacking protocol). > but it's really a pain to run for example git-repack or git-prune commands. Well, you really don't need to do that very often. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 16:58 ` Linus Torvalds 2006-01-19 17:30 ` Petr Baudis 2006-01-19 17:33 ` Franck @ 2006-01-19 18:24 ` Junio C Hamano 2 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2006-01-19 18:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, Franck, Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > - a full clone takes a long time. Git _could_ fairly easily have an > extension to add a date specifier to clone too: > > git clone --since=1.month.ago <source> <dst> > > and just leave any older stuff (you could always fetch it later), but > we've just never done it. Maybe we should. It _should_ be pretty simple > to do from a conceptual standpoint. True, except some implementation details you forgot to mention in your other message that you talked about upload-version. Both commit walkers and git native transfer fundamentally operate by trusting that our current refs are complete, which makes "could always fetch it later" part a bit involved. It fortunately would not be a rocket science. We would need to have a mode "do not trust our current refs are complete" with an explicit command line option, or automatically fall back to that mode when seeing the $GIT_DIR/info/grafts file has changed, and revalidate the commit ancestry chain we have in a repository cloned that way. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 10:51 ` Franck 2006-01-19 13:09 ` Petr Baudis @ 2006-01-19 18:24 ` Junio C Hamano 2006-01-20 13:43 ` Franck 1 sibling, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2006-01-19 18:24 UTC (permalink / raw) To: Franck; +Cc: Git Mailing List Franck <vagabon.xyz@gmail.com> writes: > I don't see why it is so bad to create a "grafted" repository ? I want > it to be small but still want to merge by using git-resolve with XX > repository. Franck, and people on the list, I have a bad habit of responding to a "call for help" request by stating how things are currently done and why, sometimes with an outline of how the limitation in the current way can be (or at least I think it could be, without testing that solution myself) worked around, but without making it explicit if the limitation is something that should not be there or if it is something fundamental. This often makes it sound as if I am saying I think the original request is unreasonable, and/or the current state of affairs is perfect. This is one of such cases. I agree it would be nice to support "strictly speaking, the repository is incomplete but has everything necessary as long as you operate near the tip of the development" mode of operation. It only has never been a high priority. > Well in my graft file I did: > > $ cat > .git/info/grafts > <shaid> <shaid> > > $ The trailing empty line at the end is discarded as a comment, I think, so that should be fine. "terminated by a newline" in the documentation talks about each line being terminated by a LF, not about terminating the file itself with an extra newline. I think you spotted a bug in a documentation and another in the code. I presume these two <shaid> are the same in what you did; you are saying "this commit has itself as its parent", but that can never be the case and the graft parser should reject such line and complain but I do not think the current code does so. The documentation says "a commit and its fake parents ... separated by a space and terminated by a newline". We should at least say "zero or more fake parents", or make it ever clearer by giving a couple of examples. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 18:24 ` Junio C Hamano @ 2006-01-20 13:43 ` Franck 0 siblings, 0 replies; 21+ messages in thread From: Franck @ 2006-01-20 13:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List 2006/1/19, Junio C Hamano <junkio@cox.net>: > Franck <vagabon.xyz@gmail.com> writes: > > > I don't see why it is so bad to create a "grafted" repository ? I want > > it to be small but still want to merge by using git-resolve with XX > > repository. > > Franck, and people on the list, > > I have a bad habit of responding to a "call for help" request by > stating how things are currently done and why, sometimes with an what ? Hey, I would say that you, Linus and other people on the list have a GREAT habit to spend time to answer others how things work. And there are usually accurate explanations, examples with a lot of details with them. Thanks ! -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 0:40 ` Junio C Hamano 2006-01-19 10:51 ` Franck @ 2006-01-19 11:10 ` Andreas Ericsson 2006-01-19 13:05 ` Petr Baudis 2006-01-19 13:31 ` Franck 2006-01-20 1:14 ` Junio C Hamano 2 siblings, 2 replies; 21+ messages in thread From: Andreas Ericsson @ 2006-01-19 11:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Franck, Git Mailing List Junio C Hamano wrote: > Franck <vagabon.xyz@gmail.com> writes: > > >>I'm wondering why the "grafts" files is not involved during >>push/pull/clone operations ? > > > Commit ancestry grafting is a local repository issue and even if > you manage to lie to your local git that 300,000th commit is the > epoch, the commit object you send out to the downloader would > record its true parent (or parents, if it is a merge), so the > downloader would want to go further back. And no, rewriting > that commit and feeding a parentless commit to the downloader is > not an option, because such a commit object would have different > object name and unpack-objects would be unhappy. > I'm a bit curious about how this was done for the public kernel repo. I'd like to import glibc to git, but keeping history since 1972 seems a bloody waste, really. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 11:10 ` Andreas Ericsson @ 2006-01-19 13:05 ` Petr Baudis 2006-01-19 13:31 ` Franck 1 sibling, 0 replies; 21+ messages in thread From: Petr Baudis @ 2006-01-19 13:05 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Junio C Hamano, Franck, Git Mailing List Dear diary, on Thu, Jan 19, 2006 at 12:10:23PM CET, I got a letter where Andreas Ericsson <ae@op5.se> said that... > Junio C Hamano wrote: > >Franck <vagabon.xyz@gmail.com> writes: > > > > > >>I'm wondering why the "grafts" files is not involved during > >>push/pull/clone operations ? > > > > > >Commit ancestry grafting is a local repository issue and even if > >you manage to lie to your local git that 300,000th commit is the > >epoch, the commit object you send out to the downloader would > >record its true parent (or parents, if it is a merge), so the > >downloader would want to go further back. And no, rewriting > >that commit and feeding a parentless commit to the downloader is > >not an option, because such a commit object would have different > >object name and unpack-objects would be unhappy. > > I'm a bit curious about how this was done for the public kernel repo. > I'd like to import glibc to git, but keeping history since 1972 seems a > bloody waste, really. FWIW, with the ELinks GIT repository we just started from scratch and then converted the old CVS repository, and provided this script in contrib/grafthistory.sh: #!/bin/sh # # Graft the ELinks development history to the current tree. # # Note that this will download about 80M. if [ -z "`which wget 2>/dev/null`" ]; then echo "Error: You need to have wget installed so that I can fetch the history." >&2 exit 1 fi [ "$GIT_DIR" ] || GIT_DIR=.git if ! [ -d "$GIT_DIR" ]; then echo "Error: You must run this from the project root (or set GIT_DIR to your .git directory)." >&2 exit 1 fi cd "$GIT_DIR" echo "[grafthistory] Downloading the history" mkdir -p objects/pack cd objects/pack wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d79124a48.idx wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d79124a48.pack echo "[grafthistory] Setting up the grafts" cd ../.. mkdir -p info # master echo 0f6d4310ad37550be3323fab80456e4953698bf0 06135dc2b8bb7ed2e441305bdaa82048396de633 >>info/grafts # REL_0_10 echo 43a9a406737fd22a8558c47c74b4ad04d4c92a2b 730242dcf2cdeed13eae7e8b0c5f47bb03326792 >>info/grafts echo "[grafthistory] Refreshing the dumb server info wrt. new packs" cd .. git-update-server-info So you checkout the ELinks repository and if you want the full history you just run this script and it does everything for you. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 11:10 ` Andreas Ericsson 2006-01-19 13:05 ` Petr Baudis @ 2006-01-19 13:31 ` Franck 2006-01-19 13:44 ` Andreas Ericsson 1 sibling, 1 reply; 21+ messages in thread From: Franck @ 2006-01-19 13:31 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Junio C Hamano, Git Mailing List 2006/1/19, Andreas Ericsson <ae@op5.se>: > Junio C Hamano wrote: > > Franck <vagabon.xyz@gmail.com> writes: > > > > > >>I'm wondering why the "grafts" files is not involved during > >>push/pull/clone operations ? > > > > > > Commit ancestry grafting is a local repository issue and even if > > you manage to lie to your local git that 300,000th commit is the > > epoch, the commit object you send out to the downloader would > > record its true parent (or parents, if it is a merge), so the > > downloader would want to go further back. And no, rewriting > > that commit and feeding a parentless commit to the downloader is > > not an option, because such a commit object would have different > > object name and unpack-objects would be unhappy. > > > > > I'm a bit curious about how this was done for the public kernel repo. > I'd like to import glibc to git, but keeping history since 1972 seems a > bloody waste, really. > That's exactly my point. Futhermore make your downloaders import that useless history spread this waste. I guess kernel repo will encounter this problem in short term. It's being bigger and bigger and developpers may be borred to deal with so many useless objects. But I'm not saying that it's bad thing to keep that history. It just would be nice to allow developpers that don't care about old history to get rid of it. Thanks -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 13:31 ` Franck @ 2006-01-19 13:44 ` Andreas Ericsson 2006-01-19 17:45 ` Petr Baudis 2006-01-20 20:48 ` Ryan Anderson 0 siblings, 2 replies; 21+ messages in thread From: Andreas Ericsson @ 2006-01-19 13:44 UTC (permalink / raw) To: Git Mailing List Franck wrote: > 2006/1/19, Andreas Ericsson <ae@op5.se>: >> >>I'm a bit curious about how this was done for the public kernel repo. >>I'd like to import glibc to git, but keeping history since 1972 seems a >>bloody waste, really. >> > > > That's exactly my point. Futhermore make your downloaders import that > useless history spread this waste. > > I guess kernel repo will encounter this problem in short term. It's > being bigger and bigger and developpers may be borred to deal with so > many useless objects. Ach, no. The current kernel repo only has history since April 17 (around 155 MB of objects, with less than optimal packing), when it started using git for versioning. The kernel repo also sees a lot of very rapid development. The full kernel tree, with history since 1991 or some such, is about 3.2 GB. It was for this reason that the early history was dropped. I don't think another drop will be necessary any time soon, since incremental updates are fairly cheap over git and git+ssh. Only gitk suffers, but that's just for a short while. > But I'm not saying that it's bad thing to keep > that history. It just would be nice to allow developpers that don't > care about old history to get rid of it. > You could ofcourse create a new repository with the files from the version you want, but then you'd have a hard time merging the two repos if you ever want to import the old history. Linus; Is this what you did with the public kernel repo? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 13:44 ` Andreas Ericsson @ 2006-01-19 17:45 ` Petr Baudis 2006-01-20 20:48 ` Ryan Anderson 1 sibling, 0 replies; 21+ messages in thread From: Petr Baudis @ 2006-01-19 17:45 UTC (permalink / raw) To: Andreas Ericsson, torvalds; +Cc: Git Mailing List Dear diary, on Thu, Jan 19, 2006 at 02:44:15PM CET, I got a letter where Andreas Ericsson <ae@op5.se> said that... > Ach, no. The current kernel repo only has history since April 17 (around > 155 MB of objects, with less than optimal packing), when it started > using git for versioning. The kernel repo also sees a lot of very rapid > development. > > The full kernel tree, with history since 1991 or some such, is about 3.2 > GB. There is some "accurate" history only from the moment the kernel got tracked in BK, and it is certainly far less. The question is, what is the "official" kernel history repository? There is at least http://www.kernel.org/pub/scm/linux/kernel/git/tglx/history.git with a 251M pack and http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git with a 165M pack - IIRC the latter is obsoleted by the former and perhaps should be blasted to prevent confusion? Getting a little offtopic here... Linus, would it be deemed useful to have the script I've pasted in <20060119130519.GB28365@pasky.or.cz> (earlier in this thread) in the kernel's scripts/ directory, pointing at the canonical history repository? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 13:44 ` Andreas Ericsson 2006-01-19 17:45 ` Petr Baudis @ 2006-01-20 20:48 ` Ryan Anderson 1 sibling, 0 replies; 21+ messages in thread From: Ryan Anderson @ 2006-01-20 20:48 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Git Mailing List On Thu, Jan 19, 2006 at 02:44:15PM +0100, Andreas Ericsson wrote: > > The full kernel tree, with history since 1991 or some such, is about 3.2 > GB. It was for this reason that the early history was dropped. I don't > think another drop will be necessary any time soon, since incremental > updates are fairly cheap over git and git+ssh. Only gitk suffers, but > that's just for a short while. Just to make sure this is corrected, the 3.2GB was for a fully unpacked tree, which is still fairly bad in the current tree. The historical tree, packed, runs about 266M in a single pack. Admittedly, I still refuse to try to run gitk on it. > >But I'm not saying that it's bad thing to keep > >that history. It just would be nice to allow developpers that don't > >care about old history to get rid of it. > > You could ofcourse create a new repository with the files from the > version you want, but then you'd have a hard time merging the two repos > if you ever want to import the old history. It's always possible to use a "graft" to tie the history together, and if you really need to merge changes across the boundary, my graft-ripple (in the archives) tool can make it happen, though it does some ... nasty things to the history tree in the process. (It might be useful on a throwaway tree to provide a way to merge, then, from which a set of diffs could be taken and applied back on an un-messy tree.) -- Ryan Anderson sometimes Pug Majere ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-19 0:40 ` Junio C Hamano 2006-01-19 10:51 ` Franck 2006-01-19 11:10 ` Andreas Ericsson @ 2006-01-20 1:14 ` Junio C Hamano 2006-01-20 10:07 ` Franck 2 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2006-01-20 1:14 UTC (permalink / raw) To: Franck; +Cc: Git Mailing List Junio C Hamano <junkio@cox.net> writes: > It _might_ work if you tell your downloader to have a proper > graft file in his repository to cauterize the commit ancestry > chain _before_ he pulls from you, though. I haven't tried it > (and honestly I did not feel that is something important to > support, so it might work by accident but that is not by > design). I just tried it and it actually works. $ git clone git.git junk $ cd junk ;# I am not brave enough to risk the real thing ;-) $ git rev-parse master~4 >.git/refs/info/grafts $ cd .. $ mkdir cloned $ cd cloned $ git init-db $ cp ../junk/.git/info/grafts .git/info/ $ git clone-pack ../baz $ git fsck-objects --full $ git log --pretty=short | cat This "only the tip of the git.git" repository has about 450 objects in it, fully packed because of clone-pack, with one 680K packfile. I think the true full history of git.git/ packed into one is aruond a 5MB packfile. I suspect a bigger repository would not see that much size reduction, as Linus already explained here. You could emulate what I just did above to prepare the equivalent of "baz" above, and make it available over git:// protocol, say at git://franck.example.com/franck.git/. Then you tell your downloaders something like this: This repository has been cauterized, and cannot be cloned in a usual manner, but once you make a clone everything including further incremental updates should work. To clone this repository: $ mkdir franckproject ;# make a new repository $ cd franckproject && git init-db $ echo 'XXxxxxXXxxx' >.git/info/grafts $ git clone-pack git://franck.example.com/franck.git/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-20 1:14 ` Junio C Hamano @ 2006-01-20 10:07 ` Franck 2006-01-20 17:59 ` Junio C Hamano 0 siblings, 1 reply; 21+ messages in thread From: Franck @ 2006-01-20 10:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List 2006/1/20, Junio C Hamano <junkio@cox.net>: > Junio C Hamano <junkio@cox.net> writes: > > > It _might_ work if you tell your downloader to have a proper > > graft file in his repository to cauterize the commit ancestry > > chain _before_ he pulls from you, though. I haven't tried it > > (and honestly I did not feel that is something important to > > support, so it might work by accident but that is not by > > design). > > I just tried it and it actually works. > > $ git clone git.git junk > $ cd junk ;# I am not brave enough to risk the real thing ;-) > $ git rev-parse master~4 >.git/refs/info/grafts > $ cd .. > $ mkdir cloned > $ cd cloned > $ git init-db > $ cp ../junk/.git/info/grafts .git/info/ > $ git clone-pack ../baz > $ git fsck-objects --full > $ git log --pretty=short | cat > Just to be sure, what you call baz is actually junk ? > This "only the tip of the git.git" repository has about 450 > objects in it, fully packed because of clone-pack, with one 680K > packfile. I tried that but I don't have same results. Did you delete all branchs except master before running clone-pack ? In my case I cloned the whole thing. So junk and cloned repos are the same size > I think the true full history of git.git/ packed into > one is aruond a 5MB packfile. I suspect a bigger repository > would not see that much size reduction, as Linus already > explained here. sorry, but I didn't understand his explaination, surely because of my very limited knowledge about git internals... > > You could emulate what I just did above to prepare the > equivalent of "baz" above, and make it available over git:// > protocol, say at git://franck.example.com/franck.git/. > does the git protocol is really needed in your example ? or can rsync work fine too since "franck.git" repo is cautorized so every objects of this repo shouldn't be part of old history, so they should be usefull, no ? Thanks. -- Franck ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUESTION] about .git/info/grafts file 2006-01-20 10:07 ` Franck @ 2006-01-20 17:59 ` Junio C Hamano 0 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2006-01-20 17:59 UTC (permalink / raw) To: Franck; +Cc: Git Mailing List Franck <vagabon.xyz@gmail.com> writes: > 2006/1/20, Junio C Hamano <junkio@cox.net>: > >> $ git clone git.git junk >> $ cd junk ;# I am not brave enough to risk the real thing ;-) >> $ git rev-parse master~4 >.git/refs/info/grafts Typo: 's|.git/refs/info/grafts|.git/info/grafts|' BTW the above exact sequence will not work with my "master" today, since I merged up bunch of things last night. You have to cauterize all the paths that lead to earlier history. For example, if I have this: ---o---o---x---o---o---o---o (master) \ / o---o---o cauterizing at master~4 ('x') will still leak history via the side branch, if you follow the history from the tip and go backwards. I have to also cauterize the merge commit after that to remove the side branch, or cauterize the leftmost branch point and live with a bit deeper history. The choice depends on how much real history I want to keep in the pruned history. For example, to pretend the history was like this: ---o---o x---o---o---o---o (master) \ o---o---o $ git rev-parse master~4 >.git/info/grafts ;# 'x' $ git rev-parse master~3 master~4 >.git/info/grafts The second line says master~3 (the one that comes after 'x') has only a single parent, which is master~4, in order to throw the side branch away [*1*]. Back to the original example... >> $ cd .. >> $ mkdir cloned >> $ cd cloned >> $ git init-db >> $ cp ../junk/.git/info/grafts .git/info/ >> $ git clone-pack ../baz There are a couple of typos here and that was the reason your experiment did not work. Sorry. The "clone-pack" should have been like this: $ git clone-pack ../junk master Packing 471 objects e7555785f4edcf4988c53305349e3f525216e2cb refs/heads/master $ git-rev-parse e7555785f >.git/refs/heads/master This 'cloned' is the lightweight one. > does the git protocol is really needed in your example ? or can rsync > work fine too since "franck.git" repo is cautorized so every objects > of this repo shouldn't be part of old history, so they should be > usefull, no ? rsync may for the initial clone but its use afterwards is frowned upon for other reasons these days. [Footnote] *1* There still is an anomaly if you look at "git log" after pruning side branch this way; master~3 commit is still shown as "merge". I think you could call it a bug, but I am not sure it is worth fixing. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-01-20 20:49 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cda58cb80601170928r252a6e34y@mail.gmail.com>
2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck
2006-01-18 17:47 ` Franck
2006-01-19 0:40 ` Junio C Hamano
2006-01-19 10:51 ` Franck
2006-01-19 13:09 ` Petr Baudis
2006-01-19 16:58 ` Linus Torvalds
2006-01-19 17:30 ` Petr Baudis
2006-01-19 17:33 ` Franck
2006-01-19 17:49 ` Linus Torvalds
2006-01-19 18:24 ` Junio C Hamano
2006-01-19 18:24 ` Junio C Hamano
2006-01-20 13:43 ` Franck
2006-01-19 11:10 ` Andreas Ericsson
2006-01-19 13:05 ` Petr Baudis
2006-01-19 13:31 ` Franck
2006-01-19 13:44 ` Andreas Ericsson
2006-01-19 17:45 ` Petr Baudis
2006-01-20 20:48 ` Ryan Anderson
2006-01-20 1:14 ` Junio C Hamano
2006-01-20 10:07 ` Franck
2006-01-20 17:59 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).