[QUESTION] about .git/info/grafts file

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [QUESTION] about .git/info/grafts file
       [not found] <cda58cb80601170928r252a6e34y@mail.gmail.com>
@ 2006-01-17 17:32 ` Franck
  2006-01-18 17:47   ` Franck
  2006-01-19  0:40   ` Junio C Hamano
  0 siblings, 2 replies; 21+ messages in thread
From: Franck @ 2006-01-17 17:32 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I'm wondering why the "grafts" files is not involved during
push/pull/clone operations ?

Another question regarding grafting use case. Let's say I have my
origin branch looks like:

               origin ---0---1---<snip>---300 000---300 001---300 002

Let's say that the 300 000th commit is where I started my work by using:

               $ git-checkout -b master <300 000 shaid>

I do some work on master branch and get the following

                                                 a---b---c---d master
                                                /
               origin ---0---1---...---300,000---300,001---300,002

Now, I would like to make my own public repository based on my work
but before pushing master branch in that repo I would like to get rid
of all unused commits [0 299,999]. Indeed each of these commits do not
have useful history for my work. So I used grafts things to have:

                              a---b---c---d master
                             /
               origin 300,000---300,001---300,002

But now if I ask to git for:

               $ git-merge-base master origin
               # nothing

So git failed to found the common commit object which should be 300,000. Why ?

In other the hand, if I use grafting to get:

                                               a---b---c---d master
                                              /
               origin 2999,999---300,000---300,001---300,002

              $ git-merge-base master origin
              2dcaaf2decd31ac9a21d616604c0a7c1fa65d5a4

So now git found the common commit. Can anybody explain me why ?

Do you think it's a good usage of git ? Or should I do otherwise to
setup my public repository ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck
@ 2006-01-18 17:47   ` Franck
  2006-01-19  0:40   ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Franck @ 2006-01-18 17:47 UTC (permalink / raw)
  To: Git Mailing List

Hi,

Could anybody shed some light there ? It would be very nice.

Thanks
                  Franck

2006/1/17, Franck <vagabon.xyz@gmail.com>:
> Hi,
>
> I'm wondering why the "grafts" files is not involved during
> push/pull/clone operations ?
>
> Another question regarding grafting use case. Let's say I have my
> origin branch looks like:
>
>                origin ---0---1---<snip>---300 000---300 001---300 002
>
> Let's say that the 300 000th commit is where I started my work by using:
>
>                $ git-checkout -b master <300 000 shaid>
>
> I do some work on master branch and get the following
>
>                                                  a---b---c---d master
>                                                 /
>                origin ---0---1---...---300,000---300,001---300,002
>
> Now, I would like to make my own public repository based on my work
> but before pushing master branch in that repo I would like to get rid
> of all unused commits [0 299,999]. Indeed each of these commits do not
> have useful history for my work. So I used grafts things to have:
>
>                               a---b---c---d master
>                              /
>                origin 300,000---300,001---300,002
>
> But now if I ask to git for:
>
>                $ git-merge-base master origin
>                # nothing
>
> So git failed to found the common commit object which should be 300,000. Why ?
>
> In other the hand, if I use grafting to get:
>
>                                                a---b---c---d master
>                                               /
>                origin 2999,999---300,000---300,001---300,002
>
>               $ git-merge-base master origin
>               2dcaaf2decd31ac9a21d616604c0a7c1fa65d5a4
>
> So now git found the common commit. Can anybody explain me why ?
>
> Do you think it's a good usage of git ? Or should I do otherwise to
> setup my public repository ?
>
> Thanks
> --
>                Franck
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck
  2006-01-18 17:47   ` Franck
@ 2006-01-19  0:40   ` Junio C Hamano
  2006-01-19 10:51     ` Franck
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Junio C Hamano @ 2006-01-19  0:40 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List

Franck <vagabon.xyz@gmail.com> writes:

> I'm wondering why the "grafts" files is not involved during
> push/pull/clone operations ?

Commit ancestry grafting is a local repository issue and even if
you manage to lie to your local git that 300,000th commit is the
epoch, the commit object you send out to the downloader would
record its true parent (or parents, if it is a merge), so the
downloader would want to go further back.  And no, rewriting
that commit and feeding a parentless commit to the downloader is
not an option, because such a commit object would have different
object name and unpack-objects would be unhappy.

If you choose not to have full history in your public repository
for whatever reason (ISP server diskquota comes to mind) that is
OK, but be honest about it to your downloaders.  Tell them that
you do not have the full history, and they first need to clone
from some other repository you started your development upon, in
order to use what you added upon.  "This repository does not
have all the history -- please first clone from XX repository
(you need at least xxx commit), and then do another 'git pull'
from here", or something like that.

It _might_ work if you tell your downloader to have a proper
graft file in his repository to cauterize the commit ancestry
chain _before_ he pulls from you, though.  I haven't tried it
(and honestly I did not feel that is something important to
support, so it might work by accident but that is not by
design).

>                $ git-merge-base master origin
>                # nothing

Maybe you did not use grafts properly to cauterize?  I tried the
following and am getting expected results.  I did not have
patience to do 300,000, so I cut things at #4, though.

-- 8< -- 

#!/bin/sh

rm -fr .git
git init-db
echo 0 >path
git add path

for i in 1 2 3 4 5 6 7
do
	echo $i >path
	git commit -a -m "Iteration #$i"
	git tag "iter#$i"
done

git checkout -b mine iter#4

for i in A B C D
do
	echo $i >path
	git commit -a -m "Alternate #$i"
	git tag "alt#$i"
done

git log --pretty=oneline --topo-order
echo merge base is `git merge-base master mine` | git name-rev --stdin

git-rev-parse iter#4 >.git/info/grafts
echo "Cauterize away history before #4"

git log --pretty=oneline --topo-order
echo merge base is `git merge-base master mine` | git name-rev --stdin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19  0:40   ` Junio C Hamano
@ 2006-01-19 10:51     ` Franck
  2006-01-19 13:09       ` Petr Baudis
  2006-01-19 18:24       ` Junio C Hamano
  2006-01-19 11:10     ` Andreas Ericsson
  2006-01-20  1:14     ` Junio C Hamano
  2 siblings, 2 replies; 21+ messages in thread
From: Franck @ 2006-01-19 10:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Thanks Junio for answering

2006/1/19, Junio C Hamano <junkio@cox.net>:
> Franck <vagabon.xyz@gmail.com> writes:
>
> > I'm wondering why the "grafts" files is not involved during
> > push/pull/clone operations ?
>
> Commit ancestry grafting is a local repository issue and even if
> you manage to lie to your local git that 300,000th commit is the
> epoch, the commit object you send out to the downloader would
> record its true parent (or parents, if it is a merge), so the
> downloader would want to go further back.  And no, rewriting
> that commit and feeding a parentless commit to the downloader is
> not an option, because such a commit object would have different
> object name and unpack-objects would be unhappy.
>
> If you choose not to have full history in your public repository
> for whatever reason (ISP server diskquota comes to mind)

well, dealing with a repo that has more than 300,000 objects becomes a
burden. A lots of git commands are slow, and cloning it take a while !

> that is
> OK, but be honest about it to your downloaders.  Tell them that
> you do not have the full history, and they first need to clone
> from some other repository you started your development upon, in
> order to use what you added upon.  "This repository does not
> have all the history -- please first clone from XX repository
> (you need at least xxx commit), and then do another 'git pull'
> from here", or something like that.
>

I don't try to hide or lie to my downloaders. I just want them to
avoid to deal with totaly pointless history. My work have been started
recently and is based on current XX repository. IMHO storing, dealing
with objects which are more than 10 years old is useless.

I don't see why it is so bad to create a "grafted" repository ? I want
it to be small but still want to merge by using git-resolve with XX
repository.

>
> >                $ git-merge-base master origin
> >                # nothing
>
> Maybe you did not use grafts properly to cauterize?

Well in my graft file I did:

                    $ cat > .git/info/grafts
                    <shaid> <shaid>

                    $

By reading "Documentation/repository-layout.txt", I thought it would
have been the right thing to do. If I did the same like you did ie:

                    $ cat > .git/info/grafts
                    <shaid>

                    $

It works.

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 10:51     ` Franck
@ 2006-01-19 13:09       ` Petr Baudis
  2006-01-19 16:58         ` Linus Torvalds
  2006-01-19 18:24       ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Petr Baudis @ 2006-01-19 13:09 UTC (permalink / raw)
  To: Franck; +Cc: Junio C Hamano, Git Mailing List

Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter
where Franck <vagabon.xyz@gmail.com> said that...
> well, dealing with a repo that has more than 300,000 objects becomes a
> burden. A lots of git commands are slow, and cloning it take a while !

Were the objects packed? It would be interesting to have some data about
how GIT performs with that much objects...

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 13:09       ` Petr Baudis
@ 2006-01-19 16:58         ` Linus Torvalds
  2006-01-19 17:30           ` Petr Baudis
                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Linus Torvalds @ 2006-01-19 16:58 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Franck, Junio C Hamano, Git Mailing List

On Thu, 19 Jan 2006, Petr Baudis wrote:
>
> Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter
> where Franck <vagabon.xyz@gmail.com> said that...
> > well, dealing with a repo that has more than 300,000 objects becomes a
> > burden. A lots of git commands are slow, and cloning it take a while !
> 
> Were the objects packed? It would be interesting to have some data about
> how GIT performs with that much objects...

The historical linux archive has a lot more than 300,000 objects. In fact, 
even the _current_ kernel archive has almost 200,000 objects.

Maybe somebody was thinking "commits", not "objects". Something with 
300,000 commits is indeed a pretty big project.

Anyway, from a scalability standpoint, git should have no problem at all 
with tons of objects, as long as you pack the old history. There are a few 
things that get slower:

 - if you end up doing things that look at history, they are obviously at 
   least linear is history size. Often there are other downsides too 
   (using lots of memory).

   Example: try even just a simple "gitk" on the (regular, new) kernel 
   archive, and it will take a while before the whole thing has been done. 
   Of course, you'll see the top entries interactively, so mostly you 
   won't care, but I routinely limit it some way just to make it not make 
   the CPU fans come on. So I do something like

	gitk --since=1.week.ago
	gitk v2.6.15..

   instead of plain gitk, just because it makes operations cheaper.

 - a full clone takes a long time. Git _could_ fairly easily have an 
   extension to add a date specifier to clone too:

	git clone --since=1.month.ago <source> <dst>

   and just leave any older stuff (you could always fetch it later), but 
   we've just never done it. Maybe we should. It _should_ be pretty simple 
   to do from a conceptual standpoint.

but "everyday" operations shouldn't slow down from having a long history. 
I can still apply 4-5 patches a second to the kernel archive, for example, 
as you can see from

	git log --pretty=fuller | grep CommitDate | less -S

and looking for one of the patch series I've applied from Andrew..

		Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 16:58         ` Linus Torvalds
@ 2006-01-19 17:30           ` Petr Baudis
  2006-01-19 17:33           ` Franck
  2006-01-19 18:24           ` Junio C Hamano
  2 siblings, 0 replies; 21+ messages in thread
From: Petr Baudis @ 2006-01-19 17:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Franck, Junio C Hamano, Git Mailing List

Dear diary, on Thu, Jan 19, 2006 at 05:58:09PM CET, I got a letter
where Linus Torvalds <torvalds@osdl.org> said that...
> On Thu, 19 Jan 2006, Petr Baudis wrote:
> >
> > Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter
> > where Franck <vagabon.xyz@gmail.com> said that...
> > > well, dealing with a repo that has more than 300,000 objects becomes a
> > > burden. A lots of git commands are slow, and cloning it take a while !
> > 
> > Were the objects packed? It would be interesting to have some data about
> > how GIT performs with that much objects...
> 
> The historical linux archive has a lot more than 300,000 objects. In fact, 
> even the _current_ kernel archive has almost 200,000 objects.

Eek. I was burnt by git-count-objects' misleading name. I guess

	git-rev-list --objects --all | wc -l

should give accurate results - 145941 for kernel repository back from
December. I will follow up later with a patch for git-count-objects.

>  - a full clone takes a long time. Git _could_ fairly easily have an 
>    extension to add a date specifier to clone too:
> 
> 	git clone --since=1.month.ago <source> <dst>
> 
>    and just leave any older stuff (you could always fetch it later), but 
>    we've just never done it. Maybe we should. It _should_ be pretty simple 
>    to do from a conceptual standpoint.

Yes. I receive wishes for this time by time and it is buried somewhere
deep in my TODO list. I'm not sure how happy the GIT tools will be about
invalid parent references.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 16:58         ` Linus Torvalds
  2006-01-19 17:30           ` Petr Baudis
@ 2006-01-19 17:33           ` Franck
  2006-01-19 17:49             ` Linus Torvalds
  2006-01-19 18:24           ` Junio C Hamano
  2 siblings, 1 reply; 21+ messages in thread
From: Franck @ 2006-01-19 17:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, Git Mailing List

2006/1/19, Linus Torvalds <torvalds@osdl.org>:
>  - a full clone takes a long time. Git _could_ fairly easily have an
>    extension to add a date specifier to clone too:
>
>         git clone --since=1.month.ago <source> <dst>
>
>    and just leave any older stuff (you could always fetch it later), but
>    we've just never done it. Maybe we should. It _should_ be pretty simple
>    to do from a conceptual standpoint.
>

that would be great ! something like:

        git clone --since=v2.6.15 <src> <dst>

would be very useful for me. How would it work ? Does it automatically
set up a graft file for me ?

> but "everyday" operations shouldn't slow down from having a long history.

but it's really a pain to run for example git-repack or git-prune commands.

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 17:33           ` Franck
@ 2006-01-19 17:49             ` Linus Torvalds
  0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2006-01-19 17:49 UTC (permalink / raw)
  To: Franck; +Cc: Petr Baudis, Junio C Hamano, Git Mailing List

On Thu, 19 Jan 2006, Franck wrote:
> 
> that would be great ! something like:
> 
>         git clone --since=v2.6.15 <src> <dst>
> 
> would be very useful for me. How would it work ? Does it automatically
> set up a graft file for me ?

I think we'd have to set up the grafts file, yes. However, it's actually 
less of an advantage than you'd think: especially for long development 
histories, the incremental packing is very very efficient. In contrast, if 
you only get recent versions, there's nothing to be incremental against, 
so the size of the pack will not be that much smaller.

So getting just a tenth of the development history will _not_ cause the 
pack to be just a tenth in size. It's probably closer to half the size of 
the full history.

Anyway, it's _conceptually_ something that git wouldn't have any problems 
with, but that doesn't mean that it's totally trivial either. The easiest 
way to do it (by far) would be to expand the native git protocol with a 
"get all objects of this one version" or something like that, and then 
you'd just do a "pull and mark all unknown commits in the grafts file".

So in effect, instead of getting the whole history pack, you'd get a pack 
that contains _one_ version (no history at all), and then (if you want to) 
you can get a pack that gets all stuff that isn't reachable from that one 
(ie "newer").

That would have the advantage that it's quite possible that many users 
might want to do just

	git clone --only=v2.6.15 <source> <target>

which would do that "one single version" variant of the clone. Then, later 
on, you could just do

	git pull --graft-unknown <source> <target>

to update the history.

Anybody want to try that? It would be a new command to "git-daemon" 
(instead of "git-upoload-pack", you'd do a new "git-upload-version" 
command internally: it would look a lot like upload-pack, and use the same 
unpacking protocol).

> but it's really a pain to run for example git-repack or git-prune commands.

Well, you really don't need to do that very often.

		Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 16:58         ` Linus Torvalds
  2006-01-19 17:30           ` Petr Baudis
  2006-01-19 17:33           ` Franck
@ 2006-01-19 18:24           ` Junio C Hamano
  2 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2006-01-19 18:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Franck, Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

>  - a full clone takes a long time. Git _could_ fairly easily have an 
>    extension to add a date specifier to clone too:
>
> 	git clone --since=1.month.ago <source> <dst>
>
>    and just leave any older stuff (you could always fetch it later), but 
>    we've just never done it. Maybe we should. It _should_ be pretty simple 
>    to do from a conceptual standpoint.

True, except some implementation details you forgot to mention
in your other message that you talked about upload-version.
Both commit walkers and git native transfer fundamentally
operate by trusting that our current refs are complete, which
makes "could always fetch it later" part a bit involved.

It fortunately would not be a rocket science.  We would need to
have a mode "do not trust our current refs are complete" with an
explicit command line option, or automatically fall back to that
mode when seeing the $GIT_DIR/info/grafts file has changed, and
revalidate the commit ancestry chain we have in a repository
cloned that way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 10:51     ` Franck
  2006-01-19 13:09       ` Petr Baudis
@ 2006-01-19 18:24       ` Junio C Hamano
  2006-01-20 13:43         ` Franck
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2006-01-19 18:24 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List

Franck <vagabon.xyz@gmail.com> writes:

> I don't see why it is so bad to create a "grafted" repository ? I want
> it to be small but still want to merge by using git-resolve with XX
> repository.

Franck, and people on the list,

I have a bad habit of responding to a "call for help" request by
stating how things are currently done and why, sometimes with an
outline of how the limitation in the current way can be (or at
least I think it could be, without testing that solution myself)
worked around, but without making it explicit if the limitation
is something that should not be there or if it is something
fundamental.  This often makes it sound as if I am saying I
think the original request is unreasonable, and/or the current
state of affairs is perfect.  This is one of such cases.

I agree it would be nice to support "strictly speaking, the
repository is incomplete but has everything necessary as long as
you operate near the tip of the development" mode of operation.

It only has never been a high priority.

> Well in my graft file I did:
>
>                     $ cat > .git/info/grafts
>                     <shaid> <shaid>
>
>                     $

The trailing empty line at the end is discarded as a comment, I
think, so that should be fine.  "terminated by a newline" in the
documentation talks about each line being terminated by a LF,
not about terminating the file itself with an extra newline.

I think you spotted a bug in a documentation and another in the
code.  I presume these two <shaid> are the same in what you did;
you are saying "this commit has itself as its parent", but that
can never be the case and the graft parser should reject such
line and complain but I do not think the current code does so.

The documentation says "a commit and its fake parents ...
separated by a space and terminated by a newline".  We should at
least say "zero or more fake parents", or make it ever clearer
by giving a couple of examples.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 18:24       ` Junio C Hamano
@ 2006-01-20 13:43         ` Franck
  0 siblings, 0 replies; 21+ messages in thread
From: Franck @ 2006-01-20 13:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

2006/1/19, Junio C Hamano <junkio@cox.net>:
> Franck <vagabon.xyz@gmail.com> writes:
>
> > I don't see why it is so bad to create a "grafted" repository ? I want
> > it to be small but still want to merge by using git-resolve with XX
> > repository.
>
> Franck, and people on the list,
>
> I have a bad habit of responding to a "call for help" request by
> stating how things are currently done and why, sometimes with an

what ? Hey, I would say that you, Linus and other people on the list
have a GREAT habit to spend time to answer others how things work. And
there are usually accurate explanations, examples with a lot of
details with them.

Thanks !
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19  0:40   ` Junio C Hamano
  2006-01-19 10:51     ` Franck
@ 2006-01-19 11:10     ` Andreas Ericsson
  2006-01-19 13:05       ` Petr Baudis
  2006-01-19 13:31       ` Franck
  2006-01-20  1:14     ` Junio C Hamano
  2 siblings, 2 replies; 21+ messages in thread
From: Andreas Ericsson @ 2006-01-19 11:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Franck, Git Mailing List

Junio C Hamano wrote:
> Franck <vagabon.xyz@gmail.com> writes:
> 
> 
>>I'm wondering why the "grafts" files is not involved during
>>push/pull/clone operations ?
> 
> 
> Commit ancestry grafting is a local repository issue and even if
> you manage to lie to your local git that 300,000th commit is the
> epoch, the commit object you send out to the downloader would
> record its true parent (or parents, if it is a merge), so the
> downloader would want to go further back.  And no, rewriting
> that commit and feeding a parentless commit to the downloader is
> not an option, because such a commit object would have different
> object name and unpack-objects would be unhappy.
> 


I'm a bit curious about how this was done for the public kernel repo. 
I'd like to import glibc to git, but keeping history since 1972 seems a 
bloody waste, really.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 11:10     ` Andreas Ericsson
@ 2006-01-19 13:05       ` Petr Baudis
  2006-01-19 13:31       ` Franck
  1 sibling, 0 replies; 21+ messages in thread
From: Petr Baudis @ 2006-01-19 13:05 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Junio C Hamano, Franck, Git Mailing List

Dear diary, on Thu, Jan 19, 2006 at 12:10:23PM CET, I got a letter
where Andreas Ericsson <ae@op5.se> said that...
> Junio C Hamano wrote:
> >Franck <vagabon.xyz@gmail.com> writes:
> >
> >
> >>I'm wondering why the "grafts" files is not involved during
> >>push/pull/clone operations ?
> >
> >
> >Commit ancestry grafting is a local repository issue and even if
> >you manage to lie to your local git that 300,000th commit is the
> >epoch, the commit object you send out to the downloader would
> >record its true parent (or parents, if it is a merge), so the
> >downloader would want to go further back.  And no, rewriting
> >that commit and feeding a parentless commit to the downloader is
> >not an option, because such a commit object would have different
> >object name and unpack-objects would be unhappy.
> 
> I'm a bit curious about how this was done for the public kernel repo. 
> I'd like to import glibc to git, but keeping history since 1972 seems a 
> bloody waste, really.

FWIW, with the ELinks GIT repository we just started from scratch and
then converted the old CVS repository, and provided this script in
contrib/grafthistory.sh:


#!/bin/sh
#
# Graft the ELinks development history to the current tree.
#
# Note that this will download about 80M.

if [ -z "`which wget 2>/dev/null`" ]; then
  echo "Error: You need to have wget installed so that I can fetch the history." >&2
  exit 1
fi

[ "$GIT_DIR" ] || GIT_DIR=.git
if ! [ -d "$GIT_DIR" ]; then
  echo "Error: You must run this from the project root (or set GIT_DIR to your .git directory)." >&2
  exit 1
fi
cd "$GIT_DIR"

echo "[grafthistory] Downloading the history"
mkdir -p objects/pack
cd objects/pack
wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d79124a48.idx
wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d79124a48.pack

echo "[grafthistory] Setting up the grafts"
cd ../..
mkdir -p info
# master
echo 0f6d4310ad37550be3323fab80456e4953698bf0 06135dc2b8bb7ed2e441305bdaa82048396de633 >>info/grafts
# REL_0_10
echo 43a9a406737fd22a8558c47c74b4ad04d4c92a2b 730242dcf2cdeed13eae7e8b0c5f47bb03326792 >>info/grafts

echo "[grafthistory] Refreshing the dumb server info wrt. new packs"
cd ..
git-update-server-info


So you checkout the ELinks repository and if you want the full history
you just run this script and it does everything for you.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 11:10     ` Andreas Ericsson
  2006-01-19 13:05       ` Petr Baudis
@ 2006-01-19 13:31       ` Franck
  2006-01-19 13:44         ` Andreas Ericsson
  1 sibling, 1 reply; 21+ messages in thread
From: Franck @ 2006-01-19 13:31 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Junio C Hamano, Git Mailing List

2006/1/19, Andreas Ericsson <ae@op5.se>:
> Junio C Hamano wrote:
> > Franck <vagabon.xyz@gmail.com> writes:
> >
> >
> >>I'm wondering why the "grafts" files is not involved during
> >>push/pull/clone operations ?
> >
> >
> > Commit ancestry grafting is a local repository issue and even if
> > you manage to lie to your local git that 300,000th commit is the
> > epoch, the commit object you send out to the downloader would
> > record its true parent (or parents, if it is a merge), so the
> > downloader would want to go further back.  And no, rewriting
> > that commit and feeding a parentless commit to the downloader is
> > not an option, because such a commit object would have different
> > object name and unpack-objects would be unhappy.
> >
>
>
> I'm a bit curious about how this was done for the public kernel repo.
> I'd like to import glibc to git, but keeping history since 1972 seems a
> bloody waste, really.
>

That's exactly my point. Futhermore make your downloaders import that
useless history spread this waste.

I guess kernel repo will encounter this problem in short term. It's
being bigger and bigger and developpers may be borred to deal with so
many useless objects. But I'm not saying that it's bad thing to keep
that history. It just would be nice to allow developpers that don't
care about old history to get rid of it.

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 13:31       ` Franck
@ 2006-01-19 13:44         ` Andreas Ericsson
  2006-01-19 17:45           ` Petr Baudis
  2006-01-20 20:48           ` Ryan Anderson
  0 siblings, 2 replies; 21+ messages in thread
From: Andreas Ericsson @ 2006-01-19 13:44 UTC (permalink / raw)
  To: Git Mailing List

Franck wrote:
> 2006/1/19, Andreas Ericsson <ae@op5.se>:
>>
>>I'm a bit curious about how this was done for the public kernel repo.
>>I'd like to import glibc to git, but keeping history since 1972 seems a
>>bloody waste, really.
>>
> 
> 
> That's exactly my point. Futhermore make your downloaders import that
> useless history spread this waste.
> 
> I guess kernel repo will encounter this problem in short term. It's
> being bigger and bigger and developpers may be borred to deal with so
> many useless objects.

Ach, no. The current kernel repo only has history since April 17 (around 
155 MB of objects, with less than optimal packing), when it started 
using git for versioning. The kernel repo also sees a lot of very rapid 
development.

The full kernel tree, with history since 1991 or some such, is about 3.2 
GB. It was for this reason that the early history was dropped. I don't 
think another drop will be necessary any time soon, since incremental 
updates are fairly cheap over git and git+ssh. Only gitk suffers, but 
that's just for a short while.

> But I'm not saying that it's bad thing to keep
> that history. It just would be nice to allow developpers that don't
> care about old history to get rid of it.
> 

You could ofcourse create a new repository with the files from the 
version you want, but then you'd have a hard time merging the two repos 
if you ever want to import the old history.

Linus; Is this what you did with the public kernel repo?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 13:44         ` Andreas Ericsson
@ 2006-01-19 17:45           ` Petr Baudis
  2006-01-20 20:48           ` Ryan Anderson
  1 sibling, 0 replies; 21+ messages in thread
From: Petr Baudis @ 2006-01-19 17:45 UTC (permalink / raw)
  To: Andreas Ericsson, torvalds; +Cc: Git Mailing List

Dear diary, on Thu, Jan 19, 2006 at 02:44:15PM CET, I got a letter
where Andreas Ericsson <ae@op5.se> said that...
> Ach, no. The current kernel repo only has history since April 17 (around 
> 155 MB of objects, with less than optimal packing), when it started 
> using git for versioning. The kernel repo also sees a lot of very rapid 
> development.
> 
> The full kernel tree, with history since 1991 or some such, is about 3.2 
> GB.

There is some "accurate" history only from the moment the kernel got
tracked in BK, and it is certainly far less.

The question is, what is the "official" kernel history repository?
There is at least

	http://www.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

with a 251M pack and

	http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git

with a 165M pack - IIRC the latter is obsoleted by the former and
perhaps should be blasted to prevent confusion?

Getting a little offtopic here... Linus, would it be deemed useful to
have the script I've pasted in <20060119130519.GB28365@pasky.or.cz>
(earlier in this thread) in the kernel's scripts/ directory, pointing at
the canonical history repository?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19 13:44         ` Andreas Ericsson
  2006-01-19 17:45           ` Petr Baudis
@ 2006-01-20 20:48           ` Ryan Anderson
  1 sibling, 0 replies; 21+ messages in thread
From: Ryan Anderson @ 2006-01-20 20:48 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Git Mailing List

On Thu, Jan 19, 2006 at 02:44:15PM +0100, Andreas Ericsson wrote:
> 
> The full kernel tree, with history since 1991 or some such, is about 3.2 
> GB. It was for this reason that the early history was dropped. I don't 
> think another drop will be necessary any time soon, since incremental 
> updates are fairly cheap over git and git+ssh. Only gitk suffers, but 
> that's just for a short while.

Just to make sure this is corrected, the 3.2GB was for a fully unpacked
tree, which is still fairly bad in the current tree.

The historical tree, packed, runs about 266M in a single pack.
Admittedly, I still refuse to try to run gitk on it.

> >But I'm not saying that it's bad thing to keep
> >that history. It just would be nice to allow developpers that don't
> >care about old history to get rid of it.
> 
> You could ofcourse create a new repository with the files from the 
> version you want, but then you'd have a hard time merging the two repos 
> if you ever want to import the old history.

It's always possible to use a "graft" to tie the history together, and
if you really need to merge changes across the boundary, my graft-ripple
(in the archives) tool can make it happen, though it does some ... nasty
things to the history tree in the process.  (It might be useful on a
throwaway tree to provide a way to merge, then, from which a set of
diffs could be taken and applied back on an un-messy tree.)

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-19  0:40   ` Junio C Hamano
  2006-01-19 10:51     ` Franck
  2006-01-19 11:10     ` Andreas Ericsson
@ 2006-01-20  1:14     ` Junio C Hamano
  2006-01-20 10:07       ` Franck
  2 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2006-01-20  1:14 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List

Junio C Hamano <junkio@cox.net> writes:

> It _might_ work if you tell your downloader to have a proper
> graft file in his repository to cauterize the commit ancestry
> chain _before_ he pulls from you, though.  I haven't tried it
> (and honestly I did not feel that is something important to
> support, so it might work by accident but that is not by
> design).

I just tried it and it actually works.

	$ git clone git.git junk
        $ cd junk ;# I am not brave enough to risk the real thing ;-)
	$ git rev-parse master~4 >.git/refs/info/grafts
        $ cd ..
        $ mkdir cloned
        $ cd cloned
        $ git init-db
        $ cp ../junk/.git/info/grafts .git/info/
	$ git clone-pack ../baz
	$ git fsck-objects --full
	$ git log --pretty=short | cat

This "only the tip of the git.git" repository has about 450
objects in it, fully packed because of clone-pack, with one 680K
packfile.  I think the true full history of git.git/ packed into
one is aruond a 5MB packfile.  I suspect a bigger repository
would not see that much size reduction, as Linus already
explained here.

You could emulate what I just did above to prepare the
equivalent of "baz" above, and make it available over git://
protocol, say at git://franck.example.com/franck.git/.

Then you tell your downloaders something like this:

	This repository has been cauterized, and cannot be
	cloned in a usual manner, but once you make a clone
	everything including further incremental updates should
	work.

        To clone this repository:

		$ mkdir franckproject ;# make a new repository
		$ cd franckproject && git init-db
		$ echo 'XXxxxxXXxxx' >.git/info/grafts
		$ git clone-pack git://franck.example.com/franck.git/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-20  1:14     ` Junio C Hamano
@ 2006-01-20 10:07       ` Franck
  2006-01-20 17:59         ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Franck @ 2006-01-20 10:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

2006/1/20, Junio C Hamano <junkio@cox.net>:
> Junio C Hamano <junkio@cox.net> writes:
>
> > It _might_ work if you tell your downloader to have a proper
> > graft file in his repository to cauterize the commit ancestry
> > chain _before_ he pulls from you, though.  I haven't tried it
> > (and honestly I did not feel that is something important to
> > support, so it might work by accident but that is not by
> > design).
>
> I just tried it and it actually works.
>
>         $ git clone git.git junk
>         $ cd junk ;# I am not brave enough to risk the real thing ;-)
>         $ git rev-parse master~4 >.git/refs/info/grafts
>         $ cd ..
>         $ mkdir cloned
>         $ cd cloned
>         $ git init-db
>         $ cp ../junk/.git/info/grafts .git/info/
>         $ git clone-pack ../baz
>         $ git fsck-objects --full
>         $ git log --pretty=short | cat
>

Just to be sure, what you call baz is actually junk ?

> This "only the tip of the git.git" repository has about 450
> objects in it, fully packed because of clone-pack, with one 680K
> packfile.

I tried that but I don't have same results. Did you delete all branchs
except master before running clone-pack ? In my case I cloned the
whole thing. So junk and cloned repos are the same size

> I think the true full history of git.git/ packed into
> one is aruond a 5MB packfile.  I suspect a bigger repository
> would not see that much size reduction, as Linus already
> explained here.

sorry, but I didn't understand his explaination, surely because of my
very limited knowledge about git internals...

>
> You could emulate what I just did above to prepare the
> equivalent of "baz" above, and make it available over git://
> protocol, say at git://franck.example.com/franck.git/.
>

does the git protocol is really needed in your example ? or can rsync
work fine too since "franck.git" repo is cautorized so every objects
of this repo shouldn't be part of old history, so they should be
usefull, no ?

Thanks.
--
               Franck

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUESTION] about .git/info/grafts file
  2006-01-20 10:07       ` Franck
@ 2006-01-20 17:59         ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2006-01-20 17:59 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List

Franck <vagabon.xyz@gmail.com> writes:

> 2006/1/20, Junio C Hamano <junkio@cox.net>:
>
>>         $ git clone git.git junk
>>         $ cd junk ;# I am not brave enough to risk the real thing ;-)
>>         $ git rev-parse master~4 >.git/refs/info/grafts

Typo: 	's|.git/refs/info/grafts|.git/info/grafts|'

BTW the above exact sequence will not work with my "master"
today, since I merged up bunch of things last night.  You have
to cauterize all the paths that lead to earlier history.  For
example, if I have this:

   ---o---o---x---o---o---o---o (master)
       \         /
        o---o---o

cauterizing at master~4 ('x') will still leak history via the
side branch, if you follow the history from the tip and go
backwards.  I have to also cauterize the merge commit after that
to remove the side branch, or cauterize the leftmost branch
point and live with a bit deeper history.  The choice depends on
how much real history I want to keep in the pruned history.

For example, to pretend the history was like this:

   ---o---o   x---o---o---o---o (master)
       \         
        o---o---o

	$ git rev-parse master~4 >.git/info/grafts ;# 'x'
        $ git rev-parse master~3 master~4 >.git/info/grafts

The second line says master~3 (the one that comes after 'x') has
only a single parent, which is master~4, in order to throw the
side branch away [*1*].

Back to the original example...

>>         $ cd ..
>>         $ mkdir cloned
>>         $ cd cloned
>>         $ git init-db
>>         $ cp ../junk/.git/info/grafts .git/info/
>>         $ git clone-pack ../baz

There are a couple of typos here and that was the reason your
experiment did not work.  Sorry.  The "clone-pack" should have
been like this:

        $ git clone-pack ../junk master
        Packing 471 objects
        e7555785f4edcf4988c53305349e3f525216e2cb refs/heads/master
	$ git-rev-parse e7555785f >.git/refs/heads/master

This 'cloned' is the lightweight one.

> does the git protocol is really needed in your example ? or can rsync
> work fine too since "franck.git" repo is cautorized so every objects
> of this repo shouldn't be part of old history, so they should be
> usefull, no ?

rsync may for the initial clone but its use afterwards is
frowned upon for other reasons these days.

[Footnote]

*1* There still is an anomaly if you look at "git log" after
pruning side branch this way; master~3 commit is still shown as
"merge".  I think you could call it a bug, but I am not sure it
is worth fixing.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-01-20 20:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cda58cb80601170928r252a6e34y@mail.gmail.com>
2006-01-17 17:32 ` [QUESTION] about .git/info/grafts file Franck
2006-01-18 17:47   ` Franck
2006-01-19  0:40   ` Junio C Hamano
2006-01-19 10:51     ` Franck
2006-01-19 13:09       ` Petr Baudis
2006-01-19 16:58         ` Linus Torvalds
2006-01-19 17:30           ` Petr Baudis
2006-01-19 17:33           ` Franck
2006-01-19 17:49             ` Linus Torvalds
2006-01-19 18:24           ` Junio C Hamano
2006-01-19 18:24       ` Junio C Hamano
2006-01-20 13:43         ` Franck
2006-01-19 11:10     ` Andreas Ericsson
2006-01-19 13:05       ` Petr Baudis
2006-01-19 13:31       ` Franck
2006-01-19 13:44         ` Andreas Ericsson
2006-01-19 17:45           ` Petr Baudis
2006-01-20 20:48           ` Ryan Anderson
2006-01-20  1:14     ` Junio C Hamano
2006-01-20 10:07       ` Franck
2006-01-20 17:59         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).