Git development
 help / color / mirror / Atom feed
* Re: several quick questions
From: Junio C Hamano @ 2006-02-14 22:39 UTC (permalink / raw)
  To: Carl Worth; +Cc: git, Linus Torvalds
In-Reply-To: <87d5hpvc8p.wl%cworth@cworth.org>

Carl Worth <cworth@cworth.org> writes:

> You've pointed out that branches are free in terms of what git has to
> do. I'm saying that they're not free for the user who bears the cost
> of inventing a name. And in the case of any commit-while-seeking, it's
> at the time of the commit itself that the user has enough information
> to invent a useful name, not prior to seeking, (when the user is still
> trying to figure things out).

I think this is a very valid point and I am happy to accept a
workable proposal (does not have to be a working patch, but a
general semantics that covers most of if not all the corner
cases).

^ permalink raw reply

* Re: several quick questions
From: Andreas Ericsson @ 2006-02-14 23:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, git
In-Reply-To: <7v7j7xr54u.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:
> 
> 
>>Why not allow something like
>>
>>	git-checkout master~5
>>
>>which implicitly does create a read-only branch "seek-point"?
> 
> 
> Now what does "git-checkout branch" mean?  Does it switch to the
> branch, or does it force tip of seek-point to be the tip of
> branch and switch to seek-point branch?  More interestingly,
> what does "git-checkout seek-point" mean? 
> 
> If we _were_ to do something like cg-seek where an implicit
> throw-away branch is used, you at least need a way to
> disambiguate these cases, and "git seek" originally suggested is
> far clearer than what you said above.
> 

Nah. What's the point of having another protected name. Just allow

	$ git checkout -b discard HEAD~15

and we're good to go.

> Having said that, I am not convinced in either way, though.
> 
> 
>>A branch could be marked readonly by above command with
>>
>>	chmod a-w .git/refs/heads/seek
> 
> 
> I do not think that would work.  Have you tried it?
> 

It wouldn't on cygwin, for one. I'm against having things work 
differently on different platforms. If nothing else it usually worsens 
the bitrot that always happens to documentation.

> 
>>And git-commit should refuse to commit on a readonly ref, telling
>>the user to create a writable branch before with "git-branch new".
> 
> 
> Now, read-only ref does not interest me, but "do not commit on
> top of this yourself, only fast-forward from somewhere else is
> allowed" may be useful, for the reason why you mentioned
> "origin".
> 

Do my suggestion and you wouldn't have to worry about read-only 
branches, and although merging any changes from it might be more trouble 
than its worth, it might be possible to cherry-pick the commit rather 
than reverting and re-applying it.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: cg-clean, cg-status improvements
From: Petr Baudis @ 2006-02-14 23:02 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1139941032.26723.4.camel@dv>

Dear diary, on Tue, Feb 14, 2006 at 07:17:12PM CET, I got a letter
where Pavel Roskin <proski@gnu.org> said that...
> On Tue, 2006-02-14 at 16:53 +0100, Petr Baudis wrote:
> >   I didn't plan to require git 1.2.0 with 0.17, so it would be better if
> > you could do the workaround. But if the workaround means significant
> > hassle, it's no biggie if git 1.2.0 will be required.
> 
> It turns out a proper workaround can only be implemented in cg-Xlib, not
> in cg-clean.  It's a bit hairy for my taste (a bash guru could write it
> better, I believe), but it's a compact blob of code that can be easily
> removed at any time.

Wow, how you managed to simplify and shrink cg-clean sounds really
impressive! :)

Thanks, both applied and pushed out.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: What's in git.git
From: Luck, Tony @ 2006-02-14 23:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3bis59un.fsf@assigned-by-dhcp.cox.net>

> >   (1) First merge "master" into "topic":
> >
> >         $ git checkout topic
> >         $ git pull . master
> >
> >   (3) Pick commits only on "topic" branch but not on "master"
> >
> >         $ git rev-list --pretty --no-merges master..topic >P.log
> >
> >       This will pick up the three 'o' commits on the lower
> >       development track and show their commit log message.
> >
> >
> > This obviously would work equally well for single strand of
> > pearls case.  Maybe you can package the above up, and send in a
> > patch to add "git-squash" command?
> 
> I am stupid.  (4) can be done a lot more easily.  Do not do step
> (2) -- you do not need a diff at all.  But do do step (3) to get
> the logs.  Then:
> 
> 	$ git reset --soft master
>         $ git commit -F P.log -e

Yes, that all seems to work as advertised.  One extra wrinkle was to
preserve the author information by grepping out the last Author: line
from the log.  Here's the work-in-progress version of git-squash (I
don't have a "master" branch, so I stuck in the "mbranch" shell variable
so there is only one place for me to change ... to mbranch=linus).

Any style (or other) comments?  If not I'll package into patch format
with a manual page in a few days.

-Tony

#!/bin/sh

. git-sh-setup

branch="$1"
mbranch=master

if [ ! -f .git/refs/heads/"$branch" ]
then
	die "Can't see branch '$branch'"
fi

if [ -f .git/refs/heads/"$branch"-unsquashed ]
then
	die "Branch '$branch' has been squashed before"
fi

cp .git/refs/heads/"$branch" .git/refs/heads/"$branch"-unsquashed

git checkout "$branch" || die "Couldn't checkout '$branch'"

git pull . $mbranch || die "Can't merge $mbranch into $branch"

git-rev-list --pretty --no-merges $mbranch..$branch > /tmp/git-squash-$$

git reset --soft $mbranch

author=$(sed -n -e  '/^Author: /s///p' /tmp/git-squash-$$ | tail -1)

git commit --author "$author" -F /tmp/git-squash-$$ -e

rm -f /tmp/git-squash-$$

^ permalink raw reply

* Re: several quick questions
From: Johannes Schindelin @ 2006-02-14 23:23 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Junio C Hamano, Josef Weidendorfer, git
In-Reply-To: <43F26129.4040804@op5.se>

Hi,

On Wed, 15 Feb 2006, Andreas Ericsson wrote:

> What's the point of having another protected name. Just allow
> 
> 	$ git checkout -b discard HEAD~15
> 
> and we're good to go.

Last time I checked (2 hours ago) it did exactly what you want it to.

Hth,
Dscho

^ permalink raw reply

* Re: several quick questions
From: Andreas Ericsson @ 2006-02-15  0:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Josef Weidendorfer, git
In-Reply-To: <Pine.LNX.4.63.0602150022470.24570@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin wrote:
> Hi,
> 
> On Wed, 15 Feb 2006, Andreas Ericsson wrote:
> 
> 
>>What's the point of having another protected name. Just allow
>>
>>	$ git checkout -b discard HEAD~15
>>
>>and we're good to go.
> 
> 
> Last time I checked (2 hours ago) it did exactly what you want it to.
> 

Heh. You're right. :) Didn't grok that from the man-page. I'm so used to 
seeing <commit-ish> everywhere that when it says "<branch> can be any 
object that refers to a commit" I get confused.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Sam Vilain @ 2006-02-15  0:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Chuck Lever, Karl Hasselström, Catalin Marinas, git
In-Reply-To: <20060214222913.GK31278@pasky.or.cz>

Petr Baudis wrote:
>>my impression of git is that you don't change stuff that's already 
>>committed.  you revert changes by applying a new commit that backs out 
>>the original changes.  i'm speculating, but i suspect that's why there's 
>>a "stg pick --reverse" and not a "stg uncommit."
> It is ok as long as you know what are you doing - if you don't push out
> the commits you've just "undid" (or work on a public accessible
> repository in the first place, but I think that's kind of rare these
> days; quick survey - does anyone reading these lines do that?), there's
> nothing wrong on it, and it gives you nice flexibility.

Yes, and this is one problem I envision with publishing a git repository
with an stgit stack applied - somebody later doing a pull of it will not
find the head revision they had.  I'm not sure what the net effect of
this will be, though.

Sam.

^ permalink raw reply

* Re: several quick questions
From: Junio C Hamano @ 2006-02-15  0:34 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Johannes Schindelin, Josef Weidendorfer, git
In-Reply-To: <43F270F5.7070804@op5.se>

Andreas Ericsson <ae@op5.se> writes:

> Heh. You're right. :) Didn't grok that from the man-page. I'm so used
> to seeing <commit-ish> everywhere that when it says "<branch> can be
> any object that refers to a commit" I get confused.

You are right; the documentation is wrong.  It says

'git-checkout' [-f] [-b <new_branch>] [-m] [<branch>] [<paths>...]

It should have said something like:

git-checkout [ -f | -m ] <branch>    		or
git-checkout [-b <new_branch>] <committish>	or
git-checkout [<committish> | -- ] <paths>...

The first form is to switch to a branch (with flag to say what
to do when conflict can lose local modification); the second
form is to create a new branch out of comittish and switch to
it; and the third is not switching branches but just checking
out named paths out of index or arbitrary comittish (I think any
ent should do but I have not verified it).

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Shawn Pearce @ 2006-02-15  0:35 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Petr Baudis, Chuck Lever, Karl Hasselström, Catalin Marinas,
	git
In-Reply-To: <43F2745D.4010800@vilain.net>

Sam Vilain <sam@vilain.net> wrote:
> Petr Baudis wrote:
> >>my impression of git is that you don't change stuff that's already 
> >>committed.  you revert changes by applying a new commit that backs out 
> >>the original changes.  i'm speculating, but i suspect that's why there's 
> >>a "stg pick --reverse" and not a "stg uncommit."
> >It is ok as long as you know what are you doing - if you don't push out
> >the commits you've just "undid" (or work on a public accessible
> >repository in the first place, but I think that's kind of rare these
> >days; quick survey - does anyone reading these lines do that?), there's
> >nothing wrong on it, and it gives you nice flexibility.
> 
> Yes, and this is one problem I envision with publishing a git repository
> with an stgit stack applied - somebody later doing a pull of it will not
> find the head revision they had.  I'm not sure what the net effect of
> this will be, though.

It would cause some pain for anyone pulling from it with git-pull, as
git-pull won't happily go backwards from what I've seen. But I think
you can force it to do so even if it won't make sense during the
resulting merge, which then leaves the user in an interesting state.

This is actually why pg-rebase doesn't care what you move to
when you grab the remote's commit; it just jumps to that commit
and pushes your patch stack back down onto it.  So if the remote
rebuilds itself through a new commit lineage which you have never
seen before the next pg-rebase will still update to it.  But on
the other hand if you have a commit that isn't in your local patch
stack its gone into the bit bucket.

Publishing a repository with a stg (or pg) patch series isn't
a problem; the problem is that no clients currently know how to
follow along with the remote repository's patch series.  And I can't
think of a sensible behavior for doing so that isn't what git-core is
already doing today for non patch series type clients (as in don't go
backwards by popping but instead by pushing a negative delta).  :-)

-- 
Shawn.

^ permalink raw reply

* Re: Handling large files with GIT
From: Sam Vilain @ 2006-02-15  0:40 UTC (permalink / raw)
  To: Junio C Hamano, Martin Langhoff; +Cc: git, Linus Torvalds
In-Reply-To: <7vy80dpo9g.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
>>If somebody is interested in making the "lots of filename changes" case go 
>>fast, I'd be more than happy to walk them through what they'd need to 
>>change. I'm just not horribly motivated to do it myself. Hint, hint.
> 
> In case anybody is wondering, I share the same feeling.  I
> cannot say I'd be "more than happy to" clean up potential
> breakages during the development of such changes, but if the
> change eventually would help certain use cases, I can be
> persuaded to help debugging such a mess ;-).

Excellent.  Any speculations on where they might fit?  Clearly, it needs
to be out of the "tree".

Dealing with the three cases I mentioned before in my Warnocked post;

   1. caching - I'll consider this an "under the hood" thing, it really
                doesn't matter, so long as the tools all know.

   2. forensic - extra stuff at the end of the commit object?

      eg
         Copied: /new/path from /old/path:commit:c0bb171d..
           (for SVN case where history matters)
         Copied: /new/path from blob:b10b1d..
           (for general pre-caching case)
         Merged: /new/path from /old/path:commit:C0bb171d..
           (for an SVK clone, so we know that subsequent merges on
            /new/path need only merge from /old/path starting at commit
            C0bb171d..)

   3. retrospective - as above, but allow to specify old versions.

      eg
         Copied: /new/path:C0bb171d1 from /old/path:commit:c0bb171d2...
           (for SVN case where history matters)

Martin, is that enough for your CVS case?

Sam.

^ permalink raw reply

* Cogito turbo-introduction
From: Petr Baudis @ 2006-02-15  1:12 UTC (permalink / raw)
  To: Keith Packard; +Cc: git
In-Reply-To: <1139963183.4341.117.camel@evo.keithp.com>

I suppose this might be interesting for others (and Google) as well, so
I'm bringing it back to the mailing list...

Dear diary, on Wed, Feb 15, 2006 at 01:26:23AM CET, I got a letter
where Keith Packard <keithp@keithp.com> said that...
> I will see about learning enough cogito to point appropriate people at
> it in place of full-on git exposure.

Actually, everything should be really trivial, since Cogito is very
similar to CVS or SVN in practice. A turbo introduction to Cogito, which
actually proved to be enough to get started fast with the regular work,
was:

	* cg-clone URL to get the stuff
	* cg-commit just like in CVS (but cooler)
	* cg-update just like in CVS (or rather SVN)
	* cg-status to get the status letters produced by cvs update
	  (just like in SVN)
	* cg-diff, cg-log, cg-add, cg-rm just like in CVS (but cooler)

	* After committing for a while, you need to run cg-push
	* Merge commits are perfectly ok, don't mind them; no really,
	  you will get used; in reality, they are cool
	* If you hit conflicts during merge, the software will tell you
	  how to proceed
	* If you need something different / more advanced, look it up
	  in cg-help list (there is actually significantly less Cogito
	  commands to go through than in CVS, yet in most areas Cogito
	  is much more powerful)
	* If you are confused about the distributed concept or want to
	  learn about branching, try Cogito README

I've been trying to design Cogito's UI pretty carefully to really
absolutely minimize the learning curve from CVS/SVN, while also making
it consistent on its own so that people who learn it as their first VCS
will get actually something nice. Well, the users shall judge. ;-)
(At this stage I would probably design few bits of the UI slightly
differently than how they have evolved, but the gripes are pretty
minor.)

In this sense, the good UI goal has indeed higher priority than the
powerfulness goal, but most of the time we hopefully manage to make it
go together well. The significant areas where Cogito is fundamentally
less powerful than GIT itself are:

	* No git-whatchanged -p - this is huge deficiency, and I'm
	  entirely at fault here
	* Consequently, no pickaxe and renames detection - same
	* Recursive merge strategy - not much of a UI problem, it just
	  needs the time and work to get integrated
	* Remote branches handling - Cogito's handling is strictly 1:1
	  while GIT's remotes are much more powerful and allow you to
	  fetch/push many branches at once (and in fact do so by
	  default); I did not invent a good UI for something similarly
	  powerful yet, and it is no high priority for me so far;
	  I think you actually want 1:1 in by far the most common usage
	  pattern
	* No email interface - but you can trivially just fall back to
	  GIT in this area

Note that Cogito's goal is not to reproduce and wrap all GIT commands -
e.g. I have currently no plans to wrap up git-bisect.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Petr Baudis @ 2006-02-15  1:14 UTC (permalink / raw)
  To: Sam Vilain, Chuck Lever, Karl Hasselström, Catalin Marinas,
	git
In-Reply-To: <20060215003510.GA25715@spearce.org>

Dear diary, on Wed, Feb 15, 2006 at 01:35:10AM CET, I got a letter
where Shawn Pearce <spearce@spearce.org> said that...
> Publishing a repository with a stg (or pg) patch series isn't
> a problem; the problem is that no clients currently know how to
> follow along with the remote repository's patch series.  And I can't
> think of a sensible behavior for doing so that isn't what git-core is
> already doing today for non patch series type clients (as in don't go
> backwards by popping but instead by pushing a negative delta).  :-)

New Cogito will automagically do the right thing if you are just
fast-forwarding and you are using cg-update - if the branch rebased, it
will happily follow (but cg-fetch + cg-merge will NOT and it will fall
back to the tree merge).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: Cogito turbo-introduction
From: Petr Baudis @ 2006-02-15  1:32 UTC (permalink / raw)
  To: Keith Packard; +Cc: git
In-Reply-To: <20060215011210.GG30316@pasky.or.cz>

Dear diary, on Wed, Feb 15, 2006 at 02:12:11AM CET, I got a letter
where Petr Baudis <pasky@ucw.cz> said that...
> In this sense, the good UI goal has indeed higher priority than the
> powerfulness goal, but most of the time we hopefully manage to make it
> go together well. The significant areas where Cogito is fundamentally
> less powerful than GIT itself are:
> 
> 	* No git-whatchanged -p - this is huge deficiency, and I'm
> 	  entirely at fault here
> 	* Consequently, no pickaxe and renames detection - same
> 	* Recursive merge strategy - not much of a UI problem, it just
> 	  needs the time and work to get integrated
> 	* Remote branches handling - Cogito's handling is strictly 1:1
> 	  while GIT's remotes are much more powerful and allow you to
> 	  fetch/push many branches at once (and in fact do so by
> 	  default); I did not invent a good UI for something similarly
> 	  powerful yet, and it is no high priority for me so far;
> 	  I think you actually want 1:1 in by far the most common usage
> 	  pattern

(And you can pretty easily script/alias multi-branch fetches, it just
won't be as super-efficient as if you would fetch/push at once.)


The above is though not to say that Cogito is strict subset of GIT!
Cogito has many cool things GIT doesn't have. ;-) To pick some random
examples:

	* cg-clean
	* cg-commit - packed with convenience stuff, from multiple -m's
	  to --review
	* cg-init's initial commit
	* resumable cg-clone
	* cg-fetch's cute progressbars ;-)
	* things which are rather thin wrappers automating sequence of
	  few commands, nevertheless providing much convenience
	  (cg-admin-setuprepo, cg-admin-uncommit, cg-seek, cg-export...)

(Perhaps some of this GIT can already do, I don't watch the core
porcelain that closely.)

> Note that Cogito's goal is not to reproduce and wrap all GIT commands -
> e.g. I have currently no plans to wrap up git-bisect.

To explain, that's not because I consider git-bisect to be bad, but the
very opposite - because it is so cool and I couldn't really add much
value by adding it to Cogito. When Cogito will have good tutorial
documentation (I think it already has _very_ good reference
documentation - please prove me wrong so that we can improve it
further), I will just happily reference people to those GIT core
commands.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: Handling large files with GIT
From: Junio C Hamano @ 2006-02-15  1:39 UTC (permalink / raw)
  To: Sam Vilain; +Cc: git
In-Reply-To: <43F27878.50701@vilain.net>

Sam Vilain <sam@vilain.net> writes:

> ...  Clearly, it needs to be out of the "tree".

OK.

>   2. forensic - extra stuff at the end of the commit object?

(except "extra at the end of commit", which does not make it out
of the tree).

>      eg
>         Copied: /new/path from /old/path:commit:c0bb171d..
>           (for SVN case where history matters)
>         Copied: /new/path from blob:b10b1d..
>           (for general pre-caching case)
>         Merged: /new/path from /old/path:commit:C0bb171d..
>           (for an SVK clone, so we know that subsequent merges on
>            /new/path need only merge from /old/path starting at commit
>            C0bb171d..)

I am not sure if recording the bare SVN ``copied'' is very
useful.  You would need to infer things from what SVN did to
tell if the copy is a tree copy inside a project (e.g. cp -r
i386 x86_64), tagging (e.g. svn-cp rHEAD trunk tags/v1.2), or
branching, wouldn't you?  SVK merge ticket is a bit more useful
in that sense.

So far, git philosophy is to record things you _know_ about and
defer such guesswork to the future, so limiting what you record
to what you can actually see from the foreign SCM would be more
in line with it.  For the same reason, if you are talking about
maildir managed under git, you should not have record anything
other than what git already records: "we used to have these
files, now we have these instead".

But I thought you were talking about caching what earlier
inference declared what happened, so that you do not have to do
the same inference every time.  If that is the case, SVN level
"Copied:" is probably not what you would want to record, I
suspect.  You would do some inference with the given information
("SVN says it copied this tree to that tree, what was it that it
really wanted to do?  Was it a copy, or was it to create a
branch which was implemented as a copy?"), and record that,
hoping that information would help your other operations this
time and later.

So I think the order of questions you should be asking is:

   - what operations are you trying to help?

   - what information you would need to achieve those operations
     better?

   - among the second one, what will be necessary to be set in
     stone (IOW, cannot be computed later), and what are
     computable but expensive to recompute every time?

An example from an ancient thread.

With criss-cross merge between renamed trees, it was conjectured
that recording renames detected earlier would help later merges.
I think you should arrive at the list of "what we should record"
by thinking things in this order:

 (1) currently criss-cross merge between renamed trees does not
     work well (realization of the status quo);

 (2) if we had this kind of information it would work better,
     here are the things we need to record when a new commit is
     made, and here is how to compute other information that can
     be inferred, and here is how to use that information to
     make the merge work better (solution without caching);

 (3) but it is expensive to recompute information we said
     computable in (2) if we were to do so every time.  Let's
     cache it.

I am getting an impression that you are doing only the first
half of (2) without other parts, which somewhat bothers me.

^ permalink raw reply

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15  2:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vy80dpo9g.fsf@assigned-by-dhcp.cox.net>



On Tue, 14 Feb 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > If somebody is interested in making the "lots of filename changes" case go 
> > fast, I'd be more than happy to walk them through what they'd need to 
> > change. I'm just not horribly motivated to do it myself. Hint, hint.
> 
> In case anybody is wondering, I share the same feeling.  I
> cannot say I'd be "more than happy to" clean up potential
> breakages during the development of such changes, but if the
> change eventually would help certain use cases, I can be
> persuaded to help debugging such a mess ;-).

Actually, I got interested in seeing how hard this is, and wrote a simple 
first cut at doing a tree-optimized merger.

Let me shout a bit first:

  THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION 
  RATHER THAN THE FINAL PRODUCT!

With that out of the way, let me descibe what this does (and then describe 
the missing parts).

This is basically a three-way merge that works entirely on the "tree" 
level, rather than on the index. A lot of the _concepts_ are the same, 
though, and if you're familiar with the results of an index merge, some of 
the output will make more sense.

You give it three trees: the base tree (tree 0), and the two branches to 
be merged (tree 1 and tree 2 respectively). It will then walk these three 
trees, and resolve them as it goes along.

The interesting part is:
 - it can resolve whole sub-directories in one go, without actually even 
   looking recursively at them. A whole subdirectory will resolve the same 
   way as any individual files will (although that may need some 
   modification, see later).
 - if it has a "content conflict", for subdirectories that means "try to 
   do a recursive tree merge", while for non-subdirectories it's just a 
   content conflict and we'll output the stage 1/2/3 information.
 - a successful merge will output a single stage 0 ("merged") entry, 
   potentially for a whole subdirectory.
 - it outputs all the resolve information on stdout, so something like the 
   recursive resolver can pretty easily parse it all.

Now, the caveats:
 - we probably need to be more careful about subdirectory resolves. The 
   trivial case (both branches have the exact same subdirectory) is a 
   trivial resolve, but the other cases ("branch1 matches base, branch2 is 
   different" probably can't be silently just resolved to the "branch2" 
   subdirectory state, since it might involve renames into - or out of - 
   that subdirectory)
 - we do not track the current index file at all, so this does not do the 
   "check that index matches branch1" logic that the three-way merge in 
   git-read-tree does. The theory is that we'd do a full three-way merge 
   (ignoring the index and working directory), and then to update the 
   working tree, we'd do a two-way "git-read-tree branch1->result"
 - I didn't actually make it do all the trivial resolve cases that 
   git-read-tree does. It's a technology demonstration.

Finally (a more serious caveat):
 - doing things through stdout may end up being so expensive that we'd 
   need to do something else. In particular, it's likely that I should 
   not actually output the "merge results", but instead output a "merge 
   results as they _differ_ from branch1"

However, I think this patch is already interesting enough that people who 
are interested in merging trees might want to look at it. Please keep in 
mind that tech _demo_ part, and in particular, keep in mind the final 
"serious caveat" part.

In many ways, the really _interesting_ part of a merge is not the result, 
but how it _changes_ the branch we're merging into. That's particularly 
important as it should hopefully also mean that the output size for any 
reasonable case is minimal (and tracks what we actually need to do to the 
current state to create the final result).

The code very much is organized so that doing the result as a "diff 
against branch1" should be quite easy/possible. I was actually going to do 
it, but I decided that it probably makes the output harder to read. I 
dunno.

Anyway, let's think about this kind of approach.. Note how the code itself 
is actually quite small and short, although it's prbably pretty "dense".

As an interesting test-case, I'd suggest this merge in the kernel:

	git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc

which resolves beautifully (there are no actual file-level conflicts), and 
you can look at the output of that command to start thinking about what 
it does.

The interesting part (perhaps) is that timing that command for me shows 
that it takes all of 0.004 seconds.. (the git-merge-base thing takes 
considerably more ;)

The point is, we _can_ do the actual merge part really really quickly. 

		Linus

PS. Final note: when I say that it is "WORKING CODE", that is obviously by 
my standards. IOW, I tested it once and it gave reasonable results - so it 
must be perfect.

Whether it works for anybody else, or indeed for any other test-case, is 
not my problem ;)

---
diff-tree f0e6b454ff873429237322c846603d2e1fffc867 (from 6a9b87972f27edfe53da4ce016adf4c0cd42f5e6)
Author: Linus Torvalds <torvalds@osdl.org>
Date:   Tue Feb 14 17:39:15 2006 -0800

    Add "git-merge-tree" functionality
    
    This is basically a tree-optimized merge.  Or rather, it is the first
    stages _towards_ such a merge.
    
    Given a base tree and two branches to merge, it will do a trivial merge,
    optimizing away the case of identical subdirectories, and resolving
    trivial merges.  It outputs the list of file/directory resolves.
    
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/Makefile b/Makefile
index d40aa6a..4d04f49 100644
--- a/Makefile
+++ b/Makefile
@@ -151,7 +151,7 @@ PROGRAMS = \
 	git-upload-pack$X git-verify-pack$X git-write-tree$X \
 	git-update-ref$X git-symbolic-ref$X git-check-ref-format$X \
 	git-name-rev$X git-pack-redundant$X git-repo-config$X git-var$X \
-	git-describe$X
+	git-describe$X git-merge-tree$X
 
 # what 'all' will build and 'install' will install, in gitexecdir
 ALL_PROGRAMS = $(PROGRAMS) $(SIMPLE_PROGRAMS) $(SCRIPTS)
diff --git a/merge-tree.c b/merge-tree.c
new file mode 100644
index 0000000..0d6d434
--- /dev/null
+++ b/merge-tree.c
@@ -0,0 +1,238 @@
+#include "cache.h"
+#include "diff.h"
+
+static const char merge_tree_usage[] = "git-merge-tree <base-tree> <branch1> <branch2>";
+static int resolve_directories = 1;
+
+static void merge_trees(struct tree_desc t[3], const char *base);
+
+static void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
+{
+	unsigned long size = 0;
+	void *buf = NULL;
+
+	if (sha1) {
+		buf = read_object_with_reference(sha1, "tree", &size, NULL);
+		if (!buf)
+			die("unable to read tree %s", sha1_to_hex(sha1));
+	}
+	desc->size = size;
+	desc->buf = buf;
+	return buf;
+}
+
+struct name_entry {
+	const unsigned char *sha1;
+	const char *path;
+	unsigned int mode;
+	int pathlen;
+};
+
+static void entry_clear(struct name_entry *a)
+{
+	memset(a, 0, sizeof(*a));
+}
+
+static int entry_compare(struct name_entry *a, struct name_entry *b)
+{
+	return base_name_compare(
+			a->path, a->pathlen, a->mode,
+			b->path, b->pathlen, b->mode);
+}
+
+static void entry_extract(struct tree_desc *t, struct name_entry *a)
+{
+	a->sha1 = tree_entry_extract(t, &a->path, &a->mode);
+	a->pathlen = strlen(a->path);
+}
+
+/* An empty entry never compares same, not even to another empty entry */
+static int same_entry(struct name_entry *a, struct name_entry *b)
+{
+	return	a->sha1 &&
+		b->sha1 &&
+		!memcmp(a->sha1, b->sha1, 20) &&
+		a->mode == b->mode;
+}
+
+static void resolve(const char *base, struct name_entry *result)
+{
+	printf("0 %06o %s %s%s\n", result->mode, sha1_to_hex(result->sha1), base, result->path);
+}
+
+static int unresolved_directory(const char *base, struct name_entry n[3])
+{
+	int baselen;
+	char *newbase;
+	struct name_entry *p;
+	struct tree_desc t[3];
+	void *buf0, *buf1, *buf2;
+
+	if (!resolve_directories)
+		return 0;
+	p = n;
+	if (!p->mode) {
+		p++;
+		if (!p->mode)
+			p++;
+	}
+	if (!S_ISDIR(p->mode))
+		return 0;
+	baselen = strlen(base);
+	newbase = xmalloc(baselen + p->pathlen + 2);
+	memcpy(newbase, base, baselen);
+	memcpy(newbase + baselen, p->path, p->pathlen);
+	memcpy(newbase + baselen + p->pathlen, "/", 2);
+
+	buf0 = fill_tree_descriptor(t+0, n[0].sha1);
+	buf1 = fill_tree_descriptor(t+1, n[1].sha1);
+	buf2 = fill_tree_descriptor(t+2, n[2].sha1);
+	merge_trees(t, newbase);
+
+	free(buf0);
+	free(buf1);
+	free(buf2);
+	free(newbase);
+	return 1;
+}
+
+static void unresolved(const char *base, struct name_entry n[3])
+{
+	if (unresolved_directory(base, n))
+		return;
+	printf("1 %06o %s %s%s\n", n[0].mode, sha1_to_hex(n[0].sha1), base, n[0].path);
+	printf("2 %06o %s %s%s\n", n[1].mode, sha1_to_hex(n[1].sha1), base, n[1].path);
+	printf("3 %06o %s %s%s\n", n[2].mode, sha1_to_hex(n[2].sha1), base, n[2].path);
+}
+
+/*
+ * Merge two trees together (t[1] and t[2]), using a common base (t[0])
+ * as the origin.
+ *
+ * This walks the (sorted) trees in lock-step, checking every possible
+ * name. Note that directories automatically sort differently from other
+ * files (see "base_name_compare"), so you'll never see file/directory
+ * conflicts, because they won't ever compare the same.
+ *
+ * IOW, if a directory changes to a filename, it will automatically be
+ * seen as the directory going away, and the filename being created.
+ *
+ * Think of this as a three-way diff.
+ *
+ * The output will be either:
+ *  - successful merge
+ *	 "0 mode sha1 filename"
+ *    NOTE NOTE NOTE! FIXME! We really really need to walk the index
+ *    in parallel with this too!
+ * 
+ *  - conflict:
+ *	"1 mode sha1 filename"
+ *	"2 mode sha1 filename"
+ *	"3 mode sha1 filename"
+ *    where not all of the 1/2/3 lines may exist, of course.
+ *
+ * The successful merge rules are the same as for the three-way merge
+ * in git-read-tree.
+ */
+static void merge_trees(struct tree_desc t[3], const char *base)
+{
+	for (;;) {
+		struct name_entry entry[3];
+		unsigned int mask = 0;
+		int i, last;
+
+		last = -1;
+		for (i = 0; i < 3; i++) {
+			if (!t[i].size)
+				continue;
+			entry_extract(t+i, entry+i);
+			if (last >= 0) {
+				int cmp = entry_compare(entry+i, entry+last);
+
+				/*
+				 * Is the new name bigger than the old one?
+				 * Ignore it
+				 */
+				if (cmp > 0)
+					continue;
+				/*
+				 * Is the new name smaller than the old one?
+				 * Ignore all old ones
+				 */
+				if (cmp < 0)
+					mask = 0;
+			}
+			mask |= 1u << i;
+			last = i;
+		}
+		if (!mask)
+			break;
+
+		/*
+		 * Update the tree entries we've walked, and clear
+		 * all the unused name-entries.
+		 */
+		for (i = 0; i < 3; i++) {
+			if (mask & (1u << i)) {
+				update_tree_entry(t+i);
+				continue;
+			}
+			entry_clear(entry + i);
+		}
+
+		/* Same in both? */
+		if (same_entry(entry+1, entry+2)) {
+			if (entry[0].sha1) {
+				resolve(base, entry+1);
+				continue;
+			}
+		}
+
+		if (same_entry(entry+0, entry+1)) {
+			if (entry[2].sha1) {
+				resolve(base, entry+2);
+				continue;
+			}
+		}
+
+		if (same_entry(entry+0, entry+2)) {
+			if (entry[1].sha1) {
+				resolve(base, entry+1);
+				continue;
+			}
+		}
+
+		unresolved(base, entry);
+	}
+}
+
+static void *get_tree_descriptor(struct tree_desc *desc, const char *rev)
+{
+	unsigned char sha1[20];
+	void *buf;
+
+	if (get_sha1(rev, sha1) < 0)
+		die("unknown rev %s", rev);
+	buf = fill_tree_descriptor(desc, sha1);
+	if (!buf)
+		die("%s is not a tree", rev);
+	return buf;
+}
+
+int main(int argc, char **argv)
+{
+	struct tree_desc t[3];
+	void *buf1, *buf2, *buf3;
+
+	if (argc < 4)
+		usage(merge_tree_usage);
+
+	buf1 = get_tree_descriptor(t+0, argv[1]);
+	buf2 = get_tree_descriptor(t+1, argv[2]);
+	buf3 = get_tree_descriptor(t+2, argv[3]);
+	merge_trees(t, "");
+	free(buf1);
+	free(buf2);
+	free(buf3);
+	return 0;
+}

^ permalink raw reply related

* Re: Handling large files with GIT
From: Martin Langhoff @ 2006-02-15  2:07 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Junio C Hamano, git, Linus Torvalds
In-Reply-To: <43F27878.50701@vilain.net>

On 2/15/06, Sam Vilain <sam@vilain.net> wrote:
> Excellent.  Any speculations on where they might fit?  Clearly, it needs
> to be out of the "tree".

I think Junio & Linus are talking about alternative mergers, something
that can be called instead of git-read-tree -m (which is the way
merges seem to kick off). Or perhaps an additional flag to
git-read-tree to be used in conjunction with -m, something like
--optimize-for-identity that lets git-read-tree know to do a first
pass keying things on file identity rather than file path.

So we are _not_ touching the object database, at all. Only optimising
merges for very large trees there mv is a popular operation. All the
cases you discuss can be tackled very efficiently without making *any*
change to the object database.

> Martin, is that enough for your CVS case?

Oh, I don't need it at all. It's just that there's been some lazy talk
of tracking mboxes and maildirs with git, and look where it's led.
Blame Roland Stigge who got me started down this track.

I'm sure it's because the other optimisations are a lot harder to
tackle ;-) though Linus mentions that it'd be trivial for
git-read-tree -m to detect unchanged directories and perhaps do things
a bit faster. Not as revolutionary as an --optimize-for-identity but
not as risky either.

In any case, don't count in me for any of this git-checkout hacking. I
know better than start learning C posting patches to *this* list.

cheers,


martin

^ permalink raw reply

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15  2:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602141741210.3691@g5.osdl.org>



On Tue, 14 Feb 2006, Linus Torvalds wrote:
> 
> Finally (a more serious caveat):
>  - doing things through stdout may end up being so expensive that we'd 
>    need to do something else. In particular, it's likely that I should 
>    not actually output the "merge results", but instead output a "merge 
>    results as they _differ_ from branch1"
> 
> In many ways, the really _interesting_ part of a merge is not the result, 
> but how it _changes_ the branch we're merging into. That's particularly 
> important as it should hopefully also mean that the output size for any 
> reasonable case is minimal (and tracks what we actually need to do to the 
> current state to create the final result).

Here, btw, is the trivial diff to turn my previous "tree-resolve" into a 
"resolve tree relative to the current branch".

In particular, it makes the example merge perhaps even more interesting, 
and makes the "merging directories and merging files should use different 
heuristics more obvious". It's quite instructive, I think.

So if you want to test this, the merge I have been testing with is the 
last infiniband merge in the kernel:

	git-merge-tree 3c3b809 4cbf876 7d2babc

and you'll need to spend a few moments on thinking about what the 
"directory merge" thing there means: in particular, we should probably 
make the

	if (entry[2].sha1) {

test be

	if (entry[2].sha && !S_ISDIR(entry[2].mode)) {

(and same for "resolve to entry[1]" case for that matter) so that we never 
create a "resolve()" that picks a whole subdirectory from one of the 
branches.

The current logic is "logical", just probably not what we want.

		Linus

----
diff --git a/merge-tree.c b/merge-tree.c
index 0d6d434..0bf871c 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -55,9 +55,19 @@ static int same_entry(struct name_entry 
 		a->mode == b->mode;
 }
 
-static void resolve(const char *base, struct name_entry *result)
+static void resolve(const char *base, struct name_entry *branch1, struct name_entry *result)
 {
-	printf("0 %06o %s %s%s\n", result->mode, sha1_to_hex(result->sha1), base, result->path);
+	char branch1_sha1[50];
+
+	/* If it's already branch1, don't bother showing it */
+	if (!branch1)
+		return;
+	memcpy(branch1_sha1, sha1_to_hex(branch1->sha1), 41);
+
+	printf("0 %06o->%06o %s->%s %s%s\n",
+		branch1->mode, result->mode,
+		branch1_sha1, sha1_to_hex(result->sha1),
+		base, result->path);
 }
 
 static int unresolved_directory(const char *base, struct name_entry n[3])
@@ -183,21 +193,21 @@ static void merge_trees(struct tree_desc
 		/* Same in both? */
 		if (same_entry(entry+1, entry+2)) {
 			if (entry[0].sha1) {
-				resolve(base, entry+1);
+				resolve(base, NULL, entry+1);
 				continue;
 			}
 		}
 
 		if (same_entry(entry+0, entry+1)) {
 			if (entry[2].sha1) {
-				resolve(base, entry+2);
+				resolve(base, entry+1, entry+2);
 				continue;
 			}
 		}
 
 		if (same_entry(entry+0, entry+2)) {
 			if (entry[1].sha1) {
-				resolve(base, entry+1);
+				resolve(base, NULL, entry+1);
 				continue;
 			}
 		}

^ permalink raw reply related

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15  2:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602141811050.3691@g5.osdl.org>



On Tue, 14 Feb 2006, Linus Torvalds wrote:
> 
> Here, btw, is the trivial diff to turn my previous "tree-resolve" into a 
> "resolve tree relative to the current branch".

Gaah. It was trivial, and it happened to work fine for my test-case, but 
when I started looking at not doing that extremely aggressive subdirectory 
merging, that showed a few other issues...

So in case people want to try, here's a third patch. Oh, and it's against 
my _original_ path, not incremental to the middle one (ie both patches two 
and three are against patch #1, it's not a nice series).

Now I'm really done, and won't be sending out any more patches today. 
Sorry for the noise.

		Linus
----
diff --git a/merge-tree.c b/merge-tree.c
index 0d6d434..6381118 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -55,9 +55,26 @@ static int same_entry(struct name_entry 
 		a->mode == b->mode;
 }
 
-static void resolve(const char *base, struct name_entry *result)
+static const char *sha1_to_hex_zero(const unsigned char *sha1)
 {
-	printf("0 %06o %s %s%s\n", result->mode, sha1_to_hex(result->sha1), base, result->path);
+	if (sha1)
+		return sha1_to_hex(sha1);
+	return "0000000000000000000000000000000000000000";
+}
+
+static void resolve(const char *base, struct name_entry *branch1, struct name_entry *result)
+{
+	char branch1_sha1[50];
+
+	/* If it's already branch1, don't bother showing it */
+	if (!branch1)
+		return;
+	memcpy(branch1_sha1, sha1_to_hex_zero(branch1->sha1), 41);
+
+	printf("0 %06o->%06o %s->%s %s%s\n",
+		branch1->mode, result->mode,
+		branch1_sha1, sha1_to_hex_zero(result->sha1),
+		base, result->path);
 }
 
 static int unresolved_directory(const char *base, struct name_entry n[3])
@@ -100,9 +117,12 @@ static void unresolved(const char *base,
 {
 	if (unresolved_directory(base, n))
 		return;
-	printf("1 %06o %s %s%s\n", n[0].mode, sha1_to_hex(n[0].sha1), base, n[0].path);
-	printf("2 %06o %s %s%s\n", n[1].mode, sha1_to_hex(n[1].sha1), base, n[1].path);
-	printf("3 %06o %s %s%s\n", n[2].mode, sha1_to_hex(n[2].sha1), base, n[2].path);
+	if (n[0].sha1)
+		printf("1 %06o %s %s%s\n", n[0].mode, sha1_to_hex(n[0].sha1), base, n[0].path);
+	if (n[1].sha1)
+		printf("2 %06o %s %s%s\n", n[1].mode, sha1_to_hex(n[1].sha1), base, n[1].path);
+	if (n[2].sha1)
+		printf("3 %06o %s %s%s\n", n[2].mode, sha1_to_hex(n[2].sha1), base, n[2].path);
 }
 
 /*
@@ -183,21 +203,21 @@ static void merge_trees(struct tree_desc
 		/* Same in both? */
 		if (same_entry(entry+1, entry+2)) {
 			if (entry[0].sha1) {
-				resolve(base, entry+1);
+				resolve(base, NULL, entry+1);
 				continue;
 			}
 		}
 
 		if (same_entry(entry+0, entry+1)) {
-			if (entry[2].sha1) {
-				resolve(base, entry+2);
+			if (entry[2].sha1 && !S_ISDIR(entry[2].mode)) {
+				resolve(base, entry+1, entry+2);
 				continue;
 			}
 		}
 
 		if (same_entry(entry+0, entry+2)) {
-			if (entry[1].sha1) {
-				resolve(base, entry+1);
+			if (entry[1].sha1 && !S_ISDIR(entry[1].mode)) {
+				resolve(base, NULL, entry+1);
 				continue;
 			}
 		}

^ permalink raw reply related

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15  3:58 UTC (permalink / raw)
  To: Junio C Hamano, Fredrik Kuivinen; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0602141829080.3691@g5.osdl.org>



On Tue, 14 Feb 2006, Linus Torvalds wrote:
> 
> So in case people want to try, here's a third patch. Oh, and it's against 
> my _original_ path, not incremental to the middle one (ie both patches two 
> and three are against patch #1, it's not a nice series).
> 
> Now I'm really done, and won't be sending out any more patches today. 

Still true. I've just been thinking about the last state.

As far as I can tell, the output from git-merge-tree with that fix to only 
simplify subdirectories that match exactly in all of base/branch1/branch2 
is precisely the output that git-merge-recursive actually wants.

Rather than doing a three-way merge with "git-read-tree", and then doing 
"git-ls-files --unmerged", I think this gives the same result much more 
efficiently.

That said, I can't follow the python code, so maybe I'm missing something. 
Fredrik cc'd, in case he can put me right.

		Linus

^ permalink raw reply

* Re: Handling large files with GIT
From: Sam Vilain @ 2006-02-15  4:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslqlo0wo.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> So I think the order of questions you should be asking is:
> 
>   1 - what operations are you trying to help?

Primarily, tracing history when dealing with history/changeset based
revision systems like SVN or darcs, and doing this in a manner that we
can make guarantees about behaving in the same way as those systems
would.

>   2 - what information you would need to achieve those operations
>       better?

Minimally, this tuple:

   ( merge|copy, source_path, source_tree|source_commit,
     destination_path, destination_commit )

It makes sense to record this with commits, as conceptually it is a part
of the intended commit history along with the change comment.

>   3 - among the second one, what will be necessary to be set in
>       stone (IOW, cannot be computed later), and what are
>       computable but expensive to recompute every time?

The only operation you cannot automatically and with certainty detect a 
rename and change content without inserting a dummy commit between the 
name change and the content change.  But in a sense this is the same as
my suggestion - using the commit object history to record information
that normally doesn't matter when you are doing content-keyed
operations.

> I am getting an impression that you are doing only the first
> half of (2) without other parts, which somewhat bothers me.

Well, thank you for spending so much time to reply to me given that was
your assessment.  I think the best direction from here would be to start
molding some porcelain, then I can cross this bridge when I come to it
rather than simply speculating and hand-waving.

Besides, I can always prototype it for discussion using the commit
description as a surrogate container for the information.

Sam.

ps I also responded to the rest of your e-mail, but decided that the 
answers to the above questions were more important.

 >>  2. forensic - extra stuff at the end of the commit object?
 > (except "extra at the end of commit", which does not make it out
 > of the tree).

It is a part of the repository, but more a property of the commit itself
- like the commit description.  Like somebody writing "I renamed this
file to that file and changed its contents", but in a parsable form
that can _optionally_ be used to prevent the relevant git-core tools
from having to do content comparison, or perhaps something subtler like
increasing the score of the recorded history branch when scoring
alternatives looking for history.

 >>     eg
 >>        Copied: /new/path from /old/path:commit:c0bb171d..
 >>          (for SVN case where history matters)
 >>        Copied: /new/path from blob:b10b1d..
 >>          (for general pre-caching case)
 >>        Merged: /new/path from /old/path:commit:C0bb171d..
 >>          (for an SVK clone, so we know that subsequent merges on
 >>           /new/path need only merge from /old/path starting at commit
 >>           C0bb171d..)
 > I am not sure if recording the bare SVN ``copied'' is very
 > useful.  You would need to infer things from what SVN did to
 > tell if the copy is a tree copy inside a project (e.g. cp -r
 > i386 x86_64), tagging (e.g. svn-cp rHEAD trunk tags/v1.2), or
 > branching, wouldn't you?  SVK merge ticket is a bit more useful
 > in that sense.

In the SVN model there really is no difference between these cases.  Of
course the actual representation of these in the object does not matter;
the above is the what, not the how.  But in general, SVN only records
copying; it has no repository concept of merge, branch, tag, rename.
SVK adds merging to the picture.

Representing an SVN tree copy as a new sub-tree in a git repository
should still be a "cheap copy", it's just that all the tools will not
(and probably should not) see it as a branch but a copy.

 > So far, git philosophy is to record things you _know_ about and
 > defer such guesswork to the future, so limiting what you record
 > to what you can actually see from the foreign SCM would be more
 > in line with it.

Yes, and if I am mirroring an SVN repository, then I only know that in
that repository, the history /was recorded/ as such.  Not the history
/is/ as such, that's a different question, and is the guesswork worth
being defered to the future.

 > For the same reason, if you are talking about
 > maildir managed under git, you should not have record anything
 > other than what git already records: "we used to have these
 > files, now we have these instead".

Ok.  As Martin pointed out, the Maildir situation is actually a simple
case.  In a sense, I hijacked a vaguely related thread to resolve my
Warnock dilemma :)

 > But I thought you were talking about caching what earlier
 > inference declared what happened, so that you do not have to do
 > the same inference every time.  If that is the case, SVN level
 > "Copied:" is probably not what you would want to record, I
 > suspect.  You would do some inference with the given information
 > ("SVN says it copied this tree to that tree, what was it that it
 > really wanted to do?  Was it a copy, or was it to create a
 > branch which was implemented as a copy?"), and record that,
 > hoping that information would help your other operations this
 > time and later.

Well, this is already guesswork defered to the future that the
Subversion authors inflict on the users of Subversion repositories.  If
you read the Subversion manual you will find recommendations to
studiously record this information and to use a standard repository
layout so that other people will understand what your copies were
intended to be.

^ permalink raw reply

* Re: several quick questions
From: Martin Langhoff @ 2006-02-15  4:11 UTC (permalink / raw)
  To: Keith Packard; +Cc: Linus Torvalds, Carl Worth, Nicolas Vilz 'niv', git
In-Reply-To: <1139945967.4341.71.camel@evo.keithp.com>

On 2/15/06, Keith Packard <keithp@keithp.com> wrote:
> I was validating the cvs import by comparing every tagged version. Trust
> me, the git tree-rewriting stage was somewhat faster than the CVS
> checkout of the same content. And, as an egg, one often prefers BFI to
> finesse.

Keith,

Did that lead to finding any problems with the import? Can I get my
hands on that script you've written to run the comparison?

cheers,


martin

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: J. Bruce Fields @ 2006-02-15  4:11 UTC (permalink / raw)
  To: Sam Vilain, Petr Baudis, Chuck Lever, Karl Hasselström,
	Catalin Marinas, git
In-Reply-To: <20060215003510.GA25715@spearce.org>

On Tue, Feb 14, 2006 at 07:35:10PM -0500, Shawn Pearce wrote:
> Publishing a repository with a stg (or pg) patch series isn't
> a problem; the problem is that no clients currently know how to
> follow along with the remote repository's patch series.  And I can't
> think of a sensible behavior for doing so that isn't what git-core is
> already doing today for non patch series type clients (as in don't go
> backwards by popping but instead by pushing a negative delta).  :-)

If you represent each patch as a branch, with each modification to the
patch a commit on the corresponding branch, and each "push" operation a
merge from the branch corresponding to the previous patch to a branch
corresponding to the new patch (isn't that what pg's trying to do?),
then it should be possible just to track the branch corresponding to the
top patch.

In theory I guess it should also be possible to merge patch series that
have followed two lines of development, by merging each corresponding
branch.

The history would be really complicated.  You'd need to figure out how
to track the patch comments too, and you'd need scripts to convert to
just a simple series of commits for submitting upstream.  Probably not
worth the trouble, but I don't know.

If you really want revision control on patches the simplest thing might
be just to run quilt or Andrew Morton's scripts on top of a git
repository--the documentation with Andrew's scripts recommends doing
that with CVS.

--b,

^ permalink raw reply

* StGIT refreshes all added files - limitation of git-write-tree?
From: Pavel Roskin @ 2006-02-15  4:42 UTC (permalink / raw)
  To: Catalin Marinas, git

Hello!

Current StGIT would refresh all added files on "stg refresh", even if
they are not specified on the command line.  I believe the same is true
for removed files.

That's when it can hurt.  I apply a large patch that modifies and adds
several files.  I want to split the patch into several StGIT patches by
selecting which files belong to which patch.  All files that need to be
modified are already modifies.  It's natural that I add new files using
"stg add" at this stage.  To me, adding files looks very similar to
modifying them.  However, then I create a patch by "stg new" and commit
_some_ files to it using "stg refresh" with those files on the command
line, I discover that _all_ added files have been refreshed.

The only workaround seems to be to add only those files that will be
included in the current patch.  The same should be true for renamed
files.

Debugging StGIT shows that it builds correct lists of the files to be
included in the current patch (update_cache in git.py), but they are
never used.  Instead, StGIT runs "git-write-tree" without arguments
(commit in git.py).  While StGIT is careful to only add user-specified
modified files to the directory cache, it does nothing to the added
files, which are in the cache already.

Purely StGIT way of handling this problem would be to remove added files
from the directory cache if they are not being committed.  The problem
is, "stg add" uses the cache, so this would undo "stg add" on files
unused in the current operation.  Either StGIT should restore files in
the cache after the refresh, or there should be a separate StGIT cache
that "stg add" would work on.  The former is potentially unreliable
(what if refresh is interrupted), the later creates an extra layer,
another directory cache no less, this reducing usability of the pure git
commands.

Another approach would be to get rid of "stg add" and "stg rm" and use
"stg refresh --add" and "stg refresh --rm" to add files to the current
patch or to remove them.  It does feel like removal of a useful feature,
since "stg diff" without arguments will no longer show added or removed
files.  Adding and removing files would be too immediate compared to
modifying them.

What I really hope to hear is that there is or there will be a git based
solution.  Either git-write-tree could be changed to only process files
specified on the command line, or there should be some other command
doing that.

Or maybe git-write-tree and other utilities could be changed to work on
a copy of the index file?  I would prefer not to move the
actual .git/index away, but to make a copy for the current "stg refresh"
operation.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: several quick questions
From: Keith Packard @ 2006-02-15  5:25 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: keithp, Linus Torvalds, Carl Worth, Nicolas Vilz 'niv',
	git
In-Reply-To: <46a038f90602142011o36b975b7s1833953db3b6d376@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Wed, 2006-02-15 at 17:11 +1300, Martin Langhoff wrote:

> Did that lead to finding any problems with the import? Can I get my
> hands on that script you've written to run the comparison?

The only issues we had were with manual changes to the repository; other
than that, we now has a usable git repository for cairo (visible at
git://git.cairographics.org/cairo). The comparison tool that I wrote was
a cheesy shell script; I think Carl has updated it to do something less
severe than rm -rf *; git-reset --hard; if he can share that, I think
you'll like it a lot better than mine.

Our CVS import script has some magic ChangeLog-style mangling which
we've posted to the list before; that clearly needs to be encapsulated
in an optional log-reformatting bit for it to be generally useful. 
 
-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: StGIT refreshes all added files - limitation of git-write-tree?
From: Junio C Hamano @ 2006-02-15  6:20 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1139978528.28292.41.camel@dv>

Pavel Roskin <proski@gnu.org> writes:

> Or maybe git-write-tree and other utilities could be changed to work on
> a copy of the index file?  I would prefer not to move the
> actual .git/index away, but to make a copy for the current "stg refresh"
> operation.

There is no need to change the core side.  

	GIT_INDEX_FILE=temporary-index git-write-tree

would do the job.  See the current round of git-commit and how
it handles "git commit --only these files" case.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox