Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] diff-raw format update take #2.
From: Thomas Glanzmann @ 2005-05-24  1:44 UTC (permalink / raw)
  To: GIT
In-Reply-To: <20050524013947.ADFEE528F53@taniwha.stupidest.org>

Hello,

* Chris Wedgwood <cw@f00f.org> [050524 03:40]:
> This is an automatically generated response.  You should only receive
> one such response (even if you send mutliple messages).

> I'm current fairly slow with email at times so please be patient.  If
> it's urgent, it's probably best you call my cellphone and leave
> voicemail (if you don't have the number, then chances are I won't
> consider your email urgent anyhow).

> I do check my email, and I do expect to reply to your message given a
> little bit of time.

> Thanks for your patience.

that is a bad joke, isn't it? I need a killfile for eMail or I start
hurting people. Fist talking bullshit and than autorepsonding just
shit to say *something*.

	Thomas

^ permalink raw reply

* Re: [PATCH] diff-raw format update take #2.
From: Linus Torvalds @ 2005-05-24  1:50 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: David Lang, Junio C Hamano, git
In-Reply-To: <211e617258d9d993810f3c88bace255e.IBX@taniwha.stupidest.org>

On Mon, 23 May 2005, Chris Wedgwood wrote:
> 
> diff doesn't do this, so i'm not sure it's useful:

Indeed.

A few weeks ago (for the previous round of "what should the diff format 
be" discussions) I looked at what "diffstat" does to figure out what the 
name should be, and it's quite disgusting.

The sad part is that if it wasn't for the date that GNU diff puts out, 
we'd be ok (since only GNU diff historically did unified diffs). Then we'd 
just know that the filename ends at the newline, and screw people who use 
newlines in filenames.

Now, we can fix this for git-diffs, and make sure that we don't make that 
mistake, but that just covers git users.

		Linus

^ permalink raw reply

* [PATCH] adjust git-deltafy-script to the new diff-tree output format
From: Nicolas Pitre @ 2005-05-24  1:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git


Also prevent 'sort' from sorting on the sha1 which was screwing the 
history listing.

Signed-off-by: Nicolas Pitre <nico@cam.org>

diff --git a/git-deltafy-script b/git-deltafy-script
--- a/git-deltafy-script
+++ b/git-deltafy-script
@@ -23,8 +23,9 @@ curr_file=""
 
 git-rev-list HEAD |
 git-diff-tree -r --stdin |
-sed -n '/^\*/ s/^.*->\(.\{41\}\)\(.*\)$/\2 \1/p' | sort | uniq |
-while read file sha1; do
+awk '/^:/ { if ($5 == "M" || $5 == "N") print $4, $6 }' |
+LC_ALL=C sort -s -k 2 | uniq |
+while read sha1 file; do
 	if [ "$file" == "$curr_file" ]; then
 		list="$list $sha1"
 	else

^ permalink raw reply

* Re: cogito - how do I ???
From: Herbert Xu @ 2005-05-24  2:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sithglan, seanlkml, sam, git
In-Reply-To: <Pine.LNX.4.58.0505230731430.2307@ppc970.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> 
> I actually suspect that whole time thing was a mistake, it seemed sensible 
> back when we didn't have any other way of ordering the changesets well, 
> but it's really a bad ordering anyway to do it by time (ie add a "sort 
> -rn" in there), and we can (and probably should) order rev-tree output 
> with some topological sort based on the commit tree.

Yes please.  Can we also have a rev-* command that outputs parent
relations instead of a simple list? That is,

<tree-1> <parent-1>
<tree-1> <parent-2>
<tree-2> <parent-3>
...

Then you could just run tsort for rev-tree, plus you could use this
for other things like finding merges.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: cogito - how do I ???
From: Linus Torvalds @ 2005-05-24  3:05 UTC (permalink / raw)
  To: Herbert Xu; +Cc: sithglan, seanlkml, sam, git
In-Reply-To: <E1DaP7k-0007ar-00@gondolin.me.apana.org.au>

On Tue, 24 May 2005, Herbert Xu wrote:
> 
> Yes please.  Can we also have a rev-* command that outputs parent
> relations instead of a simple list? That is,
> 
> <tree-1> <parent-1>
> <tree-1> <parent-2>
> <tree-2> <parent-3>

That's not <tree-n>, it's <commit-n>.

I think that would be "git-rev-list --parents" or something - that 
wouldn't impact any existing users.

Patches welcome.

As to git-rev-tree, that's likely used by scripts in various places 
(cogito, gitk, gitweb etc), so changing that is nastier, but at least the 
output could be _sorted_ better.

Of course, I really think that the bigger problem with git-rev-tree
currently is that global reachability analysis, which is just not
acceptable performance-wise.

		Linus

^ permalink raw reply

* Re: gitweb wishlist
From: David Mansfield @ 2005-05-24  3:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201702170.2206@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 4239 bytes --]

Hi Linus,

Linus Torvalds wrote:
> [ Thomas added to cc, since he seems to have also worked on this ]
> 
> On Fri, 20 May 2005, H. Peter Anvin wrote:
> 
>>Here is my "main" OSS CVS repository; look at the syslinux module.  It 
>>has at least some minor branching.
> 
> 
> Ok, "cvsps" output scares me. I wonder what
> 
> 	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
> 	    memdisk/init32.asm:1.3=after, memdisk/Makefile:1.26=before. Treated as 'before'
> 	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
> 	    memdisk/init32.asm:1.3=after, memdisk/e820test.c:1.7=before. Treated as 'before'
> 	...
> 
> means..
> 

Ok.  I'll tell you.  It means that the committer uses bad practices in 
tagging ;-)  It generally means that force tag (cvs tag -F <file>) was 
used on a specific file.  Here's the scenario:

cvsps is trying to associate a tag to a specific commit.  But in the cvs 
world this is not always at all possible.  If, for example, a commit 
made and  all files are tagged.  Now some random file is modified and 
committed.  Then, a bug is found in a file from the previously tagged 
set, say the file 'memdisk/init32.asm'.  The bug is fixed, committed and 
the tag is MOVED for _just that file_ forward to the new version.  Now 
there is no commit that can be associated with the tag.  In this case, 
cvsps believes this to be a 'FUNKY' tag.  There is a more pathological 
case having to do with 'INVALID' tags...  It's enough to make a grown 
man cry.

> Also, your syslinux repo is interesting and shows another thing: doing a
> 
> 	cvsps -g -p separate
> 
> ends badly with
> 
> 	Directing PatchSet 938 to file separate/938.patch
> 	cvs rdiff: failed to read diff file header /tmp/cvso8PswZ for mdiskchk.com,v: end of file
> 	system command returned non-zero exit status: 1: aborting
> 
> which doesn't look very promising and causes an empty diff for
> mdiskck.com. Trying with --cvs-direct shows the reason:
> 
> 	Index: syslinux/sample/mdiskchk.com
> 	===================================================================
> 	RCS file: 
> 	/home/torvalds/src/osscvs/cvsroot/syslinux/sample/mdiskchk.com,v
> 	retrieving revision 1.1
> 	retrieving revision 1.2
> 	diff -u -r1.1 -r1.2
> 	Binary files /tmp/cvsU6MGU0 and /tmp/cvsiskFVR differ
> 
> which shows that anything that bases itself of diffs (ie uses "-g" with
> cvsps) is just doomed to failure, since there's no good way to handle
> binary data. Both Kay's and Thomas' scripts try to do the "-g" thing, 
> that's just not right.
> 

I accept patches ;-)  Honestly, handling binary data should be trivial I 
just haven't had the interest, and surprisingly noone else on the 
internet ever has.  The only binary file in the kernel appears to be the 
logo.gif, according to Ingo.

[ discussion on working around broken handling of binary files in cvsps]
> 
> There seems to be two questions:
> 
>  - what to do about branch creation (ie a branch name we haven't seen
>    before): it looks like cvsps doesn't tell you what the _originating_
>    branch was for a new branch (that may be my confusion - maybe you can't
>    create branches off branches in CVS?)
> 
>    For syslinux, it looks like you can always base it on HEAD, or possibly 
>    just the previous patch (which looks like it is always HEAD). The above 
>    pseudo-script will actually do that automatically, simply by virtue of
>    the "git-read-tree -m" at the top of the loop failing when the
>    branchname doesn't exist yet.
> 

See attached patch to cvsps.c which displays 'Ancestor branch' when this 
differs from Branch.

>  - whether to bother to create merge entries for when somebody tried to 
>    merge a branch back or forth in CVS. 
> 
>    CVS fundamentally doesn't have the notion of such a thing, and cvsps 
>    can't either. But we could try to guess, based on the commit message, 
>    perhaps.
> 
>    NOTE! Such a "merge" would not have any real GIT merge functionality 
>    what-so-ever. It would just introduce a second parent into the commit, 
>    nothing more.
> 
> Bah. What crud.
> 

Hey, a polished turd is only so shiny...  cvsps is a 99% solution [to 
the problem of extracting metatdata from cvs] only and cvs makes the 
other 1% impossible.

David

[-- Attachment #2: show-ancestor-branch.patch --]
[-- Type: text/x-patch, Size: 752 bytes --]

--- cvsps.c~	2003-04-11 10:06:01.000000000 -0400
+++ cvsps.c	2005-05-23 23:26:12.110231536 -0400
@@ -1402,6 +1402,16 @@
 	   tm->tm_hour, tm->tm_min, tm->tm_sec);
     printf("Author: %s\n", ps->author);
     printf("Branch: %s\n", ps->branch);
+    
+    /* check if ancestor was different branch */
+    if (!list_empty(&ps->members)) 
+    {
+	    PatchSetMember * psm = list_entry(ps->members.next, PatchSetMember, link);
+	    const char * abr = psm->pre_rev ? psm->pre_rev->branch : NULL;
+	    if (abr && strcmp(ps->branch, abr) != 0)
+		    printf("Ancestor branch: %s\n", abr);
+    }
+
     printf("Tag: %s %s\n", ps->tag ? ps->tag : "(none)", tag_flag_descr[ps->tag_flags]);
     printf("Log:\n%s\n", ps->descr);
     printf("Members: \n");

^ permalink raw reply

* Re: gitweb wishlist
From: H. Peter Anvin @ 2005-05-24  3:39 UTC (permalink / raw)
  To: David Mansfield
  Cc: Linus Torvalds, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <4292A08A.5050108@cobite.com>

David Mansfield wrote:
> 
> Ok.  I'll tell you.  It means that the committer uses bad practices in 
> tagging ;-)  It generally means that force tag (cvs tag -F <file>) was 
> used on a specific file.  Here's the scenario:
> 
> cvsps is trying to associate a tag to a specific commit.  But in the cvs 
> world this is not always at all possible.  If, for example, a commit 
> made and  all files are tagged.  Now some random file is modified and 
> committed.  Then, a bug is found in a file from the previously tagged 
> set, say the file 'memdisk/init32.asm'.  The bug is fixed, committed and 
> the tag is MOVED for _just that file_ forward to the new version.  Now 
> there is no commit that can be associated with the tag.  In this case, 
> cvsps believes this to be a 'FUNKY' tag.  There is a more pathological 
> case having to do with 'INVALID' tags...  It's enough to make a grown 
> man cry.
> 

This is only pathological if the tag now represents a state that never 
actually existed in the history of the repository.  I don't believe 
there are any such cases in the syslinux repository; I could be wrong, 
but I am *highly* sceptical.

> 
> I accept patches ;-)  Honestly, handling binary data should be trivial I 
> just haven't had the interest, and surprisingly noone else on the 
> internet ever has.  The only binary file in the kernel appears to be the 
> logo.gif, according to Ingo.
> 
> [ discussion on working around broken handling of binary files in cvsps]
> 

Actually, as long as we can create the tree that exists between each 
changeset, we should be OK.

> 
> Hey, a polished turd is only so shiny...  cvsps is a 99% solution [to 
> the problem of extracting metatdata from cvs] only and cvs makes the 
> other 1% impossible.
> 

No sh*t...

	-hpa

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-24  3:52 UTC (permalink / raw)
  To: David Mansfield
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <4292A08A.5050108@cobite.com>

On Mon, 23 May 2005, David Mansfield wrote:
> > 
> > Bah. What crud.
> > 
> 
> Hey, a polished turd is only so shiny...  cvsps is a 99% solution [to 
> the problem of extracting metatdata from cvs] only and cvs makes the 
> other 1% impossible.

The "what crud" refers to cvs. cvsps seems to be a great way to make a
tool to migrate away from CVS (or if forced to use CVS, at least show it
in a sane manner). So don't take it the wrong way.

I've gotten side-tracked with purely git issues, and since I don't 
actually have any CVS archives, the cvs->git translation will be on the 
back-burner for a while, but your "Ancestor branch" patch seems to at 
least solve the problem that cvsps didn't show all the information that 
was there. So now I know how to do branches, even if I don't think I'd 
ever _really_ merge them back (which is as much info as CVS contains). 

They'd just be dangling references, ie you could get to them if you wanted 
to, for historical reasons, and they could be merged merged by hand, of 
course. Some day..

		Linus

^ permalink raw reply

* Re: gitweb wishlist
From: David Mansfield @ 2005-05-24  4:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <4292A1F2.7020606@zytor.com>

H. Peter Anvin wrote:
> David Mansfield wrote:
> 
>>
>> Ok.  I'll tell you.  It means that the committer uses bad practices in 
>> tagging ;-)  It generally means that force tag (cvs tag -F <file>) was 
>> used on a specific file.  Here's the scenario:
>>
>> cvsps is trying to associate a tag to a specific commit.  But in the 
>> cvs world this is not always at all possible.  If, for example, a 
>> commit made and  all files are tagged.  Now some random file is 
>> modified and committed.  Then, a bug is found in a file from the 
>> previously tagged set, say the file 'memdisk/init32.asm'.  The bug is 
>> fixed, committed and the tag is MOVED for _just that file_ forward to 
>> the new version.  Now there is no commit that can be associated with 
>> the tag.  In this case, cvsps believes this to be a 'FUNKY' tag.  
>> There is a more pathological case having to do with 'INVALID' tags...  
>> It's enough to make a grown man cry.
>>
> 
> This is only pathological if the tag now represents a state that never 
> actually existed in the history of the repository.  I don't believe 
> there are any such cases in the syslinux repository; I could be wrong, 
> but I am *highly* sceptical.
> 

I didn't mean that YOUR repository had more pathological stuff in it, 
just that SOME do.  'FUNKY' tags are not really that bad, it's just that 
there is not a single commit to assign them to (i.e. at no point were 
all of the objects in the repository at that state simultaneously), 
which makes the import of such a tag difficult into a more commit 
oriented system.

Another way to reach 'funky'ness is to modify a file, commit and tag, 
without having done a 'cvs update' first (and a colleague has done a 
commit since your last 'cvs update')

David




^ permalink raw reply

* Re: gitweb wishlist
From: Thomas Glanzmann @ 2005-05-24  4:58 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201702170.2206@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

Hello,

> 	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
> 	    memdisk/init32.asm:1.3=after, memdisk/Makefile:1.26=before. Treated as 'before'
> 	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
> 	    memdisk/init32.asm:1.3=after, memdisk/e820test.c:1.7=before. Treated as 'before'
> 	...

actually I think this is the broken upstream version. It can't parse
dates right. Just look at the exported patches and see if them all from
1970. However the debian package has a patch in which solves it:

maybe you should try with the attached patch or with the version that
comes with debian sarge. I also reported this problem a while back to
the original author.

	Thomas

[-- Attachment #2: cvsps_2.0rc1-5.diff --]
[-- Type: text/plain, Size: 1785 bytes --]

--- cvsps-2.0rc1.orig/util.c
+++ cvsps-2.0rc1/util.c
@@ -13,6 +13,7 @@
 #include <time.h>
 #include <errno.h>
 #include <signal.h>
+#include <regex.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <sys/types.h>
@@ -140,24 +141,51 @@
     return *res;
 }
 
+static int get_int_substr(const char * str, const regmatch_t * p)
+{
+    char buff[256];
+    memcpy(buff, str + p->rm_so, p->rm_eo - p->rm_so);
+    buff[p->rm_eo - p->rm_so] = 0;
+    return atoi(buff);
+}
+
 void convert_date(time_t * t, const char * dte)
 {
-    /* HACK: this routine parses two formats,
-     * 1) 'cvslog' format YYYY/MM/DD HH:MM:SS
-     * 2) time_t formatted as %d
-     */
-       
-    if (strchr(dte, '/'))
+    static regex_t date_re;
+    static int init_re;
+
+#define MAX_MATCH 16
+    size_t nmatch = MAX_MATCH;
+    regmatch_t match[MAX_MATCH];
+
+    if (!init_re) 
+    {
+	if (regcomp(&date_re, "([0-9]{4})[-/]([0-9]{2})[-/]([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})", REG_EXTENDED)) 
+	{
+	    fprintf(stderr, "FATAL: date regex compilation error\n");
+	    exit(1);
+	}
+	init_re = 1;
+    }
+    
+    if (regexec(&date_re, dte, nmatch, match, 0) == 0)
     {
+	regmatch_t * pm = match;
 	struct tm tm;
+
+	/* first regmatch_t is match location of entire re */
+	pm++;
 	
-	memset(&tm, 0, sizeof(tm));
-	sscanf(dte, "%d/%d/%d %d:%d:%d", 
-	       &tm.tm_year, &tm.tm_mon, &tm.tm_mday, 
-	       &tm.tm_hour, &tm.tm_min, &tm.tm_sec);
-	
+	tm.tm_year = get_int_substr(dte, pm++);
+	tm.tm_mon  = get_int_substr(dte, pm++);
+	tm.tm_mday = get_int_substr(dte, pm++);
+	tm.tm_hour = get_int_substr(dte, pm++);
+	tm.tm_min  = get_int_substr(dte, pm++);
+	tm.tm_sec  = get_int_substr(dte, pm++);
+
 	tm.tm_year -= 1900;
 	tm.tm_mon--;
+	tm.tm_isdst = 0;
 	
 	*t = mktime(&tm);
     }

^ permalink raw reply

* Re: gitweb wishlist
From: H. Peter Anvin @ 2005-05-24  5:04 UTC (permalink / raw)
  To: David Mansfield
  Cc: Linus Torvalds, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <4292AD5E.3000106@cobite.com>

David Mansfield wrote:
>>
>> This is only pathological if the tag now represents a state that never 
>> actually existed in the history of the repository.  I don't believe 
>> there are any such cases in the syslinux repository; I could be wrong, 
>> but I am *highly* sceptical.
> 
> I didn't mean that YOUR repository had more pathological stuff in it, 
> just that SOME do.  'FUNKY' tags are not really that bad, it's just that 
> there is not a single commit to assign them to (i.e. at no point were 
> all of the objects in the repository at that state simultaneously), 
> which makes the import of such a tag difficult into a more commit 
> oriented system.
> 
> Another way to reach 'funky'ness is to modify a file, commit and tag, 
> without having done a 'cvs update' first (and a colleague has done a 
> commit since your last 'cvs update')
> 

Not sure, sounds more likely.

Either which way, I guess there are two ways to deal with them in 'git'; 
either as standalone trees (tags pointing to tree objects), or probably 
more sensical, as impromptu branches if one can find a sane origin object.

	-hpa

^ permalink raw reply

* interim report on a big screwup with diff -M and -C.
From: Junio C Hamano @ 2005-05-24  5:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: GIT
In-Reply-To: <7vll651nth.fsf@assigned-by-dhcp.cox.net>

Regrettably this is just an interim progress report not a full
solution, but there is a big screwup between the way rename/copy
detector records detection results and the way unmodified pair
pruner removes uninteresting filepairs, which results in double
free segfaults.  This makes -M and -C flags practically unusable
and I am redoing the rename/copy detector right now.  Hopefully
you will hear back from me by tomorrow morning.

^ permalink raw reply

* Re: [PATCH 3/3] Diff overhaul, adding the other half...
From: Junio C Hamano @ 2005-05-24  5:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505211137250.2206@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Hmm.. It's not working well. Not only does it take a lot of CPU time (do 
LT> an fsck first to make sure you're not seekign the disk all over the 
LT> place), but it "finds" lots of things like this:

The "finds logs of funny things" problem should have been fixed
by now, and I am fixing a big screwup now as I reported. I have
two ideas on speeding up diff-tree -C I want to run by you.

I have not measured things yet, but I think the big CPU waste is
coming from either expanding all the blobs and/or running the
diff-delta on many file pairs.  If that is indeed the cause,
then helping the upfront check in the similarity estimator that
refuses to consider a file pair whose file size change is too
big may be a good way to resolve this problem.

One approach, which I think is an unacceptable change at this
stage (but I would seriously consider if this _were_ a week and
half old project), is to record the blob size as part of the
object ID.  We say object size is "unsigned long" everywhere, so
I am talking about making the object ID from 20-byte SHA1 to
24-byte SHA1 plus 4-byte integer in the network byte order.

Since I think the above is inpractical, the second best approach
would be to piggy-back on the optimization used in uncached
diff-cache, which avoids blob expansion if cache says what we
have in the work tree already matches the object we are
interested in.

When -C is in effect, we would make diff-tree read the current
cache first, so that diff_populate_filespec() can borrow from
the current work tree when a path in the tree we are looking at
has not changed.  This would obviously be effective only when we
are talking about recent history.

Thoughts?

^ permalink raw reply

* Re: [PATCH 3/3] Diff overhaul, adding the other half...
From: Linus Torvalds @ 2005-05-24  6:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbr71xjyt.fsf@assigned-by-dhcp.cox.net>

On Mon, 23 May 2005, Junio C Hamano wrote:
> 
> I have not measured things yet, but I think the big CPU waste is
> coming from either expanding all the blobs and/or running the
> diff-delta on many file pairs.  If that is indeed the cause,
> then helping the upfront check in the similarity estimator that
> refuses to consider a file pair whose file size change is too
> big may be a good way to resolve this problem.

Since pretty much all the blobs will be expanded in the working directory
anyway, it sounds like that would be the way to go. 

> One approach, which I think is an unacceptable change at this
> stage (but I would seriously consider if this _were_ a week and
> half old project), is to record the blob size as part of the
> object ID.  We say object size is "unsigned long" everywhere, so
> I am talking about making the object ID from 20-byte SHA1 to
> 24-byte SHA1 plus 4-byte integer in the network byte order.

You can actually get the blob size fairly easily for non-delta objects, by 
just unpacking the beginning of it. But since we have the files..

That said, I don't think -C is that important. I personally don't see it
as a thing I'd run normally - it's more of a thing I might do between
releases rather than for something like git-whatchanged that looks at
every commit. It's an interesting thing to have _available_, but I don't
think it's a huge problem if it is a lot more expensive than the more
normal "-M".

		Linus

^ permalink raw reply

* [PATCH] Redo rename/copy detection logic.
From: Junio C Hamano @ 2005-05-24  8:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: GIT
In-Reply-To: <7vk6lpyz5i.fsf_-_@assigned-by-dhcp.cox.net>

Earlier implementation had a major screw-up in the memory
management area.  Rename/copy logic sometimes borrowed a pointer
to a structure without any provision for downstream to determine
which pointer is shared and which is not.  This resulted in the
later clean-up code to sometimes double free such structure,
resulting in a segfault.  This made -M and -C useless.

Another problem the earlier implementation had was that it
reordered the patches, and forced the logic to differentiate
renames and copies to depend on that particular order.  This
problem was fixed by teaching rename/copy detection logic not to
do any reordering, and rename-copy differentiator not to depend
on the order of the patches.  The diffs will leave rename/copy
detector in the same destination path order as the patch that
was fed into it.  Some test vectors have been reordered to
accommodate this change.

It also adds a sanity check logic to the human-readable diff-raw
output to detect paths with embedded TAB and LF characters,
which cannot be expressed with that format.  This idea came up
during a discussion with Chris Wedgwood.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c                         |  117 ++++++++++--
diffcore-rename.c              |  393 +++++++++++++++++++----------------------
diffcore.h                     |   16 +
t/t4003-diff-rename-1.sh       |   16 -
t/t4004-diff-rename-symlink.sh |   14 -
t/t4005-diff-rename-2.sh       |   18 -
6 files changed, 318 insertions(+), 256 deletions(-)

diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -493,8 +493,6 @@ struct diff_filepair *diff_queue(struct 
 	dp->one = one;
 	dp->two = two;
 	dp->score = 0;
-	dp->orig_order = queue->nr;
-	dp->rename_rank = 0;
 	diff_q(queue, dp);
 	return dp;
 }
@@ -505,6 +503,17 @@ static void diff_flush_raw(struct diff_f
 {
 	int two_paths;
 	char status[10];
+
+	if (line_termination) {
+		const char *err = "path %s cannot be expressed without -z";
+		if (strchr(p->one->path, line_termination) ||
+		    strchr(p->one->path, inter_name_termination))
+			die(err, p->one->path);
+		if (strchr(p->two->path, line_termination) ||
+		    strchr(p->two->path, inter_name_termination))
+			die(err, p->two->path);
+	}
+
 	switch (p->status) {
 	case 'C': case 'R':
 		two_paths = 1;
@@ -628,41 +637,110 @@ int diff_needs_to_stay(struct diff_queue
 int diff_queue_is_empty(void)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
-	return q->nr == 0;
+	int i;
+	for (i = 0; i < q->nr; i++)
+		if (!diff_unmodified_pair(q->queue[i]))
+			return 0;
+	return 1;
 }
 
-static void diff_resolve_rename_copy(void)
+#if DIFF_DEBUG
+void diff_debug_filespec(struct diff_filespec *s, int x, const char *one)
+{
+	fprintf(stderr, "queue[%d] %s (%s) %s %06o %s\n",
+		x, one ? : "",
+		s->path,
+		DIFF_FILE_VALID(s) ? "valid" : "invalid",
+		s->mode,
+		s->sha1_valid ? sha1_to_hex(s->sha1) : "");
+	fprintf(stderr, "queue[%d] %s size %lu flags %d\n",
+		x, one ? : "",
+		s->size, s->xfrm_flags);
+}
+
+void diff_debug_filepair(const struct diff_filepair *p, int i)
+{
+	diff_debug_filespec(p->one, i, "one");
+	diff_debug_filespec(p->two, i, "two");
+	fprintf(stderr, "score %d, status %c\n",
+		p->score, p->status ? : '?');
+}
+
+void diff_debug_queue(const char *msg, struct diff_queue_struct *q)
 {
 	int i;
-	struct diff_queue_struct *q = &diff_queued_diff;
+	if (msg)
+		fprintf(stderr, "%s\n", msg);
+	fprintf(stderr, "q->nr = %d\n", q->nr);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
+		diff_debug_filepair(p, i);
+	}
+}
+#endif
+
+static void diff_resolve_rename_copy(void)
+{
+	int i, j;
+	struct diff_filepair *p, *pp;
+	struct diff_queue_struct *q = &diff_queued_diff;
+
+	/* This should not depend on the ordering of things. */
+
+	diff_debug_queue("resolve-rename-copy", q);
+
+	for (i = 0; i < q->nr; i++) {
+		p = q->queue[i];
 		p->status = 0;
 		if (DIFF_PAIR_UNMERGED(p))
 			p->status = 'U';
 		else if (!DIFF_FILE_VALID((p)->one))
 			p->status = 'N';
 		else if (!DIFF_FILE_VALID((p)->two)) {
-			/* maybe earlier one said 'R', meaning
-			 * it will take it, in which case we do
-			 * not need to keep 'D'.
+			/* Deletion record should be omitted if there
+			 * is another entry that is a rename or a copy
+			 * and it uses this one as the source.  Then we
+			 * can say the other one is a rename.
 			 */
-			int j;
-			for (j = 0; j < i; j++) {
-				struct diff_filepair *pp = q->queue[j];
-				if (pp->status == 'R' &&
-				    !strcmp(pp->one->path, p->one->path))
+			for (j = 0; j < q->nr; j++) {
+				pp = q->queue[j];
+				if (!strcmp(pp->one->path, p->one->path) &&
+				    strcmp(pp->one->path, pp->two->path))
 					break;
 			}
-			if (j < i)
-				continue;
+			if (j < q->nr)
+				continue; /* has rename/copy */
 			p->status = 'D';
 		}
 		else if (strcmp(p->one->path, p->two->path)) {
-			/* This is rename or copy.  Which one is it? */
-			if (diff_needs_to_stay(q, i+1, p->one))
-				p->status = 'C';
-			else
+			/* See if there is somebody else anywhere that
+			 * will keep the path (either modified or
+			 * unmodified).  If so, we have to be a copy,
+			 * not a rename.  In addition, if there is
+			 * some other rename or copy that comes later
+			 * than us that uses the same source, we
+			 * cannot be a rename either.
+			 */
+			for (j = 0; j < q->nr; j++) {
+				pp = q->queue[j];
+				if (strcmp(pp->one->path, p->one->path))
+					continue;
+				if (!strcmp(pp->one->path, pp->two->path)) {
+					if (DIFF_FILE_VALID(pp->two)) {
+						/* non-delete */
+						p->status = 'C';
+						break;
+					}
+					continue;
+				}
+				/* pp is a rename/copy ... */
+				if (i < j) {
+					/* ... and comes later than us */
+					p->status = 'C';
+					break;
+				}
+			}
+			if (!p->status)
 				p->status = 'R';
 		}
 		else if (memcmp(p->one->sha1, p->two->sha1, 20))
@@ -672,6 +750,7 @@ static void diff_resolve_rename_copy(voi
 			p->status = 0;
 		}
 	}
+	diff_debug_queue("resolve-rename-copy done", q);
 }
 
 void diff_flush(int diff_output_style, int resolve_rename_copy)
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -6,29 +6,92 @@
 #include "diffcore.h"
 #include "delta.h"
 
-struct diff_rename_pool {
-	struct diff_filespec **s;
-	int nr, alloc;
-};
-
-static void diff_rename_pool_clear(struct diff_rename_pool *pool)
-{
-	pool->s = NULL; pool->nr = pool->alloc = 0;
-}
+/* Table of rename/copy destinations */
 
-static void diff_rename_pool_add(struct diff_rename_pool *pool,
-				 struct diff_filespec *s)
-{
-	if (S_ISDIR(s->mode))
-		return;  /* no trees, please */
-
-	if (pool->alloc <= pool->nr) {
-		pool->alloc = alloc_nr(pool->alloc);
-		pool->s = xrealloc(pool->s,
-				   sizeof(*(pool->s)) * pool->alloc);
+static struct diff_rename_dst {
+	struct diff_filespec *two;
+	struct diff_filepair *pair;
+} *rename_dst;
+static int rename_dst_nr, rename_dst_alloc;
+
+static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
+						 int insert_ok)
+{
+	int first, last;
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strcmp(two->path, dst->two->path);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
 	}
-	pool->s[pool->nr] = s;
-	pool->nr++;
+	/* not found */
+	if (!insert_ok)
+		return NULL;
+	/* insert to make it at "first" */
+	if (rename_dst_alloc <= rename_dst_nr) {
+		rename_dst_alloc = alloc_nr(rename_dst_alloc);
+		rename_dst = xrealloc(rename_dst,
+				      rename_dst_alloc * sizeof(*rename_dst));
+	}
+	rename_dst_nr++;
+	if (first < rename_dst_nr)
+		memmove(rename_dst + first + 1, rename_dst + first,
+			(rename_dst_nr - first - 1) * sizeof(*rename_dst));
+	rename_dst[first].two = two;
+	rename_dst[first].pair = NULL;
+	return &(rename_dst[first]);
+}
+
+static struct diff_rename_src {
+	struct diff_filespec *one;
+	unsigned src_used : 1;
+} *rename_src;
+static int rename_src_nr, rename_src_alloc;
+
+static struct diff_rename_src *locate_rename_src(struct diff_filespec *one,
+						 int insert_ok)
+{
+	int first, last;
+
+	first = 0;
+	last = rename_src_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_src *src = &(rename_src[next]);
+		int cmp = strcmp(one->path, src->one->path);
+		if (!cmp)
+			return src;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	if (!insert_ok)
+		return NULL;
+	/* insert to make it at "first" */
+	if (rename_src_alloc <= rename_src_nr) {
+		rename_src_alloc = alloc_nr(rename_src_alloc);
+		rename_src = xrealloc(rename_src,
+				      rename_src_alloc * sizeof(*rename_src));
+	}
+	rename_src_nr++;
+	if (first < rename_src_nr)
+		memmove(rename_src + first + 1, rename_src + first,
+			(rename_src_nr - first - 1) * sizeof(*rename_src));
+	rename_src[first].one = one;
+	rename_src[first].src_used = 0;
+	return &(rename_src[first]);
 }
 
 static int is_exact_match(struct diff_filespec *src, struct diff_filespec *dst)
@@ -46,8 +109,8 @@ static int is_exact_match(struct diff_fi
 }
 
 struct diff_score {
-	struct diff_filespec *src;
-	struct diff_filespec *dst;
+	int src; /* index in rename_src */
+	int dst; /* index in rename_dst */
 	int score;
 	int rank;
 };
@@ -113,92 +176,28 @@ static int estimate_similarity(struct di
 	return score;
 }
 
-static void record_rename_pair(struct diff_queue_struct *outq,
-			       struct diff_filespec *src,
-			       struct diff_filespec *dst,
-			       int rank,
-			       int score)
+static void record_rename_pair(struct diff_queue_struct *renq,
+			       int dst_index, int src_index, int score)
 {
-	/*
-	 * These ranks are used to sort the final output, because there
-	 * are certain dependencies:
-	 *
-	 *  1. rename/copy that depends on deleted ones.
-	 *  2. deletions in the original.
-	 *  3. rename/copy that depends on the pre-edit image of kept files.
-	 *  4. additions, modifications and no-modifications in the original.
-	 *  5. rename/copy that depends on the post-edit image of kept files
-	 *     (note that we currently do not detect such rename/copy).
-	 *
-	 * The downstream diffcore transformers are free to reorder
-	 * the entries as long as they keep file pairs that has the
-	 * same p->one->path in earlier rename_rank to appear before
-	 * later ones.
-	 *
-	 * To the final output routine, and in the diff-raw format
-	 * output, a rename/copy that is based on a path that has a
-	 * later entry that shares the same p->one->path and is not a
-	 * deletion is a copy.  Otherwise it is a rename.
-	 */
+	struct diff_filespec *one, *two, *src, *dst;
+	struct diff_filepair *dp;
 
-	struct diff_filepair *dp = diff_queue(outq, src, dst);
-	dp->rename_rank = rank * 2 + 1;
-	dp->score = score;
-	dst->xfrm_flags |= RENAME_DST_MATCHED;
-}
+	if (rename_dst[dst_index].pair)
+		die("internal error: dst already matched.");
 
-#if 0
-static void debug_filespec(struct diff_filespec *s, int x, const char *one)
-{
-	fprintf(stderr, "queue[%d] %s (%s) %s %06o %s\n",
-		x, one,
-		s->path,
-		DIFF_FILE_VALID(s) ? "valid" : "invalid",
-		s->mode,
-		s->sha1_valid ? sha1_to_hex(s->sha1) : "");
-	fprintf(stderr, "queue[%d] %s size %lu flags %d\n",
-		x, one,
-		s->size, s->xfrm_flags);
-}
+	src = rename_src[src_index].one;
+	one = alloc_filespec(src->path);
+	fill_filespec(one, src->sha1, src->mode);
 
-static void debug_filepair(const struct diff_filepair *p, int i)
-{
-	debug_filespec(p->one, i, "one");
-	debug_filespec(p->two, i, "two");
-	fprintf(stderr, "pair rank %d, orig order %d, score %d\n",
-		p->rename_rank, p->orig_order, p->score);
-}
+	dst = rename_dst[dst_index].two;
+	two = alloc_filespec(dst->path);
+	fill_filespec(two, dst->sha1, dst->mode);
 
-static void debug_queue(const char *msg, struct diff_queue_struct *q)
-{
-	int i;
-	if (msg)
-		fprintf(stderr, "%s\n", msg);
-	fprintf(stderr, "q->nr = %d\n", q->nr);
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
-		debug_filepair(p, i);
-	}
-}
-#else
-#define debug_queue(a,b) do { ; /*nothing*/ } while(0)
-#endif
+	dp = diff_queue(renq, one, two);
+	dp->score = score;
 
-/*
- * We sort the outstanding diff entries according to the rank (see
- * comment at the beginning of record_rename_pair) and tiebreak with
- * the order in the original input.
- */
-static int rank_compare(const void *a_, const void *b_)
-{
-	const struct diff_filepair *a = *(const struct diff_filepair **)a_;
-	const struct diff_filepair *b = *(const struct diff_filepair **)b_;
-	int a_rank = a->rename_rank;
-	int b_rank = b->rename_rank;
-
-	if (a_rank != b_rank)
-		return a_rank - b_rank;
-	return a->orig_order - b->orig_order;
+	rename_src[src_index].src_used = 1;
+	rename_dst[dst_index].pair = dp;
 }
 
 /*
@@ -232,24 +231,15 @@ int diff_scoreopt_parse(const char *opt)
 void diffcore_rename(int detect_rename, int minimum_score)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
-	struct diff_queue_struct outq;
-	struct diff_rename_pool created, deleted, stay;
-	struct diff_rename_pool *(srcs[2]);
+	struct diff_queue_struct renq, outq;
 	struct diff_score *mx;
-	int h, i, j;
-	int num_create, num_src, dst_cnt, src_cnt;
+	int i, j;
+	int num_create, num_src, dst_cnt;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_MINIMUM_SCORE;
-	outq.queue = NULL;
-	outq.nr = outq.alloc = 0;
-
-	diff_rename_pool_clear(&created);
-	diff_rename_pool_clear(&deleted);
-	diff_rename_pool_clear(&stay);
-
-	srcs[0] = &deleted;
-	srcs[1] = &stay;
+	renq.queue = NULL;
+	renq.nr = renq.alloc = 0;
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -257,76 +247,70 @@ void diffcore_rename(int detect_rename, 
 			if (!DIFF_FILE_VALID(p->two))
 				continue; /* unmerged */
 			else
-				diff_rename_pool_add(&created, p->two);
+				locate_rename_dst(p->two, 1);
 		else if (!DIFF_FILE_VALID(p->two))
-			diff_rename_pool_add(&deleted, p->one);
+			locate_rename_src(p->one, 1);
 		else if (1 < detect_rename) /* find copy, too */
-			diff_rename_pool_add(&stay, p->one);
+			locate_rename_src(p->one, 1);
 	}
-	if (created.nr == 0)
+	if (rename_dst_nr == 0)
 		goto cleanup; /* nothing to do */
 
 	/* We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	for (i = 0; i < created.nr; i++) {
-		for (h = 0; h < sizeof(srcs)/sizeof(srcs[0]); h++) {
-			struct diff_rename_pool *p = srcs[h];
-			for (j = 0; j < p->nr; j++) {
-				if (!is_exact_match(p->s[j], created.s[i]))
-					continue;
-				record_rename_pair(&outq,
-						   p->s[j], created.s[i], h,
-						   MAX_SCORE);
-				break; /* we are done with this entry */
-			}
+	for (i = 0; i < rename_dst_nr; i++) {
+		struct diff_filespec *two = rename_dst[i].two;
+		for (j = 0; j < rename_src_nr; j++) {
+			struct diff_filespec *one = rename_src[j].one;
+			if (!is_exact_match(one, two))
+				continue;
+			record_rename_pair(&renq, i, j, MAX_SCORE);
+			break; /* we are done with this entry */
 		}
 	}
-	debug_queue("done detecting exact", &outq);
+	diff_debug_queue("done detecting exact", &renq);
 
 	/* Have we run out the created file pool?  If so we can avoid
 	 * doing the delta matrix altogether.
 	 */
-	if (outq.nr == created.nr)
+	if (renq.nr == rename_dst_nr)
 		goto flush_rest;
 
-	num_create = (created.nr - outq.nr);
-	num_src = deleted.nr + stay.nr;
+	num_create = (rename_dst_nr - renq.nr);
+	num_src = rename_src_nr;
 	mx = xmalloc(sizeof(*mx) * num_create * num_src);
-	for (dst_cnt = i = 0; i < created.nr; i++) {
+	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
 		int base = dst_cnt * num_src;
-		if (created.s[i]->xfrm_flags & RENAME_DST_MATCHED)
+		struct diff_filespec *two = rename_dst[i].two;
+		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
-		for (src_cnt = h = 0; h < sizeof(srcs)/sizeof(srcs[0]); h++) {
-			struct diff_rename_pool *p = srcs[h];
-			for (j = 0; j < p->nr; j++, src_cnt++) {
-				struct diff_score *m = &mx[base + src_cnt];
-				m->src = p->s[j];
-				m->dst = created.s[i];
-				m->score = estimate_similarity(m->src, m->dst,
-							       minimum_score);
-				m->rank = h;
-			}
+		for (j = 0; j < rename_src_nr; j++) {
+			struct diff_filespec *one = rename_src[j].one;
+			struct diff_score *m = &mx[base+j];
+			m->src = j;
+			m->dst = i;
+			m->score = estimate_similarity(one, two,
+						       minimum_score);
 		}
 		dst_cnt++;
 	}
 	/* cost matrix sorted by most to least similar pair */
 	qsort(mx, num_create * num_src, sizeof(*mx), score_compare);
 	for (i = 0; i < num_create * num_src; i++) {
-		if (mx[i].dst->xfrm_flags & RENAME_DST_MATCHED)
-			continue; /* alreayd done, either exact or fuzzy. */
+		struct diff_rename_dst *dst = &rename_dst[mx[i].dst];
+		if (dst->pair)
+			continue; /* already done, either exact or fuzzy. */
 		if (mx[i].score < minimum_score)
 			break; /* there is not any more diffs applicable. */
-		record_rename_pair(&outq,
-				  mx[i].src, mx[i].dst, mx[i].rank,
-				  mx[i].score);
+		record_rename_pair(&renq, mx[i].dst, mx[i].src, mx[i].score);
 	}
 	free(mx);
-	debug_queue("done detecting fuzzy", &outq);
+	diff_debug_queue("done detecting fuzzy", &renq);
 
  flush_rest:
 	/* At this point, we have found some renames and copies and they
-	 * are kept in outq.  The original list is still in *q.
+	 * are kept in renq.  The original list is still in *q.
 	 *
 	 * Scan the original list and move them into the outq; we will sort
 	 * outq and swap it into the queue supplied to pass that to
@@ -335,68 +319,61 @@ void diffcore_rename(int detect_rename, 
 	 * See comments at the top of record_rename_pair for numbers used
 	 * to assign rename_rank.
 	 */
+	outq.queue = NULL;
+	outq.nr = outq.alloc = 0;
 	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *dp, *p = q->queue[i];
-		if (!DIFF_FILE_VALID(p->one)) {
-			/* creation or unmerged entries */
-			dp = diff_queue(&outq, p->one, p->two);
-			dp->rename_rank = 4;
-		}
-		else if (!DIFF_FILE_VALID(p->two)) {
-			/* deletion */
-			dp = diff_queue(&outq, p->one, p->two);
-			dp->rename_rank = 2;
+		struct diff_filepair *p = q->queue[i];
+		struct diff_rename_src *src = locate_rename_src(p->one, 0);
+		struct diff_rename_dst *dst = locate_rename_dst(p->two, 0);
+		struct diff_filepair *pair_to_free = NULL;
+
+		if (dst) {
+			/* creation */
+			if (dst->pair) {
+				/* renq has rename/copy already to produce
+				 * this file, so we do not emit the creation
+				 * record in the output.
+				 */
+				diff_q(&outq, dst->pair);
+				pair_to_free = p;
+			}
+			else
+				/* no matching rename/copy source, so record
+				 * this as a creation.
+				 */
+				diff_q(&outq, p);
 		}
+		else if (!diff_unmodified_pair(p))
+			/* all the other cases need to be recorded as is */
+			diff_q(&outq, p);
 		else {
-			/* modification, or stay as is */
-			dp = diff_queue(&outq, p->one, p->two);
-			dp->rename_rank = 4;
-		}
-		free(p);
-	}
-	debug_queue("done copying original", &outq);
-
-	/* Sort outq */
-	qsort(outq.queue, outq.nr, sizeof(outq.queue[0]), rank_compare);
-
-	debug_queue("done sorting", &outq);
-
-	free(q->queue);
-	q->nr = q->alloc = 0;
-	q->queue = NULL;
-
-	/* Copy it out to q, removing duplicates. */
-	for (i = 0; i < outq.nr; i++) {
-		struct diff_filepair *p = outq.queue[i];
-		if (!DIFF_FILE_VALID(p->one)) {
-			/* created or unmerged */
-			if (p->two->xfrm_flags & RENAME_DST_MATCHED)
-				; /* rename/copy created it already */
+			/* unmodified pair needs to be recorded only if
+			 * it is used as the source of rename/copy
+			 */
+			if (src && src->src_used)
+				diff_q(&outq, p);
 			else
-				diff_queue(q, p->one, p->two);
-		}
-		else if (!DIFF_FILE_VALID(p->two)) {
-			/* deleted */
-			diff_queue(q, p->one, p->two);
+				pair_to_free = p;
 		}
-		else if (strcmp(p->one->path, p->two->path)) {
-			/* rename or copy */
-			struct diff_filepair *dp =
-				diff_queue(q, p->one, p->two);
-			dp->score = p->score;
+		if (pair_to_free) {
+			diff_free_filespec_data(pair_to_free->one);
+			diff_free_filespec_data(pair_to_free->two);
+			free(pair_to_free);
 		}
-		else
-			/* otherwise it is a modified (or "stay") entry */
-			diff_queue(q, p->one, p->two);
-		free(p);
 	}
+	diff_debug_queue("done copying original", &outq);
 
-	free(outq.queue);
-	debug_queue("done collapsing", q);
+	free(renq.queue);
+	free(q->queue);
+	*q = outq;
+	diff_debug_queue("done collapsing", q);
 
  cleanup:
-	free(created.s);
-	free(deleted.s);
-	free(stay.s);
+	free(rename_dst);
+	rename_dst = NULL;
+	rename_dst_nr = rename_dst_alloc = 0;
+	free(rename_src);
+	rename_src = NULL;
+	rename_src_nr = rename_src_alloc = 0;
 	return;
 }
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -40,11 +40,6 @@ struct diff_filepair {
 	struct diff_filespec *one;
 	struct diff_filespec *two;
 	int score; /* only valid when one and two are different paths */
-	int orig_order; /* the original order of insertion into the queue */
-	int rename_rank; /* rename/copy dependency needs to enforce
-			  * certain ordering of patches that later
-			  * diffcore transformations should not break.
-			  */
 	int status; /* M C R N D U (see Documentation/diff-format.txt) */
 };
 #define DIFF_PAIR_UNMERGED(p) \
@@ -67,4 +62,15 @@ extern void diff_q(struct diff_queue_str
 extern int diff_needs_to_stay(struct diff_queue_struct *, int,
 			      struct diff_filespec *);
 
+#define DIFF_DEBUG 0
+#if DIFF_DEBUG
+void diff_debug_filespec(struct diff_filespec *, int, const char *);
+void diff_debug_filepair(const struct diff_filepair *, int);
+void diff_debug_queue(const char *, struct diff_queue_struct *);
+#else
+#define diff_debug_filespec(a,b,c) do {} while(0)
+#define diff_debug_filepair(a,b) do {} while(0)
+#define diff_debug_queue(a,b) do {} while(0)
+#endif
+
 #endif
diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
--- a/t/t4003-diff-rename-1.sh
+++ b/t/t4003-diff-rename-1.sh
@@ -78,14 +78,6 @@ test_expect_success \
 
 GIT_DIFF_OPTS=--unified=0 git-diff-cache -C -p $tree >current
 cat >expected <<\EOF
-diff --git a/COPYING b/COPYING.1
-copy from COPYING
-copy to COPYING.1
---- a/COPYING
-+++ b/COPYING.1
-@@ -6 +6 @@
-- HOWEVER, in order to allow a migration to GPLv3 if that seems like
-+ However, in order to allow a migration to GPLv3 if that seems like
 diff --git a/COPYING b/COPYING
 --- a/COPYING
 +++ b/COPYING
@@ -98,6 +90,14 @@ diff --git a/COPYING b/COPYING
 @@ -12 +12 @@
 -	This file is licensed under the GPL v2, or a later version
 +	This file is licensed under the G.P.L v2, or a later version
+diff --git a/COPYING b/COPYING.1
+copy from COPYING
+copy to COPYING.1
+--- a/COPYING
++++ b/COPYING.1
+@@ -6 +6 @@
+- HOWEVER, in order to allow a migration to GPLv3 if that seems like
++ However, in order to allow a migration to GPLv3 if that seems like
 EOF
 
 test_expect_success \
diff --git a/t/t4004-diff-rename-symlink.sh b/t/t4004-diff-rename-symlink.sh
--- a/t/t4004-diff-rename-symlink.sh
+++ b/t/t4004-diff-rename-symlink.sh
@@ -35,6 +35,13 @@ test_expect_success \
 
 GIT_DIFF_OPTS=--unified=0 git-diff-cache -M -p $tree >current
 cat >expected <<\EOF
+diff --git a/bozbar b/bozbar
+new file mode 120000
+--- /dev/null
++++ b/bozbar
+@@ -0,0 +1 @@
++xzzzy
+\ No newline at end of file
 diff --git a/frotz b/nitfol
 similarity index 100%
 copy from frotz
@@ -50,13 +57,6 @@ deleted file mode 100644
 @@ -1 +0,0 @@
 -xyzzy
 \ No newline at end of file
-diff --git a/bozbar b/bozbar
-new file mode 120000
---- /dev/null
-+++ b/bozbar
-@@ -0,0 +1 @@
-+xzzzy
-\ No newline at end of file
 EOF
 
 test_expect_success \
diff --git a/t/t4005-diff-rename-2.sh b/t/t4005-diff-rename-2.sh
--- a/t/t4005-diff-rename-2.sh
+++ b/t/t4005-diff-rename-2.sh
@@ -101,8 +101,8 @@ test_expect_success \
 
 git-diff-cache -C $tree >current
 cat >expected <<\EOF
-:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178 C1234	COPYING	COPYING.1
 :100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 06c67961bbaed34a127f76d261f4c0bf73eda471 M	COPYING
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178 C1234	COPYING	COPYING.1
 EOF
 
 test_expect_success \
@@ -118,14 +118,6 @@ test_expect_success \
 mv expected diff-raw
 GIT_DIFF_OPTS=--unified=0 git-diff-helper <diff-raw >current
 cat >expected <<\EOF
-diff --git a/COPYING b/COPYING.1
-copy from COPYING
-copy to COPYING.1
---- a/COPYING
-+++ b/COPYING.1
-@@ -6 +6 @@
-- HOWEVER, in order to allow a migration to GPLv3 if that seems like
-+ However, in order to allow a migration to GPLv3 if that seems like
 diff --git a/COPYING b/COPYING
 --- a/COPYING
 +++ b/COPYING
@@ -138,6 +130,14 @@ diff --git a/COPYING b/COPYING
 @@ -12 +12 @@
 -	This file is licensed under the GPL v2, or a later version
 +	This file is licensed under the G.P.L v2, or a later version
+diff --git a/COPYING b/COPYING.1
+copy from COPYING
+copy to COPYING.1
+--- a/COPYING
++++ b/COPYING.1
+@@ -6 +6 @@
+- HOWEVER, in order to allow a migration to GPLv3 if that seems like
++ However, in order to allow a migration to GPLv3 if that seems like
 EOF
 
 test_expect_success \
------------------------------------------------


^ permalink raw reply

* Re: [PATCH 3/3] Diff overhaul, adding the other half...
From: Junio C Hamano @ 2005-05-24  8:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505232314510.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Since pretty much all the blobs will be expanded in the working directory
LT> anyway, it sounds like that would be the way to go. 
LT> That said, I don't think -C is that important...

OK, so the short version is, diff-cache like optimization may be
interesting to try out, but practically it would not be much
useful anyway, so I should do it if I am really bored and have
nothing else interesting to do ;-).

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-24  8:25 UTC (permalink / raw)
  To: David Mansfield
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505232048190.2307@ppc970.osdl.org>

On Mon, 23 May 2005, Linus Torvalds wrote:
> 
> I've gotten side-tracked with purely git issues, and since I don't 
> actually have any CVS archives, the cvs->git translation will be on the 
> back-burner for a while, but your "Ancestor branch" patch seems to at 
> least solve the problem that cvsps didn't show all the information that 
> was there.

Naff.

I just checked in a "cvs2git.c" file in the "tools" project (which has my 
patch application stuff).

It's still buggy, and it's hacky as hell, but you can basically do 
something like this:

	cvsps | cvs2git > script

with the normal setup for "cvsps", and "cvs2git" needs one additional
stage, namely it wants to know the RCSDIR where to find the RCS files
(that should be basically "$CVSROOT/module").

That _script_ then creates a git archive. Very hacky. So after you've 
successfully created the conversion script, check it to see that it looks 
sane, and then do

	sh script

and the end result is a git'ified version of your CVS repo (and a 
corrupted working directory, btw, so look out. It _shouldn't_ corrupt 
your old CVS repo, though, so it should be ok).

It has the logic for branches, but it doesn't work, and I'm fed up enough
with CVS and RCS for the moment that I'm not going to work on it any more
tonight. I don't know what stupid bug I have (I've had about a million of
them on this silly program), but it's at a point where I think others
might find it interesting, and it's probably/hopefully some really
embarrassing typo or something and easily fixed.

It converted Peter's "syslinux" repository in a couple of minutes, 
resulting in 1038 commits (it _should_ have resulted in 1046 commits, 
that's the branch thing afaik) and most of it looks sane:

	diff-tree cfb715c827e19226a446d47c98a7460fd94633ff (from a809559323f1b370717e475dd252b24686f97727)
	Author: hpa <hpa>
	Date:   Thu May 19 22:30:50 2005 -0700

	    gcc4 compilation fix

	diff-tree a809559323f1b370717e475dd252b24686f97727 (from 4d65331b50a7b5ce858bb55a58f37b17ebc26c72)
	Author: hpa <hpa>
	Date:   Sun May 8 22:47:03 2005 -0700

	    New Multiboot module; increase command line limit to 1023

	diff-tree 4d65331b50a7b5ce858bb55a58f37b17ebc26c72 (from e88244753d528f695790adc96f0542d20dc33882)
	Author: hpa <hpa>
	Date:   Fri Apr 29 07:08:03 2005 -0700

	    Don't clobber live registers, it's not nice

	diff-tree e88244753d528f695790adc96f0542d20dc33882 (from a49e189e35d208648a0d0b52ff652a5f3f8a707e)
	Author: hpa <hpa>
	Date:   Fri Apr 29 07:05:52 2005 -0700

	    Use the correct register

	...
	...
	...

	diff-tree 350772d45425a85dae86ec721d6bd3fde5595d50 (from 47ee894e7821f50cb83ea14b08132337577b2a1e)
	Author: hpa <hpa>
	Date:   Sat Jan 31 13:24:35 1998 -0800

	    Slightly less ugly Id tag.

	diff-tree 47ee894e7821f50cb83ea14b08132337577b2a1e (from a8b52f1c31055049b276d14c67436d06dd7757aa)
	Author: hpa <hpa>
	Date:   Sat Jan 31 13:22:38 1998 -0800

	    Added Id tags.

	diff-tree b924672aadb2c3b7f3cac1aaf52fbb4a1ed86b8d (from root)
	Author: hpa <hpa>
	Date:   Sat Jan 31 13:16:05 1998 -0800

	    Initial revision

And btw, it's definitely cvsps that does all the heavy lifting here. 
"cvs2git" itself is 255 lines of horrid crud, and should have been 
written in perl, except I only do C..

		Linus

^ permalink raw reply

* Re: [PATCH 3/3] Diff overhaul, adding the other half...
From: Linus Torvalds @ 2005-05-24  8:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vll65vy10.fsf@assigned-by-dhcp.cox.net>

On Tue, 24 May 2005, Junio C Hamano wrote:
> 
> OK, so the short version is, diff-cache like optimization may be
> interesting to try out, but practically it would not be much
> useful anyway, so I should do it if I am really bored and have
> nothing else interesting to do ;-).

Yup. I think it's more important to get the rest calmed down again, and 
fix the things that got broken. Sadly, "git-whatchanged -s" was one such 
thing.

(I think that's just because the "silent" test used to depend on the 
magical behaviour of the "header" thing, and now that the header 
generation and suppression is sane, "silent" doesn't work any more)

		Linus

^ permalink raw reply

* [PATCH] Allow symlinks in the leading path in checkout-cache --prefix=
From: Junio C Hamano @ 2005-05-24  8:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Greaves, git
In-Reply-To: <Pine.LNX.4.58.0505231208460.2307@ppc970.osdl.org>

This is what Linus wrote, improving what David Greaves
originally submitted.

    Hmm.. Does this alternative work for you instead?
    [ Totally untested, please check for sanity first!! ]
    Btw, I'm not going to apply this, and expect that David or somebody else 
    can validate it and send it back to me as "tested".

I just added a test case and verified the patch works.

Author: David Greaves <david@dgreaves.com>
Author: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
---

checkout-cache.c                |   28 ++++++-----
t/t2003-checkout-cache-mkdir.sh |   95 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+), 13 deletions(-)
new file (100755): t/t2003-checkout-cache-mkdir.sh

diff --git a/checkout-cache.c b/checkout-cache.c
--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -37,6 +37,8 @@
 #include "cache.h"
 
 static int force = 0, quiet = 0, not_new = 0, refresh_cache = 0;
+static const char *base_dir = "";
+static int base_dir_len = 0;
 
 static void create_directories(const char *path)
 {
@@ -51,10 +53,10 @@ static void create_directories(const cha
 		if (mkdir(buf, 0755)) {
 			if (errno == EEXIST) {
 				struct stat st;
-				if (!lstat(buf, &st) && S_ISDIR(st.st_mode))
-					continue; /* ok */
-				if (force && !unlink(buf) && !mkdir(buf, 0755))
+				if (len > base_dir_len && force && !unlink(buf) && !mkdir(buf, 0755))
 					continue;
+				if (!stat(buf, &st) && S_ISDIR(st.st_mode))
+					continue; /* ok */
 			}
 			die("cannot create directory at %s", buf);
 		}
@@ -163,11 +165,11 @@ static int write_entry(struct cache_entr
 	return 0;
 }
 
-static int checkout_entry(struct cache_entry *ce, const char *base_dir)
+static int checkout_entry(struct cache_entry *ce)
 {
 	struct stat st;
 	static char path[MAXPATHLEN+1];
-	int len = strlen(base_dir);
+	int len = base_dir_len;
 
 	memcpy(path, base_dir, len);
 	strcpy(path + len, ce->name);
@@ -194,7 +196,7 @@ static int checkout_entry(struct cache_e
 	return write_entry(ce, path);
 }
 
-static int checkout_file(const char *name, const char *base_dir)
+static int checkout_file(const char *name)
 {
 	int pos = cache_name_pos(name, strlen(name));
 	if (pos < 0) {
@@ -209,10 +211,10 @@ static int checkout_file(const char *nam
 		}
 		return -1;
 	}
-	return checkout_entry(active_cache[pos], base_dir);
+	return checkout_entry(active_cache[pos]);
 }
 
-static int checkout_all(const char *base_dir)
+static int checkout_all(void)
 {
 	int i;
 
@@ -220,7 +222,7 @@ static int checkout_all(const char *base
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce))
 			continue;
-		if (checkout_entry(ce, base_dir) < 0)
+		if (checkout_entry(ce) < 0)
 			return -1;
 	}
 	return 0;
@@ -229,7 +231,6 @@ static int checkout_all(const char *base
 int main(int argc, char **argv)
 {
 	int i, force_filename = 0;
-	const char *base_dir = "";
 	struct cache_file cache_file;
 	int newfd = -1;
 
@@ -241,7 +242,7 @@ int main(int argc, char **argv)
 		const char *arg = argv[i];
 		if (!force_filename) {
 			if (!strcmp(arg, "-a")) {
-				checkout_all(base_dir);
+				checkout_all();
 				continue;
 			}
 			if (!strcmp(arg, "--")) {
@@ -272,10 +273,11 @@ int main(int argc, char **argv)
 			}
 			if (!memcmp(arg, "--prefix=", 9)) {
 				base_dir = arg+9;
+				base_dir_len = strlen(base_dir);
 				continue;
 			}
 		}
-		if (base_dir[0]) {
+		if (base_dir_len) {
 			/* when --prefix is specified we do not
 			 * want to update cache.
 			 */
@@ -285,7 +287,7 @@ int main(int argc, char **argv)
 			}
 			refresh_cache = 0;
 		}
-		checkout_file(arg, base_dir);
+		checkout_file(arg);
 	}
 
 	if (0 <= newfd &&
diff --git a/t/t2003-checkout-cache-mkdir.sh b/t/t2003-checkout-cache-mkdir.sh
new file mode 100755
--- /dev/null
+++ b/t/t2003-checkout-cache-mkdir.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='git-checkout-cache --prefix test.
+
+This test makes sure that --prefix option works as advertised, and
+also verifies that such leading path may contain symlinks, unlike
+the GIT controlled paths.
+'
+
+. ./test-lib.sh
+
+test_expect_success \
+    'setup' \
+    'mkdir path1 &&
+    echo frotz >path0 &&
+    echo rezrov >path1/file1 &&
+    git-update-cache --add path0 path1/file1'
+
+test_expect_success \
+    'have symlink in place where dir is expected.' \
+    'rm -fr path0 path1 &&
+     mkdir path2 &&
+     ln -s path2 path1 &&
+     git-checkout-cache -f -a &&
+     test ! -h path1 && test -d path1 &&
+     test -f path1/file1 && test ! -f path2/file1'
+
+test_expect_success \
+    'use --prefix=path2/' \
+    'rm -fr path0 path1 path2 &&
+     mkdir path2 &&
+     git-checkout-cache --prefix=path2/ -f -a &&
+     test -f path2/path0 &&
+     test -f path2/path1/file1 &&
+     test ! -f path0 &&
+     test ! -f path1/file1'
+
+test_expect_success \
+    'use --prefix=tmp-' \
+    'rm -fr path0 path1 path2 tmp* &&
+     git-checkout-cache --prefix=tmp- -f -a &&
+     test -f tmp-path0 &&
+     test -f tmp-path1/file1 &&
+     test ! -f path0 &&
+     test ! -f path1/file1'
+
+test_expect_success \
+    'use --prefix=tmp- but with a conflicting file and dir' \
+    'rm -fr path0 path1 path2 tmp* &&
+     echo nitfol >tmp-path1 &&
+     mkdir tmp-path0 &&
+     git-checkout-cache --prefix=tmp- -f -a &&
+     test -f tmp-path0 &&
+     test -f tmp-path1/file1 &&
+     test ! -f path0 &&
+     test ! -f path1/file1'
+
+# Linus fix #1
+test_expect_success \
+    'use --prefix=tmp/orary/ where tmp is a symlink' \
+    'rm -fr path0 path1 path2 tmp* &&
+     mkdir tmp1 tmp1/orary &&
+     ln -s tmp1 tmp &&
+     git-checkout-cache --prefix=tmp/orary/ -f -a &&
+     test -d tmp1/orary &&
+     test -f tmp1/orary/path0 &&
+     test -f tmp1/orary/path1/file1 &&
+     test -h tmp'
+
+# Linus fix #2
+test_expect_success \
+    'use --prefix=tmp/orary- where tmp is a symlink' \
+    'rm -fr path0 path1 path2 tmp* &&
+     mkdir tmp1 &&
+     ln -s tmp1 tmp &&
+     git-checkout-cache --prefix=tmp/orary- -f -a &&
+     test -f tmp1/orary-path0 &&
+     test -f tmp1/orary-path1/file1 &&
+     test -h tmp'
+
+# Linus fix #3
+test_expect_success \
+    'use --prefix=tmp- where tmp-path1 is a symlink' \
+    'rm -fr path0 path1 path2 tmp* &&
+     mkdir tmp1 &&
+     ln -s tmp1 tmp-path1 &&
+     git-checkout-cache --prefix=tmp- -f -a &&
+     test -f tmp-path0 &&
+     test ! -h tmp-path1 &&
+     test -d tmp-path1 &&
+     test -f tmp-path1/file1'
+
------------------------------------------------


^ permalink raw reply

* Re: [PATCH 3/3] Diff overhaul, adding the other half...
From: Junio C Hamano @ 2005-05-24  9:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505240129420.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> (I think that's just because the "silent" test used to depend on the 
LT> magical behaviour of the "header" thing, and now that the header 
LT> generation and suppression is sane, "silent" doesn't work any more)

I think you will be more efficient for this task but I'm willing
to volunteer if you let me know how "silent" should behave.  The
documentation says it is useful only with -v and supresses the
diffs, so if that is the only thing it does, I think something
like this is sufficient?  Not tested enough but I am going to
crash for the day now.

------------
Use DIFF_FORMAT_NO_OUTPUT to implement diff-tree -s option.

Instead of checking silent flag all over the place, simply use
the NO_OUTPUT option diffcore provides to suppress the diff
output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
cd /opt/packrat/playpen/public/in-place/git/git.junio/
jit-diff
# - HEAD: Allow symlinks in the leading path in checkout-cache --prefix=
# + (working tree)
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -2,7 +2,6 @@
 #include "cache.h"
 #include "diff.h"
 
-static int silent = 0;
 static int show_root_diff = 0;
 static int verbose_header = 0;
 static int ignore_merges = 1;
@@ -67,9 +66,6 @@ static void show_file(const char *prefix
 	const char *path;
 	const unsigned char *sha1 = extract(tree, size, &path, &mode);
 
-	if (silent)
-		return;
-
 	if (recursive && S_ISDIR(mode)) {
 		char type[20];
 		unsigned long size;
@@ -132,9 +128,6 @@ static int compare_tree_entry(void *tree
 		return retval;
 	}
 
-	if (silent)
-		return 0;
-
 	diff_change(mode1, mode2, sha1, sha2, base, path1);
 	return 0;
 }
@@ -395,8 +388,7 @@ static char *generate_header(const char 
 		if (this_header[offset-1] != '\n')
 			this_header[offset++] = '\n';
 		/* Add _another_ EOLN if we are doing diff output */
-		if (!silent)
-			this_header[offset++] = '\n';
+		this_header[offset++] = '\n';
 		this_header[offset] = 0;
 	}
 
@@ -442,8 +434,6 @@ static int diff_tree_commit(const unsign
 			 * Don't print multiple merge entries if we
 			 * don't print the diffs.
 			 */
-			if (silent)
-				break;
 		}
 		offset += 48;
 	}
@@ -540,7 +530,7 @@ int main(int argc, const char **argv)
 			continue;
 		}
 		if (!strcmp(arg, "-s")) {
-			silent = 1;
+			diff_output_format = DIFF_FORMAT_NO_OUTPUT;
 			continue;
 		}
 		if (!strcmp(arg, "-v")) {

Compilation finished at Tue May 24 02:03:10




^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-24 16:00 UTC (permalink / raw)
  To: David Mansfield
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505240110580.2307@ppc970.osdl.org>

On Tue, 24 May 2005, Linus Torvalds wrote:
> 
> It has the logic for branches, but it doesn't work, and I'm fed up enough
> with CVS and RCS for the moment that I'm not going to work on it any more
> tonight.

I'm back, and yes, it was a really stupid thing.

However, David, I need more help deciphering "cvsps" output..

Fixing the branch handling shows that cvsps does some really strange
things with the newly added "Ancestor grpah". Here's one example:

	---------------------
	PatchSet 372 
	Date: 2002/02/03 21:37:50
	Author: hpa
	Branch: syslinux-1_6x-1
	Ancestor branch: HEAD
	Tag: syslinux-1_67 
	Log:
	New mailing list information

	Members: 
	        syslinux.doc:1.48->1.48.2.1 

	---------------------
	PatchSet 373 
	Date: 2002/02/11 23:08:47
	Author: hpa
	Branch: HEAD
	Tag: (none) 
	Log:
	tftpd32 needs version 2.11 or later.

	Members: 
	        pxelinux.doc:1.28->1.29 

	---------------------
	PatchSet 374 
	Date: 2002/02/18 23:43:43
	Author: hpa
	Branch: syslinux-1_6x-1
	Ancestor branch: HEAD
	Tag: syslinux-1_6x-merge-2 
	Log:
	Actually make the -o option work properly.

	Members: 
	        syslinux.c:1.13->1.13.2.1 

	---------------------

note how both 372 _and_ 374 claim to have HEAD as their ancestor, and are 
on the "syslinux-1_6x-1" branch. What's up with that? Right now this 
causes my git archive to first create 372 as a branch off HEAD, and then 
overwrite that with 374, resulting in a dangling branch for 372 that 
_exists_, but it's not reachable any more, because the branch name that it 
used has been overwritten by the _new_ branch off HEAD.

Side note: cvs2git is pretty robust since it doesn't rely on patches
anywhere, so the head of the branch likely ends up being correct, if that
"syslinux.doc" file has been modified anywhere else in the branch. So this
_usually_ just results in (a) git-fsck-cache complaining about unreachable
commits and (b) possible history being hard to find.

Maybe this cvs2git behaviour is the right thing to do, and what really
happened was that the changes described by PatchSet 372 aren't really
available any more even in CVS, unless you go back by date or something 
like that.

However, I suspect it's a cvsps bug in the "ancestor branch" thing. I
could work around it by just saying "if I have already seen this branch,
I'll ignore the ancestor information".

So I'd like to know whether this is a cvsps issue or whether I actually
ended up doing the right thing and it really should be a dangling
branch-name that got re-used...

(And if it's a cvsps issue, I'd obviously prefer to get a cvsps patch 
instead of having a questionable workaround in cvs2git).

"Davi-Mansfieldobi, you're our only hope.."

		Linus

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-24 16:16 UTC (permalink / raw)
  To: David Mansfield
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505240849050.2307@ppc970.osdl.org>

On Tue, 24 May 2005, Linus Torvalds wrote:
> 
> Fixing the branch handling shows that cvsps does some really strange
> things with the newly added "Ancestor grpah". Here's one example:

Ahh, looking at cvsps source, I think I see what's going on. 

It's deciding the "previous branch" by looking at what the previous branch 
for the first individual file in the PatchSet was, which fails because in 
this case, PatchSet 372 was changing "syslinux.doc", and Patchset 374 was 
changing "syslinux.c", and thus the previous version of the individual 
_files_ were both in the HEAD branch.

So it does look like I should just ignore the "Ancestor branch" 
information if the new branch already existed.

Of course, some semantics will never be translatable when trying to treat 
CVS as a sane system (ie treating CVS as if it was changeset-based is 
always going to cause strange corner cases since it really is file-based), 
but that should most likely give the best approximation of what a 
conversion should do.

		Linus

^ permalink raw reply

* Re: gitweb wishlist
From: David Mansfield @ 2005-05-24 16:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Kay Sievers, Petr Baudis, Thomas Glanzmann,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505240110580.2307@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> On Mon, 23 May 2005, Linus Torvalds wrote:
> 
>>I've gotten side-tracked with purely git issues, and since I don't 
>>actually have any CVS archives, the cvs->git translation will be on the 
>>back-burner for a while, but your "Ancestor branch" patch seems to at 
>>least solve the problem that cvsps didn't show all the information that 
>>was there.
> 
> 
> Naff.
> 
> I just checked in a "cvs2git.c" file in the "tools" project (which has my 
> patch application stuff).
> 
> It's still buggy, and it's hacky as hell, but you can basically do 
> something like this:
> 
> 	cvsps | cvs2git > script
> 
> with the normal setup for "cvsps", and "cvs2git" needs one additional
> stage, namely it wants to know the RCSDIR where to find the RCS files
> (that should be basically "$CVSROOT/module").
> 
> That _script_ then creates a git archive. Very hacky. So after you've 
> successfully created the conversion script, check it to see that it looks 
> sane, and then do
> 
> 	sh script
> 
> and the end result is a git'ified version of your CVS repo (and a 
> corrupted working directory, btw, so look out. It _shouldn't_ corrupt 
> your old CVS repo, though, so it should be ok).

I'll take a look.  One problem is that many folks use non-local cvs... 
not sure if that will be an issue.  I'll look to cleaning this up if 
necessary.

> 
> It has the logic for branches, but it doesn't work, and I'm fed up enough
> with CVS and RCS for the moment that I'm not going to work on it any more
> tonight. I don't know what stupid bug I have (I've had about a million of
> them on this silly program), but it's at a point where I think others
> might find it interesting, and it's probably/hopefully some really
> embarrassing typo or something and easily fixed.
> 

I actually found an issue with the 30-second ancestor branch patch I 
sent and I'm doing that one properly now.  Once that's done I can look 
at the branch capture logic in cvs2git and see if anything pops out.


> It converted Peter's "syslinux" repository in a couple of minutes, 
> resulting in 1038 commits (it _should_ have resulted in 1046 commits, 
> that's the branch thing afaik) and most of it looks sane:

Really cool.  What's 8 commits between friends?

David

^ permalink raw reply

* Re: gitweb wishlist
From: Thomas Glanzmann @ 2005-05-24 16:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Mansfield, H. Peter Anvin, Kay Sievers, Petr Baudis,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505240110580.2307@ppc970.osdl.org>

Hello Linus,
I tried cvs2git and have the following problem:

	---------------------
	PatchSet 26
	Date: 1998/06/20 03:53:44
	Author: roessler
	Branch: mutt-0-93
	Ancestor branch: HEAD
	Tag: (none)
	Log:
	documenting alias-path

	Members:
		doc/manual.sgml:1.2->1.2.2.1

	---------------------

And your script does that:

	export GIT_COMMITTER_NAME=roessler
	export GIT_COMMITTER_EMAIL=roessler
	export GIT_AUTHOR_NAME=roessler
	export GIT_AUTHOR_EMAIL=roessler
	export GIT_AUTHOR_DATE='1998/06/20 03:53:44'
	ln -sf refs/heads/'master' .git/HEAD
	git-read-tree -m HEAD
	git-checkout-cache -f -u -a
	mkdir -p doc
	co -p -r1.2.2.1 '/home/cip/adm/sithglan/work/mutt/cvsrepository/doc/Attic/manual.sgml,v' > 'doc/manual.sgml'
	git-update-cache --add -- 'doc/manual.sgml'
	tree=$(git-write-tree)
	cat > .cmitmsg <<EOFMSG
	documenting alias-path
	EOFMSG
	commit=$(cat .cmitmsg | git-commit-tree $tree -p HEAD)
	echo $commit > .git/HEAD

The problem might be that this is the first commit in the branch. But I thought
it should end up in refs/heads/mutt-0-93. The problem is that this ends
up a empty file and next time the script is working on it, it fails
because the branch is empty:

	+ export GIT_COMMITTER_NAME=roessler
	+ GIT_COMMITTER_NAME=roessler
	+ export GIT_COMMITTER_EMAIL=roessler
	+ GIT_COMMITTER_EMAIL=roessler
	+ export GIT_AUTHOR_NAME=roessler
	+ GIT_AUTHOR_NAME=roessler
	+ export GIT_AUTHOR_EMAIL=roessler
	+ GIT_AUTHOR_EMAIL=roessler
	+ export 'GIT_AUTHOR_DATE=1998/06/20 07:12:32'
	+ GIT_AUTHOR_DATE=1998/06/20 07:12:32
	+ ln -sf refs/heads/mutt-0-93 .git/HEAD
	+ git-read-tree -m HEAD
	usage: git-read-tree (<sha> | -m <sha1> [<sha2> <sha3>])
	+ git-checkout-cache -f -u -a
	+ co -p -r1.1.1.1.2.2 /home/cip/adm/sithglan/work/mutt/cvsrepository/handler.c,v
	/home/cip/adm/sithglan/work/mutt/cvsrepository/handler.c,v  -->  standard output
	revision 1.1.1.1.2.2
	+ git-update-cache --add -- handler.c
	++ git-write-tree
	+ tree=9e4d085838e4e62a8c4236a6713a7dd8d7b07b4e
	+ cat
	++ cat .cmitmsg
	++ git-commit-tree 9e4d085838e4e62a8c4236a6713a7dd8d7b07b4e -p HEAD
	usage: git-commit-tree <sha1> [-p <sha1>]* < changelog
	+ commit=
	+ echo

	Thomas

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-24 16:31 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: David Mansfield, H. Peter Anvin, Kay Sievers, Petr Baudis,
	Git Mailing List
In-Reply-To: <20050524161745.GA9537@cip.informatik.uni-erlangen.de>



On Tue, 24 May 2005, Thomas Glanzmann wrote:
> 
> And your script does that:
> 
> 	export GIT_COMMITTER_NAME=roessler
> 	export GIT_COMMITTER_EMAIL=roessler
> 	export GIT_AUTHOR_NAME=roessler
> 	export GIT_AUTHOR_EMAIL=roessler
> 	export GIT_AUTHOR_DATE='1998/06/20 03:53:44'
> 	ln -sf refs/heads/'master' .git/HEAD
> 	git-read-tree -m HEAD
> 	git-checkout-cache -f -u -a
> 	mkdir -p doc
> 	co -p -r1.2.2.1 '/home/cip/adm/sithglan/work/mutt/cvsrepository/doc/Attic/manual.sgml,v' > 'doc/manual.sgml'
> 	git-update-cache --add -- 'doc/manual.sgml'
> 	tree=$(git-write-tree)
> 	cat > .cmitmsg <<EOFMSG
> 	documenting alias-path
> 	EOFMSG
> 	commit=$(cat .cmitmsg | git-commit-tree $tree -p HEAD)
> 	echo $commit > .git/HEAD
> 
> The problem might be that this is the first commit in the branch. But I thought
> it should end up in refs/heads/mutt-0-93.

Yes, you're using the cvs2git from yesterday, which didn't write the new
commit to the right branch. This is part of the branch fixing I've done.

Wait another few minutes and I'll commit my fix the problem with cvsps
branch handling (and I need to escape '$' in <<EOFMSG handling).

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox