Fix branch ancestry calculation

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fix branch ancestry calculation
@ 2006-03-23  1:29 Linus Torvalds
  2006-03-23  1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds
  2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-03-23  1:29 UTC (permalink / raw)
  To: David Mansfield; +Cc: Git Mailing List

Some branches don't get any ancestors at all, because their ancestor gets 
a "dotcount" value of 0, and are thus not considered any better than not 
having any ancestor. That's obviously wrong. Even a zero-dot-count 
ancestor is better than having none at all.

This fixes the issue by making not having an ancestor branch have a 
goodness value of -1, avoiding the problem (because even a zero dot-count 
will be considered better).

Alternatively, the special-case for the "1.1.1.1" revision should be 
removed (or made to imply a dot-count of 1).

Finally, I suspect that dot-counting in general should ignore any final 
".1" counts, ie "1.2.1.1" should count the same as "1.2.1", which should 
count the same as "1.2", which has a dot-count of 1.

That would automatically make any "1.1.1.1.1...." sequence always count as 
having a dot-count of 0.

I'll send suggestion that as a separate patch, but in the meantime, this 
is a separate issue, and obviously a bug-fix.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
----

diff --git a/cvsps.c b/cvsps.c
--- a/cvsps.c
+++ b/cvsps.c
@@ -2599,7 +2599,7 @@ static void determine_branch_ancestor(Pa
 	 * note: rev is the pre-commit revision, not the post-commit
 	 */
 	if (!head_ps->ancestor_branch)
-	    d1 = 0;
+	    d1 = -1;
 	else if (strcmp(ps->branch, rev->branch) == 0)
 	    continue;
 	else if (strcmp(head_ps->ancestor_branch, "HEAD") == 0)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC] Make dot-counting ignore ".1" at the end
  2006-03-23  1:29 Fix branch ancestry calculation Linus Torvalds
@ 2006-03-23  1:50 ` Linus Torvalds
  2006-03-23  6:26   ` Keith Packard
  2006-03-24 14:40   ` David Mansfield
  2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield
  1 sibling, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-03-23  1:50 UTC (permalink / raw)
  To: David Mansfield; +Cc: Git Mailing List


I'm not 100% sure this is appropriate, but in general, I think "<rev>" and 
"<rev>.1" should be considered the same thing, no? Which implies that 
"1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a 
zero dot-count. They are all the same version, after all, no?

This gets rid of the insane (?) special case of "1.1.1.1" that exists 
there now, since it's now no longer a special case.

I also wonder if trailing ".1" revisions should be ignored when comparing 
two revisions.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

Yeah, I don't know RCS file logic. This may be completely broken.

diff --git a/cvsps.c b/cvsps.c
index 2695a0f..2ad1595 100644
--- a/cvsps.c
+++ b/cvsps.c
@@ -2357,9 +2357,16 @@ static int revision_affects_branch(CvsFi
 static int count_dots(const char * p)
 {
     int dots = 0;
+    int len = strlen(p);
 
-    while (*p)
-	if (*p++ == '.')
+    while (len > 2) {
+	if (memcmp(p+len-2, ".1", 2))
+		break;
+	len -= 2;
+    }
+
+    while (len)
+	if (p[--len] == '.')
 	    dots++;
 
     return dots;
@@ -2613,7 +2620,7 @@ static void determine_branch_ancestor(Pa
 	/* HACK: we sometimes pretend to derive from the import branch.  
 	 * just don't do that.  this is the easiest way to prevent... 
 	 */
-	d2 = (strcmp(rev->rev, "1.1.1.1") == 0) ? 0 : count_dots(rev->rev);
+	d2 = count_dots(rev->rev);
 	
 	if (d2 > d1)
 	    head_ps->ancestor_branch = rev->branch;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC] Make dot-counting ignore ".1" at the end
  2006-03-23  1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds
@ 2006-03-23  6:26   ` Keith Packard
  2006-03-23  6:34     ` Linus Torvalds
  2006-03-24 14:40   ` David Mansfield
  1 sibling, 1 reply; 11+ messages in thread
From: Keith Packard @ 2006-03-23  6:26 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List; +Cc: keithp

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

On Wed, 2006-03-22 at 17:50 -0800, Linus Torvalds wrote:
> I'm not 100% sure this is appropriate, but in general, I think "<rev>" and 
> "<rev>.1" should be considered the same thing, no? Which implies that 
> "1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a 
> zero dot-count. They are all the same version, after all, no?

No. 1.1.1.1 is the first import on the first vendor branch; 1.1 is the
head of the tree.

vendor branches are total CVS magic and need very special treatment. The
initial import sets the 'branch' value in the ,v file to point at the
vendor branch. Subsequent imports leave the branch value alone, a commit
to the trunk will reset the branch to point at the trunk. This means
that use of the default version of the file just after an import gives
you the head of the import tree. It's insane, but that's how it works.

What I've been doing is to treat imports to a vendor branch which occur
sequentially as if they were on the trunk. Imports after an intervening
commit to the trunk are placed on a separate branch.

The best part is that you get the vendor branch named 1.1.1 *even if
you've made a million commits to the trunk*. Which means that you must
ignore the numeric relationship between the vendor branch and the trunk
and merge them together in date order.

> This gets rid of the insane (?) special case of "1.1.1.1" that exists 
> there now, since it's now no longer a special case.

Oh, it's a seriously special case, one which takes seriously special
handling, and a careful disregard for normal version number ordering.

> I also wonder if trailing ".1" revisions should be ignored when comparing 
> two revisions.

As 'real' CVS version numbers always have four digits, this doesn't much
matter.

btw -- I've got my parsecvs code doing a pretty good job of discovering
the structure of an arbitrary set of ,v files. The last remaining bit of
code to write is to correctly construct the tree of branches from the
partial trees in each ,v file. With simple trees, things are looking
good, with the xserver CVS tree, I get a couple of mis-hung branches as
the branch tree is wrong. Fixed tomorrow, I think, at which point it
should be able to produce more accurate commits than cvsps does.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Make dot-counting ignore ".1" at the end
  2006-03-23  6:26   ` Keith Packard
@ 2006-03-23  6:34     ` Linus Torvalds
  2006-03-23  7:17       ` Keith Packard
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-03-23  6:34 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List

On Wed, 22 Mar 2006, Keith Packard wrote:
> 
> No. 1.1.1.1 is the first import on the first vendor branch; 1.1 is the
> head of the tree.

Ok. Discard the second patch. The first one is definitely needed for cvsps 
right now, though.

With that in place (the "make sure we have a proper ancestor branch" 
thing), a "git cvsimport" of the binutils tree seems to be working, at 
least to the point that it seems to have imported 1400+ commits without 
undue complaints. But hey, I'm looking forward to something less 
hacked-together.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Make dot-counting ignore ".1" at the end
  2006-03-23  6:34     ` Linus Torvalds
@ 2006-03-23  7:17       ` Keith Packard
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Packard @ 2006-03-23  7:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Wed, 2006-03-22 at 22:34 -0800, Linus Torvalds wrote:

> With that in place (the "make sure we have a proper ancestor branch" 
> thing), a "git cvsimport" of the binutils tree seems to be working, at 
> least to the point that it seems to have imported 1400+ commits without 
> undue complaints. But hey, I'm looking forward to something less 
> hacked-together.

Yeah, me too. Attempts at importing some of the X.org trees have
resulted in 'less than ideal' repositories.

I stuck a couple of hacks in cvsps myself to get it to deal with 
X.org trees; the first was to increase a static buffer to 'large enough'
to hold X.org-style commit messages (which are enormous).

http://gitweb.freedesktop.org/?p=freedesktop-cvsps;a=summary

shows both minor patches. I should have let people know about these
earlier...

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] Make dot-counting ignore ".1" at the end
  2006-03-23  1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds
  2006-03-23  6:26   ` Keith Packard
@ 2006-03-24 14:40   ` David Mansfield
  1 sibling, 0 replies; 11+ messages in thread
From: David Mansfield @ 2006-03-24 14:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Mansfield, Git Mailing List

Linus Torvalds wrote:
> I'm not 100% sure this is appropriate, but in general, I think "<rev>" and 
> "<rev>.1" should be considered the same thing, no? Which implies that 
> "1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a 
> zero dot-count. They are all the same version, after all, no?

Hmmm.  I'm not sure about this. Given x.y.z.q... the 'odd' nodes 
(starting from x = position 1) represent branches, not revisions, and 
don't refer to actual concrete objects (just tags if you will) in the 
cvs world.

So if <rev> is something like x.y then x.y.z would refer to the 'z' branch.

Furthermore, 'z' better be an even value 2 4 6 etc. because those are 
the only branch id's cvs will create.  The odd values are for 'imported 
source' branches.

The reason 1.1.1.1 exists is some lame-ass crap that CVS delivers to any 
developer who imports his/her initial source code.

It creates 1.1 as a placeholder, and I think in this special case it has 
the same contents.  It also creates a .1 'import branch' then puts the
imported revision onto that 'import' branch.

In a normal situation, you have rev = x.y

You branch, it 'registers' a branch x.y.z where z in {2,4,6...} (and 
uses a special 'magic branch' syntax x.y.0.z in the symbolic tags 
section).

Only when you commit your first change does it create x.y.z.1.

So we have:

x.y != x.y.z.1 for sure, in the general case.

Also x.y.z will never be x.y.1 for a user created branch because z must 
be even number (except for import branches), in any case x.y.z is never 
an actual file revision.  Now, it COULD be the fact there there needs to 
be special handling for x.y.z where z == 1 because that is an import 
branch and something devilish is happening there.

I honestly don't know...

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix branch ancestry calculation
  2006-03-23  1:29 Fix branch ancestry calculation Linus Torvalds
  2006-03-23  1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds
@ 2006-03-24 14:45 ` David Mansfield
  2006-03-24 15:46   ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: David Mansfield @ 2006-03-24 14:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Mansfield, Git Mailing List

Linus Torvalds wrote:
> Some branches don't get any ancestors at all, because their ancestor gets 
> a "dotcount" value of 0, and are thus not considered any better than not 
> having any ancestor. That's obviously wrong. Even a zero-dot-count 
> ancestor is better than having none at all.
> 
> This fixes the issue by making not having an ancestor branch have a 
> goodness value of -1, avoiding the problem (because even a zero dot-count 
> will be considered better).
> 
> Alternatively, the special-case for the "1.1.1.1" revision should be 
> removed (or made to imply a dot-count of 1).
> 


Thanks for this.  I'll look at bundling this and some miscellaneous 
other stuff this weekend (pray to gods for rain so I can stay in all 
weekend ;-).

Anyway, I'd like to nail down some of the other nagging ancestry/branch 
point problems if possible.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix branch ancestry calculation
  2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield
@ 2006-03-24 15:46   ` Linus Torvalds
  2006-03-24 16:38     ` Keith Packard
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-03-24 15:46 UTC (permalink / raw)
  To: David Mansfield; +Cc: David Mansfield, Git Mailing List

On Fri, 24 Mar 2006, David Mansfield wrote:
> 
> Anyway, I'd like to nail down some of the other nagging ancestry/branch point
> problems if possible.

What I considered doing was to just ignore the branch ancestry that cvsps 
gives us, and instead use whatever branch that is closest (ie generates 
the minimal diff). That's really wrong too (the data just _has_ to be in 
CVS somehow), but I just don't know how CVS handles branches, and it's how 
we'd have to do merges if we were to ever support them (since afaik, the 
merge-back information simply doesn't exists in CVS).

I actually went back to read some of the original CVS papers, and realized 
that CVS _without_ branches actually makes perfect sense.

Suddenly it was a perfectly reasonable system: the fact that you can only 
merge once (between working tree and repo) is perfectly reasonable when 
there is only one branch and checking in requires you to have updated 
first. All the things I really hated about CVS just go away if you don't 
do any branches at all.

Of course, it's a much less powerful thing without branches, but what I'm 
getting at is that the whole branch support seems to have been a total 
crock added later on top of something that was never designed for it, and 
where the data-structures aren't even set up for it.

Live and learn. (Of course, maybe I'm wrong, and the thing doesn't make 
sense even without branches).

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix branch ancestry calculation
  2006-03-24 15:46   ` Linus Torvalds
@ 2006-03-24 16:38     ` Keith Packard
  2006-03-25  1:45       ` Chris Shoemaker
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Packard @ 2006-03-24 16:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, David Mansfield, David Mansfield, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

On Fri, 2006-03-24 at 07:46 -0800, Linus Torvalds wrote:
> 
> On Fri, 24 Mar 2006, David Mansfield wrote:
> > 
> > Anyway, I'd like to nail down some of the other nagging ancestry/branch point
> > problems if possible.
> 
> What I considered doing was to just ignore the branch ancestry that cvsps 
> gives us, and instead use whatever branch that is closest (ie generates 
> the minimal diff). That's really wrong too (the data just _has_ to be in 
> CVS somehow), but I just don't know how CVS handles branches, and it's how 
> we'd have to do merges if we were to ever support them (since afaik, the 
> merge-back information simply doesn't exists in CVS).

cvsps is more of a problem than cvs itself. Per-file branch information
is readily available in the ,v files; each version has a list of
branches from that version, and there are even tags marking the names of
them. One issue that I've discovered is when files have differing branch
structure in the same repository. That happens when a branch is created
while files are checked out on different branches.  I'm not quite sure
what to do in this case; I've been trying several approaches and none
seem optimal. One remaining plan is to just attach such branches by
date, but that assumes that the first commit along a branch occurs
shortly after the branch is created (which isn't required).

Of course, this branch information is only created when a change is made
to the file along said branch, so most of the repository will lack
precise branch information for each branch. When you create a child
branch, the files with no commits in the parent branch will never get
branch information, so the child branch will be numbered as if it were a
branch off of the grandparent. Globally, it is possible to reconstruct
the entire branch structure.

> Suddenly it was a perfectly reasonable system: the fact that you can only 
> merge once (between working tree and repo) is perfectly reasonable when 
> there is only one branch and checking in requires you to have updated 
> first. All the things I really hated about CVS just go away if you don't 
> do any branches at all.

If you look at how deltas are stored in the file you get an even
stronger argument -- CVS has always advertised that it stores deltas
'backwards' so that the current version is first in the file. That's
true for the trunk, but for every other branch, you have to seek back
from the tip of the trunk to the branch point and then walk forwards to
the desired version along the branch.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix branch ancestry calculation
  2006-03-24 16:38     ` Keith Packard
@ 2006-03-25  1:45       ` Chris Shoemaker
  2006-03-25  7:54         ` Keith Packard
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Shoemaker @ 2006-03-25  1:45 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, David Mansfield, David Mansfield,
	Git Mailing List

On Fri, Mar 24, 2006 at 08:38:58AM -0800, Keith Packard wrote:
> On Fri, 2006-03-24 at 07:46 -0800, Linus Torvalds wrote:
> > 
> > On Fri, 24 Mar 2006, David Mansfield wrote:
> > > 
> > > Anyway, I'd like to nail down some of the other nagging ancestry/branch point
> > > problems if possible.
> > 
> > What I considered doing was to just ignore the branch ancestry that cvsps 
> > gives us, and instead use whatever branch that is closest (ie generates 
> > the minimal diff). That's really wrong too (the data just _has_ to be in 
> > CVS somehow), but I just don't know how CVS handles branches, and it's how 
> > we'd have to do merges if we were to ever support them (since afaik, the 
> > merge-back information simply doesn't exists in CVS).
> 
> cvsps is more of a problem than cvs itself. Per-file branch information
> is readily available in the ,v files; each version has a list of
> branches from that version, and there are even tags marking the names of
> them. One issue that I've discovered is when files have differing branch
> structure in the same repository. That happens when a branch is created
> while files are checked out on different branches.  I'm not quite sure
> what to do in this case; I've been trying several approaches and none
> seem optimal. One remaining plan is to just attach such branches by
> date, but that assumes that the first commit along a branch occurs
> shortly after the branch is created (which isn't required).
> 
> Of course, this branch information is only created when a change is made
> to the file along said branch, so most of the repository will lack
> precise branch information for each branch. When you create a child
> branch, the files with no commits in the parent branch will never get
> branch information, so the child branch will be numbered as if it were a
> branch off of the grandparent. Globally, it is possible to reconstruct
> the entire branch structure.

If that last sentence was a typo then you already know this, but
otherwise you may be disappointed to learn that it's not _always_
possible to discern the correct ancestry tree.

The simplest counter-example is two branches where each adds one file
and no files in common are modified.  If A and B both branched off of
HEAD and each adds one file, then they should each only have one file.
But if B branched from A which branched from HEAD, then B should also
have the file that was added to A. (*)  However, the information to
distinguish these two cases isn't recorded in CVS.  

I seem to have described this example more fully in the notes I took
while writing the patch to cvsps that does the global inferrence
you're describing.  You _usually_ can make a very good guess, and the
more files that are modified, the better you can do.

BTW, those notes are still available here:
http://www.codesifter.com/cvsps-notes.txt 

If you end up comparing the ancestry tree discovered by your tool and
the tree output by a patched cvsps, I would be very interested in the
results.

-chris

(*) You can distinguish between A->B->head and B->A->head simply by
date.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix branch ancestry calculation
  2006-03-25  1:45       ` Chris Shoemaker
@ 2006-03-25  7:54         ` Keith Packard
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Packard @ 2006-03-25  7:54 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: keithp, Linus Torvalds, David Mansfield, David Mansfield,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

On Fri, 2006-03-24 at 20:45 -0500, Chris Shoemaker wrote:

> If that last sentence was a typo then you already know this, but
> otherwise you may be disappointed to learn that it's not _always_
> possible to discern the correct ancestry tree.

Sure, it's possible to generate trees which can't be figured out. So
far, I haven't found any which can't be pieced back together, except in
cases where the tree was accidentally damaged (child branches created on
two separate parent branches)

> If you end up comparing the ancestry tree discovered by your tool and
> the tree output by a patched cvsps, I would be very interested in the
> results.

So far, I've found several concrete trees where cvsps (in any form)
assigns branch points many versions too early compared to the 'true'
history. My tool is getting better answers, but still can't compute the
tree for the X.org X server tree yet. That one has a wide variety of
damage, including the direct copying of ,v files between repositories
which had divered, and the accidental branching of files from different
parent branches. I keep poking at it...

> -chris
> 
> (*) You can distinguish between A->B->head and B->A->head simply by
> date.

I'm doing a lot more date-based identification than I'm really
comfortable with; the bad thing here is that branch points can occur
long before any commits to that branch, when doing date-based
operations, you have a range of possible matching branch points and it's
hard to disambiguate.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-03-25  7:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-23  1:29 Fix branch ancestry calculation Linus Torvalds
2006-03-23  1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds
2006-03-23  6:26   ` Keith Packard
2006-03-23  6:34     ` Linus Torvalds
2006-03-23  7:17       ` Keith Packard
2006-03-24 14:40   ` David Mansfield
2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield
2006-03-24 15:46   ` Linus Torvalds
2006-03-24 16:38     ` Keith Packard
2006-03-25  1:45       ` Chris Shoemaker
2006-03-25  7:54         ` Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).