* Fix branch ancestry calculation @ 2006-03-23 1:29 Linus Torvalds 2006-03-23 1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds 2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield 0 siblings, 2 replies; 11+ messages in thread From: Linus Torvalds @ 2006-03-23 1:29 UTC (permalink / raw) To: David Mansfield; +Cc: Git Mailing List Some branches don't get any ancestors at all, because their ancestor gets a "dotcount" value of 0, and are thus not considered any better than not having any ancestor. That's obviously wrong. Even a zero-dot-count ancestor is better than having none at all. This fixes the issue by making not having an ancestor branch have a goodness value of -1, avoiding the problem (because even a zero dot-count will be considered better). Alternatively, the special-case for the "1.1.1.1" revision should be removed (or made to imply a dot-count of 1). Finally, I suspect that dot-counting in general should ignore any final ".1" counts, ie "1.2.1.1" should count the same as "1.2.1", which should count the same as "1.2", which has a dot-count of 1. That would automatically make any "1.1.1.1.1...." sequence always count as having a dot-count of 0. I'll send suggestion that as a separate patch, but in the meantime, this is a separate issue, and obviously a bug-fix. Signed-off-by: Linus Torvalds <torvalds@osdl.org> ---- diff --git a/cvsps.c b/cvsps.c --- a/cvsps.c +++ b/cvsps.c @@ -2599,7 +2599,7 @@ static void determine_branch_ancestor(Pa * note: rev is the pre-commit revision, not the post-commit */ if (!head_ps->ancestor_branch) - d1 = 0; + d1 = -1; else if (strcmp(ps->branch, rev->branch) == 0) continue; else if (strcmp(head_ps->ancestor_branch, "HEAD") == 0) ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC] Make dot-counting ignore ".1" at the end 2006-03-23 1:29 Fix branch ancestry calculation Linus Torvalds @ 2006-03-23 1:50 ` Linus Torvalds 2006-03-23 6:26 ` Keith Packard 2006-03-24 14:40 ` David Mansfield 2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield 1 sibling, 2 replies; 11+ messages in thread From: Linus Torvalds @ 2006-03-23 1:50 UTC (permalink / raw) To: David Mansfield; +Cc: Git Mailing List I'm not 100% sure this is appropriate, but in general, I think "<rev>" and "<rev>.1" should be considered the same thing, no? Which implies that "1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a zero dot-count. They are all the same version, after all, no? This gets rid of the insane (?) special case of "1.1.1.1" that exists there now, since it's now no longer a special case. I also wonder if trailing ".1" revisions should be ignored when comparing two revisions. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- Yeah, I don't know RCS file logic. This may be completely broken. diff --git a/cvsps.c b/cvsps.c index 2695a0f..2ad1595 100644 --- a/cvsps.c +++ b/cvsps.c @@ -2357,9 +2357,16 @@ static int revision_affects_branch(CvsFi static int count_dots(const char * p) { int dots = 0; + int len = strlen(p); - while (*p) - if (*p++ == '.') + while (len > 2) { + if (memcmp(p+len-2, ".1", 2)) + break; + len -= 2; + } + + while (len) + if (p[--len] == '.') dots++; return dots; @@ -2613,7 +2620,7 @@ static void determine_branch_ancestor(Pa /* HACK: we sometimes pretend to derive from the import branch. * just don't do that. this is the easiest way to prevent... */ - d2 = (strcmp(rev->rev, "1.1.1.1") == 0) ? 0 : count_dots(rev->rev); + d2 = count_dots(rev->rev); if (d2 > d1) head_ps->ancestor_branch = rev->branch; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC] Make dot-counting ignore ".1" at the end 2006-03-23 1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds @ 2006-03-23 6:26 ` Keith Packard 2006-03-23 6:34 ` Linus Torvalds 2006-03-24 14:40 ` David Mansfield 1 sibling, 1 reply; 11+ messages in thread From: Keith Packard @ 2006-03-23 6:26 UTC (permalink / raw) To: Linus Torvalds, Git Mailing List; +Cc: keithp [-- Attachment #1: Type: text/plain, Size: 2330 bytes --] On Wed, 2006-03-22 at 17:50 -0800, Linus Torvalds wrote: > I'm not 100% sure this is appropriate, but in general, I think "<rev>" and > "<rev>.1" should be considered the same thing, no? Which implies that > "1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a > zero dot-count. They are all the same version, after all, no? No. 1.1.1.1 is the first import on the first vendor branch; 1.1 is the head of the tree. vendor branches are total CVS magic and need very special treatment. The initial import sets the 'branch' value in the ,v file to point at the vendor branch. Subsequent imports leave the branch value alone, a commit to the trunk will reset the branch to point at the trunk. This means that use of the default version of the file just after an import gives you the head of the import tree. It's insane, but that's how it works. What I've been doing is to treat imports to a vendor branch which occur sequentially as if they were on the trunk. Imports after an intervening commit to the trunk are placed on a separate branch. The best part is that you get the vendor branch named 1.1.1 *even if you've made a million commits to the trunk*. Which means that you must ignore the numeric relationship between the vendor branch and the trunk and merge them together in date order. > This gets rid of the insane (?) special case of "1.1.1.1" that exists > there now, since it's now no longer a special case. Oh, it's a seriously special case, one which takes seriously special handling, and a careful disregard for normal version number ordering. > I also wonder if trailing ".1" revisions should be ignored when comparing > two revisions. As 'real' CVS version numbers always have four digits, this doesn't much matter. btw -- I've got my parsecvs code doing a pretty good job of discovering the structure of an arbitrary set of ,v files. The last remaining bit of code to write is to correctly construct the tree of branches from the partial trees in each ,v file. With simple trees, things are looking good, with the xserver CVS tree, I get a couple of mis-hung branches as the branch tree is wrong. Fixed tomorrow, I think, at which point it should be able to produce more accurate commits than cvsps does. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Make dot-counting ignore ".1" at the end 2006-03-23 6:26 ` Keith Packard @ 2006-03-23 6:34 ` Linus Torvalds 2006-03-23 7:17 ` Keith Packard 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2006-03-23 6:34 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Wed, 22 Mar 2006, Keith Packard wrote: > > No. 1.1.1.1 is the first import on the first vendor branch; 1.1 is the > head of the tree. Ok. Discard the second patch. The first one is definitely needed for cvsps right now, though. With that in place (the "make sure we have a proper ancestor branch" thing), a "git cvsimport" of the binutils tree seems to be working, at least to the point that it seems to have imported 1400+ commits without undue complaints. But hey, I'm looking forward to something less hacked-together. Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Make dot-counting ignore ".1" at the end 2006-03-23 6:34 ` Linus Torvalds @ 2006-03-23 7:17 ` Keith Packard 0 siblings, 0 replies; 11+ messages in thread From: Keith Packard @ 2006-03-23 7:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: keithp, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 867 bytes --] On Wed, 2006-03-22 at 22:34 -0800, Linus Torvalds wrote: > With that in place (the "make sure we have a proper ancestor branch" > thing), a "git cvsimport" of the binutils tree seems to be working, at > least to the point that it seems to have imported 1400+ commits without > undue complaints. But hey, I'm looking forward to something less > hacked-together. Yeah, me too. Attempts at importing some of the X.org trees have resulted in 'less than ideal' repositories. I stuck a couple of hacks in cvsps myself to get it to deal with X.org trees; the first was to increase a static buffer to 'large enough' to hold X.org-style commit messages (which are enormous). http://gitweb.freedesktop.org/?p=freedesktop-cvsps;a=summary shows both minor patches. I should have let people know about these earlier... -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Make dot-counting ignore ".1" at the end 2006-03-23 1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds 2006-03-23 6:26 ` Keith Packard @ 2006-03-24 14:40 ` David Mansfield 1 sibling, 0 replies; 11+ messages in thread From: David Mansfield @ 2006-03-24 14:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Mansfield, Git Mailing List Linus Torvalds wrote: > I'm not 100% sure this is appropriate, but in general, I think "<rev>" and > "<rev>.1" should be considered the same thing, no? Which implies that > "1.1" and "1.1.1.1" are all the same thing, and collapse to just "1", ie a > zero dot-count. They are all the same version, after all, no? Hmmm. I'm not sure about this. Given x.y.z.q... the 'odd' nodes (starting from x = position 1) represent branches, not revisions, and don't refer to actual concrete objects (just tags if you will) in the cvs world. So if <rev> is something like x.y then x.y.z would refer to the 'z' branch. Furthermore, 'z' better be an even value 2 4 6 etc. because those are the only branch id's cvs will create. The odd values are for 'imported source' branches. The reason 1.1.1.1 exists is some lame-ass crap that CVS delivers to any developer who imports his/her initial source code. It creates 1.1 as a placeholder, and I think in this special case it has the same contents. It also creates a .1 'import branch' then puts the imported revision onto that 'import' branch. In a normal situation, you have rev = x.y You branch, it 'registers' a branch x.y.z where z in {2,4,6...} (and uses a special 'magic branch' syntax x.y.0.z in the symbolic tags section). Only when you commit your first change does it create x.y.z.1. So we have: x.y != x.y.z.1 for sure, in the general case. Also x.y.z will never be x.y.1 for a user created branch because z must be even number (except for import branches), in any case x.y.z is never an actual file revision. Now, it COULD be the fact there there needs to be special handling for x.y.z where z == 1 because that is an import branch and something devilish is happening there. I honestly don't know... David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fix branch ancestry calculation 2006-03-23 1:29 Fix branch ancestry calculation Linus Torvalds 2006-03-23 1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds @ 2006-03-24 14:45 ` David Mansfield 2006-03-24 15:46 ` Linus Torvalds 1 sibling, 1 reply; 11+ messages in thread From: David Mansfield @ 2006-03-24 14:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Mansfield, Git Mailing List Linus Torvalds wrote: > Some branches don't get any ancestors at all, because their ancestor gets > a "dotcount" value of 0, and are thus not considered any better than not > having any ancestor. That's obviously wrong. Even a zero-dot-count > ancestor is better than having none at all. > > This fixes the issue by making not having an ancestor branch have a > goodness value of -1, avoiding the problem (because even a zero dot-count > will be considered better). > > Alternatively, the special-case for the "1.1.1.1" revision should be > removed (or made to imply a dot-count of 1). > Thanks for this. I'll look at bundling this and some miscellaneous other stuff this weekend (pray to gods for rain so I can stay in all weekend ;-). Anyway, I'd like to nail down some of the other nagging ancestry/branch point problems if possible. David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fix branch ancestry calculation 2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield @ 2006-03-24 15:46 ` Linus Torvalds 2006-03-24 16:38 ` Keith Packard 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2006-03-24 15:46 UTC (permalink / raw) To: David Mansfield; +Cc: David Mansfield, Git Mailing List On Fri, 24 Mar 2006, David Mansfield wrote: > > Anyway, I'd like to nail down some of the other nagging ancestry/branch point > problems if possible. What I considered doing was to just ignore the branch ancestry that cvsps gives us, and instead use whatever branch that is closest (ie generates the minimal diff). That's really wrong too (the data just _has_ to be in CVS somehow), but I just don't know how CVS handles branches, and it's how we'd have to do merges if we were to ever support them (since afaik, the merge-back information simply doesn't exists in CVS). I actually went back to read some of the original CVS papers, and realized that CVS _without_ branches actually makes perfect sense. Suddenly it was a perfectly reasonable system: the fact that you can only merge once (between working tree and repo) is perfectly reasonable when there is only one branch and checking in requires you to have updated first. All the things I really hated about CVS just go away if you don't do any branches at all. Of course, it's a much less powerful thing without branches, but what I'm getting at is that the whole branch support seems to have been a total crock added later on top of something that was never designed for it, and where the data-structures aren't even set up for it. Live and learn. (Of course, maybe I'm wrong, and the thing doesn't make sense even without branches). Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fix branch ancestry calculation 2006-03-24 15:46 ` Linus Torvalds @ 2006-03-24 16:38 ` Keith Packard 2006-03-25 1:45 ` Chris Shoemaker 0 siblings, 1 reply; 11+ messages in thread From: Keith Packard @ 2006-03-24 16:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: keithp, David Mansfield, David Mansfield, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2593 bytes --] On Fri, 2006-03-24 at 07:46 -0800, Linus Torvalds wrote: > > On Fri, 24 Mar 2006, David Mansfield wrote: > > > > Anyway, I'd like to nail down some of the other nagging ancestry/branch point > > problems if possible. > > What I considered doing was to just ignore the branch ancestry that cvsps > gives us, and instead use whatever branch that is closest (ie generates > the minimal diff). That's really wrong too (the data just _has_ to be in > CVS somehow), but I just don't know how CVS handles branches, and it's how > we'd have to do merges if we were to ever support them (since afaik, the > merge-back information simply doesn't exists in CVS). cvsps is more of a problem than cvs itself. Per-file branch information is readily available in the ,v files; each version has a list of branches from that version, and there are even tags marking the names of them. One issue that I've discovered is when files have differing branch structure in the same repository. That happens when a branch is created while files are checked out on different branches. I'm not quite sure what to do in this case; I've been trying several approaches and none seem optimal. One remaining plan is to just attach such branches by date, but that assumes that the first commit along a branch occurs shortly after the branch is created (which isn't required). Of course, this branch information is only created when a change is made to the file along said branch, so most of the repository will lack precise branch information for each branch. When you create a child branch, the files with no commits in the parent branch will never get branch information, so the child branch will be numbered as if it were a branch off of the grandparent. Globally, it is possible to reconstruct the entire branch structure. > Suddenly it was a perfectly reasonable system: the fact that you can only > merge once (between working tree and repo) is perfectly reasonable when > there is only one branch and checking in requires you to have updated > first. All the things I really hated about CVS just go away if you don't > do any branches at all. If you look at how deltas are stored in the file you get an even stronger argument -- CVS has always advertised that it stores deltas 'backwards' so that the current version is first in the file. That's true for the trunk, but for every other branch, you have to seek back from the tip of the trunk to the branch point and then walk forwards to the desired version along the branch. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fix branch ancestry calculation 2006-03-24 16:38 ` Keith Packard @ 2006-03-25 1:45 ` Chris Shoemaker 2006-03-25 7:54 ` Keith Packard 0 siblings, 1 reply; 11+ messages in thread From: Chris Shoemaker @ 2006-03-25 1:45 UTC (permalink / raw) To: Keith Packard Cc: Linus Torvalds, David Mansfield, David Mansfield, Git Mailing List On Fri, Mar 24, 2006 at 08:38:58AM -0800, Keith Packard wrote: > On Fri, 2006-03-24 at 07:46 -0800, Linus Torvalds wrote: > > > > On Fri, 24 Mar 2006, David Mansfield wrote: > > > > > > Anyway, I'd like to nail down some of the other nagging ancestry/branch point > > > problems if possible. > > > > What I considered doing was to just ignore the branch ancestry that cvsps > > gives us, and instead use whatever branch that is closest (ie generates > > the minimal diff). That's really wrong too (the data just _has_ to be in > > CVS somehow), but I just don't know how CVS handles branches, and it's how > > we'd have to do merges if we were to ever support them (since afaik, the > > merge-back information simply doesn't exists in CVS). > > cvsps is more of a problem than cvs itself. Per-file branch information > is readily available in the ,v files; each version has a list of > branches from that version, and there are even tags marking the names of > them. One issue that I've discovered is when files have differing branch > structure in the same repository. That happens when a branch is created > while files are checked out on different branches. I'm not quite sure > what to do in this case; I've been trying several approaches and none > seem optimal. One remaining plan is to just attach such branches by > date, but that assumes that the first commit along a branch occurs > shortly after the branch is created (which isn't required). > > Of course, this branch information is only created when a change is made > to the file along said branch, so most of the repository will lack > precise branch information for each branch. When you create a child > branch, the files with no commits in the parent branch will never get > branch information, so the child branch will be numbered as if it were a > branch off of the grandparent. Globally, it is possible to reconstruct > the entire branch structure. If that last sentence was a typo then you already know this, but otherwise you may be disappointed to learn that it's not _always_ possible to discern the correct ancestry tree. The simplest counter-example is two branches where each adds one file and no files in common are modified. If A and B both branched off of HEAD and each adds one file, then they should each only have one file. But if B branched from A which branched from HEAD, then B should also have the file that was added to A. (*) However, the information to distinguish these two cases isn't recorded in CVS. I seem to have described this example more fully in the notes I took while writing the patch to cvsps that does the global inferrence you're describing. You _usually_ can make a very good guess, and the more files that are modified, the better you can do. BTW, those notes are still available here: http://www.codesifter.com/cvsps-notes.txt If you end up comparing the ancestry tree discovered by your tool and the tree output by a patched cvsps, I would be very interested in the results. -chris (*) You can distinguish between A->B->head and B->A->head simply by date. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fix branch ancestry calculation 2006-03-25 1:45 ` Chris Shoemaker @ 2006-03-25 7:54 ` Keith Packard 0 siblings, 0 replies; 11+ messages in thread From: Keith Packard @ 2006-03-25 7:54 UTC (permalink / raw) To: Chris Shoemaker Cc: keithp, Linus Torvalds, David Mansfield, David Mansfield, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1551 bytes --] On Fri, 2006-03-24 at 20:45 -0500, Chris Shoemaker wrote: > If that last sentence was a typo then you already know this, but > otherwise you may be disappointed to learn that it's not _always_ > possible to discern the correct ancestry tree. Sure, it's possible to generate trees which can't be figured out. So far, I haven't found any which can't be pieced back together, except in cases where the tree was accidentally damaged (child branches created on two separate parent branches) > If you end up comparing the ancestry tree discovered by your tool and > the tree output by a patched cvsps, I would be very interested in the > results. So far, I've found several concrete trees where cvsps (in any form) assigns branch points many versions too early compared to the 'true' history. My tool is getting better answers, but still can't compute the tree for the X.org X server tree yet. That one has a wide variety of damage, including the direct copying of ,v files between repositories which had divered, and the accidental branching of files from different parent branches. I keep poking at it... > -chris > > (*) You can distinguish between A->B->head and B->A->head simply by > date. I'm doing a lot more date-based identification than I'm really comfortable with; the bad thing here is that branch points can occur long before any commits to that branch, when doing date-based operations, you have a range of possible matching branch points and it's hard to disambiguate. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-03-25 7:54 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-23 1:29 Fix branch ancestry calculation Linus Torvalds 2006-03-23 1:50 ` [RFC] Make dot-counting ignore ".1" at the end Linus Torvalds 2006-03-23 6:26 ` Keith Packard 2006-03-23 6:34 ` Linus Torvalds 2006-03-23 7:17 ` Keith Packard 2006-03-24 14:40 ` David Mansfield 2006-03-24 14:45 ` Fix branch ancestry calculation David Mansfield 2006-03-24 15:46 ` Linus Torvalds 2006-03-24 16:38 ` Keith Packard 2006-03-25 1:45 ` Chris Shoemaker 2006-03-25 7:54 ` Keith Packard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).