* Fake linear history in a deterministic manner. @ 2006-02-13 1:46 Martin Langhoff 2006-02-13 5:24 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Martin Langhoff @ 2006-02-13 1:46 UTC (permalink / raw) To: Git Mailing List, martyn To emulate `cvs log somepath` I need to munge history to look linear. I am working on the theory that I will tell the cvs client about *one* linear history, and show merges from parallel histories as a merge commit, "flattened" so to speak, and with a commit message where I'll list the hash and first line of each commit that it involves. Now, we are keeping a sqlite db (bad dependency, I know) with a list of the lies we tell to the clients, so we can at least be consistent (and fast). The fact that true git clients will be able to commit to the repo means that actual parallel development will happen in the repo. Now, I can't think of an approach to drawing the linearized history that is deterministic. I can't chose any branch with any confidence, because the repo always has a very limited view. A client could come in any time and push onto the repo a series of commits based on an ancient commit, with ancient dates, and a merge to todays head. We've been talking about updating the sqlite db with a post update hook, which means that in that context we never have to choose, the commits that get to the repo first win because they now drive the linearized history. But when creating a new lies database, we have no "pushed-to-this-repo" timestamp in the commits so we'll have to pick. At the moment, I suspect I'll pick the one with the earliest "following" commit. I thought briefly about delaying the decision until I see the merge, and pick the leftmost, or rightmost, if there is some bias in git-merge or cg-merge on putting whatever origin has on a particular side. It'd mean running backwards through history and that the very last merge can flip the decision entirely. Hmmm... any strategy I can come up with means that each new merge throws the dice again entirely. Ideas? (As a result of this, the git-cvsserver we're drafting is of limited usefulness to projects that really do use git to do what it does best: DSCM. Projects with a mostly linearized history -- using git-rebase liberally to avoid uninteresting merges -- do get a reasonable cvs history. Or will get. Sometime. Soon.) martin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-13 1:46 Fake linear history in a deterministic manner Martin Langhoff @ 2006-02-13 5:24 ` Junio C Hamano 2006-02-13 5:58 ` Martin Langhoff 2006-02-16 22:29 ` Eric Wong 2006-02-16 23:54 ` Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2006-02-13 5:24 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin Langhoff <martin.langhoff@gmail.com> writes: > I thought briefly about delaying the decision until I see the merge, > and pick the leftmost, or rightmost, if there is some bias in > git-merge or cg-merge on putting whatever origin has on a particular > side. It'd mean running backwards through history and that the very > last merge can flip the decision entirely. Hmmm... any strategy I can > come up with means that each new merge throws the dice again entirely. > > Ideas? When somebody pushes a ref to your existing commit ancestry graph, you can easily identify which commits are the new ones you see for the first time. A o---o / o---o---o---o---o B Suppose you started from two branch repo A and B. Your sqlite database knows about all of these commits, and say you earlier have decided to treat A as a side branch, B as trunk. Then somebody pushed a new B, making the ancestry graph like this: A o---o-------*---*---* B-new / / o---o---o---o---o B-old When the update-hook runs, as you read in receive-pack.c, your refs have not been updated yet, so you can identify the commits marked with * with: git rev-list B-new $(git rev-parse --not --all) I think you have to do this enumeration of new commits anyway, if you are keeping tabs on all commits in your database. And you check the chain * commits form, and work backwards. The topologically oldest ones determine what branch others belong to. With CVS you cannot express a merge very well, so you now face a choice. Which parent to drop from the leftmost * commit in the above picture? It has commits known to your sqlite database as its parents, and one of them, B (old), is on trunk; the other one is a side branch, so in this case the decision is clear. You leave the link from B-old and cut link from A. But this heuristics would not work very well if both are branches. Which one to prefer? One approach would be to see the world with eyes of the person who did such a merge. Both git and cogito place the current branch as the first parent, so you can tell which one the person who did the merge had before that merge that way. Also in practice, in a normal branch, you would build sequence of small commits and occasionally merge from elsewhere, so the side with more difference tends to be a side branch from the point of view of the person who performs the merge. However, this would not help much. There is no inherent distinction between trunk and branch in the distributed development. Although I treat my "next" branch as a side branch and "master" branch as the trunk, "next" sometimes needs to merge from "master", and by looking at only that merge, you cannot tell which one is the trunk. My merging "master" into "next" does not make my "next" the trunk. Since the ordering is completely arbitrary, I think you can give preference to the ones you have found out about earlier. That is, suppose you already had this, and for some reason track B is the trunk and track A is a branch (say commits on B have 1.XX, the ones on A have 1.XX.1.YY where XX depends on which commits on B the branch was forked from). At this point, if you see a push to C: A o---o / o---o---o---o---o B \ *---* C you give 1.XX.2.YY to commits on C. This may or may not match reality, and C may collect a lot of commits while B might stay dormant -- and you may regret that you picked B as the trunk. But the thing is, there is no inherent trunk or branch in the distributed world, so the cvs clients of your server needs to live with it. To you, you saw B first and people who track the project through your server also saw B first. Then you can have total ordering to side branches. When you see a merge, you arbitrarily but consistently pick which side to pick. The one that has smaller number at the third place (which I am using as "the branch number"). Just thinking out aloud again... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-13 5:24 ` Junio C Hamano @ 2006-02-13 5:58 ` Martin Langhoff 2006-02-13 16:34 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Martin Langhoff @ 2006-02-13 5:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 2/13/06, Junio C Hamano <junkio@cox.net> wrote: Yep, I had roughly this exact scenario in my mind, thanks for the graphs -- help make sense of what's happening. > When somebody pushes a ref to your existing commit ancestry > graph, you can easily identify which commits are the new ones > you see for the first time. > > A > o---o > / > o---o---o---o---o B > > Suppose you started from two branch repo A and B. Your sqlite > database knows about all of these commits, and say you earlier > have decided to treat A as a side branch, B as trunk. Well, from the point of view of B being a head we know about, either A is another head, and we don't care about it, or A is someone's repo and we haven't seen it and thus cannot care about it either. A bit of background... git-cvsserver at the moment has the following semantics: what cvs considers a top-level "module" we use to identify the head. So cvs -d :ext:me@server:/var/frob.git checkout master will get you frob.git#master. I thought long and hard about this, and it is horribly hard to mimic CVS's idea of branches. So clients see a strictly linear history. Any user in this scenario wanting to do branching and merging is kindly invited to use real tools. There';s only so much magic Perl can do ;-) > Then somebody pushed a new B, making the ancestry graph like > this: > > A > o---o-------*---*---* B-new > / / > o---o---o---o---o B-old > > When the update-hook runs, as you read in receive-pack.c, > your refs have not been updated yet, so you can identify the > commits marked with * with: > > git rev-list B-new $(git rev-parse --not --all) That's a nifty trick, though I'm not sure I'll be able to use it. Right now I'm doing something stupid-er just using `git-log --parents`, skipping commits that don't move the ball forward on the linear head I know about, and then processing those "other tracks" when I see a merge commit. I'll probably find the merge base and see what commits were brought in from the other side. The problem is that the situation running from post update hook is very different from the scenario of running the very same script on a repo where you see all the history. In any case, I'm undecided whether to use --topo-order or --merge-order. Does it really matter? (...) > With CVS you cannot express a merge very well, so you now face a > choice. Which parent to drop from the leftmost * commit in the > above picture? Well, we've already lied about having B to clients that won't know what to do if we tell them about parallel histories, so our pick is B. (...) > One approach would be to see the world with eyes of the person > who did such a merge. Both git and cogito place the current > branch as the first parent, Yes, I thought about that, but that order is ambiguous in the two most interesting cases: - project maintainer pulls from mature feature branches from other developers - her side is first, show the "pulled" stuff as merges (flattened with a merge summary in the commit msg). Still, you can argue the feature development is more interesting. - team-shared-git-repo user does cg-update and merges updates from origin - her side is first, we don't know which side is the interesting one. At all. Hmmm. > But the thing is, there is no > inherent trunk or branch in the distributed world, so the cvs > clients of your server needs to live with it. That's the mantra so far, and we'll talk to cvs clients about perfectly linearized history. Anything else won't be useful as far as I can tell -- or in any case,until this is going well for basic usage. If someone's crazy enough to try I won't get in the way. cheers. martin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-13 5:58 ` Martin Langhoff @ 2006-02-13 16:34 ` Linus Torvalds 0 siblings, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2006-02-13 16:34 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, git On Mon, 13 Feb 2006, Martin Langhoff wrote: > > In any case, I'm undecided whether to use --topo-order or > --merge-order. Does it really matter? Use topo-order. merge-order is much more expensive to calculate, and doesn't even exist if NO_OPENSSL is set. Anyway, while linearization in general is impossible, I'd suggest a few heuristics: - you obviously must remember the head of the previous linearization. Any tree choices you made in the past are not something you can change any more for cvs export. This may sound obvious, but the fact is, especially if you do the cvs export frequently, and the main maintainer updates the tree in a timely manner, it will limit your choices a LOT. In fact, most of the time you'll have no choice at all: you will have an unambiguous "TRUNK" that is defined by the fact that there is only one linear path from the previous export-head to the current HEAD. In short, most of the time you won't have any issues in a stable system. You'll see a true "fork in the road" choice only when the maintainer hasn't pushed his tree to the thing you export in a while, long enough that no CVS exporting has taken place over a whole parallel cycle. It's probably rare. So the main issue of "which child to choose" becomes one of the initial export, and then just occasionally thereafter. And in that case, there's really just one obvious algorithm: - simply enumerate the possible paths (not that hard - it's just enumerating a tree once you've generated the graph in memory with child and parent pointers), and just selecting the one with the "longest weighted path". I say "weighted path", because you might want to consider different commits to have different weights. For example, you might want to consider merge commits to be less interesting (so weight them as 0.5 commits) in order to get as many "real" commits as possible. And you might consider commits made by a certain person to be "more TRUNKish", and give them a weight of 2, so that you'll be more likely to follow the "maintainer view" over any other linearized history. No other algorithm seems very sane. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-13 1:46 Fake linear history in a deterministic manner Martin Langhoff 2006-02-13 5:24 ` Junio C Hamano @ 2006-02-16 22:29 ` Eric Wong 2006-02-16 23:29 ` Martin Langhoff 2006-02-16 23:54 ` Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Eric Wong @ 2006-02-16 22:29 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List, Paul Mackerras Martin Langhoff <martin.langhoff@gmail.com> wrote: > To emulate `cvs log somepath` I need to munge history to look linear. > I am working on the theory that I will tell the cvs client about *one* > linear history, and show merges from parallel histories as a merge > commit, "flattened" so to speak, and with a commit message where I'll > list the hash and first line of each commit that it involves. I'd be interested in exporting from git to SVN with something like this. > I thought briefly about delaying the decision until I see the merge, > and pick the leftmost, or rightmost, if there is some bias in > git-merge or cg-merge on putting whatever origin has on a particular > side. It'd mean running backwards through history and that the very > last merge can flip the decision entirely. Hmmm... any strategy I can > come up with means that each new merge throws the dice again entirely. > > Ideas? I'd actually like to do this interactively in gitk. Just browse history visually and pick the path you want to choose each time there's a merge, and then having it output the revisions to stdout or saved to a file after you're done picking. Ideally you'd be able to use saved output interactively, as well. -- Eric Wong ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-16 22:29 ` Eric Wong @ 2006-02-16 23:29 ` Martin Langhoff 0 siblings, 0 replies; 7+ messages in thread From: Martin Langhoff @ 2006-02-16 23:29 UTC (permalink / raw) To: Eric Wong; +Cc: Git Mailing List, Paul Mackerras On 2/17/06, Eric Wong <normalperson@yhbt.net> wrote: > Martin Langhoff <martin.langhoff@gmail.com> wrote: > > To emulate `cvs log somepath` I need to munge history to look linear. > > I am working on the theory that I will tell the cvs client about *one* > > linear history, and show merges from parallel histories as a merge > > commit, "flattened" so to speak, and with a commit message where I'll > > list the hash and first line of each commit that it involves. > > I'd be interested in exporting from git to SVN with something like this. We're hoping to release the code soon, but the truth is that it's really trivial. It was more agonizing over the fact that there's no "good" (aka "stable") algorithm for this. > > I thought briefly about delaying the decision until I see the merge, > > and pick the leftmost, or rightmost, if there is some bias in > > git-merge or cg-merge on putting whatever origin has on a particular > > side. It'd mean running backwards through history and that the very > > last merge can flip the decision entirely. Hmmm... any strategy I can > > come up with means that each new merge throws the dice again entirely. > > > > Ideas? > > I'd actually like to do this interactively in gitk. Just browse history > visually and pick the path you want to choose each time there's a merge, > and then having it output the revisions to stdout or saved to a file > after you're done picking. Ideally you'd be able to use saved output > interactively, as well. It's cool to be able to pick, but if it's for a git-svnserver implementation, you can't change your (fake) history you tell after clients have seen. So a merge that gets pushed to the repo later may contain more interesting paths, but you're bound to the lies you've told. cheers, martin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fake linear history in a deterministic manner. 2006-02-13 1:46 Fake linear history in a deterministic manner Martin Langhoff 2006-02-13 5:24 ` Junio C Hamano 2006-02-16 22:29 ` Eric Wong @ 2006-02-16 23:54 ` Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2006-02-16 23:54 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin, (and Martyn), I just cloned the gitcvs via git locally, exported it via gitcvs, and ran "cvs log" and "cvs annotate README" there. Good job. I really had a good laugh looking at the generated output. It just looks like .... CVS! ;-) I haven't read the code in detail (I tend to start nitpicking the details without looking at a bigger picture, once I start reading the code, so I try to only give superficial look until I know I'll have tons of time to spend on it). But it looks like a fun project. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-02-16 23:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-02-13 1:46 Fake linear history in a deterministic manner Martin Langhoff 2006-02-13 5:24 ` Junio C Hamano 2006-02-13 5:58 ` Martin Langhoff 2006-02-13 16:34 ` Linus Torvalds 2006-02-16 22:29 ` Eric Wong 2006-02-16 23:29 ` Martin Langhoff 2006-02-16 23:54 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).