* Consistency question @ 2014-01-15 10:37 David Kastrup 2014-01-15 11:13 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: David Kastrup @ 2014-01-15 10:37 UTC (permalink / raw) To: git Hi, I am in the process of rewriting the core logic of git blame (the current speed of which is quite an impediment to some workflows). I currently have one question I don't see an answer to right away, and that question arises in doing a reasonably robust traversal of commits without determining topology first: The question is what guarantees I have with regard to the commit date of a commit in relation to that of its parent commits: a) none b) commitdate(child) >= commitdate(parent) c) commitdate(child) > commitdate(parent) Obviously, I can rely on c) being true "almost always": it's definitely good for a heuristic used for improving performance (meaning as an ordering criterion for a commit priority queue). The problem is how much I should cater for graceful behavior for the cases where it's not. Does git do any actual checks before pushing? -- David Kastrup ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 10:37 Consistency question David Kastrup @ 2014-01-15 11:13 ` Jeff King 2014-01-15 11:40 ` David Kastrup ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Jeff King @ 2014-01-15 11:13 UTC (permalink / raw) To: David Kastrup; +Cc: git On Wed, Jan 15, 2014 at 11:37:08AM +0100, David Kastrup wrote: > The question is what guarantees I have with regard to the commit date of > a commit in relation to that of its parent commits: > > a) none > b) commitdate(child) >= commitdate(parent) > c) commitdate(child) > commitdate(parent) a) none > Obviously, I can rely on c) being true "almost always": Actually, b) is quite often the case in automated processes (e.g., "git am" or "git rebase"). The author dates are different, but the committer dates may be in the same second. And of course a) is the result of clock skew and software bugs. > it's definitely > good for a heuristic used for improving performance (meaning as an > ordering criterion for a commit priority queue). The problem is how > much I should cater for graceful behavior for the cases where it's not. Yes, this is exactly how git uses it. We generally walk breadth-first through the graph, relying on commit times for performance but not correctness. There are some parts of the code that will behave badly with clock skew. For example, "--since" will stop traversing when we hit a certain point. It requires a fixed number of "too old" commits before quitting, though, in an attempt to bypass small runs of skewed clocks. The "git describe --contains" algorithm will also produce wrong results in the face of skew. I believe it uses a "slop" of 24 hours in its skew. The current "tag --contains" algorithm is currently correct in the face of skew, but it can go much faster if you accept that skew will cause it to go wrong. I suspect there are other algorithms that could be sped up, too, if we had trustworthy generation numbers (I implemented and timed the "--contains" algorithm, but haven't done so for other algorithms). I've played with calculating and caching the generation numbers at repack time, but there aren't any patches currently under consideration. > Does git do any actual checks before pushing? No. One problem with checking the commit relationships is that it may not be the new commit which is broken (by skewing backwards), but rather the commit it builds on (which has skewed forwards). If you are building on such a fast-forward commit, it is often to late to fix that commit. So you have to fast-forward yourself, propagating the bogus value. If the receiving machine checked the incoming commits against its own internal clock, that could work. You would still want to check the commit relationships to catch new commits that erroneously claim to be from the past, though (that is, before their parents). -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 11:13 ` Jeff King @ 2014-01-15 11:40 ` David Kastrup 2014-01-15 12:44 ` Andreas Krey 2014-01-15 11:55 ` David Kastrup 2014-01-18 1:22 ` Mike Hommey 2 siblings, 1 reply; 8+ messages in thread From: David Kastrup @ 2014-01-15 11:40 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > On Wed, Jan 15, 2014 at 11:37:08AM +0100, David Kastrup wrote: > >> The question is what guarantees I have with regard to the commit date of >> a commit in relation to that of its parent commits: >> >> a) none >> b) commitdate(child) >= commitdate(parent) >> c) commitdate(child) > commitdate(parent) > > a) none > >> Obviously, I can rely on c) being true "almost always": > > Actually, b) is quite often the case in automated processes (e.g., "git > am" or "git rebase"). The author dates are different, but the committer > dates may be in the same second. Ok, thanks. Assuming that rebases don't happen 1000/s, I should likely not worry too much about O(n^2) for this case (and frankly, clearly nobody worried about O(n^2) in the current blame.c anyway). It's also not really relevant for linear parts of the history like that of "git rebase" since in that case the parent enters my priority queue when its child is getting processed: nothing to be confused about here. This is more about sibling rivalries calling a parent to the queue before the sibling had a chance to leave. So it comes into play for my use case basically only when dealing with merge commits. > I suspect there are other algorithms that could be sped up, too, if we > had trustworthy generation numbers (I implemented and timed the > "--contains" algorithm, but haven't done so for other algorithms). With a single root, "depth" helps a lot. When looking for a common parent of a number of commits, you first shorten all ancestries to the same size and then you can look for the point of convergence in lockstep. But didn't git forego the "single root" requirement in its commit DAG at some point of time? Thanks for the speedy reply! I think I'm good with what I need to know to go ahead. The rest is just idle curiosity. -- David Kastrup ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 11:40 ` David Kastrup @ 2014-01-15 12:44 ` Andreas Krey 2014-01-15 13:00 ` David Kastrup 0 siblings, 1 reply; 8+ messages in thread From: Andreas Krey @ 2014-01-15 12:44 UTC (permalink / raw) To: David Kastrup; +Cc: Jeff King, git On Wed, 15 Jan 2014 12:40:29 +0000, David Kastrup wrote: ... > With a single root, "depth" helps a lot. When looking for a common > parent of a number of commits, you first shorten all ancestries to the > same size and then you can look for the point of convergence in > lockstep. Hmm, how about traversing from all the start commits downwards simultaneously, noting which start you say each commit from, and stopping when you have a commit carrying all start labels? I don't quite see how the same size plus lockstep works out (but the 'same size' part is possibly the same as my 'concurrent traversal'). > But didn't git forego the "single root" requirement in its commit DAG at > some point of time? About at the beginning, I guess. Nothing in the data model ever required it? > ... The rest is just idle curiosity. Me too, mostly. I may have to do some traversal for tree/dag painting. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 12:44 ` Andreas Krey @ 2014-01-15 13:00 ` David Kastrup 2014-01-15 13:45 ` Andreas Krey 0 siblings, 1 reply; 8+ messages in thread From: David Kastrup @ 2014-01-15 13:00 UTC (permalink / raw) To: Andreas Krey; +Cc: Jeff King, git Andreas Krey <a.krey@gmx.de> writes: > On Wed, 15 Jan 2014 12:40:29 +0000, David Kastrup wrote: > ... >> With a single root, "depth" helps a lot. When looking for a common >> parent of a number of commits, you first shorten all ancestries to the >> same size and then you can look for the point of convergence in >> lockstep. > > Hmm, how about traversing from all the start commits downwards > simultaneously, noting which start you say each commit from, and stopping > when you have a commit carrying all start labels? It means that when the start commits are at considerably different depth, you'll traverse much more material than necessary. Also you need labels. > I don't quite see how the same size plus lockstep works out (but the > 'same size' part is possibly the same as my 'concurrent traversal'). It just equalizes the depth before starting, so you don't need labels: any common ancestor is reached at the same time by its descendants. Of course, I conveniently forgot merge commits. This scheme works out of the box only with single parenting. And it works fine without a common ancestor, too: you just run into a NULL pointer at the same time when there isn't one. So, uh, this solution does not really seem to match the problem... -- David Kastrup ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 13:00 ` David Kastrup @ 2014-01-15 13:45 ` Andreas Krey 0 siblings, 0 replies; 8+ messages in thread From: Andreas Krey @ 2014-01-15 13:45 UTC (permalink / raw) To: David Kastrup; +Cc: Jeff King, git On Wed, 15 Jan 2014 14:00:30 +0000, David Kastrup wrote: > Andreas Krey <a.krey@gmx.de> writes: ... > > Hmm, how about traversing from all the start commits downwards > > simultaneously, noting which start you say each commit from, and stopping > > when you have a commit carrying all start labels? > > It means that when the start commits are at considerably different > depth, you'll traverse much more material than necessary. But it has the advantage that you don't need to traverse the DAG to the root when the differences are small - the runtimes are roughly proportional to the difference between the commits. > Also you need labels. Yes, I assume that that (marking commits) is not a cost factor. > > I don't quite see how the same size plus lockstep works out (but the > > 'same size' part is possibly the same as my 'concurrent traversal'). > > It just equalizes the depth before starting, so you don't need labels: > any common ancestor is reached at the same time by its descendants. > > Of course, I conveniently forgot merge commits. I finally noticed but forgot to mention in the last post. Merges also mean that there is not necessarily a unique common ancestor between commits. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 11:13 ` Jeff King 2014-01-15 11:40 ` David Kastrup @ 2014-01-15 11:55 ` David Kastrup 2014-01-18 1:22 ` Mike Hommey 2 siblings, 0 replies; 8+ messages in thread From: David Kastrup @ 2014-01-15 11:55 UTC (permalink / raw) To: git Jeff King <peff@peff.net> writes: > There are some parts of the code that will behave badly with clock skew. > For example, "--since" will stop traversing when we hit a certain point. > It requires a fixed number of "too old" commits before quitting, though, > in an attempt to bypass small runs of skewed clocks. That actually turns out to be a somewhat sore point for me: I use something like git shortlog -n --since 2013/12/01 --until 2014/01/01 master for generating statistics on LilyPond when I am doing my monthly report begging the community for money. It turns out that the numbers of commits attributed to me tend to go _down_ quite regularly in the time from starting the report to sending it out. Which might also cause me to overlook a particularly selfpraiseworthy item. Not sure how feasible it would be to arrive at a stable and complementary set of --since/--until. -- David Kastrup ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Consistency question 2014-01-15 11:13 ` Jeff King 2014-01-15 11:40 ` David Kastrup 2014-01-15 11:55 ` David Kastrup @ 2014-01-18 1:22 ` Mike Hommey 2 siblings, 0 replies; 8+ messages in thread From: Mike Hommey @ 2014-01-18 1:22 UTC (permalink / raw) To: Jeff King; +Cc: David Kastrup, git On Wed, Jan 15, 2014 at 06:13:30AM -0500, Jeff King wrote: > On Wed, Jan 15, 2014 at 11:37:08AM +0100, David Kastrup wrote: > > > The question is what guarantees I have with regard to the commit date of > > a commit in relation to that of its parent commits: > > > > a) none > > b) commitdate(child) >= commitdate(parent) > > c) commitdate(child) > commitdate(parent) > > a) none > > > Obviously, I can rely on c) being true "almost always": > > Actually, b) is quite often the case in automated processes (e.g., "git > am" or "git rebase"). The author dates are different, but the committer > dates may be in the same second. > > And of course a) is the result of clock skew and software bugs. ... or importing non-git repositories that don't have commit info separated from author info like git does. In such cases, it's usual to duplicate the author info as commit info so that clones of the same non-git repo end up with the same git sha1s. Mercurial easily allows author dates to be in a non topological order. Mike ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-01-18 1:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-15 10:37 Consistency question David Kastrup 2014-01-15 11:13 ` Jeff King 2014-01-15 11:40 ` David Kastrup 2014-01-15 12:44 ` Andreas Krey 2014-01-15 13:00 ` David Kastrup 2014-01-15 13:45 ` Andreas Krey 2014-01-15 11:55 ` David Kastrup 2014-01-18 1:22 ` Mike Hommey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).