* Bizarre missing changes (git bug?) @ 2008-07-21 20:26 Tim Harper 2008-07-21 20:37 ` Linus Torvalds 2008-07-21 20:42 ` Alex Riesen 0 siblings, 2 replies; 58+ messages in thread From: Tim Harper @ 2008-07-21 20:26 UTC (permalink / raw) To: git I ran into a strange issue that has left me scratching my head. I have a commit in my history, that does indeed show up in my branch, named "sprint" The following commands yield as follows (I've modified the output slightly to conceal any potentially proprietary information): ######################### git log HEAD -p commit 8f9effffb0dcdacd514085608e8923fbbe00ff29 Author: Name Concealed <email@email.com> Date: Mon Jul 14 16:19:18 2008 -0600 commit message.... diff --git a/app/controllers/events_controller.rb b/app/controllers/ events_controller.rb index 6905ba4..a0b7dfc 100644 --- a/app/controllers/events_controller.rb +++ b/app/controllers/events_controller.rb @@ -238,36 +238,37 @@ class EventsController < ApplicationController }.freeze RUBY_STUFF = { - changes... - changes... - changes... + changes... + changes... + changes... diff --git a/spec/fixtures/factors.yml b/spec/fixtures/factors.yml index 186ed73..3c76e86 100755 --- a/spec/fixtures/factors.yml +++ b/spec/fixtures/factors.yml @@ -2483,4 +2483,54 @@ some_branch: file: contents: - reliable_on: \ No newline at end of file + data: +fixture_name: + id: 115 + file: Event + contents: behavior ######################### git log spec/fixtures/factors.yml ... commit 8f9effff is not listed anywhere ######################### git log app/controllers/events_controller.rb ... commit 8f9effff shows up ######################### git branch --contains 8f9effff some-task-branch * sprint The changes in 8f9effff for app/controllers/events_controller.rb show up in the working copy, however the changes for spec/fixtures/ factors.yml are nowhere to be seen. It's as if the history of that particular file diverged somehow, but I know that can't be true since git doesn't track files. Anyone run into this before? Any idea what might have caused it? We're a bit concerned about this because if we don't know how to avoid this, we no longer can feel certain that when something is committed, it will make it out in our release. Any help or clues VERY much apperciated. Thanks! Tim ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper @ 2008-07-21 20:37 ` Linus Torvalds 2008-07-21 22:53 ` Tim Harper ` (2 more replies) 2008-07-21 20:42 ` Alex Riesen 1 sibling, 3 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-21 20:37 UTC (permalink / raw) To: Tim Harper; +Cc: git On Mon, 21 Jul 2008, Tim Harper wrote: > > Anyone run into this before? Any idea what might have caused it? We're a bit > concerned about this because if we don't know how to avoid this, we no longer > can feel certain that when something is committed, it will make it out in our > release. Read up on '--full-history'. By default, git simplifies the history for logs that have path simplification: only walking down the side of a merge that all the data came from (ie the unchanged side). So it only leaves merges around if there was changes from _all_ parents. So without --full-history, if any parent matches the state, it just removes the merge and picks that parent that contained all the state. Obviously, any changes to that file can be sufficiently explained by walking just that limited history, because they must have changed in _that_ history too! That default behaviour leads to a *much* simpler history, and is usually what you want - it avoids unnecessary duplication when something was changed trivially the same way in both branches - 'git log' will just pick the first branch. So, if you had two (or more) commits that both fixed the same bug in different branches, and thus both branches actually ended up with the same contents, it does mean that "git log <filename>" will only show _one_ of the fixes. In this case, it apparently showed another version than the one you were looking for. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-21 20:37 ` Linus Torvalds @ 2008-07-21 22:53 ` Tim Harper 2008-07-21 22:55 ` Tim Harper [not found] ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com> 2008-07-26 3:12 ` Roman Zippel 2 siblings, 1 reply; 58+ messages in thread From: Tim Harper @ 2008-07-21 22:53 UTC (permalink / raw) To: git On Jul 21, 2008, at 2:37 PM, Linus Torvalds wrote: > > > On Mon, 21 Jul 2008, Tim Harper wrote: >> >> Anyone run into this before? Any idea what might have caused it? >> We're a bit >> concerned about this because if we don't know how to avoid this, we >> no longer >> can feel certain that when something is committed, it will make it >> out in our >> release. > > Read up on '--full-history'. > > By default, git simplifies the history for logs that have path > simplification: only walking down the side of a merge that all the > data > came from (ie the unchanged side). So it only leaves merges around if > there was changes from _all_ parents. > > So without --full-history, if any parent matches the state, it just > removes the merge and picks that parent that contained all the state. > Obviously, any changes to that file can be sufficiently explained by > walking just that limited history, because they must have changed in > _that_ history too! > > That default behaviour leads to a *much* simpler history, and is > usually > what you want - it avoids unnecessary duplication when something was > changed trivially the same way in both branches - 'git log' will > just pick > the first branch. > Agreed - this was an insightful decision. > So, if you had two (or more) commits that both fixed the same bug in > different branches, and thus both branches actually ended up with > the same > contents, it does mean that "git log <filename>" will only show > _one_ of > the fixes. > > In this case, it apparently showed another version than the one you > were > looking for. > > Linus Git has made me feel stupid on various occasions. This is no exception as the problem turned out being in the chair, not in git. After running through git bisect, and ran the command Alex Riesen suggested, it made it pretty crystal clear where things went wrong. It turned out to be a bad merge that was from a conflict related to white-space issues, and the wrong resolution was chosen (a resolution that also consequently turned out to be no change). Another false impression I had is a merge conflict resolution would always be displayed in a merge commit. However, after running over the merges again, if you pick the right or left, discarding the one or the other, nothing is shown in "git log -p" for the merge commit. Is there a way to see what was chosen for a conflict resolution? Seeing that in the merge commit would have made things a little more clear. Thank you for articulating git branch's behavior - all is clear as mud now :) Tim ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-21 22:53 ` Tim Harper @ 2008-07-21 22:55 ` Tim Harper 0 siblings, 0 replies; 58+ messages in thread From: Tim Harper @ 2008-07-21 22:55 UTC (permalink / raw) To: git On Jul 21, 2008, at 4:53 PM, Tim Harper wrote: > > On Jul 21, 2008, at 2:37 PM, Linus Torvalds wrote: > >> >> >> On Mon, 21 Jul 2008, Tim Harper wrote: >>> >>> Anyone run into this before? Any idea what might have caused it? >>> We're a bit >>> concerned about this because if we don't know how to avoid this, >>> we no longer >>> can feel certain that when something is committed, it will make it >>> out in our >>> release. >> >> Read up on '--full-history'. >> >> By default, git simplifies the history for logs that have path >> simplification: only walking down the side of a merge that all the >> data >> came from (ie the unchanged side). So it only leaves merges around if >> there was changes from _all_ parents. >> >> So without --full-history, if any parent matches the state, it just >> removes the merge and picks that parent that contained all the state. >> Obviously, any changes to that file can be sufficiently explained by >> walking just that limited history, because they must have changed in >> _that_ history too! >> >> That default behaviour leads to a *much* simpler history, and is >> usually >> what you want - it avoids unnecessary duplication when something was >> changed trivially the same way in both branches - 'git log' will >> just pick >> the first branch. >> > > Agreed - this was an insightful decision. > >> So, if you had two (or more) commits that both fixed the same bug in >> different branches, and thus both branches actually ended up with >> the same >> contents, it does mean that "git log <filename>" will only show >> _one_ of >> the fixes. >> >> In this case, it apparently showed another version than the one you >> were >> looking for. >> >> Linus > > Git has made me feel stupid on various occasions. This is no > exception as the problem turned out being in the chair, not in git. > > After running through git bisect, and ran the command Alex Riesen > suggested, it made it pretty crystal clear where things went wrong. > It turned out to be a bad merge that was from a conflict related to > white-space issues, and the wrong resolution was chosen (a > resolution that also consequently turned out to be no change). > > Another false impression I had is a merge conflict resolution would > always be displayed in a merge commit. However, after running over > the merges again, if you pick the right or left, discarding the one > or the other, nothing is shown in "git log -p" for the merge > commit. Is there a way to see what was chosen for a conflict > resolution? Seeing that in the merge commit would have made things > a little more clear. > Actually, I found it: http://www.kernel.org/pub/software/scm/git/docs/git-log.html "git log -p -c" gives me what I was looking for Tim > Thank you for articulating git branch's behavior - all is clear as > mud now :) > > Tim ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com>]
* Re: Bizarre missing changes (git bug?) [not found] ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com> @ 2008-07-21 22:57 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-21 22:57 UTC (permalink / raw) To: Tim Harper; +Cc: Git Mailing List [ git channel added back to cc, because this is an interesting question in itself ] On Mon, 21 Jul 2008, Tim Harper wrote: > > Another false impression I had is a merge conflict resolution would always be > displayed in a merge commit. However, after running over the merges again, if > you pick the right or left, discarding the one or the other, nothing is shown > in "git log -p" for the merge commit. Is there a way to see what was chosen > for a conflict resolution? Seeing that in the merge commit would have made > things a little more clear. The default behavior for showing merges is "--cc", which is the condensed version that only shows _actual_ conflicts that got resolved differently from either of the sources. But note how this is an "after-the-fact" determination: it doesn't look whether the merge _did_ conflict (because doing that would require re-running the whole merge!), but it looks whether the end _result_ is different from either side. So you can easily have a merge that conflicts - but then you resolve that merge by picking just one side of the merge as the result. And in that case the "--cc" diff will not show anything at all - because the end result did not conflict with the sources of the merge! So "--cc" only shows output if: the merge itself actually changed something from _all_ parents. This can happen if: - there was a non-trivial conflict, and the end result really was a "mixture" of the two. The result wasn't just a select of either side, it was a combination of the two. This is obviously one "common" case for a merge resolution. But it's equally common that when you merge something you just say "Ok, that other side did it better, I'll just pick that version". And in that case, "--cc" won't show anything at all, because it's not really a conflict any more once you've done that choice. - There can also be an "evil merge" that actually adds/changes/deletes code that didn't exist AT ALL in either side. And --cc is very much designed to show that. This is actually not always a bad thing (despite the name "evil merge"), because it happens regularly that one branch had changed some infrastructure setup or other, and the other branch had added a totally new use of that infrastructure - and as a result the _merge_ needs to add that setup code that didn't exist in either of the branches (in one because the use wasn't there, in the other because the setup wasn't needed). Anyway, this all means that yes, "--cc" _often_ shows conflicts, but not always - exactly because it doesn't show the ones that the merge had committed as a non-conflict. If you actually want to see the _real_ conflicts that happened as the merge was done, you do have to re-do it. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-21 20:37 ` Linus Torvalds 2008-07-21 22:53 ` Tim Harper [not found] ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com> @ 2008-07-26 3:12 ` Roman Zippel 2008-07-26 19:58 ` Linus Torvalds 2 siblings, 1 reply; 58+ messages in thread From: Roman Zippel @ 2008-07-26 3:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Monday 21. July 2008, Linus Torvalds wrote: > Read up on '--full-history'. > > By default, git simplifies the history for logs that have path > simplification: only walking down the side of a merge that all the data > came from (ie the unchanged side). So it only leaves merges around if > there was changes from _all_ parents. > > So without --full-history, if any parent matches the state, it just > removes the merge and picks that parent that contained all the state. > Obviously, any changes to that file can be sufficiently explained by > walking just that limited history, because they must have changed in > _that_ history too! Is that really a good default behaviour? Is there a way to change that default? I'm currently looking into converting the m68k CVS repository and I'd like to properly regenerate it as two separate lines of development. The problem is if I look at the file history, I often only see one side of the changes when things got merged because of this default. What makes this worse is that graphical front ends may inherit this behaviour. I tested this with qgit and it only shows half of the history. giggle retrieves the history like --full-history, but it lacks empty merges, so it makes it harder to see what got merged when. For example a history that actually looks this: linux -+-----import----+-----------import----+------... m68k \-commit-commit-\-merge-commit-commit-\-merge... Without the merges it looks like this: linux -+-----import----------------import--------------+... m68k \-commit-commit---------commit-commit \... And without --full-history these "loose ends" aren't visible as in qgit. When researching historical changes one wants to know when something was introduced and when it was merged, but this simplification makes it harder than it really has to be. IMO the default should be to show the complete history, so one doesn't miss something by accident that might be important or as the original example shows it might be confusing if one sees a change with "git log --stat id.." and the change disappears when one looks at "git log path". bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-26 3:12 ` Roman Zippel @ 2008-07-26 19:58 ` Linus Torvalds 2008-07-27 17:50 ` Roman Zippel 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-26 19:58 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Sat, 26 Jul 2008, Roman Zippel wrote: > > > > So without --full-history, if any parent matches the state, it just > > removes the merge and picks that parent that contained all the state. > > Obviously, any changes to that file can be sufficiently explained by > > walking just that limited history, because they must have changed in > > _that_ history too! > > Is that really a good default behaviour? Yes. It's the only sane default right now. See below. > Is there a way to change that default? Use an alias or something. To see why it's the default, do a few tests. In particular, try it with gitk on the kernel. Try it on some fairly simple file that doesn't get a lot of churn. Example: gitk lib/vsprintf.c vs gitk --full-history lib/vsprintf.c and if you don't _immediately_ see why --full-history isn't the default, there's likely something wrong with you. One is useful. The other is not. So we absolutely _have_ to simplify merges. There is no question about it. That said, we currently simplify merges the simple and stupid way, and I've hinted several times on this mailing list that there is a better way to do it (last time it was the discussion about "filter-branch". In fact, if you google for filter-branch full-history you'll find some of the discussion. In order to make --full-history useful as a default, we'd need to do an after-the-fact merge cleanup (ie remove lines of development that are later found to really be totally uninteresting), but that is *hard*. But if we did that, I'd agree to making --full-history the default (and it would be a good thing, no doubt about it - I just cannot see how to do ti simply and sanely enough) Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-26 19:58 ` Linus Torvalds @ 2008-07-27 17:50 ` Roman Zippel 2008-07-27 18:47 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Roman Zippel @ 2008-07-27 17:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Sat, 26 Jul 2008, Linus Torvalds wrote: > > Is there a way to change that default? > > Use an alias or something. This doesn't help with the graphical front ends and they only use what git gives them. > To see why it's the default, do a few tests. In particular, try it with > gitk on the kernel. Try it on some fairly simple file that doesn't get a > lot of churn. Example: > > gitk lib/vsprintf.c > > vs > > gitk --full-history lib/vsprintf.c > > and if you don't _immediately_ see why --full-history isn't the default, > there's likely something wrong with you. One is useful. The other is not. > > So we absolutely _have_ to simplify merges. There is no question about it. Well, I don't want that much history. Let's take a different example. Look at kernel/sched_rt.c with git-log, --full-history shows an extra commit of a patch which was committed and merged twice, but there is no information how this other patch was merged. If you have giggle installed, you'll see that commit as a loose end. (I have git version 1.5.6.2 installed in case it matters.) > That said, we currently simplify merges the simple and stupid way, and > I've hinted several times on this mailing list that there is a better way > to do it (last time it was the discussion about "filter-branch". > > In fact, if you google for > > filter-branch full-history > > you'll find some of the discussion. In order to make --full-history useful > as a default, we'd need to do an after-the-fact merge cleanup (ie remove > lines of development that are later found to really be totally > uninteresting), but that is *hard*. I played a little with it in the ruby script below, which produces a complete connected graph of all content nodes and which have been merged into the head, e.g. for sched_rt.c it produces that extra commit merge. The script basically eliminates all empty merges. As input to the script I used "git log --parents --name-only --full-history kernel/sched_rt.c | grep -e ^commit -e ^kernel", which seems to produce the same amount of commits as "gitk --full-history ...". The main function is to check, whether one parent of a commit is an ancestor of another parent, so that this path can be eliminated. I tried it with other paths and too simple implementations quickly lead to exponential behaviour. :) It probably also shouldn't be recursive, I had to increase the stack limit, otherwise I got stack exceptions. Otherwise it seems to work fine, it wasn't that hard :-) The ruby syntax shouldn't be too hard too read, the nonobvious thing is maybe that '$' marks global variables. bye, Roman #! /usr/bin/ruby $parent = Hash.new $content = Hash.new $result = Hash.new commit = nil head = nil while l = $stdin.gets a = l.split(" ") if a[0] == "commit" commit = a[1] head = commit unless head $parent[commit] = a[2..-1] else $content[commit] = true end end $parent_check = Hash.new $parent_cache = Hash.new def commit_has_parent?(commit, commit2) if $parent_check[commit] print "parent loop for #{commit} (#{commit2})?\n" p $parent_check return false end return $parent_cache[commit] if $parent_cache.has_key?(commit) $parent_check[commit] = true res = false if $content[commit] > $content[commit2] $parent[commit].each do |parent| if parent == commit2 || commit_has_parent?(parent, commit2) res = true break; end end end $parent_cache[commit] = res $parent_check.delete(commit) $parent_cache.clear if $parent_check.empty? return res end def check_commit(commit) return $result[commit] if $result.has_key? commit a = Array.new $parent[commit].each do |parent| parent = check_commit(parent) if parent a.each_index do |i| if a[i] == parent || commit_has_parent?(a[i], parent) parent = nil break elsif commit_has_parent?(parent, a[i]) a[i] = parent parent = nil break end end end a.push(parent) if parent end $parent[commit] = a $content[commit] = true if a.size > 1 if $content[commit] $result[commit] = commit max = 1 a.each do |parent| max = $content[parent] + 1 if max <= $content[parent] end $content[commit] = max else $result[commit] = a[0] end return $result[commit] end check_commit(head) #p $result #p $parent p $content.keys.size $content.each_key do |commit| p [ commit, $parent[commit] ] commit = $parent[commit][0] end ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 17:50 ` Roman Zippel @ 2008-07-27 18:47 ` Linus Torvalds 2008-07-27 23:14 ` Roman Zippel 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-27 18:47 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Sun, 27 Jul 2008, Roman Zippel wrote: > Hi, > > On Sat, 26 Jul 2008, Linus Torvalds wrote: > > > > Is there a way to change that default? > > > > Use an alias or something. > > This doesn't help with the graphical front ends and they only use what git > gives them. And the graphical front-ends is exactly why --full-history cannot be the default. We _could_ make it the default for non-graphical ones, if we also say "--no-merges". But then: > > To see why it's the default, do a few tests. In particular, try it with > > gitk on the kernel. Try it on some fairly simple file that doesn't get a > > lot of churn. Example: > > > > gitk lib/vsprintf.c > > > > vs > > > > gitk --full-history lib/vsprintf.c > > > > and if you don't _immediately_ see why --full-history isn't the default, > > there's likely something wrong with you. One is useful. The other is not. > > > > So we absolutely _have_ to simplify merges. There is no question about it. > > Well, I don't want that much history. Right. Nobody does. > Let's take a different example. No, let's not. Unless you can solve that _one_ example efficiently, nothing else matters. The above example is all you ever need. Make that one work right (and efficiently), and everything is fine. And no, some random ruby code doesn't make any difference what-so-ever. There are efficiency constraints here that make any ruby solution be unrealistic to begin with. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 18:47 ` Linus Torvalds @ 2008-07-27 23:14 ` Roman Zippel 2008-07-27 23:18 ` Linus Torvalds 2008-07-27 23:25 ` Martin Langhoff 0 siblings, 2 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-27 23:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Sun, 27 Jul 2008, Linus Torvalds wrote: > > > > Is there a way to change that default? > > > > > > Use an alias or something. > > > > This doesn't help with the graphical front ends and they only use what git > > gives them. > > And the graphical front-ends is exactly why --full-history cannot be the > default. If you mean current gitk style --full-history I agree, the problem is still that git is hiding too much history with the simplified history... > > Let's take a different example. > > No, let's not. > > Unless you can solve that _one_ example efficiently, nothing else matters. > > The above example is all you ever need. Make that one work right (and > efficiently), and everything is fine. > > And no, some random ruby code doesn't make any difference what-so-ever. > There are efficiency constraints here that make any ruby solution be > unrealistic to begin with. Why are you dismissing what I wrote without even giving it a second thought? I didn't bother with the initial example, because it's so simple, that it's no real challenge. Did I say anywhere it had to be done in ruby? All I tried was to demonstrate a possible algorithm to solve the problem. I did time the execution and compared to the time it took to extract the history it wasn't significant for such a simple script. What did I do wrong that you rebuff me based on this secondary problem (which I'm quite aware of, because it was me who mentioned in first place) and giving the primary problem (which is the missing history) no attention? If you had any questions, I'd be happy to answer them. If you think that the demonstrated algorithm doesn't work, I'd be glad to know why. If the algorithm might work, any hint what could be do done to try it for real, would be great. But I don't get any of this. Why? bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 23:14 ` Roman Zippel @ 2008-07-27 23:18 ` Linus Torvalds 2008-07-28 0:00 ` Roman Zippel 2008-07-27 23:25 ` Martin Langhoff 1 sibling, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-27 23:18 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Mon, 28 Jul 2008, Roman Zippel wrote: > > Why are you dismissing what I wrote without even giving it a second > thought? I didn't bother with the initial example, because it's so > simple, that it's no real challenge. Did you try it? It really shouldn't be any simpler than anything else. And I dismissed what you wrote because the example you _did_ state was about something else entirely (ie apparently some giggle bug that simplifies things incorrectly). > What did I do wrong that you rebuff me based on this secondary problem > (which I'm quite aware of, because it was me who mentioned in first place) > and giving the primary problem (which is the missing history) no > attention? It's not missing history. It's all there in --full-history. The default is to give a reasonable simplification, and I told you what the simplification was, and it's perfectly conceptually fine - AND IT IS MUCH MORE EFFICIENT than the alternatives. So I'm not seeing your point what-so-ever. My point is: - with full-history, you have it all, but it's useless in practice - without full-history, it's useful in practice You never gave any examples otherwise. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 23:18 ` Linus Torvalds @ 2008-07-28 0:00 ` Roman Zippel 2008-07-28 5:00 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Roman Zippel @ 2008-07-28 0:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Sun, 27 Jul 2008, Linus Torvalds wrote: > > Why are you dismissing what I wrote without even giving it a second > > thought? I didn't bother with the initial example, because it's so > > simple, that it's no real challenge. > > Did you try it? It really shouldn't be any simpler than anything else. And Of course I did: $ git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit | wc -l 5929 This is same amount of commits as for gitk. $ /usr/bin/time git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit -e ^lib | /usr/bin/time ruby ../filter.rb 3.08user 0.14system 0:03.54elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+48230minor)pagefaults 0swaps 20 ["72fd4a35a824331d7a0f4168d7576502d95d34b3", ["0a6047eef1c465c38aacfbdab193161b3f0cd144"]] ["06b2a76d25d3cfbd14680021c1d356c91be6904e", ["96e3e18eed3b48f6d4377dee0326a106e8a04569"]] ["1da177e4c3f41524e886b7f1b8a0c1fc7321cac2", []] ["e905914f96e11862b130dd229f73045dad9a34e8", ["f796937a062c7aeb44cd0e75e1586c8543634a7d"]] ["4e57b6817880946a3a78d5d8cad1ace363f7e449", ["8032230694ec56c168a1404c67a54d281536cbed"]] ["4f9d5f4a353440f2265781bfa641587964901861", ["9b706aee7d92d6ac3002547aea12e3eaa0a750ae"]] ["0a6047eef1c465c38aacfbdab193161b3f0cd144", ["e905914f96e11862b130dd229f73045dad9a34e8"]] ["c6b40d16d1cfa1a01158049bb887a9bbe48ef7ba", ["11443ec7d9286dd25663516436a14edfb5f43857"]] ["ea6f3281a145d16ed53e88b0627f78d5cde6068f", ["72fd4a35a824331d7a0f4168d7576502d95d34b3"]] ["11443ec7d9286dd25663516436a14edfb5f43857", ["ea6f3281a145d16ed53e88b0627f78d5cde6068f"]] ["b39a734097d5095d63eb9c709a6aaf965633bb01", ["c6b40d16d1cfa1a01158049bb887a9bbe48ef7ba"]] ["78a8bf69b32980879975f7e31d30386c50bfe851", ["0f9bfa569d46f2346a53a940b2b9e49a38635732"]] ["4d8a743cdd2690c0bc8d1b8cbd02cffb1ead849f", ["78a8bf69b32980879975f7e31d30386c50bfe851"]] ["0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2", ["4d8a743cdd2690c0bc8d1b8cbd02cffb1ead849f"]] ["9b706aee7d92d6ac3002547aea12e3eaa0a750ae", ["06b2a76d25d3cfbd14680021c1d356c91be6904e"]] ["0f9bfa569d46f2346a53a940b2b9e49a38635732", ["4f9d5f4a353440f2265781bfa641587964901861"]] ["4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2", ["b39a734097d5095d63eb9c709a6aaf965633bb01"]] ["8032230694ec56c168a1404c67a54d281536cbed", ["1da177e4c3f41524e886b7f1b8a0c1fc7321cac2"]] ["f796937a062c7aeb44cd0e75e1586c8543634a7d", ["4e57b6817880946a3a78d5d8cad1ace363f7e449"]] ["96e3e18eed3b48f6d4377dee0326a106e8a04569", ["4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2"]] 0.12user 0.02system 0:03.64elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7252minor)pagefaults 0swaps These are the same 20 commits (with parents) from a simple git-log. > I dismissed what you wrote because the example you _did_ state was about > something else entirely (ie apparently some giggle bug that simplifies > things incorrectly). I'm trying to get back things to the topic and it's not just giggle, every tool presents a different version of the history. > > What did I do wrong that you rebuff me based on this secondary problem > > (which I'm quite aware of, because it was me who mentioned in first place) > > and giving the primary problem (which is the missing history) no > > attention? > > It's not missing history. It's all there in --full-history. The default is > to give a reasonable simplification, and I told you what the > simplification was, and it's perfectly conceptually fine - AND IT IS MUCH > MORE EFFICIENT than the alternatives. > > So I'm not seeing your point what-so-ever. That's why I gave you an alternative example, where history is missing. > - with full-history, you have it all, but it's useless in practice Could you please specify which full-history you mean, gitk --full-history or git-log --full-history? > - without full-history, it's useful in practice >From a VCS I would still expect nevertheless to present a correct history not some approximation. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-28 0:00 ` Roman Zippel @ 2008-07-28 5:00 ` Linus Torvalds 2008-07-28 5:30 ` Linus Torvalds 2008-07-29 2:59 ` Roman Zippel 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-28 5:00 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Mon, 28 Jul 2008, Roman Zippel wrote: > > Of course I did: > > $ git log --parents --name-only --full-history lib/vsprintf.c | grep -e ^commit | wc -l > 5929 > > This is same amount of commits as for gitk. OF COURSE it's the same numbr of commits. That's what gitk uses. That's what you *have* to use. You don't actually understand how git works, do you? > These are the same 20 commits (with parents) from a simple git-log. So what? The point I tried to make is that _any_ algorithm that gets the above case right by actually simplifying the commits "post facto" probably gets any case right. You tried to find some more interestign case, but you missed the whole point - even the "simple" case is quite hard enough. IOW, don't look for anythign more difficult, because if you do, you don't understand the problem to begin with! Do you not understand that the problem is that "post facto" isn't actually acceptable? Have you looked at all at the revision reading code? Hmm? The regular merge simplification does the simplification _before_ it has gathered the whole history of commits. And that is really really important. And I realize that you don't even seem to understand the difference. But to simplify it for you, let me give you a challenge. Try this: time sh -c "git log --parents --full-history lib/vsprintf.c | head" and if it takes noticeably more than a tenth of a second on my machine, you lose. Because that's roughly what it takes right now, and it's what means that effectively the normal log is instantaneous. It's why you can start looking at the log without waiting for three seconds, even though the _full_ log may take three seconds to compute (ok, on my machine it takes 2.3s, but whatever). And it's why gitk can start printing out the history _before_ three seconds has passed. And that's really really important. Try it. Really. Just do "gitk lib/vsprintf.c" and look at how it does things incrementally. It doesn't wait for a couple of seconds and then show things. Absolutely EVERYTHING in git would be totally trivial if you did it all based on generating first the whole history, and then simplifying it from there. But git would be unusably slow in real life, and it would scale _horribly_ badly with history size. So _all_ the normal code actually generates the history INCREMENTALLY, and it absolutely _has_ to work that way. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-28 5:00 ` Linus Torvalds @ 2008-07-28 5:30 ` Linus Torvalds 2008-07-29 2:59 ` Roman Zippel 1 sibling, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-28 5:30 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Sun, 27 Jul 2008, Linus Torvalds wrote: > > And it's why gitk can start printing out the history _before_ three > seconds has passed. And that's really really important. Btw, the reason it's really really important is that "three seconds" can actually easily be "three minutes" - if the project is big, or if you simply don't have everything in the cache, so you actually need to do tons of IO to generate the whole history. So every normal operation absolutely _must_ be incremental and not rely on any calculation of the whole history in order to then simplify it. Of course, post-processing is fine for some things. For example, in the thread I pointed you to originally (see filter-branch + full-history in google, or look in some git archive) I suggested a post-processing of the merge history for filter-branch. I suspect it's very acceptable for _that_ kind of use to "batch" things up and not do them with partial knowledge. But this incremental thing is why I for example suggest people should use "git gui blame" instead of "git blame" when looking for problems - because the latter cannot be done incrementally, and as a result can cause really irritating delays (exactly because it basically needs to synchronously walk back to the beginning of history). The kernel repo, btw, is pretty small in this regard. The cases that caused much more pain were the insane KDE ones that were something like ten times the size. We've optimized things pretty aggressively, but... Btw, if I sound irritated, it's because we had all these discussions about three _years_ ago when git got started. This is not a new issue. It's hard. I've been pushing on people to do things incrementally very hard over the last few years because it's such a _huge_ usability issue. For example, I've pointed you to the incremental nature of "gitk" as an example of how things should work, but that's actually fairly recent: it wasn't that long ago that "gitk" used to pass in "--topo-order" or "--date-order" to the core git revision machinery, and that actually is another of those "global" operations that you need the whole history for. So gitk actually used to pause for three seconds (or ten. or thirty) before it would show the results. I'm really happy to report that Paul finally did the (trivial) topo-sort in gitk, meaning that he could re-sort it as necessary and keep things incremental. It was one of my biggest UI gripes for the longest time (and I wasted time adding a special "partial output mode" that gitk didn't even then end up using because Paul did things the right way). Btw, from a git log viewer standpoint, the "merge history simplification" is all the exact same problem as the "--topo-order" flag is: you could use the (incremental and very verbose) git log --full-history --parents output as the base-line, and then you could do the commit simplification of things interactively. But "git log" itself cannot do it by default, since that would mean that git log itself would have to wait for the whole history to be generated. That's because output to a pipe is fundamentally linear (ie it cannot "re-write" the things it has already shown as it finds a simplification: there is no incremental way to rewrite things "after the fact"). Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-28 5:00 ` Linus Torvalds 2008-07-28 5:30 ` Linus Torvalds @ 2008-07-29 2:59 ` Roman Zippel 2008-07-29 3:15 ` Martin Langhoff ` (2 more replies) 1 sibling, 3 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-29 2:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Sun, 27 Jul 2008, Linus Torvalds wrote: > You don't actually understand how git works, do you? Probably not and to be honest I don't care, all I want is my history - the correct one. All I know it's in there somehow and all I care is how to get it out of there. Right now you're giving me the choice between a crappy incomplete history or a crappy history full of useless information. That's it? As long as your challenge involves being compared to crappy history, I'm not interested. If the solution should involve a switch "--correct-history" or I have to wait for the result, I don't care, because it's the correct history I want. As long as you're trying to sell me crappy history I'm not buying it. Can we please get past this and look at what is required to produce the correct history? Fact is based on the current git-log output it's simply impossible to produce the correct history without reading the whole history first, since the very last commit can still be relevant for an earlier merge to connect it properly into the graph. This means we need some extra information before even starting to scan through the commits. Luckily this information can be cached once it has been generated and it also can be updated incrementally. E.g. this information could be generated while generating the pack, so you only had to scan the commits which haven't been packed yet, but it's also possible to update it when merging/pulling new data. It's also not much information that is needed, all that is needed is list of commits per file (which are are usually only a few, mostly even none), which git-log can use, so it knows that these are important while scanning the tree. Technically I don't see a really hard problem to implement this, the problem for me is only that I have no idea where to start this within git and how to integrate it. The other problem (over which I have absolutely no control) is whether anyone actually wants to produce a correct history. I hope there is someone, otherwise there would be no need for "git-log --full-history" (which is also used by git-web), this e.g. produces a nice example in kernel/printk.c, where git-web produces two commits (search for tty_write_message), for which none of the front ends can tell me usefully how it fits into the history (they either don't show it at all or it's lost in "gitk --full-history"). bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 2:59 ` Roman Zippel @ 2008-07-29 3:15 ` Martin Langhoff 2008-07-30 0:16 ` Roman Zippel 2008-07-29 3:29 ` Linus Torvalds 2008-07-29 5:31 ` Jeff King 2 siblings, 1 reply; 58+ messages in thread From: Martin Langhoff @ 2008-07-29 3:15 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Tue, Jul 29, 2008 at 2:59 PM, Roman Zippel <zippel@linux-m68k.org> wrote: > Can we please get past this and look at what is required to produce the > correct history? Roman - correct is --full-history -- any simplification that makes it easy on your eyes *is* a simplification. And consumers that want to do nice user-friendly simplification like gitk does can hang off the data stream. > it's also possible to update it when merging/pulling new data. If that's what you want to do, you can prototype it with a hook on fetch and commit. That is definitely an area that hasn't been explored - what nicer (but expensive) views on the history we have can be afforded by pre-computing things on fetch and commit hooks. cheers, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 3:15 ` Martin Langhoff @ 2008-07-30 0:16 ` Roman Zippel 2008-07-30 0:25 ` Martin Langhoff ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-30 0:16 UTC (permalink / raw) To: Martin Langhoff; +Cc: Linus Torvalds, Tim Harper, git Hi, On Tue, 29 Jul 2008, Martin Langhoff wrote: > On Tue, Jul 29, 2008 at 2:59 PM, Roman Zippel <zippel@linux-m68k.org> wrote: > > Can we please get past this and look at what is required to produce the > > correct history? > > Roman - correct is --full-history -- any simplification that makes it > easy on your eyes *is* a simplification. And consumers that want to do > nice user-friendly simplification like gitk does can hang off the data > stream. I don't quite understand what you're trying to say. To avoid further confusion it maybe helps to specify a few of the terms: - full history graph: produced by "git-log --full-history --parents" - compact history graph: the full history graph without without any repeated merges, this is what my example script produces. - full simplified history: output of "git-log --full-history" - short simplified history: standard output of "git-log" The important part about the history graphs is that all commits are properly connected in it (i.e. all except the head commit have a child), This is needed to know if you don't just what want to know what happened, but also how it got merged, also any graphical interface needs it to produce a useful history graph. What the short simplified history is more pure laziness, it's fast and gets the most common cases right, but in order to do this it has to ignore part of the history. The full simplified history at least produces produces the full change history, but it lacks part of the merge history and it stills takes longer to generate. The point I'm trying to make is that the compact history graph has the potential to completely replace the simplified history. The only problem is that it needs a bit of cached extra information, then it can be as fast the short simplified history for the common case and it still can produce as much information as the full simplified history, thus you can still apply as much simplification as you want on top of it. Keep in mind that e.g. git-web is using the full simplified history, so what I'm offering also has the potential to improve git-web performance... > > it's also possible to update it when merging/pulling new data. > > If that's what you want to do, you can prototype it with a hook on > fetch and commit. That is definitely an area that hasn't been explored > - what nicer (but expensive) views on the history we have can be > afforded by pre-computing things on fetch and commit hooks. I already did the prototype, I know how to generate that information, the problem is to get that information to the various graphical interfaces. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 0:16 ` Roman Zippel @ 2008-07-30 0:25 ` Martin Langhoff 2008-07-30 0:32 ` Linus Torvalds 2008-07-30 8:36 ` Bizarre missing changes (git bug?) Jakub Narebski 2 siblings, 0 replies; 58+ messages in thread From: Martin Langhoff @ 2008-07-30 0:25 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Wed, Jul 30, 2008 at 12:16 PM, Roman Zippel <zippel@linux-m68k.org> wrote: > I already did the prototype afaict people around here are only interested if it can be done without losing the early-output niceness of current git-log. That it can be worked out in a "put it all in memory and work it in there" model is _not_ interesting for git. cheers, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 0:16 ` Roman Zippel 2008-07-30 0:25 ` Martin Langhoff @ 2008-07-30 0:32 ` Linus Torvalds 2008-07-30 0:48 ` Linus Torvalds 2008-07-30 8:36 ` Bizarre missing changes (git bug?) Jakub Narebski 2 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 0:32 UTC (permalink / raw) To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git On Wed, 30 Jul 2008, Roman Zippel wrote: > > What the short simplified history is more pure laziness No. Roman, you're an idiot who doesn't even _understand_ what you are talking about. Sadly, you then _think_ you are so smart that you then refuse to even consider the fact that others disagree, so you don't even read what they write. Go to my previous email in this thread. Do the example. Look at the simplified version. Ponder. It's not "pure lazyness" when you get the simplified version. It's actually a MORE USEFUL RESULT! The simplified version shows the minimal explanation of how things ended up the way they are, and that is damn useful. What you want is extra _clutter_ most of the time. It's really sad how you cannot get over your own prejudices here. So Roman. Go back, read my previous email in this thread. It's message ID is <alpine.LFD.1.10.0807291006070.3334@nehalem.linux-foundation.org> in case it helps you find it. Read it twice, or three times. Read it with the notion that maybe you didn't know best after all. Read it with the possibility that maybe there are smarter people than you, and people who have actually worked with git for several years. And if you can't do that, at least stop cc'ing me with your idiocy. To get to the meat of your email: > The point I'm trying to make is that the compact history graph has the > potential to completely replace the simplified history. The only problem > is that it needs a bit of cached extra information, then it can be as fast > the short simplified history for the common case and it still can produce > as much information as the full simplified history, thus you can still > apply as much simplification as you want on top of it. You're simply full of sh*t. You make two huge mistakes, and I'll spend another few minutes of my life trying to educate you one final more time, even though from every single indication I have so far, you are unable to learn simply because you think you already know the answer. Your two mistakes are: - your "only" problem is fundamental. It's unsolvable. Git history simplification isn't per-file or even per-directory. It's per-any-random-set-of-pathnames. You can't "cache" the simplified information, and it's not "a bit" of cached extra info. It's fundamentally a metric truckload of info. With a cache, you can make the performance of a repeated query go fast, but that's totally uninteresting. - But the other huge mistake you make is EVEN MORE STUPID, because it's so ironic. That magical output you want, and claim is so perfect, and point out "thus you can still apply as much simplification as you want on top of it"? You know what? It already _exists_! It's exactly that --full-history case. Can you not see that? That's exactly that --full-history --parents cases. It gives you the full information. You can simplify it to what you want, exactly because it did _not_ simplify things for you. I've even told you so, multiple times, when I suggested you try to do that simplification in "gitk". In other words, git has the two cases you want: the "extreme simplified history" (that is nice to see what really _mattered_, with no extra unnecessary duplicate history that didn't actually affect the end result), and the "full" history (ooh, I know, we could make a command line called "--full-history" to get the latter, so that people who wanted to see it all and perhaps distill it to something else could do so). And I've told you over and over what you should look at, and I've told you over and over that the default is actually the _useful_ case, and why. But you seem to refuse to listen. You just close your ears and repeat your mantra, even though people smarter than you have told you why it's done the way it's done. Stop stuffing your ears. Listen to what people tell you. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 0:32 ` Linus Torvalds @ 2008-07-30 0:48 ` Linus Torvalds 2008-07-30 23:56 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 0:48 UTC (permalink / raw) To: Roman Zippel; +Cc: Martin Langhoff, Tim Harper, git On Tue, 29 Jul 2008, Linus Torvalds wrote: > > - But the other huge mistake you make is EVEN MORE STUPID, because it's > so ironic. That magical output you want, and claim is so perfect, and > point out "thus you can still apply as much simplification as you want > on top of it"? You know what? It already _exists_! It's exactly that > --full-history case. Put in other terms: what you ask for can be fairly trivially done as a filter on the _current_ git output (preferably merged into the tool that shows it graphically in the first place), with absolutely no downside. In contrast, if somebody was really so _stupid_ as to go with your output format, then yes, he could further simplify it down to the current default format, but with a huge performance/interactivity downside. See? Your preferred format is not actually the "best" format. Not at all. Quite the reverse. Your preferred format is much better off being a secondary post-processing format exactly because it can be generated from one of the primary formats easily enough. But the reverse isn't true: the current primary formats cannot be generated from your preferred format without losing something important (performance). But I'll make you a deal: if you actually write the filter in C form, I can pretty much guarantee that we can easily add it as a new flag. It really should be pretty easy to integrate it into the revision parsing machinery alongside --topo-order, since it's really the same kind of operation. In fact, it's possible that the current --topo-order sorting could possibly be made to just do the simplification (conditionally, of course, since it has the latency problem). See the function void sort_in_topological_order(struct commit_list ** list, int lifo) in commit.c - that's where it would hook in. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 0:48 ` Linus Torvalds @ 2008-07-30 23:56 ` Junio C Hamano 2008-07-31 0:15 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Junio C Hamano @ 2008-07-30 23:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git Linus Torvalds <torvalds@linux-foundation.org> writes: > But the reverse isn't true: the current primary formats cannot be > generated from your preferred format without losing something important > (performance). > > But I'll make you a deal: if you actually write the filter in C form, I > can pretty much guarantee that we can easily add it as a new flag. It > really should be pretty easy to integrate it into the revision parsing > machinery alongside --topo-order, since it's really the same kind of > operation. I am not Roman, but so I do not know if I did what Roman wanted to, but here is a quick hack. "gitk --post-simplify -- kernel/printk.c" is slightly more readable than --full-history with this patch. -- >8 -- Subject: [PATCH] revision traversal: teach --post-simplify The --full-history traversal keeps all merges and non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with a simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. This replacement commit is defined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if its parents are replacement commits of its original parents. * If the commit is marked as TREESAME (i.e. it modifies the paths we are interested in), then the replacement commit is itself. IOW, the commit is not dropped from the final result. * Otherwise, we examine the parents of the commit. - If they replace to the same commit, because the commit we are looking at itself does not touch the interesting paths, we replace the commit we are looking at with the replacement commit of its parents. - If some of the parents replace to one commit, and some other parents replace to another different commit, the commit we are looking at needs to stay as a merge in the final result. The algorithm outlined above alone does not quite work; the reason is because "all parents are replaced by the same commit" rule is too strict. It needs to be relaxed to remove parents that are ancestor of some other parents, and that is why post_simplify_one() calls a rather expensive reduce_heads(). Signed-off-by: Junio C Hamano <gitster@pobox.com> --- revision.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++---------- revision.h | 1 + 2 files changed, 106 insertions(+), 20 deletions(-) diff --git a/revision.c b/revision.c index 3897fec..a843c42 100644 --- a/revision.c +++ b/revision.c @@ -1045,6 +1045,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--topo-order")) { revs->lifo = 1; revs->topo_order = 1; + } else if (!strcmp(arg, "--post-simplify")) { + revs->post_simplify = 1; + revs->topo_order = 1; + revs->simplify_history = 0; } else if (!strcmp(arg, "--date-order")) { revs->lifo = 0; revs->topo_order = 1; @@ -1378,6 +1382,105 @@ static void add_child(struct rev_info *revs, struct commit *parent, struct commi l->next = add_decoration(&revs->children, &parent->object, l); } +static int remove_duplicate_parents(struct commit *commit) +{ + struct commit_list **pp, *p; + int surviving_parents; + + /* Examine existing parents while marking ones we have seen... */ + pp = &commit->parents; + while ((p = *pp) != NULL) { + struct commit *parent = p->item; + if (parent->object.flags & TMP_MARK) { + *pp = p->next; + continue; + } + parent->object.flags |= TMP_MARK; + pp = &p->next; + } + /* count them while clearing the temporary mark */ + surviving_parents = 0; + for (p = commit->parents; p; p = p->next) { + p->item->object.flags &= ~TMP_MARK; + surviving_parents++; + } + return surviving_parents; +} + +static int post_simplify_one(struct commit *commit) +{ + struct commit_list *p; + int num_parents; + + for (p = commit->parents; p; p = p->next) + if (!p->item->util) + return 0; + + /* All of our parents know what they should be rewritten to */ + for (p = commit->parents; p; p = p->next) + p->item = p->item->util; + num_parents = remove_duplicate_parents(commit); + + if (1 < num_parents) { + struct commit_list *h = reduce_heads(commit->parents); + num_parents = commit_list_count(h); + free_commit_list(commit->parents); + commit->parents = h; + } + + /* + * We stand for ourselves if we are root, if we change the tree, + * or if we are a merge and our parents simplify to different + * commits. Otherwise we can be replaced by the commit our + * sole parent is replaced by. + */ + if (!num_parents || + !(commit->object.flags & TREESAME) || + (1 < num_parents)) + commit->util = commit; + else + commit->util = commit->parents->item->util; + + return 1; +} + +static void post_simplify(struct rev_info *revs) +{ + struct commit_list *list; + struct commit_list *yet_to_do, **tail; + + /* feed the list reversed */ + yet_to_do = NULL; + for (list = revs->commits; list; list = list->next) + commit_list_insert(list->item, &yet_to_do); + while (yet_to_do) { + list = yet_to_do; + yet_to_do = NULL; + tail = &yet_to_do; + while (list) { + struct commit *commit = list->item; + struct commit_list *next = list->next; + free(list); + list = next; + if (!post_simplify_one(commit)) + tail = &commit_list_insert(commit, tail)->next; + } + } + + /* clean up the result, removing the simplified ones */ + list = revs->commits; + revs->commits = NULL; + tail = &revs->commits; + while (list) { + struct commit *commit = list->item; + struct commit_list *next = list->next; + free(list); + list = next; + if (commit->util == commit) + tail = &commit_list_insert(commit, tail)->next; + } +} + static void set_children(struct rev_info *revs) { struct commit_list *l; @@ -1418,6 +1521,8 @@ int prepare_revision_walk(struct rev_info *revs) return -1; if (revs->topo_order) sort_in_topological_order(&revs->commits, revs->lifo); + if (revs->post_simplify) + post_simplify(revs); if (revs->children.name) set_children(revs); return 0; @@ -1450,26 +1555,6 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp } } -static void remove_duplicate_parents(struct commit *commit) -{ - struct commit_list **pp, *p; - - /* Examine existing parents while marking ones we have seen... */ - pp = &commit->parents; - while ((p = *pp) != NULL) { - struct commit *parent = p->item; - if (parent->object.flags & TMP_MARK) { - *pp = p->next; - continue; - } - parent->object.flags |= TMP_MARK; - pp = &p->next; - } - /* ... and clear the temporary mark */ - for (p = commit->parents; p; p = p->next) - p->item->object.flags &= ~TMP_MARK; -} - static int rewrite_parents(struct rev_info *revs, struct commit *commit) { struct commit_list **pp = &commit->parents; diff --git a/revision.h b/revision.h index f64e8ce..953e69b 100644 --- a/revision.h +++ b/revision.h @@ -41,6 +41,7 @@ struct rev_info { simplify_history:1, lifo:1, topo_order:1, + post_simplify:1, tag_objects:1, tree_objects:1, blob_objects:1, -- 1.6.0.rc1.29.gc4aca ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 23:56 ` Junio C Hamano @ 2008-07-31 0:15 ` Junio C Hamano 2008-07-31 0:30 ` Linus Torvalds 2008-07-31 8:17 ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano 2 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2008-07-31 0:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git Junio C Hamano <gitster@pobox.com> writes: > The idea is to compute, for each commit in the "full history" result set, > the commit that should replace it in the simplified history. This > replacement commit is defined as follows: > > * In any case, we first figure out the replacement commits of parents of > the commit we are looking at. The commit we are looking at is > rewritten as if its parents are replacement commits of its original > parents. > > * If the commit is marked as TREESAME (i.e. it modifies the paths we are > interested in), then the replacement commit is itself. IOW, the commit > is not dropped from the final result. A typo here. This comment should have said !TREESAME (the code is correct). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 23:56 ` Junio C Hamano 2008-07-31 0:15 ` Junio C Hamano @ 2008-07-31 0:30 ` Linus Torvalds 2008-07-31 8:17 ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano 2 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-31 0:30 UTC (permalink / raw) To: Junio C Hamano Cc: Roman Zippel, Martin Langhoff, Tim Harper, Git Mailing List, Johannes Sixt On Wed, 30 Jul 2008, Junio C Hamano wrote: > > I am not Roman, but so I do not know if I did what Roman wanted to, but > here is a quick hack. "gitk --post-simplify -- kernel/printk.c" is > slightly more readable than --full-history with this patch. .. and if by "slightly", you mean "a lot", then yes. Patch looks fine to me. I didn't look at the code logic very closely, but I suspect that it's actually hard to get the right answer with broken code, and the logic doesn't look broken. So Ack. The filter-branch thing should probably be taught about this at least as an option. I think it was Johannes Sixt that worried about that one. Added to cc. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2] revision traversal: show full history with merge simplification 2008-07-30 23:56 ` Junio C Hamano 2008-07-31 0:15 ` Junio C Hamano 2008-07-31 0:30 ` Linus Torvalds @ 2008-07-31 8:17 ` Junio C Hamano 2008-07-31 8:18 ` Junio C Hamano 2008-07-31 22:09 ` [PATCH v3-wip] " Junio C Hamano 2 siblings, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2008-07-31 8:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * Changes from the "quick hack" are: - When the history is bounded at the bottom, the v1 patch did not terminate, because it wanted to know the replacement for UNINTERESTING parents of commits on revs->commit list, but these parents were never processed. Oops. - The option implies rewrite_parents. I was tempted to make it imply "--parents" (which would make it always emit parent information as well), but didn't. - Toposort is still implied but it is done at the end. - The code is more heavily commented. - I do not think "--post-simplify" is particulary a good name, but I couldn't come up with a good one. To mark it clearly not ready for 'master', I changed the name to a meaningless word for now ;-) * Timings (best of 5 runs) $ git rev-list --parents --full-history --topo-order HEAD -- kernel/printk.c 3.75user 0.47system 0:04.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k $ git rev-list --parents --oyoyo HEAD -- kernel/printk.c 4.31user 0.06system 0:04.37elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k $ git rev-list --parents --full-history HEAD -- kernel/printk.c | head -n 200 0.16user 0.02system 0:00.18elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k revision.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++------- revision.h | 1 + 2 files changed, 152 insertions(+), 20 deletions(-) diff --git a/revision.c b/revision.c index 3897fec..3b59e02 100644 --- a/revision.c +++ b/revision.c @@ -1045,6 +1045,11 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--topo-order")) { revs->lifo = 1; revs->topo_order = 1; + } else if (!strcmp(arg, "--oyoyo")) { + revs->post_simplify = 1; + revs->rewrite_parents = 1; + revs->simplify_history = 0; + revs->limited = 1; } else if (!strcmp(arg, "--date-order")) { revs->lifo = 0; revs->topo_order = 1; @@ -1378,6 +1383,150 @@ static void add_child(struct rev_info *revs, struct commit *parent, struct commi l->next = add_decoration(&revs->children, &parent->object, l); } +static int remove_duplicate_parents(struct commit *commit) +{ + struct commit_list **pp, *p; + int surviving_parents; + + /* Examine existing parents while marking ones we have seen... */ + pp = &commit->parents; + while ((p = *pp) != NULL) { + struct commit *parent = p->item; + if (parent->object.flags & TMP_MARK) { + *pp = p->next; + continue; + } + parent->object.flags |= TMP_MARK; + pp = &p->next; + } + /* count them while clearing the temporary mark */ + surviving_parents = 0; + for (p = commit->parents; p; p = p->next) { + p->item->object.flags &= ~TMP_MARK; + surviving_parents++; + } + return surviving_parents; +} + +static struct commit_list **post_simplify_one(struct commit *commit, struct commit_list **tail) +{ + struct commit_list *p; + int cnt; + + /* + * We store which commit each one simplifies to in its util field. + * Have we handled this one? + */ + if (commit->util) + return tail; + + /* + * An UNINTERESTING commit simplifies to itself, so does a + * root commit. We do not rewrite parents of such commit + * anyway. + */ + if ((commit->object.flags & UNINTERESTING) || !commit->parents) { + commit->util = commit; + return tail; + } + + /* + * Do we know what commit all of our parents should be rewritten to? + * Otherwise we are not ready to rewrite this one yet. + */ + for (cnt = 0, p = commit->parents; p; p = p->next) { + if (!p->item->util) { + tail = &commit_list_insert(p->item, tail)->next; + cnt++; + } + } + if (cnt) + return tail; + + /* + * Rewrite our list of parents. + */ + for (p = commit->parents; p; p = p->next) + p->item = p->item->util; + cnt = remove_duplicate_parents(commit); + + /* + * It is possible that we are a merge and one side branch + * does not have any commit that touches the given paths; + * in such a case, the immediate parents will be rewritten + * to different commits. + * + * o----X X: the commit we are looking at; + * / / o: a commit that touches the paths; + * ---o----' + * + * Further reduce the parents by removing redundant parents. + */ + if (1 < cnt) { + struct commit_list *h = reduce_heads(commit->parents); + cnt = commit_list_count(h); + free_commit_list(commit->parents); + commit->parents = h; + } + + /* + * A commit simplifies to itself if it is a root, if it is + * UNINTERESTING, if it touches the given paths, or if it is a + * merge and its parents simplifies to more than one commits + * (the first two cases are already handled at the beginning of + * this function). + * + * Otherwise, it simplifies to what its sole parent simplifies to. + */ + if (!cnt || + (commit->object.flags & UNINTERESTING) || + !(commit->object.flags & TREESAME) || + (1 < cnt)) + commit->util = commit; + else + commit->util = commit->parents->item->util; + return tail; +} + +static void post_simplify(struct rev_info *revs) +{ + struct commit_list *list; + struct commit_list *yet_to_do, **tail; + + /* feed the list reversed */ + yet_to_do = NULL; + for (list = revs->commits; list; list = list->next) + commit_list_insert(list->item, &yet_to_do); + while (yet_to_do) { + list = yet_to_do; + yet_to_do = NULL; + tail = &yet_to_do; + while (list) { + struct commit *commit = list->item; + struct commit_list *next = list->next; + free(list); + list = next; + tail = post_simplify_one(commit, tail); + } + } + + /* clean up the result, removing the simplified ones */ + list = revs->commits; + revs->commits = NULL; + tail = &revs->commits; + while (list) { + struct commit *commit = list->item; + struct commit_list *next = list->next; + free(list); + list = next; + if (commit->util == commit) + tail = &commit_list_insert(commit, tail)->next; + } + + /* sort topologically at the end */ + sort_in_topological_order(&revs->commits, revs->lifo); +} + static void set_children(struct rev_info *revs) { struct commit_list *l; @@ -1418,6 +1567,8 @@ int prepare_revision_walk(struct rev_info *revs) return -1; if (revs->topo_order) sort_in_topological_order(&revs->commits, revs->lifo); + if (revs->post_simplify) + post_simplify(revs); if (revs->children.name) set_children(revs); return 0; @@ -1450,26 +1601,6 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp } } -static void remove_duplicate_parents(struct commit *commit) -{ - struct commit_list **pp, *p; - - /* Examine existing parents while marking ones we have seen... */ - pp = &commit->parents; - while ((p = *pp) != NULL) { - struct commit *parent = p->item; - if (parent->object.flags & TMP_MARK) { - *pp = p->next; - continue; - } - parent->object.flags |= TMP_MARK; - pp = &p->next; - } - /* ... and clear the temporary mark */ - for (p = commit->parents; p; p = p->next) - p->item->object.flags &= ~TMP_MARK; -} - static int rewrite_parents(struct rev_info *revs, struct commit *commit) { struct commit_list **pp = &commit->parents; diff --git a/revision.h b/revision.h index f64e8ce..953e69b 100644 --- a/revision.h +++ b/revision.h @@ -41,6 +41,7 @@ struct rev_info { simplify_history:1, lifo:1, topo_order:1, + post_simplify:1, tag_objects:1, tree_objects:1, blob_objects:1, -- 1.6.0.rc1.31.gf448e ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v2] revision traversal: show full history with merge simplification 2008-07-31 8:17 ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano @ 2008-07-31 8:18 ` Junio C Hamano 2008-07-31 22:30 ` Linus Torvalds 2008-07-31 22:09 ` [PATCH v3-wip] " Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2008-07-31 8:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git If you look at the output from the "kernel/printk.c" with this patch, you would notice that there still are somewhat meaningless merges shown in the history (e.g. scroll down to 185a257f2f73bcd89050ad02da5bedbc28fc43fa). The mainline side keeps making steady changes to the path, but the side branch that made tty_write_message available to others with b346671 ([PATCH] Export tty_write_message() for GFS2 quota code, 2006-01-16) keeps many "Merge from master" until it is merged back to the mainline, even after the earlier change is reverted by 02630a1 ([GFS2] Remove dependance on tty_write_message(), 2006-07-03). I wonder if we can do something clever to reduce these pointless (from the point of view of explaining kernel/printk.c's evolution, at least) merges from the output. This might be another example of the reason why it is a good thing that you keep teaching people: "On your xyzzy topic, you are doing xyzzy development, not xyzzy development plus random changes --- don't merge my tree into yours!", and we could dismiss these extra merges we see in the output as artifacts from a bad practice, but as long as we are spending extra cycles, it would be better if we could reduce such clutter. I am still undecided about the option name. The existing --full-history is "show history fully without simplifying the merge at all". This is "show history fully with merge simplification". Perhaps --simplify-merges? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2] revision traversal: show full history with merge simplification 2008-07-31 8:18 ` Junio C Hamano @ 2008-07-31 22:30 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-31 22:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git On Thu, 31 Jul 2008, Junio C Hamano wrote: > > If you look at the output from the "kernel/printk.c" with this patch, you > would notice that there still are somewhat meaningless merges shown in the > history (e.g. scroll down to 185a257f2f73bcd89050ad02da5bedbc28fc43fa). They're not really meaningless. Yes, they are pointless for the end result, but once you start showing that whole pointless branch they very much are needed for a complete view of the "shape" of history. The merges are real points on that branch where printk changed because it got updates from mainlines. So either you should have the full simplification (which only shows stuff that is really meaningful for the end result), or you need for those "pointless" merges to remain (because you show the changes that happened on side branches). I obviously believe that the full simplification is what you most often want, but the --post-simplify thing does make sense. (And yes, I agree that the name should be something else, and that --simplify-merges makes more sense. The "post-simplify" thing is an implementation issue, and doesn't describe what the effect is. And with your incremental one, even that isn't true). Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-07-31 8:17 ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano 2008-07-31 8:18 ` Junio C Hamano @ 2008-07-31 22:09 ` Junio C Hamano 2008-07-31 22:26 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2008-07-31 22:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git This one makes it incremental. It is not relative to v2 but on top of 'master'. The idea and the logic to find what parents to rewrite is the same as the previous one, but this one works incrementally as much as possible. When you have this topology, where commits with capital letters are the ones that change the paths you are interested in: A---M---o---C---D---o / / ---o---B (1) we can tell that the rightmost one 'o' is not we want to show, without digging any further than D; (2) we can show D after inspecting C without digging any further. C is the sole parent of D, and C itself is an interesting one, so D's parent will stay to be C and not its ancestor. (3) before showing C, we need to know what the rewritten parent of it would be; we need to dig down to M and notice that it has two parents that simplify to a different commit (both A and B touch the path we are interested in), so M simplifies to itself and it becomes the parent of C. IOW, we need to dig no further than A and B in order to show C. $ time sh -c 'git log --pretty=oneline --abbrev-commit \ --simplify-merges --parents \ -- kernel/printk.c | head -n 1' 5dfb66b... 1d9b9f6... c9272c4... Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd real 0m0.344s user 0m0.324s sys 0m0.020s The same query with 's/| head -n 1/>/dev/null' is more expensive. In fact it is much more expensive than the non-incremental one (v2), and about three times more expensive than non-limiting --full-history for explaining the history of kernel/printk.c. There must be opportunity to further optimize this, but I'd stop here for now, as you keep saying this is hard, and if I continue thinking about this any longer my head would explode ;-) --- revision.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- revision.h | 1 + 2 files changed, 103 insertions(+), 4 deletions(-) diff --git a/revision.c b/revision.c index 3897fec..9554a70 100644 --- a/revision.c +++ b/revision.c @@ -1045,6 +1045,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--topo-order")) { revs->lifo = 1; revs->topo_order = 1; + } else if (!strcmp(arg, "--simplify-merges")) { + revs->simplify_merges = 1; + revs->rewrite_parents = 1; + revs->simplify_history = 0; } else if (!strcmp(arg, "--date-order")) { revs->lifo = 0; revs->topo_order = 1; @@ -1450,9 +1454,10 @@ static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp } } -static void remove_duplicate_parents(struct commit *commit) +static int remove_duplicate_parents(struct commit *commit) { struct commit_list **pp, *p; + int surviving_parents; /* Examine existing parents while marking ones we have seen... */ pp = &commit->parents; @@ -1465,9 +1470,13 @@ static void remove_duplicate_parents(struct commit *commit) parent->object.flags |= TMP_MARK; pp = &p->next; } - /* ... and clear the temporary mark */ - for (p = commit->parents; p; p = p->next) + /* count them while clearing the temporary mark */ + surviving_parents = 0; + for (p = commit->parents; p; p = p->next) { p->item->object.flags &= ~TMP_MARK; + surviving_parents++; + } + return surviving_parents; } static int rewrite_parents(struct rev_info *revs, struct commit *commit) @@ -1536,6 +1545,89 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit) return commit_show; } +static void simplify_merges(struct rev_info *revs, struct commit *commit) +{ + struct commit_list *work = NULL; + + commit_list_insert(commit, &work); + while (!commit->util && work) { + struct commit *c; + struct commit_list *p; + int cnt; + + c = pop_commit(&work); + if (c->util) + continue; + if ((c->object.flags & UNINTERESTING) || !c->parents) { + c->util = c; + continue; + } + + /* + * Do we know what commit all of the parents of this + * should be rewritten to? Otherwise we are not ready + * to rewrite this one yet. + */ + for (cnt = 0, p = c->parents; p; p = p->next) { + if (!p->item->util) { + if (!cnt) + commit_list_insert(c, &work); + commit_list_insert(p->item, &work); + cnt++; + } + } + if (cnt) + continue; + + /* + * Rewrite the list of parents. + */ + for (p = c->parents; p; p = p->next) + p->item = p->item->util; + cnt = remove_duplicate_parents(c); + + /* + * It is possible that this is a merge and one side + * branch does not have any commit that touches the + * given paths; in such a case, the immediate parents + * will be rewritten to different commits if we do not + * reduce such a false merge of fast-forward parents. + * + * o----X X: the commit we are looking at; + * / / o: a commit that touches the paths; + * ---o----' + * + * Further reduce the parents by removing redundant + * parents. + */ + if (1 < cnt) { + struct commit_list *h = reduce_heads(c->parents); + cnt = commit_list_count(h); + free_commit_list(c->parents); + c->parents = h; + } + + /* + * A commit simplifies to itself if it is a root, if + * it is UNINTERESTING, if it touches the given paths, + * or if it is a merge and its parents simplifies to + * more than one commits (the first two cases are + * already handled at the beginning of this function). + * + * Otherwise, it simplifies to what its sole parent + * simplifies to. + */ + if (!cnt || + (c->object.flags & UNINTERESTING) || + !(c->object.flags & TREESAME) || + (1 < cnt)) + c->util = c; + else + c->util = c->parents->item->util; + } + free_commit_list(work); +} + static struct commit *get_revision_1(struct rev_info *revs) { if (!revs->commits) @@ -1570,8 +1662,14 @@ static struct commit *get_revision_1(struct rev_info *revs) case commit_error: return NULL; default: - return commit; + break; + } + if (revs->simplify_merges && !commit->util) { + simplify_merges(revs, commit); + if (commit->util != commit) + continue; } + return commit; } while (revs->commits); return NULL; } diff --git a/revision.h b/revision.h index f64e8ce..dfa06b5 100644 --- a/revision.h +++ b/revision.h @@ -41,6 +41,7 @@ struct rev_info { simplify_history:1, lifo:1, topo_order:1, + simplify_merges:1, tag_objects:1, tree_objects:1, blob_objects:1, ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-07-31 22:09 ` [PATCH v3-wip] " Junio C Hamano @ 2008-07-31 22:26 ` Linus Torvalds 2008-07-31 22:36 ` Junio C Hamano 2008-08-01 3:00 ` Junio C Hamano 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-31 22:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git On Thu, 31 Jul 2008, Junio C Hamano wrote: > > The same query with 's/| head -n 1/>/dev/null' is more expensive. In fact > it is much more expensive than the non-incremental one (v2), and about > three times more expensive than non-limiting --full-history for explaining > the history of kernel/printk.c. Hmm? Why is that, exactly? Does it walk over the same commit over and over and over again or something? Can you combine --simplify-merges and --topo-order to get a fast version again (since --topo-order will force a non-incrmental walk)? I have this suspicion (gut feel only, not anything else to back it up) that for any complex global history, you'll always end up having a lot of merges "live" and have a hard time getting a lot of early output. That may be why you get a fairly big delay before even the first commit: > $ time sh -c 'git log --pretty=oneline --abbrev-commit \ > --simplify-merges --parents \ > -- kernel/printk.c | head -n 1' > 5dfb66b... 1d9b9f6... c9272c4... Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd > > real 0m0.344s > user 0m0.324s > sys 0m0.020s >From your previous email: $ git rev-list --parents --full-history --topo-order HEAD -- kernel/printk.c 3.75user 0.47system 0:04.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k so that's less than 10% of the whole time, but it's still a _lot_ slower than the $ git rev-list --parents --full-history HEAD -- kernel/printk.c | head -n 200 0.16user 0.02system 0:00.18elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k and that was the first 200 commits, not just the first one. I bet you got the first one in about a tenth of that time - so I'm guessing 0.016s (also based on my own testing - it's below 0.01s here, but I'm willing to bet my machine is faster than yours is). So getting the first one with "--simplify-merges" was really a _lot_ slower. That said, I'm a huge beliver in the incremental approach - it just looks like this is potentially "just barely incremental" in practice. Of course, with a more linear history than the kernel, your approach probably works better. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-07-31 22:26 ` Linus Torvalds @ 2008-07-31 22:36 ` Junio C Hamano 2008-08-01 3:00 ` Junio C Hamano 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2008-07-31 22:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 31 Jul 2008, Junio C Hamano wrote: >> >> The same query with 's/| head -n 1/>/dev/null' is more expensive. In fact >> it is much more expensive than the non-incremental one (v2), and about >> three times more expensive than non-limiting --full-history for explaining >> the history of kernel/printk.c. > > Hmm? Why is that, exactly? Does it walk over the same commit over and over > and over again or something? > > Can you combine --simplify-merges and --topo-order to get a fast version > again (since --topo-order will force a non-incrmental walk)? Heh, nice try to make my head explode ;-) Not today, no, really, no... ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-07-31 22:26 ` Linus Torvalds 2008-07-31 22:36 ` Junio C Hamano @ 2008-08-01 3:00 ` Junio C Hamano 2008-08-01 3:48 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2008-08-01 3:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 31 Jul 2008, Junio C Hamano wrote: >> >> The same query with 's/| head -n 1/>/dev/null' is more expensive. In fact >> it is much more expensive than the non-incremental one (v2), and about >> three times more expensive than non-limiting --full-history for explaining >> the history of kernel/printk.c. > > Hmm? Why is that, exactly? Does it walk over the same commit over and over > and over again or something? It was even worse than that. The output from v3 is incorrect, as the place the new call is hooked into knows only that the commit in question is not UNINTERESTING, but hasn't inspected its parents, but the simplification logic needs to dig into the parent chain deep enough, which it does not do correctly using the proper simplification logic (i.e. add_parents_to_list()). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-08-01 3:00 ` Junio C Hamano @ 2008-08-01 3:48 ` Linus Torvalds 2008-08-01 7:50 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-08-01 3:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git On Thu, 31 Jul 2008, Junio C Hamano wrote: > > It was even worse than that. > > The output from v3 is incorrect Ok. I'm really not surprised. Incrementally is really hard. I'm reminded of all the problems we had with just the "trivial" issue of just knowing when to consider something uninteresting or not, that ended up depending on commit timestamps etc, and had problems with people having their clocks set incorrectly. Doing the ops once you have the full DAG is usually _trivial_ by comparison. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3-wip] revision traversal: show full history with merge simplification 2008-08-01 3:48 ` Linus Torvalds @ 2008-08-01 7:50 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2008-08-01 7:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Martin Langhoff, Tim Harper, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 31 Jul 2008, Junio C Hamano wrote: >> >> It was even worse than that. >> >> The output from v3 is incorrect > > Ok. I'm really not surprised. Incrementally is really hard. I'm reminded > of all the problems we had with just the "trivial" issue of just knowing > when to consider something uninteresting or not, that ended up depending > on commit timestamps etc, and had problems with people having their clocks > set incorrectly. > > Doing the ops once you have the full DAG is usually _trivial_ by > comparison. Surely. I wasn't productive tonight anyway, and I'll give up for now and keep the post-processing version in 'pu', perhaps queued in 'next' during the 1.6.0-rc period. Perhaps somebody cleverer than me will feel itchy enough to make an incremental version someday ;-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 0:16 ` Roman Zippel 2008-07-30 0:25 ` Martin Langhoff 2008-07-30 0:32 ` Linus Torvalds @ 2008-07-30 8:36 ` Jakub Narebski 2 siblings, 0 replies; 58+ messages in thread From: Jakub Narebski @ 2008-07-30 8:36 UTC (permalink / raw) To: Roman Zippel; +Cc: Martin Langhoff, Linus Torvalds, Tim Harper, git Roman Zippel <zippel@linux-m68k.org> writes: > I don't quite understand what you're trying to say. > To avoid further confusion it maybe helps to specify a few of the terms: > > - full history graph: produced by "git-log --full-history --parents" > - compact history graph: the full history graph without without any > repeated merges, this is what my example script produces. > - full simplified history: output of "git-log --full-history" > - short simplified history: standard output of "git-log" [...] > Keep in mind that e.g. git-web is using the full simplified history, so > what I'm offering also has the potential to improve git-web performance... The fact that gitweb is using --full-history for a 'history' view is a historical reason, backwards compatibility with the view that was shown before gitweb used "git rev-list [flags] -- <path>", see commit cdd4037d gitweb: optimize per-file history generation The rev-list command that is recent enough can filter commits based on paths they touch, so use it instead of generating the full list and limiting it by passing it with diff-tree --stdin. [jc: The patch originally came from Luben Tuikov but the it was corrupt, but it was short enough to be applied by hand. I added the --full-history to make the output compatible with the original while doing so.] Signed-off-by: Junio C Hamano <junkio@cox.net> Removing '--parents' was put later, to remove unnecessary merges from a view (there was long discussion on git mailing list about --full-history with and without --parents), in 208b2dff gitweb: We do longer need the --parents flag in rev-list. We only want to know the direct parents of a given commit object, these parents are available in the --header output of rev-list. If --parents is supplied with --full-history the output includes merge commits that aren't relevant. Signed-off-by: Robert Fitzsimons <robfitz@273k.net> Signed-off-by: Junio C Hamano <junkio@cox.net> Besides gitweb currently does not generate graphical history view, so '--parents' are unnecessary. But if it was done from the scratch, gitweb should definitely use simplified history, instead of what you call "full simplified history", perhaps with an option to use '--full-history' (there is infractructure in gitweb for adding extra options). (Nitpick: it is 'gitweb', not 'git-web'.) -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 2:59 ` Roman Zippel 2008-07-29 3:15 ` Martin Langhoff @ 2008-07-29 3:29 ` Linus Torvalds 2008-07-29 3:33 ` Linus Torvalds 2008-07-29 11:39 ` Roman Zippel 2008-07-29 5:31 ` Jeff King 2 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-29 3:29 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Tue, 29 Jul 2008, Roman Zippel wrote: > > Right now you're giving me the choice between a crappy incomplete history > or a crappy history full of useless information. That's it? As long as > your challenge involves being compared to crappy history, I'm not > interested. No, my challenges have not been about "crappy history" but about performance and about getting it right. The reason I pointed you to lib/vsprintf.c had nothing to do with crappiness, and had everythign to do with just picking a random example of something where you absolutely *HAVE* to simplify history. The fact that it simplifies to a linear one is totally immaterial. You continue to miss the point. Over and over AND OVER again! > Can we please get past this and look at what is required to produce the > correct history? I'm not even going to bother with this argument. You dismiss all my issues, and then you continue to talk about "correct", even though it isn't a correctness thing - it's a difference of opinion. Me, I *much* prefer the simplified history. That _is_ the correct one for me. And the sad part is, what you want is there. It's a command line switch away. You were told in the very first message what the switch was. If you don't want to use "--full-history", you can actually use "git whatchanged" instead of "git log", and it implies the switch without you even having to type it. So it's all there. Use it. Just don't bother adding me to the cc to your inane flames, because I'm fed up with the fact that you can't actually be bothered to read my emails, and just want to flame. And quite frankly, I've seen that behaviour from you before, when it comes to other things. So go away. Write the code. Come back with patches. In the meantime, we've told you what to do: use --full-history if you really want it. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 3:29 ` Linus Torvalds @ 2008-07-29 3:33 ` Linus Torvalds 2008-07-29 11:39 ` Roman Zippel 1 sibling, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-29 3:33 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Mon, 28 Jul 2008, Linus Torvalds wrote: > > And quite frankly, I've seen that behaviour from you before, when it comes > to other things. So go away. Write the code. Come back with patches. In > the meantime, we've told you what to do: use --full-history if you really > want it. Btw, if you really do end up wanting to actually do something about it, I can already tell you that trying to do so in "git log" isn't going to be useful. Do it in "gitk" instead, and make gitk simplify the --full-history output. I tried to explain to you the why part earlier (go back and look for "incremental" and "topo-sort"), but it all seemed to fly right by you, and you started repeating your ranting instead. IOW, put up or shut up. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 3:29 ` Linus Torvalds 2008-07-29 3:33 ` Linus Torvalds @ 2008-07-29 11:39 ` Roman Zippel 2008-07-29 12:00 ` David Kastrup 2008-07-29 15:50 ` Linus Torvalds 1 sibling, 2 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-29 11:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Mon, 28 Jul 2008, Linus Torvalds wrote: > You dismiss all my issues, and then you continue to talk about "correct", > even though it isn't a correctness thing - it's a difference of opinion. > Me, I *much* prefer the simplified history. That _is_ the correct one for > me. I'm not dismissing it, but your focus is on how to get this result. If the results were always the same, I wouldn't have a problem at all. That's why I'm trying to give you an example where the end result differs, how are we supposed to get to an agreement on _how_ to get the result, if we don't even agree on _what_ the result should be? > And quite frankly, I've seen that behaviour from you before, when it comes > to other things. What exact behaviour is that? That I dare to disagree with you? > So go away. Write the code. Come back with patches. If you knew me that well, you also knew that such threats don't work with me. In this case you know perfectly well, that I don't know the code as well as you, so without any help it would require a huge waste of time with the risk of rejection. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 11:39 ` Roman Zippel @ 2008-07-29 12:00 ` David Kastrup 2008-07-29 15:50 ` Linus Torvalds 1 sibling, 0 replies; 58+ messages in thread From: David Kastrup @ 2008-07-29 12:00 UTC (permalink / raw) To: git Roman Zippel <zippel@linux-m68k.org> writes: > On Mon, 28 Jul 2008, Linus Torvalds wrote: > >> So go away. Write the code. Come back with patches. > > If you knew me that well, you also knew that such threats don't work > with me. I'd like to add to Linus' suggested course of action the task "look up threat in a dictionary". > In this case you know perfectly well, that I don't know the code as > well as you, so without any help it would require a huge waste of time > with the risk of rejection. If _you_ need the functionality, you can easily keep it around in _your_ copy. That's what I do with a few patches that were not accepted here. So the "risk of rejection" has no real cost for you (apart from an occasional rebase on origin, which is cheap). Just for others. And of no-one else argues your case, then they probably don't mind. -- David Kastrup ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 11:39 ` Roman Zippel 2008-07-29 12:00 ` David Kastrup @ 2008-07-29 15:50 ` Linus Torvalds 2008-07-30 1:14 ` Roman Zippel 1 sibling, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-29 15:50 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Tue, 29 Jul 2008, Roman Zippel wrote: > > I'm not dismissing it, but your focus is on how to get this result. No, you misunderstand. My focus is really on one single thing: - performance with a smaller focus on the fact that I simply don't see how it's _possible_ to do better than our current all-or-nothing approach of simplification (eg either extreme simplification or none at all: nothing or --full-history). So here's my challenge again, which you seem to have TOTALLY MISSED. Make this be fast: time sh -c "git log <filename> | head" nothing else matters. If you can make that one be fast, I'm happy. And that "| head" is really very fundamentally important. The important thing from a performance standpoint is not how long the _whole_ "log" takes. The important thing is how fast it _feels_, and that is directly tied to how fast it starts outputting the data. Put another way: I _know_ how to simplify things. Trust me, Roman. That's not the problem. But doing it incrementally is really really hard, to the point that I actually believe that it is impossible to do. And doing it after-the-fact is simply not interesting. We could trivially (well, _fairly_ trivially) do it when we do the topology sort. But I have long long tried to teach people _not_ to do the topo sort inside the core git machinery, exactly because it is a horrid thing from an interactivity standpoint. In fact, you can see what I'm talking about by trying --topo-order in the above timing test. Really. Just _try_ it. And if you still don't understand what I'm talking about, I don't know what to say. > > And quite frankly, I've seen that behaviour from you before, when it comes > > to other things. > > What exact behaviour is that? That I dare to disagree with you? No. The fact that you like arguing _pointlessly_, and just being abrasive, without actually helping or understanding the big picture. I'm thinking back on the whole scheduler thing. You weren't arguing with _me_, but you had the same modus operandi. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 15:50 ` Linus Torvalds @ 2008-07-30 1:14 ` Roman Zippel 2008-07-30 1:32 ` Kevin Ballard 2008-07-30 1:49 ` Linus Torvalds 0 siblings, 2 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-30 1:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tim Harper, git Hi, On Tue, 29 Jul 2008, Linus Torvalds wrote: > On Tue, 29 Jul 2008, Roman Zippel wrote: > > > > I'm not dismissing it, but your focus is on how to get this result. > > No, you misunderstand. > > My focus is really on one single thing: > > - performance > > with a smaller focus on the fact that I simply don't see how it's > _possible_ to do better than our current all-or-nothing approach of > simplification (eg either extreme simplification or none at all: nothing > or --full-history). That's exactly what I'm not dismissing as you claim, but I've hit the problem where this approach simply produces crap, so I'm foremost interested in getting a useful result, only after that I'm interested in the performance (which I think is possible). > So here's my challenge again, which you seem to have TOTALLY MISSED. > > Make this be fast: > > time sh -c "git log <filename> | head" > > nothing else matters. If you can make that one be fast, I'm happy. I already explained it, but you simply dismissed it. It's possible, but it requires a bit of cached information (e.g. as part of the pack file, which is needed for decent performance anyway). > In fact, you can see what I'm talking about by trying --topo-order in the > above timing test. Please give me full example. gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with git-rev-list show no improvement either. > > > And quite frankly, I've seen that behaviour from you before, when it comes > > > to other things. > > > > What exact behaviour is that? That I dare to disagree with you? > > No. The fact that you like arguing _pointlessly_, and just being abrasive, > without actually helping or understanding the big picture. The problem is that your picture doesn't include my specific problem, I'm very interested in the big picture, but I'd like to be in it. > I'm thinking > back on the whole scheduler thing. You weren't arguing with _me_, but you > had the same modus operandi. Well, it seems I have talent for finding the special cases, e.g. last time I tested the scheduler it was performing twice as bad as the old scheduler on m68k. I've also seen cases where it sacrifices throughput for interactivity. Anyway, this is the wrong place for it anyway, the problem I'm hitting is these "good enough" solutions, which work in most situations, but fail in a few special situations, but nobody is interested to get these right unless your name is Linus. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 1:14 ` Roman Zippel @ 2008-07-30 1:32 ` Kevin Ballard 2008-07-30 1:49 ` Linus Torvalds 1 sibling, 0 replies; 58+ messages in thread From: Kevin Ballard @ 2008-07-30 1:32 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Jul 29, 2008, at 6:14 PM, Roman Zippel wrote: >> So here's my challenge again, which you seem to have TOTALLY MISSED. >> >> Make this be fast: >> >> time sh -c "git log <filename> | head" >> >> nothing else matters. If you can make that one be fast, I'm happy. > > I already explained it, but you simply dismissed it. It's possible, > but it > requires a bit of cached information (e.g. as part of the pack file, > which > is needed for decent performance anyway). As an outside observer, this argument is basically akin to "it's easy to fly, you just need some faerie dust". Basically, you're dismissing the entire complexity of the problem by saying "oh, that's easy, just use some cached data" without any proof that this would work, or any sample code, or really any evidence at all. Given that the path simplification can be arbitrarily complex (I can pass any set of paths I want), I don't believe that you can just use "a bit of cached information" for this. If you did rely on cached information, said information would probably be orders of magnitude larger than the object graph itself (for repos with lots of files). >> In fact, you can see what I'm talking about by trying --topo-order >> in the >> above timing test. > > Please give me full example. > gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't > show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments > with > git-rev-list show no improvement either. He's not saying it changes what commits are shown, he's saying it has a performance impact - topo order has to post-process the graph. For a quick demonstration, run `time sh -c 'git log | head'` vs `time sh -c 'git log --topo-order | head'`. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 1:14 ` Roman Zippel 2008-07-30 1:32 ` Kevin Ballard @ 2008-07-30 1:49 ` Linus Torvalds 1 sibling, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 1:49 UTC (permalink / raw) To: Roman Zippel; +Cc: Tim Harper, git On Wed, 30 Jul 2008, Roman Zippel wrote: > > > > time sh -c "git log <filename> | head" > > > > nothing else matters. If you can make that one be fast, I'm happy. > > I already explained it, but you simply dismissed it. It's possible, but it > requires a bit of cached information (e.g. as part of the pack file, which > is needed for decent performance anyway). Bzzt. Wrong. Try again. > > In fact, you can see what I'm talking about by trying --topo-order in the > > above timing test. > > Please give me full example. > gitk --topo-order kernel/printk.c shows no difference (e.g. it doesn't > show 02630a12c7f72fa294981c8d86e38038781c25b7), several experiments with > git-rev-list show no improvement either. Roman, what the f*ck is wrong with you? Let me repeat that thing one more time: you can see what I'm talking about by trying --topo-order in the above timing test. ^^^^^^^^^^^ The fact is, --topo-order is a post-processing thing, exactly the way your half-way simplification would be. It requires _all_ commits, and it requires them because we cannot guarantee that we output all children before the parents when there are multiple threads without a central clock (ie any distributed environment). So for --topo-order, we generate the whole history, and then we sort it. As a result, it has horrible interactivity behavior. Try it. Here's some random command lines, and the times: time git log --topo-order drivers/scsi/scsi_lib.c | head real 0m0.688s user 0m0.652s sys 0m0.036s and without: time git log drivers/scsi/scsi_lib.c | head real 0m0.033s user 0m0.024s sys 0m0.008s do you see the difference? They happen to output _exactly_ the same ten lines, but one of them takes the better part of a second (and that's on pretty much the fastest machine you can find right now - on a laptop with a slow disk and without things in cache, it would take many many seconds). The other one is instantaneous. Now, I realize that 0.033s vs 0.688s doesn't sound like a big deal, even though that's a 20x difference, but that 20x difference is a _really_ big deal when the machine is slower, or when "old history" isn't in the disk cache any more. For example, try doing the timings after flushing the disk caches to simulate cold-cache behavior. Do it with a slow disk. Or do it over NFS. Yes, even the "fast" case will actually be painfully slow (well, it is for me, people who are used to CVS probably think it's just "normal"). And yes, it will depend a lot on the file in question too. Obviously, if the first change is far back in history, it will be slow _regardless_, but I've at least personally found that in practice, you tend to look at logs of _recent_ things much much much more than you look at things that haven't changed lately. It will also depend a lot on whether you are packed or not. For example, if you are well packed, the pack-file IO locality is really really good, and the 20x slowdown is much less. I just tested with a laptop with a slow disk, and the --topo-order case was "only" 2.5x slower, almost certainly because the IO required to bring in the first part of the history ended up being a large portion of the total IO, and so the "whole history" case was not 20x slower, because there was not 20x more IO due to the good locality and the kernel doing readahead etc. But 2.5x slower is really bad, wouldn't you agree? We're not talking about a few percent here, we're talking about more than twice as long. It's very noticeable, especially when the end result was --topo-order: 29.8s, no topo-order 12.1s (Yeah, that wasn't a very realistic example, but on that same machine, once it's in the cache, it's 0.13s vs 1.6s: one is "instant", the other is very much a "wait for it" kind of thing.) THAT is the kind of performance difference you see. And trust me, it's a performance difference that you can really notice in real life. I'm not kidding you. Just try it: git log kernel/sched.c vs git log --topo-order kernel/sched.c and one is instant, the other one pauses before it starts showing something. One feels fast, the other feels slow. At the same time, if you actually time the _whole_ log, it's all exactly the same speed: [torvalds@nehalem linux]$ time git log --topo-order kernel/sched.c > /dev/null real 0m0.708s user 0m0.684s sys 0m0.020s [torvalds@nehalem linux]$ time git log kernel/sched.c > /dev/null real 0m0.703s user 0m0.672s sys 0m0.032s Notice? The cost of the topological sort itself is basically zero. But from an interactivity standpoint, it's _deadly_. And please note that here "--topo-sort" is just an example of a random "global history post-processing" thing. It's not that I want you to use the topological sort per se, it's just an example of the whole issue with _any_ post-factum operation. The topological sort is not expensive as a sort. What is expensive is that it needs to get the whole history to work. And also please notice that this is a huge scalability issue. "git log" should not become slower as a project gets more history. Sure, the full log will take longer to generate (because there's _more_ of it), but the top commits should always show up immediately. Again, if you have a filter (where "topological sort" is just an example of such a filter) that requires the full history to work, it simply _fundamentally_ cannot scale well. If very fundamentally will slow down with bigger history. > The problem is that your picture doesn't include my specific problem, I'm > very interested in the big picture, but I'd like to be in it. Roman, I've been trying to explain this "interactive" thing for _days_ now. That's the big picture. The whole "you have to be able to generate history incrementally" thing. First generating the whole global history, and then simplifying it, is simply not acceptable. It's too slow, and it doesn't scale. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 2:59 ` Roman Zippel 2008-07-29 3:15 ` Martin Langhoff 2008-07-29 3:29 ` Linus Torvalds @ 2008-07-29 5:31 ` Jeff King 2008-07-29 12:32 ` Roman Zippel 2 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2008-07-29 5:31 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Tue, Jul 29, 2008 at 04:59:01AM +0200, Roman Zippel wrote: > Right now you're giving me the choice between a crappy incomplete history > or a crappy history full of useless information. That's it? As long as > your challenge involves being compared to crappy history, I'm not > interested. If the solution should involve a switch "--correct-history" > or I have to wait for the result, I don't care, because it's the correct > history I want. As long as you're trying to sell me crappy history I'm not > buying it. > > Can we please get past this and look at what is required to produce the > correct history? You seem to be indicating here (and elsewhere in the thread) that there exists some history graph for which neither "git log" nor "git log --full-history" produces the output you want, but that there is some better output (even if it might take more time to compute). Perhaps I am just slow, but I haven't been able to figure out what that history is, or what the "correct" output should be. Can you try to state more clearly what it is you are looking for? -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 5:31 ` Jeff King @ 2008-07-29 12:32 ` Roman Zippel 2008-07-29 12:48 ` Olivier Galibert 2008-07-29 12:52 ` Jeff King 0 siblings, 2 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-29 12:32 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Tim Harper, git Hi, On Tue, 29 Jul 2008, Jeff King wrote: > > Can we please get past this and look at what is required to produce the > > correct history? > > You seem to be indicating here (and elsewhere in the thread) that there > exists some history graph for which neither "git log" nor "git log > --full-history" produces the output you want, but that there is some > better output (even if it might take more time to compute). > > Perhaps I am just slow, but I haven't been able to figure out what that > history is, or what the "correct" output should be. Can you try to state > more clearly what it is you are looking for? Most frequently this involves changes where the same change is merged twice. Another interesting example is kernel/printk.c where a change is added and later removed again before it's merged. With "git-log --full-history" you see these changes, but it lacks the necessary merges to produce a full graph. As consequence none of the graphical front ends produce a useful history graph. This problem now hits me now more seriously in a repository conversion, where it frequently happened, that changes were applied both locally and upstream, so that I have relatively many of these empty merges and the default git-log output is useless. --full-history is more of a workaround than a real solution and again the history graph in _all_ graphical front ends is useless. More generally this means in any kind of situation where you maintain your own repository and it takes a while until upstream accepts your changes, so that you frequently have duplicated changes (because upstream doesn't use git or doesn't pull directly), you have to be careful to get the right history out of git. The point is now that I think the problem is solvable even within Linus' constraints, so that git-log produces the right output by default and a workaround like --full-history isn't needed anymore. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 12:32 ` Roman Zippel @ 2008-07-29 12:48 ` Olivier Galibert 2008-07-29 12:52 ` Jeff King 1 sibling, 0 replies; 58+ messages in thread From: Olivier Galibert @ 2008-07-29 12:48 UTC (permalink / raw) To: Roman Zippel; +Cc: Jeff King, Linus Torvalds, Tim Harper, git On Tue, Jul 29, 2008 at 02:32:14PM +0200, Roman Zippel wrote: > Most frequently this involves changes where the same change is merged > twice. [...] You're not answering the question. You cite cases when you consider both the default and the full-history output incorrect from a usefulness point of view. The question is, what would the correct output be, from your point of view? What should be shown, and what shouldn't? OG. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 12:32 ` Roman Zippel 2008-07-29 12:48 ` Olivier Galibert @ 2008-07-29 12:52 ` Jeff King 2008-07-29 17:25 ` Linus Torvalds 2008-07-30 2:48 ` Roman Zippel 1 sibling, 2 replies; 58+ messages in thread From: Jeff King @ 2008-07-29 12:52 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Tue, Jul 29, 2008 at 02:32:14PM +0200, Roman Zippel wrote: > > Perhaps I am just slow, but I haven't been able to figure out what that > > history is, or what the "correct" output should be. Can you try to state > > more clearly what it is you are looking for? > > Most frequently this involves changes where the same change is merged > twice. Another interesting example is kernel/printk.c where a change is > added and later removed again before it's merged. I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane. I was really hoping for you to make your case as something like: 1. here is an ascii diagram of an actual history graph (or a recipe of git commands for making one) 2. here is what git-log (or gitk) produces for this history by default; and here is why it is not optimal (presumably some information it fails to convey) 3. here is what git-log (or gitk) with --full-history produces; and here is why it is not optimal (presumably because it is too messy) 4. here is what output I would like to see. Bonus points for "and here is an algorithm that accomplishes it." > The point is now that I think the problem is solvable even within Linus' > constraints, so that git-log produces the right output by default and a > workaround like --full-history isn't needed anymore. I think this is a separate issue. Even if you came up with some great new history simplification, it likely wouldn't become the _default_ right away anyway. So you need to: 1. produce a new simplification algorithm that is at least useful in _some_ contexts. Then this can be used when desired for those contexts. It almost doesn't matter how efficient it is, if it is providing results that are otherwise unavailable. A user can choose to take the performance hit to get those results. 2. If that algorithm doesn't provide worse output in any other contexts _and_ it has similar performance to the current default, then it can be considered for the default. But I haven't seen convincing evidence leading to step '1', so arguing about step '2' seems pointless. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 12:52 ` Jeff King @ 2008-07-29 17:25 ` Linus Torvalds 2008-07-30 1:50 ` Roman Zippel 2008-07-30 4:26 ` Jeff King 2008-07-30 2:48 ` Roman Zippel 1 sibling, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-29 17:25 UTC (permalink / raw) To: Jeff King; +Cc: Roman Zippel, Tim Harper, git On Tue, 29 Jul 2008, Jeff King wrote: > > I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane. Jeff, it _is_ sane. When Roman says it's "incorrect", he is just wrong. But it's true that "gitk kernel/printk.c" does simplify the history, and does so very aggressively. It does so very much by design, and has done it since pretty much day one. And it's a good thing - and it is "correct" - but it does mean that you may literally be missing things that are part of "history" but that didn't matter much. The most trivial way to show this is actually by making such a simple case that it's obvious what is going on. Do this: mkdir test-simplification cd test-simplification git init echo "Hi there" > file git add file git commit -m"Initial file" git branch other-branch echo "Hello" > file git add file git commit -m"Modified file" git checkout other-branch echo "Hello" > file git add file git commit -m"Another person modified the file identically" echo "This is a stupid example" > another-file git add another-file git commit -m"Add another file" git merge master Now, do these three things gitk gitk file gitk --full-history file and compare them. They all show _different_ histories. Which one is "correct"? They all are. It just depends on what you want to see. The "gitk file" history is the simplest one BY FAR, because it has very aggressively simplified history to the point where it tried to find the _simplest_ history that explains the current contents of 'file'[*] >From a practical standpoint, and from having used this a long time, I'd argue that the simple history is the one that you want 99.9% of all time. But not _always_. Sometimes, the things that got simplified away actually matter. It's rare, but it happens. For example, maybe you had a bug-fix that you _know_ you did, and it it doesn't show up in the simplified history. That really pisses you off, and it apparently really pisses Roman off that it can happen. But the fact is, that still doesn't mean that the simple history is "wrong" or even "incomplete". No, it's actually meaningful data in itself. If the bug-fix doesn't show in the simplified history, then that simply means that the bug-fix was not on a branch that could _possibly_ have mattered for the current contents. So once you are _aware_ of history simplification and are mentally able to accept it, the fact that history got simplified is actually just another tool. And that's why "-full-history" and "git whatchanged" exist. They are ways to start delving deeper - they shouldn't be the _default_ mode, but they are ways to show more information when the initial default simple mode turns out to show that something didn't even matter for the end result. And yes, there is a mid-way point between "aggressive simplification" (default) and "no simplification at all" (--full-history). It's more complex than either, and I do think it would be useful to have. It's what Roman wants, but as long as he thinks it's the _only_ correct answer, and refuses to face the performance issues, the discussion with Roman is kind of pointless. Linus [*] when I say "_simplest_ history", I do want to point out that the history simplification is always a "local optimization", and it doesn't try to check all possible paths: there can be other histories that are even simpler on a global scale. But in practice it is _one_ history of the file, and it's a history that is not "unnecessarily complicated" considering the simple heurstics for finding it. So think "local minima" instead of "global minima", and in practice the local one is pretty close to the global one, although there are obviously always extreme cases where the two can differ by a whole lot. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 17:25 ` Linus Torvalds @ 2008-07-30 1:50 ` Roman Zippel 2008-07-30 2:05 ` Linus Torvalds 2008-07-30 4:26 ` Jeff King 1 sibling, 1 reply; 58+ messages in thread From: Roman Zippel @ 2008-07-30 1:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Tim Harper, git Hi, On Tue, 29 Jul 2008, Linus Torvalds wrote: > Now, do these three things > > gitk > gitk file > gitk --full-history file > > and compare them. They all show _different_ histories. > > Which one is "correct"? They all are. It just depends on what you want to > see. > > The "gitk file" history is the simplest one BY FAR, because it has very > aggressively simplified history to the point where it tried to find the > _simplest_ history that explains the current contents of 'file'[*] It's "aggressively simplified" by not even bothering to look for more. "simplified" implies there is something more complex beforehand, but all it does is simple scan through the history as fast possible without bothering looking left or right. "simplified" implies to me it's something intentional, but this is more of an accidental optimization which happens to work in most situations and in the special cases it just picks a random change and hopes for the best. "git-log --full-history file" at least produces the full change history, but it has an performance impact and it doesn't produce a complete graph usable for graphical front ends. > >From a practical standpoint, and from having used this a long time, I'd > argue that the simple history is the one that you want 99.9% of all time. > But not _always_. Sometimes, the things that got simplified away actually > matter. It's rare, but it happens. > > For example, maybe you had a bug-fix that you _know_ you did, and it it > doesn't show up in the simplified history. That really pisses you off, and > it apparently really pisses Roman off that it can happen. But the fact is, > that still doesn't mean that the simple history is "wrong" or even > "incomplete". I gave more general examples. Tracking upstream source can produce this problem frequently. Another example are stable/unstable branches where the stable branch is occasionally merged into the unstable branch can produce this problem. > No, it's actually meaningful data in itself. If the bug-fix doesn't show > in the simplified history, then that simply means that the bug-fix was not > on a branch that could _possibly_ have mattered for the current contents. > > So once you are _aware_ of history simplification and are mentally able to > accept it, the fact that history got simplified is actually just another > tool. This is your _subjective_ interpretion of this problem, because it's not a problem for you, nobody else can possibly have this problem (or they just crazy). Even if I know about this limitation it still doesn't solve the problem, that _none_ of the graphical interfaces can show me a useful history graph of these situations. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 1:50 ` Roman Zippel @ 2008-07-30 2:05 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 2:05 UTC (permalink / raw) To: Roman Zippel; +Cc: Jeff King, Tim Harper, git On Wed, 30 Jul 2008, Roman Zippel wrote: > > > > The "gitk file" history is the simplest one BY FAR, because it has very > > aggressively simplified history to the point where it tried to find the > > _simplest_ history that explains the current contents of 'file'[*] > > It's "aggressively simplified" by not even bothering to look for more. Yes and no. It's aggressively simplified because that's the right output with the minimal unnecessary irrelevant information. It explains how the file came to a particular state, with the simplest possible self-consistent history. (Again, the caveat about "simplest possible" always beign a local minimization, not a global one). The fact that it also obviously involved less work (so git can do it faster, and with fewer disk and memory accesses) is a huge bonus, of course. Are you complaining about the fact that I'm smart, and I get the right result I want with less work and with a simpler algorithm? What's your point? > "simplified" implies there is something more complex beforehand, but all > it does is simple scan through the history as fast possible without > bothering looking left or right. You're just being stupid. It's not that it's not "bothering" looking left or right. It very much *does* bother to look left or right. But once it finds that one or the other explains the situation entirely, it then says "screw left, I already know that rigth gives me the information I want". In other words, it's doing the _smart_ thing. I don't understand why you complain about intelligence. It's *not* just looking at one single history. Look at gitk kernel/sched.c and notice that the simplified history is not linear. It tries to make it AS LINEAR AS POSSIBLE, BUT NO MORE. "Make everything as simple as possible, but not simpler." - Albert Einstein You seem to complain about the fact that it's doing that. That's stupid of you. > "simplified" implies to me it's something intentional, but this is more of > an accidental optimization which happens to work in most situations and in > the special cases it just picks a random change and hopes for the best. You're just crazy. There is nothing accidental there what-so-ever. > "git-log --full-history file" at least produces the full change history, > but it has an performance impact and it doesn't produce a complete graph > usable for graphical front ends. Umm. You have to add "--parents" if you want a full graph. Without that, you can never re-generate the graph anyway. And when you do that, it _does_ give all the commits needed to complete the picture. In other words, git (once again) is actually smarter than you, and does the right thing, and (once again) you complain about something that you just don't understand. > I gave more general examples. Tracking upstream source can produce this > problem frequently. Another example are stable/unstable branches where the > stable branch is occasionally merged into the unstable branch can produce > this problem. You call it a "problem", but you don't actually give any reason for calling it that. IT IS NOT A PROBLEM. It's very much by design, and it's because what you want. Use --full-history if you want the full history. > This is your _subjective_ interpretion of this problem, because it's not a > problem for you, nobody else can possibly have this problem (or they just > crazy). No, Roman. You're not crazy because you have some issue that I cannot understand. You're crazy because you make the same mistake over and over, and don't listen when people tell you what the mistake was. "Insanity is doing the same thing over and over again and expecting different results." - Various Please. People have told you where you go wrong. Many times. So why do you keep repeating it? Take the time to slow down, listen, and realize that you're on the wrong track, and that others really _have_ spent time and thought on this. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 17:25 ` Linus Torvalds 2008-07-30 1:50 ` Roman Zippel @ 2008-07-30 4:26 ` Jeff King 2008-07-30 4:52 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Jeff King @ 2008-07-30 4:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roman Zippel, Tim Harper, git On Tue, Jul 29, 2008 at 10:25:35AM -0700, Linus Torvalds wrote: > On Tue, 29 Jul 2008, Jeff King wrote: > > > > I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane. > > Jeff, it _is_ sane. When Roman says it's "incorrect", he is just wrong. I agree with you, btw. It is definitely correct and useful; however, I am curious if there is some "in between" level of simplification that might produce an alternate graph that has interesting features. And that is why I am trying to get Roman to lay out exactly what it is he wants. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 4:26 ` Jeff King @ 2008-07-30 4:52 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 4:52 UTC (permalink / raw) To: Jeff King; +Cc: Roman Zippel, Tim Harper, git On Wed, 30 Jul 2008, Jeff King wrote: > > I agree with you, btw. It is definitely correct and useful; however, I > am curious if there is some "in between" level of simplification that > might produce an alternate graph that has interesting features. And that > is why I am trying to get Roman to lay out exactly what it is he wants. Actually, I know what he wants, since I tried to describe it for the filter-branch discussion. It's really not that conceptually complex. Basically, the stupid model is to just do this: - start with --full-history - for each merge, look at both parents. If one parent leads directly to a commit that can be reached from the the other, just remove that parent as being redundant. And if that removal leads to a merge now becoming a non-merge, and it has no changes wrt its single remaining parent, remove the commit entirely (rewriting any parenthood to make the rest all stay together, of course) - repeat until you cannot do any more simplification (removing one commit can actually cause its children to now become targets for this simplification). and I suspect that (a) the stupid model is probably at least O(n^3) if done stupidly and O(n^2) with some modest amount of smarts (keeping a list of at least potential targets of simplification and expanding it only when actually simplifying), but that (b) you can concentrate on just the merges that the current optimizing algorithm would have removed, so 'n' is not the total number of commits, but at most the number of merges, and more likely actually just the number of trivial merges in that file, and finally (c) there is likely some smart and efficient graph minimization algorithm that is O(nlogn) or something. so I don't think it's likely to be hugely more expensive than the topo-sort is. All the real expense is in the same thing the topo-sort expense, namely in generating the list up-front. I bet googling for "minimal directed acyclic graph" will give pointers. And despite the fact that I've argued against Roman's world-view, I actually _do_ think it would be nice to have that third mode, the same way that we have --topo-order. It wouldn't be good for the _default_ view, but then neither is --full-history, so that's not a big argument. That said, I'd like to (again) repeat the caveat that it's probably best done in the tool that actally visualizes the mess - exactly for the same reason that I argued for the topological sort being done in gitk. It's very painful to have to wait for the first few commits to start appearing in the history window. Admittedly most of my work is actually done on machines that are pretty fast, but every once in a while I travel with a laptop. And more importantly, not everybody gets new hardware from Intel for testing even before the CPU has been released. So others will still appreciate incremental history updates, even if my machine might be fast enough (and my kernel tree always in the caches) that I myself could live with a synchronous version a-la --topo-order. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-29 12:52 ` Jeff King 2008-07-29 17:25 ` Linus Torvalds @ 2008-07-30 2:48 ` Roman Zippel 2008-07-30 3:20 ` Kevin Ballard ` (2 more replies) 1 sibling, 3 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-30 2:48 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Tim Harper, git Hi, On Tue, 29 Jul 2008, Jeff King wrote: > > > Perhaps I am just slow, but I haven't been able to figure out what that > > > history is, or what the "correct" output should be. Can you try to state > > > more clearly what it is you are looking for? > > > > Most frequently this involves changes where the same change is merged > > twice. Another interesting example is kernel/printk.c where a change is > > added and later removed again before it's merged. > > I glanced briefly over "gitk kernel/printk.c" and it looks pretty sane. > I was really hoping for you to make your case as something like: > > 1. here is an ascii diagram of an actual history graph (or a recipe of > git commands for making one) > 2. here is what git-log (or gitk) produces for this history by > default; and here is why it is not optimal (presumably some > information it fails to convey) > 3. here is what git-log (or gitk) with --full-history produces; and > here is why it is not optimal (presumably because it is too messy) > 4. here is what output I would like to see. Bonus points for "and here > is an algorithm that accomplishes it." For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and try to find it in the graphical outputs. Here is a bit better example than Linus gave: mkdir test cd test git init echo 1 > file1 echo a > file2 git add file1 file2 git commit -m "initial commit" git tag base git branch test1 base git checkout test1 echo 2 > file1 git commit -a -m "duplicate change 1" git branch test2 base git checkout test2 echo 2 > file1 git commit -a -m "duplicate change 2" git branch test3 base git checkout test3 echo b > file2 git commit -a -m "some other change" git checkout base git merge test1 git merge test2 git merge test3 Now compare the output of "git-log file1", "git-log --full-history file1" and "git-log --full-history --parents file1". You get either both merge commits or none, but only one of it is relevant to file1. The problem is that in practice "git-log --full-history --parents" produces way too much information to be useful right away. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 2:48 ` Roman Zippel @ 2008-07-30 3:20 ` Kevin Ballard 2008-07-30 3:21 ` Linus Torvalds 2008-07-30 4:23 ` Jeff King 2 siblings, 0 replies; 58+ messages in thread From: Kevin Ballard @ 2008-07-30 3:20 UTC (permalink / raw) To: Roman Zippel; +Cc: Jeff King, Linus Torvalds, Tim Harper, git On Jul 29, 2008, at 7:48 PM, Roman Zippel wrote: > For printk.c look for commit > 02630a12c7f72fa294981c8d86e38038781c25b7 and > try to find it in the graphical outputs. > Here is a bit better example than Linus gave: > > [snip] > > Now compare the output of "git-log file1", "git-log --full-history > file1" > and "git-log --full-history --parents file1". You get either both > merge > commits or none, but only one of it is relevant to file1. > > The problem is that in practice "git-log --full-history --parents" > produces way too much information to be useful right away. Output looks correct to me. And of course --full-history --parents gives lots of output - that's what it's for. You seem to believe that the appropriate output is, what, to display the initial commit, both commits that modified file1, and the first merge, yes? Can you please clarify the logic that states that the first merge commit should be shown but the second should not? -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 2:48 ` Roman Zippel 2008-07-30 3:20 ` Kevin Ballard @ 2008-07-30 3:21 ` Linus Torvalds 2008-07-30 3:35 ` Linus Torvalds 2008-07-30 4:23 ` Jeff King 2 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 3:21 UTC (permalink / raw) To: Roman Zippel; +Cc: Jeff King, Tim Harper, git On Wed, 30 Jul 2008, Roman Zippel wrote: > > For printk.c look for commit 02630a12c7f72fa294981c8d86e38038781c25b7 and > try to find it in the graphical outputs. Umm. Why would you? Yes, it's there, if you ask for --full-history. And no, I don't think --full-history is actually useful to humans - it's very much there as a "here's all the data" thing where you could have the tools post-process it, where often "post-processing" is actually just searching for it. And no, it's not there if you don't use --full-history. But now, instead of _complaining_ about this, I would suggest you think about why it's a _good_ thing, and why it's so useful? In other words, you're arriving at all your complaints from the wrong angle entirely, and because you have convinced yourself that things have to work a certain way, and then you're upset when they don't. But you should _unconvince_ yourself - and look at whether maybe all your initial preconceptions were perhaps totally wrong? Because they were. The reason that commit 02630a12c7f72fa294981c8d86e38038781c25b7 doesn't show up in the normal log when looking at kernel/printk.c is that it really doesn't exist as a _relevant_ part of history for the current state of that file. It exists only as a a side-branch for the GFS2 quota code that first adds a line +EXPORT_SYMBOL_GPL(tty_write_message); (in commit b346671fa196a), and then removes the line not long after (in that commit 02630a12c7f). And both of them go away (along with the whole side-branch), because they didn't end up mattering for the end result: they only ever existed in that side branch, and by the time it was merged back into the main branch, all changes had been undone. In other words, that change - in a VERY REAL WAY - never actually mattered for the current state of kernel/printk.c. And the history simplification sees that, and avoids showing the whole pointless branch. This is such an obviously _good_ thing that I really am surprised ay how you can continue to argue against it. Especially as the examples you give "for" your argument are so wonderful examples _against_ it. And yes, you can actually force gitk to show the state of that commit and thus force it to acknowledge that that state was relevant (although you won't necessarily force it to acknowledge that the relevance ties together with the final end result). You do that by just telling it that you're not just interested in HEAD, but in that commit too. So I would literally suggest that anybody interested in this subject really just do gitk kernel/printk.c & gitk HEAD 02630a12c7f72fa294981c8d86e38038781c25b7 kernel/printk.c & in the kernel, and now compare the two side-by-side. Notice where they differ (hint: look for the commit a0f1ccfd8d37457a6d8a9e01acebeefcdfcc306e - "[PATCH] lockdep: do not recurse in printk" - which is in both, and look below it). Now, which graph is the more relevant and understandable one from the standpoint of what the current state of kernel/printk.c is? Honestly now, Roman. Because if you were actually willing to see this as a _feature_ (which it very much is), you'd admit that it's a damn clever and useful one. But I suspect you have dug yourself so deep into a hole that you can't admit that even to yourself any more. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 3:21 ` Linus Torvalds @ 2008-07-30 3:35 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2008-07-30 3:35 UTC (permalink / raw) To: Roman Zippel; +Cc: Jeff King, Tim Harper, git On Tue, 29 Jul 2008, Linus Torvalds wrote: > > In other words, that change - in a VERY REAL WAY - never actually mattered > for the current state of kernel/printk.c. And the history simplification > sees that, and avoids showing the whole pointless branch. Btw, Roman, this is a really really important thing for you to realize. You need to realize that your "perfect" output really REALLY is totally inferior, if what you are actually interested in is "how did things get to be the way they are". It's a _feature_. It's not a bug. And it's a really good one. If side branches didn't matter for the contents of the file, those side branches simply don't matter, and showing them is just a distraction. Yes, you can ask for the history that doesn't matter for the end result. And yes, I acknowledge freely that it would be good to then have a separate cleanup phase to make that thing more readable. In fact, in the very first reply to you I pointed you to a thread where I said exactly that, long before this thread even started. But no, the current default isn't broken. No, it's not "lazy" either. No, it was not an "accident". And no, it's not "incorrect". And until you can see that (along with all the reasons I've outlined why your "fixed" approach is a total piece of sh*t from a performance angle), you're just being stupid. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-30 2:48 ` Roman Zippel 2008-07-30 3:20 ` Kevin Ballard 2008-07-30 3:21 ` Linus Torvalds @ 2008-07-30 4:23 ` Jeff King 2 siblings, 0 replies; 58+ messages in thread From: Jeff King @ 2008-07-30 4:23 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Wed, Jul 30, 2008 at 04:48:54AM +0200, Roman Zippel wrote: > Now compare the output of "git-log file1", "git-log --full-history file1" > and "git-log --full-history --parents file1". You get either both merge > commits or none, but only one of it is relevant to file1. Ah, I see. So if I understand you, you wanted to see something like: A--B \ \ C--D where A = initial commit B = duplicate change 1 C = duplicate change 2 D = merge branch 'test2' into HEAD where the simplification isn't as aggressive (you still see the duplicate commits and the merge), but we can get rid of the later merge between A and D because A is already an ancestor of D. So do you have a proposed set of simplification rules that will produce that output? -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 23:14 ` Roman Zippel 2008-07-27 23:18 ` Linus Torvalds @ 2008-07-27 23:25 ` Martin Langhoff 2008-07-28 1:29 ` Roman Zippel 1 sibling, 1 reply; 58+ messages in thread From: Martin Langhoff @ 2008-07-27 23:25 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, Tim Harper, git On Mon, Jul 28, 2008 at 11:14 AM, Roman Zippel <zippel@linux-m68k.org> wrote: > Why are you dismissing what I wrote without even giving it a second > thought? I didn't bother with the initial example, because it's so > simple, that it's no real challenge. I can't speak for anyone else, but you do have to keep in mind that a solution to this has to be rather fast - and I mean fast in git terms, not in scripting-language-fast terms. That you can do it Ruby - and I may be able to do it Perl - has little bearing on what can be done inside the git log machinery with a small performance penalty. cheers, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-27 23:25 ` Martin Langhoff @ 2008-07-28 1:29 ` Roman Zippel 0 siblings, 0 replies; 58+ messages in thread From: Roman Zippel @ 2008-07-28 1:29 UTC (permalink / raw) To: Martin Langhoff; +Cc: Linus Torvalds, Tim Harper, git Hi, On Mon, 28 Jul 2008, Martin Langhoff wrote: > On Mon, Jul 28, 2008 at 11:14 AM, Roman Zippel <zippel@linux-m68k.org> wrote: > > Why are you dismissing what I wrote without even giving it a second > > thought? I didn't bother with the initial example, because it's so > > simple, that it's no real challenge. > > I can't speak for anyone else, but you do have to keep in mind that a > solution to this has to be rather fast - and I mean fast in git terms, > not in scripting-language-fast terms. You also have to keep in mind, that I haven't really hacked git before, so I'm just trying to do something with the data I can somehow extract from it. I seriously didn't thought that anyone wouldn't understand that the code example was just a proof of concept. > That you can do it Ruby - and I may be able to do it Perl - has little > bearing on what can be done inside the git log machinery with a small > performance penalty. It also has to do with correctness, is performance more important than correctness? Part of the problem is, what is the correct history, as which it should be displayed via the various interfaces by default. bye, Roman ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Bizarre missing changes (git bug?) 2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper 2008-07-21 20:37 ` Linus Torvalds @ 2008-07-21 20:42 ` Alex Riesen 1 sibling, 0 replies; 58+ messages in thread From: Alex Riesen @ 2008-07-21 20:42 UTC (permalink / raw) To: Tim Harper; +Cc: git Tim Harper, Mon, Jul 21, 2008 22:26:06 +0200: > I ran into a strange issue that has left me scratching my head. > > I have a commit in my history, that does indeed show up in my branch, > named "sprint" > ... > > Any help or clues VERY much apperciated. Thanks! > Try looking at the history graph in gitk $ gitk --all -- app/controllers/events_controller.rb spec/fixtures/factors.yml It usually shows the history in a very understandable way and it can help to detect when the histories have diverged. ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2008-08-01 7:51 UTC | newest] Thread overview: 58+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-21 20:26 Bizarre missing changes (git bug?) Tim Harper 2008-07-21 20:37 ` Linus Torvalds 2008-07-21 22:53 ` Tim Harper 2008-07-21 22:55 ` Tim Harper [not found] ` <8C23FB54-A28E-4294-ABEA-A5766200768B@gmail.com> 2008-07-21 22:57 ` Linus Torvalds 2008-07-26 3:12 ` Roman Zippel 2008-07-26 19:58 ` Linus Torvalds 2008-07-27 17:50 ` Roman Zippel 2008-07-27 18:47 ` Linus Torvalds 2008-07-27 23:14 ` Roman Zippel 2008-07-27 23:18 ` Linus Torvalds 2008-07-28 0:00 ` Roman Zippel 2008-07-28 5:00 ` Linus Torvalds 2008-07-28 5:30 ` Linus Torvalds 2008-07-29 2:59 ` Roman Zippel 2008-07-29 3:15 ` Martin Langhoff 2008-07-30 0:16 ` Roman Zippel 2008-07-30 0:25 ` Martin Langhoff 2008-07-30 0:32 ` Linus Torvalds 2008-07-30 0:48 ` Linus Torvalds 2008-07-30 23:56 ` Junio C Hamano 2008-07-31 0:15 ` Junio C Hamano 2008-07-31 0:30 ` Linus Torvalds 2008-07-31 8:17 ` [PATCH v2] revision traversal: show full history with merge simplification Junio C Hamano 2008-07-31 8:18 ` Junio C Hamano 2008-07-31 22:30 ` Linus Torvalds 2008-07-31 22:09 ` [PATCH v3-wip] " Junio C Hamano 2008-07-31 22:26 ` Linus Torvalds 2008-07-31 22:36 ` Junio C Hamano 2008-08-01 3:00 ` Junio C Hamano 2008-08-01 3:48 ` Linus Torvalds 2008-08-01 7:50 ` Junio C Hamano 2008-07-30 8:36 ` Bizarre missing changes (git bug?) Jakub Narebski 2008-07-29 3:29 ` Linus Torvalds 2008-07-29 3:33 ` Linus Torvalds 2008-07-29 11:39 ` Roman Zippel 2008-07-29 12:00 ` David Kastrup 2008-07-29 15:50 ` Linus Torvalds 2008-07-30 1:14 ` Roman Zippel 2008-07-30 1:32 ` Kevin Ballard 2008-07-30 1:49 ` Linus Torvalds 2008-07-29 5:31 ` Jeff King 2008-07-29 12:32 ` Roman Zippel 2008-07-29 12:48 ` Olivier Galibert 2008-07-29 12:52 ` Jeff King 2008-07-29 17:25 ` Linus Torvalds 2008-07-30 1:50 ` Roman Zippel 2008-07-30 2:05 ` Linus Torvalds 2008-07-30 4:26 ` Jeff King 2008-07-30 4:52 ` Linus Torvalds 2008-07-30 2:48 ` Roman Zippel 2008-07-30 3:20 ` Kevin Ballard 2008-07-30 3:21 ` Linus Torvalds 2008-07-30 3:35 ` Linus Torvalds 2008-07-30 4:23 ` Jeff King 2008-07-27 23:25 ` Martin Langhoff 2008-07-28 1:29 ` Roman Zippel 2008-07-21 20:42 ` Alex Riesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).