* [PATCH] Avoid errors from git-rev-parse in gitweb blame @ 2008-06-03 10:46 Rafael Garcia-Suarez 2008-06-03 11:42 ` Lea Wiemann 2008-06-03 11:43 ` Jakub Narebski 0 siblings, 2 replies; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 10:46 UTC (permalink / raw) To: git; +Cc: Rafael Garcia-Suarez git-rev-parse will abort with an error when passed a non-existent revision spec, such as "deadbeef^" where deadbeef has no parent. Using the --revs-only parameter makes this error go away, while retaining functionality, keeping the web server error log nice and clean. Signed-off-by: Rafael Garcia-Suarez <rgarciasuarez@gmail.com> --- gitweb/gitweb.perl | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 55fb100..f3b4b24 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4226,9 +4226,9 @@ HTML esc_html($rev)); print "</td>\n"; } - open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^") + open (my $dd, "-|", git_cmd(), "rev-parse", '--revs-only', "$full_rev^") or die_error(undef, "Open git-rev-parse failed"); - my $parent_commit = <$dd>; + my $parent_commit = <$dd> || ''; close $dd; chomp($parent_commit); my $blamed = href(action => 'blame', -- 1.5.6.rc1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 10:46 [PATCH] Avoid errors from git-rev-parse in gitweb blame Rafael Garcia-Suarez @ 2008-06-03 11:42 ` Lea Wiemann 2008-06-03 11:43 ` Jakub Narebski 1 sibling, 0 replies; 40+ messages in thread From: Lea Wiemann @ 2008-06-03 11:42 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git Rafael Garcia-Suarez wrote: > git-rev-parse will abort with an error when passed a non-existent > revision spec, [...] > > - open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^") > + open (my $dd, "-|", git_cmd(), "rev-parse", '--revs-only', "$full_rev^") This is no formal objection, but it would be nice if you could at the same time add a comment to the code that explains this -- like "do not fail [or 'barf on stderr'] if there is no parent revision". Makes it easier to change it later, since "--revs-only" is not particularly obvious. :) -- Lea ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 10:46 [PATCH] Avoid errors from git-rev-parse in gitweb blame Rafael Garcia-Suarez 2008-06-03 11:42 ` Lea Wiemann @ 2008-06-03 11:43 ` Jakub Narebski 2008-06-03 12:03 ` Rafael Garcia-Suarez 2008-06-03 20:18 ` Luben Tuikov 1 sibling, 2 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 11:43 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov Cc-ed Luben Tuikov, author of this part. Rafael Garcia-Suarez <rgarciasuarez@gmail.com> writes: > git-rev-parse will abort with an error when passed a non-existent > revision spec, such as "deadbeef^" where deadbeef has no parent. > Using the --revs-only parameter makes this error go away, while > retaining functionality, keeping the web server error log nice > and clean. Thanks. This error wasn't detected earlier probably because 'blame' view is rarely enabled; and repo.or.cz gitweb which has 'blame' enabled IIRC use gitweb which is modified there, allowing incremental blame using AJAX. > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 55fb100..f3b4b24 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -4226,9 +4226,9 @@ git_blame2 > esc_html($rev)); > print "</td>\n"; > } > - open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^") > + open (my $dd, "-|", git_cmd(), "rev-parse", '--revs-only', "$full_rev^") > or die_error(undef, "Open git-rev-parse failed"); > - my $parent_commit = <$dd>; > + my $parent_commit = <$dd> || ''; > close $dd; > chomp($parent_commit); > my $blamed = href(action => 'blame', I'd rather remove this, correct it, or make it optional (this is very fork-heavy). But this patch is good as it is now... -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 11:43 ` Jakub Narebski @ 2008-06-03 12:03 ` Rafael Garcia-Suarez 2008-06-03 12:45 ` Jakub Narebski 2008-06-03 20:18 ` Luben Tuikov 1 sibling, 1 reply; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 12:03 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Luben Tuikov 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >> - open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^") >> + open (my $dd, "-|", git_cmd(), "rev-parse", '--revs-only', "$full_rev^") >> or die_error(undef, "Open git-rev-parse failed"); >> - my $parent_commit = <$dd>; >> + my $parent_commit = <$dd> || ''; >> close $dd; >> chomp($parent_commit); >> my $blamed = href(action => 'blame', > > I'd rather remove this, correct it, or make it optional (this is very > fork-heavy). Not sure how to do the same thing in pure perl. We could however cache the results of git-rev-parse, since the same rev is likely to appear many times in the list. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 12:03 ` Rafael Garcia-Suarez @ 2008-06-03 12:45 ` Jakub Narebski 2008-06-03 13:00 ` Rafael Garcia-Suarez 2008-06-03 14:24 ` Lea Wiemann 0 siblings, 2 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 12:45 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov On Tue, 3 June 2008, Rafael Garcia-Suarez wrote: > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >> Rafael Garcia-Suarez wrote: >>> >>> - open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^") >>> + open (my $dd, "-|", git_cmd(), "rev-parse", '--revs-only', "$full_rev^") >>> or die_error(undef, "Open git-rev-parse failed"); >>> - my $parent_commit = <$dd>; >>> + my $parent_commit = <$dd> || ''; >>> close $dd; >>> chomp($parent_commit); >>> my $blamed = href(action => 'blame', >> >> I'd rather remove this, correct it, or make it optional (this is very >> fork-heavy). > > Not sure how to do the same thing in pure Perl. I was thinking about extending git-blame porcelain format (and also incremental format, of course) by 'parents' (and perhaps 'original-parents') header... > We could however cache the results of git-rev-parse, since the same > rev is likely to appear many times in the list. ...but starting with cache of git-rev-parse results, or optionally allowing extended sha-1 syntax (including <hash>^) in hash* CGI parameters in gitweb would be a good idea. But as I wrote, I'm fine with the patch as it is now. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 12:45 ` Jakub Narebski @ 2008-06-03 13:00 ` Rafael Garcia-Suarez 2008-06-03 13:12 ` Jakub Narebski 2008-06-03 20:35 ` Luben Tuikov 2008-06-03 14:24 ` Lea Wiemann 1 sibling, 2 replies; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 13:00 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Luben Tuikov 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >>> I'd rather remove this, correct it, or make it optional (this is very >>> fork-heavy). >> >> Not sure how to do the same thing in pure Perl. > > I was thinking about extending git-blame porcelain format (and also > incremental format, of course) by 'parents' (and perhaps > 'original-parents') header... OK, I see. That would be nice. Also: currently taking "$full_rev^" directs the user to the parent commit, but it would be more user-friendly to point at the previous commit where the selected file was modified instead. >> We could however cache the results of git-rev-parse, since the same >> rev is likely to appear many times in the list. > > ...but starting with cache of git-rev-parse results, or optionally > allowing extended sha-1 syntax (including <hash>^) in hash* CGI > parameters in gitweb would be a good idea. > > But as I wrote, I'm fine with the patch as it is now. I've sent a new version (take 2) with caching. And comments, as Lea suggested :) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 13:00 ` Rafael Garcia-Suarez @ 2008-06-03 13:12 ` Jakub Narebski 2008-06-03 13:36 ` Rafael Garcia-Suarez 2008-06-03 20:35 ` Luben Tuikov 1 sibling, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 13:12 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov Dnia wtorek 3. czerwca 2008 15:00, Rafael Garcia-Suarez napisał: > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >> On Tue, 3 June 2008, Rafael Garcia-Suarez wrote: >> I was thinking about extending git-blame porcelain format (and also >> incremental format, of course) by 'parents' (and perhaps >> 'original-parents') header... > > OK, I see. That would be nice. Also: currently taking "$full_rev^" > directs the user to the parent commit, but it would be more > user-friendly to point at the previous commit where the selected file > was modified instead. That's what I meant by distinguishing between 'parents' and 'original-parents' (or 'rewritten-parents' and 'parents'): first are rewritten parents in history limited to specified file (with the addition of code movements and copying across files/filenames), second are original parents of a commit. For gitweb we would use the first set (I wonder what to do in the case of merge commit, i.e. more than one parent). >>> We could however cache the results of git-rev-parse, since the same >>> rev is likely to appear many times in the list. >> >> ...but starting with cache of git-rev-parse results, or optionally >> allowing extended sha-1 syntax (including <hash>^) in hash* CGI >> parameters in gitweb would be a good idea. >> >> But as I wrote, I'm fine with the patch as it is now. > > I've sent a new version (take 2) with caching. And comments, as Lea > suggested :) Nice. Thanks a lot. Ack, FWIW. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 13:12 ` Jakub Narebski @ 2008-06-03 13:36 ` Rafael Garcia-Suarez 2008-06-03 14:14 ` Jakub Narebski 0 siblings, 1 reply; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 13:36 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Luben Tuikov 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >> >> OK, I see. That would be nice. Also: currently taking "$full_rev^" >> directs the user to the parent commit, but it would be more >> user-friendly to point at the previous commit where the selected file >> was modified instead. > > That's what I meant by distinguishing between 'parents' and > 'original-parents' (or 'rewritten-parents' and 'parents'): first are > rewritten parents in history limited to specified file (with the > addition of code movements and copying across files/filenames), > second are original parents of a commit. > > For gitweb we would use the first set (I wonder what to do in the case > of merge commit, i.e. more than one parent). Currently that takes the left parent. Or something. Shameless plug : the sources for perl 5 are currently being kept in a perforce repository. There is a rough web interface to it at http://public.activestate.com/cgi-bin/perlbrowse with excellent blame log navigation features (including navigation against p4 integrations). Since we're going to move the official perl 5 vcs to git (many many thanks to Sam Vilain for that, BTW), I'm more or less trying to duplicate this blame log navigation in gitweb. So it might result in a few patches here :) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 13:36 ` Rafael Garcia-Suarez @ 2008-06-03 14:14 ` Jakub Narebski 2008-06-03 14:40 ` Rafael Garcia-Suarez 2008-06-03 21:03 ` Luben Tuikov 0 siblings, 2 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 14:14 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov, Sam Vilain On Tue, 3 June 2008, Rafael Garcia-Suarez wrote: > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >>> >>> OK, I see. That would be nice. Also: currently taking "$full_rev^" >>> directs the user to the parent commit, but it would be more >>> user-friendly to point at the previous commit where the selected file >>> was modified instead. >> >> That's what I meant by distinguishing between 'parents' and >> 'original-parents' (or 'rewritten-parents' and 'parents'): first are >> rewritten parents in history limited to specified file (with the >> addition of code movements and copying across files/filenames), >> second are original parents of a commit. >> >> For gitweb we would use the first set (I wonder what to do in the case >> of merge commit, i.e. more than one parent). > > Currently that takes the left parent. Or something. > > Shameless plug : the sources for perl 5 are currently being kept in a > perforce repository. There is a rough web interface to it at > http://public.activestate.com/cgi-bin/perlbrowse with excellent blame > log navigation features (including navigation against p4 > integrations). By the way, what is the difference between '<<' links and 'br' link in the above mentioned annotate/blame interface? I'd like to say that I prefer gitweb's marking blame by blocks, not by lines, and extra info on mouseover. But having blame navigation capability of perforce web interface would be really nice (I think "git gui blame" has something like this; I don't know about other tools like qgit, giggle, or ugit). > Since we're going to move the official perl 5 vcs to git (many many > thanks to Sam Vilain for that, BTW), BTW. how in your opinion Git compares to Perforce, both as a tool itself, and also about quality of companion tools such like gitweb or git-gui? > I'm more or less trying to > duplicate this blame log navigation in gitweb. So it might result in a > few patches here :) I think it would be really nice. Will you want to use git-diff-tree to mark differences from the version we came from (marked by 'hp', 'hpb' and 'fp' URI parameters), or would you rather extend git-blame? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:14 ` Jakub Narebski @ 2008-06-03 14:40 ` Rafael Garcia-Suarez 2008-06-03 14:56 ` Jakub Narebski 2008-06-03 21:03 ` Luben Tuikov 1 sibling, 1 reply; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 14:40 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Luben Tuikov, Sam Vilain 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >> Shameless plug : the sources for perl 5 are currently being kept in a >> perforce repository. There is a rough web interface to it at >> http://public.activestate.com/cgi-bin/perlbrowse with excellent blame >> log navigation features (including navigation against p4 >> integrations). > > By the way, what is the difference between '<<' links and 'br' link > in the above mentioned annotate/blame interface? "br" navigates to another branch from which this file has been integrated (in p4 speak.) > I'd like to say that I prefer gitweb's marking blame by blocks, not by > lines, and extra info on mouseover. But having blame navigation > capability of perforce web interface would be really nice (I think > "git gui blame" has something like this; I don't know about other > tools like qgit, giggle, or ugit). > >> Since we're going to move the official perl 5 vcs to git (many many >> thanks to Sam Vilain for that, BTW), > > BTW. how in your opinion Git compares to Perforce, both as a tool > itself, and also about quality of companion tools such like gitweb > or git-gui? I'm not using companion tools much, but I'm really impatient to switch to git. (I'm often working offline and applying patches from mailboxes. That already makes two good reasons for switching:) > I think it would be really nice. Will you want to use git-diff-tree > to mark differences from the version we came from (marked by 'hp', > 'hpb' and 'fp' URI parameters), or would you rather extend git-blame? I don't know. I'll look at git-diff-tree. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:40 ` Rafael Garcia-Suarez @ 2008-06-03 14:56 ` Jakub Narebski 2008-06-03 15:07 ` Rafael Garcia-Suarez 2008-06-03 21:09 ` Luben Tuikov 0 siblings, 2 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 14:56 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov, Sam Vilain Rafael Garcia-Suarez wrote: > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >>> Shameless plug : the sources for perl 5 are currently being kept in a >>> perforce repository. There is a rough web interface to it at >>> http://public.activestate.com/cgi-bin/perlbrowse with excellent blame >>> log navigation features (including navigation against p4 >>> integrations). >> >> By the way, what is the difference between '<<' links and 'br' link >> in the above mentioned annotate/blame interface? > > "br" navigates to another branch from which this file has been > integrated (in p4 speak.) Does it mark merge commits then? Or perhaps branch points? What does "branch from which this file has been integrated" mean in git speak (in the terms of DAG of commits)? If the history of a file looks like this ....*---*---A---M---C... / ....*---B-/ and the line comes from "evil merge" M git-blame would return M as blamed commit. If the line comes from one or the other branch, from commit A or B, it makes I think no difference to git-blame; git tries to be "branch agnostic" (no special meaning to first parent; well, besides rev~n notation and --first-parent walk option). I guess it is not the case in Perforce? [...] >> [...]. Will you want to use git-diff-tree >> to mark differences from the version we came from (marked by 'hp', >> 'hpb' and 'fp' URI parameters), or would you rather extend git-blame? > > I don't know. I'll look at git-diff-tree. What I meant here, would you plan on extending git-blame, or would you use patchset (textual) diff between revision we are at, and revision we came from. git-diff-tree just compares two trees (and have to have patch output explicitely enabled). Sorry for the confusion. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:56 ` Jakub Narebski @ 2008-06-03 15:07 ` Rafael Garcia-Suarez 2008-06-03 17:50 ` Jakub Narebski 2008-06-03 21:09 ` Luben Tuikov 1 sibling, 1 reply; 40+ messages in thread From: Rafael Garcia-Suarez @ 2008-06-03 15:07 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Luben Tuikov, Sam Vilain 2008/6/3 Jakub Narebski <jnareb@gmail.com>: >>> By the way, what is the difference between '<<' links and 'br' link >>> in the above mentioned annotate/blame interface? >> >> "br" navigates to another branch from which this file has been >> integrated (in p4 speak.) > > Does it mark merge commits then? Or perhaps branch points? What > does "branch from which this file has been integrated" mean in git > speak (in the terms of DAG of commits)? > > > If the history of a file looks like this > > ....*---*---A---M---C... > / > ....*---B-/ > > and the line comes from "evil merge" M git-blame would return M as > blamed commit. If the line comes from one or the other branch, from > commit A or B, it makes I think no difference to git-blame; git tries > to be "branch agnostic" (no special meaning to first parent; well, > besides rev~n notation and --first-parent walk option). I guess it > is not the case in Perforce? No, in perforce the branch you integrate changes from is always explicit. > [...] >>> [...]. Will you want to use git-diff-tree >>> to mark differences from the version we came from (marked by 'hp', >>> 'hpb' and 'fp' URI parameters), or would you rather extend git-blame? >> >> I don't know. I'll look at git-diff-tree. > > What I meant here, would you plan on extending git-blame, or would you > use patchset (textual) diff between revision we are at, and revision we > came from. git-diff-tree just compares two trees (and have to have > patch output explicitely enabled). Sorry for the confusion. I'm under the impression that extending git-blame is a more flexible solution. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 15:07 ` Rafael Garcia-Suarez @ 2008-06-03 17:50 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 17:50 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: git, Luben Tuikov, Sam Vilain On Tue, 3 Jun 2008, Rafael Garcia-Suarez wrote: > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: By the way, could you please try to not remove all but last quote attributions? It should be, I think, as simple as replying then removing unnecessary parts, instead of selecting parts you want reply to and then hitting reply. TIA >>>> By the way, what is the difference between '<<' links and 'br' link >>>> in the above mentioned annotate/blame interface? >>> >>> "br" navigates to another branch from which this file has been >>> integrated (in p4 speak.) >> >> Does it mark merge commits then? Or perhaps branch points? What >> does "branch from which this file has been integrated" mean in git >> speak (in the terms of DAG of commits)? >> >> >> If the history of a file looks like this >> >> ....*---*---A---M---C... >> / >> ....*---B-/ >> >> and the line comes from "evil merge" M git-blame would return M as >> blamed commit. If the line comes from one or the other branch, from >> commit A or B, it makes I think no difference to git-blame; git tries >> to be "branch agnostic" (no special meaning to first parent; well, >> besides rev~n notation and --first-parent walk option). I guess it >> is not the case in Perforce? > > No, in perforce the branch you integrate changes from is always > explicit. So, in git speak, it means that 'br' means that blamed commit (commit which brought current version of given line) is not in first-parent line, and '<<' means that commit is in --first-parent history of a file (taking into account code copying and movement... err, at least in git case...), doesn't it? >> [...] >>>> [...]. Will you want to use git-diff-tree >>>> to mark differences from the version we came from (marked by 'hp', >>>> 'hpb' and 'fp' URI parameters), or would you rather extend >>>> git-blame? >>> >>> I don't know. I'll look at git-diff-tree. >> >> What I meant here, would you plan on extending git-blame, or would >> you use patchset (textual) diff between revision we are at, and >> revision we came from. git-diff-tree just compares two trees (and >> have to have patch output explicitely enabled). Sorry for the >> confusion. > > I'm under the impression that extending git-blame is a more flexible > solution. I don't think that it is correct solution in this case. I'm not sure if it can even be done. What you have (what "annotated file view" in Perforce web interface has) is difference annotations (one sided side-bys side diff ;-)), something like Eclipse QuickDiff, or like word-diff (or "git diff --color-words") put _ON TOP_ of blame info. Generating such single pane in-file diff is orthogonal to generating blame info. I think it would be best solved using patchset (textual) diff output; if git-diff would support "context" and not only "unified" patch output it could be used there. What was I thinking when mentioning extending git-blame was "reblame", i.e. blaming only those lines which are different from some child version. But while this would be very useful for tools such like "git gui blame" or blameview, it just won't work well I guess for web application (unless caching everything, and generating blame diff from cached blame). As to extending git-blame --porcelain to output "parents <hash>..." header, it is better solution than "git rev-list --no-walk" used as a kind of git-rev-parse sequencer not only because it is one fork less (and blame has has this parent info anyway), but mainly that it better fits with the streaming flow of gitweb's git_blame2(). (I'll write about it more in separate letter). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:56 ` Jakub Narebski 2008-06-03 15:07 ` Rafael Garcia-Suarez @ 2008-06-03 21:09 ` Luben Tuikov 1 sibling, 0 replies; 40+ messages in thread From: Luben Tuikov @ 2008-06-03 21:09 UTC (permalink / raw) To: Rafael Garcia-Suarez, Jakub Narebski; +Cc: git, Sam Vilain --- On Tue, 6/3/08, Jakub Narebski <jnareb@gmail.com> wrote: > From: Jakub Narebski <jnareb@gmail.com> > Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame > To: "Rafael Garcia-Suarez" <rgarciasuarez@gmail.com> > Cc: git@vger.kernel.org, "Luben Tuikov" <ltuikov@yahoo.com>, "Sam Vilain" <sam@vilain.net> > Date: Tuesday, June 3, 2008, 7:56 AM > Rafael Garcia-Suarez wrote: > > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: > >>> Shameless plug : the sources for perl 5 are > currently being kept in a > >>> perforce repository. There is a rough web > interface to it at > >>> > http://public.activestate.com/cgi-bin/perlbrowse with > excellent blame > >>> log navigation features (including navigation > against p4 > >>> integrations). > >> > >> By the way, what is the difference between > '<<' links and 'br' link > >> in the above mentioned annotate/blame interface? > > > > "br" navigates to another branch from which > this file has been > > integrated (in p4 speak.) > > Does it mark merge commits then? Or perhaps branch points? > What > does "branch from which this file has been > integrated" mean in git > speak (in the terms of DAG of commits)? > > > If the history of a file looks like this > > ....*---*---A---M---C... > / > ....*---B-/ > > and the line comes from "evil merge" M git-blame > would return M as > blamed commit. If the line comes from one or the other > branch, from > commit A or B, it makes I think no difference to git-blame; > git tries > to be "branch agnostic" (no special meaning to > first parent; well, > besides rev~n notation and --first-parent walk option). I > guess it > is not the case in Perforce? The whole point of git_blame2() was to show which "previous" commit changed that line/segment and what the commit message was. This is important to me when data-mining code evolution, since often enough I'd like to recollect mine/other's intentions at the time of changing/adding the code/commit in question. > > [...] > >> [...]. Will you want to use git-diff-tree > >> to mark differences from the version we came from > (marked by 'hp', > >> 'hpb' and 'fp' URI parameters), or > would you rather extend git-blame? > > > > I don't know. I'll look at git-diff-tree. > > What I meant here, would you plan on extending git-blame, > or would you > use patchset (textual) diff between revision we are at, and > revision we > came from. git-diff-tree just compares two trees (and have > to have > patch output explicitely enabled). Sorry for the > confusion. I'd rather stray away from "git-diff-xyz", since there is no context. Context is found in the commit message which changed the line/segment. (By "context" I mean the human intention, most likely encoded in the commit message.) Thanks everyone! Luben ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:14 ` Jakub Narebski 2008-06-03 14:40 ` Rafael Garcia-Suarez @ 2008-06-03 21:03 ` Luben Tuikov 1 sibling, 0 replies; 40+ messages in thread From: Luben Tuikov @ 2008-06-03 21:03 UTC (permalink / raw) To: Rafael Garcia-Suarez, Jakub Narebski; +Cc: git, Sam Vilain --- On Tue, 6/3/08, Jakub Narebski <jnareb@gmail.com> wrote: > From: Jakub Narebski <jnareb@gmail.com> > Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame > To: "Rafael Garcia-Suarez" <rgarciasuarez@gmail.com> > Cc: git@vger.kernel.org, "Luben Tuikov" <ltuikov@yahoo.com>, "Sam Vilain" <sam@vilain.net> > Date: Tuesday, June 3, 2008, 7:14 AM > On Tue, 3 June 2008, Rafael Garcia-Suarez wrote: > > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: > >>> > >>> OK, I see. That would be nice. Also: currently > taking "$full_rev^" > >>> directs the user to the parent commit, but it > would be more > >>> user-friendly to point at the previous commit > where the selected file > >>> was modified instead. > >> > >> That's what I meant by distinguishing between > 'parents' and > >> 'original-parents' (or > 'rewritten-parents' and 'parents'): first > are > >> rewritten parents in history limited to specified > file (with the > >> addition of code movements and copying across > files/filenames), > >> second are original parents of a commit. > >> > >> For gitweb we would use the first set (I wonder > what to do in the case > >> of merge commit, i.e. more than one parent). > > > > Currently that takes the left parent. Or something. > > > > Shameless plug : the sources for perl 5 are currently > being kept in a > > perforce repository. There is a rough web interface to > it at > > http://public.activestate.com/cgi-bin/perlbrowse with > excellent blame > > log navigation features (including navigation against > p4 > > integrations). > > By the way, what is the difference between > '<<' links and 'br' link > in the above mentioned annotate/blame interface? > > I'd like to say that I prefer gitweb's marking > blame by blocks, not by > lines, and extra info on mouseover. Completely agree. > But having blame > navigation > capability of perforce web interface would be really nice > (I think > "git gui blame" has something like this; I > don't know about other > tools like qgit, giggle, or ugit). That was the intention of git_blame2()... myself just previously coming from perforce... So yes, long time ago, in a galaxy far, far away, I was using perforce for data mining of the evolution of the code, in order to deduct intention. And I had wanted the same capability in git, thus git_blame2 and commit 244a70e608204a515c214a11c43f3ecf7642533a. > BTW. how in your opinion Git compares to Perforce, both as > a tool > itself, and also about quality of companion tools such like > gitweb > or git-gui? What I did was prompted by what I had used with perforce, and on top of it, improving on it. So yes, I like git and gitweb. At the moment they give me what I need, and of course an edge over other SCMs is the distributed nature of git. Thanks! Luben > > > I'm more or > less trying to > > duplicate this blame log navigation in gitweb. So it > might result in a > > few patches here :) > > I think it would be really nice. Will you want to use > git-diff-tree > to mark differences from the version we came from (marked > by 'hp', > 'hpb' and 'fp' URI parameters), or would > you rather extend git-blame? > > -- > Jakub Narebski > Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 13:00 ` Rafael Garcia-Suarez 2008-06-03 13:12 ` Jakub Narebski @ 2008-06-03 20:35 ` Luben Tuikov 2008-06-03 21:31 ` Jakub Narebski 1 sibling, 1 reply; 40+ messages in thread From: Luben Tuikov @ 2008-06-03 20:35 UTC (permalink / raw) To: Jakub Narebski, Rafael Garcia-Suarez; +Cc: git --- On Tue, 6/3/08, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote: > From: Rafael Garcia-Suarez <rgarciasuarez@gmail.com> > Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame > To: "Jakub Narebski" <jnareb@gmail.com> > Cc: git@vger.kernel.org, "Luben Tuikov" <ltuikov@yahoo.com> > Date: Tuesday, June 3, 2008, 6:00 AM > 2008/6/3 Jakub Narebski <jnareb@gmail.com>: > >>> I'd rather remove this, correct it, or > make it optional (this is very > >>> fork-heavy). > >> > >> Not sure how to do the same thing in pure Perl. > > > > I was thinking about extending git-blame porcelain > format (and also > > incremental format, of course) by 'parents' > (and perhaps > > 'original-parents') header... > > OK, I see. That would be nice. Also: currently taking > "$full_rev^" > directs the user to the parent commit, but it would be more > user-friendly to point at the previous commit where the > selected file > was modified instead. The intention was that it shouldn't necessarily be the (strict) parent of the change (changed segment), since it may or may not have changed in the strict parent commit. The intention was that it "starts"/"opens" the parent commit so that "git" would start from there and find the actual change/commit where that line/segment has changed. And it has worked pretty fine for me when data-mining (something I do quite often) code evolution. My commit 244a70e608204a515c214a11c43f3ecf7642533a was really derived from a command line, which I had started to use quite often and had been "looking for" for quite some time. > >> We could however cache the results of > git-rev-parse, since the same > >> rev is likely to appear many times in the list. > > > > ...but starting with cache of git-rev-parse results, > or optionally > > allowing extended sha-1 syntax (including > <hash>^) in hash* CGI > > parameters in gitweb would be a good idea. > > > > But as I wrote, I'm fine with the patch as it is > now. > > I've sent a new version (take 2) with caching. And > comments, as Lea suggested :) Yes, hashing is good if it speeds up lookups without altering intended functionality. Thanks everyone! Luben ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 20:35 ` Luben Tuikov @ 2008-06-03 21:31 ` Jakub Narebski 2008-06-04 5:58 ` Junio C Hamano 0 siblings, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 21:31 UTC (permalink / raw) To: Luben Tuikov; +Cc: Rafael Garcia-Suarez, git, Lea Wiemann On Tue, 3 Jan 2008, Luben Tuikov wrote: > On Tue, 6/3/08, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote: >> 2008/6/3 Jakub Narebski <jnareb@gmail.com> wrote: >>> >>> I was thinking about extending git-blame porcelain format (and also >>> incremental format, of course) by 'parents' (and perhaps >>> 'original-parents') header... >> >> OK, I see. That would be nice. Also: currently taking "$full_rev^" >> directs the user to the parent commit, but it would be more >> user-friendly to point at the previous commit where the selected file >> was modified instead. > > The intention was that it shouldn't necessarily be the (strict) parent > of the change (changed segment), since it may or may not have changed > in the strict parent commit. The intention was that it > "starts"/"opens" the parent commit so that "git" would start from > there and find the actual change/commit where that line/segment has > changed. And it has worked pretty fine for me when data-mining > (something I do quite often) code evolution. > > My commit 244a70e608204a515c214a11c43f3ecf7642533a was really derived > from a command line, which I had started to use quite often and had > been "looking for" for quite some time. Let us assume for a bit that history is linear, and looks like this: ...*---A---*---*---b---.---D^---D---*---x where 'x' is starting point (revision we start running git-blame from), 'D' is revision given line is blamed on, 'D^' is parent of revision 'D', 'b' is previous commit in a given file history, and 'A' is previous commit which modifies given line of a given commit. It means that the history looks like below $ git rev-list x [...] D D^ . b [...] while history of a given file looks like this $ git rev-list x -- file [...] D b [...] Now for all commits in the b..D^ range (between D^ and b, including endpoints), given file has the same contents, and therefore 'blame' view would also look the same. That is why it works. The only problem I can see when blamed commit is merge commit; D^ means D^1, ehich mesna first parent. Now, I think that merge commit might be blamed _only_ if it was "evil merge" (change/line didn't came from any of parents). But this is quite rare situation; additionally the bug is not very visible; when clicking on link you would go to not correct view, but this not-correctness isn't obvious on the first glance. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 21:31 ` Jakub Narebski @ 2008-06-04 5:58 ` Junio C Hamano 2008-06-04 14:03 ` Jakub Narebski 2008-06-04 22:24 ` Luben Tuikov 0 siblings, 2 replies; 40+ messages in thread From: Junio C Hamano @ 2008-06-04 5:58 UTC (permalink / raw) To: Jakub Narebski; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann Jakub Narebski <jnareb@gmail.com> writes: >> The intention was that it shouldn't necessarily be the (strict) parent >> of the change (changed segment), since it may or may not have changed >> in the strict parent commit. The intention was that it >> "starts"/"opens" the parent commit so that "git" would start from >> there and find the actual change/commit where that line/segment has >> changed. And it has worked pretty fine for me when data-mining >> (something I do quite often) code evolution. Yes, but the current scheme breaks down in another way. When $full_rev added many lines to the file, and you are adding the link to for a line near the end of the file and such a line may not exist. This cannot be cheaply done even inside blame itself. Another breakage is even though $full_rev^ _may_ exist (iow, $full_rev might not be the root commit), the file being blamed may not exist there (iow $full_rev might have introduced the file). Instead of running "rev-parse $full_rev^", you would at least need to ask "rev-list -1 $full_rev^ -- $path" or something from the Porcelain layer, but unfortunately this is rather expensive. Because blame already almost knows if the commit the final blame lies on has a parent, it would be reasonably cheap to add that "parent or nothing" information to its --porcelain (and its --incremental) format if we wanted to. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-04 5:58 ` Junio C Hamano @ 2008-06-04 14:03 ` Jakub Narebski 2008-06-05 6:07 ` Junio C Hamano 2008-06-04 22:24 ` Luben Tuikov 1 sibling, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-04 14:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann On Wed, 4 Jun 2008, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: >> On Tue, 3 Jan 2008, Luben Tuikov wrote: >>> >>> The intention was that it shouldn't necessarily be the (strict) parent >>> of the change (changed segment), since it may or may not have changed >>> in the strict parent commit. The intention was that it >>> "starts"/"opens" the parent commit so that "git" would start from >>> there and find the actual change/commit where that line/segment has >>> changed. And it has worked pretty fine for me when data-mining >>> (something I do quite often) code evolution. > > Yes, but the current scheme breaks down in another way. When $full_rev > added many lines to the file, and you are adding the link to for a line > near the end of the file and such a line may not exist. This cannot be > cheaply done even inside blame itself. I think the scheme could be fixed by proposed belo git-blame porcelain format output extension. Can it be done cheaply? I don't know, generating extended info as described below should be cheap if we have equivalent of textual (patch) diff between commit blamed for given line, and its parent; actually what we need is more of 'context' diff than of default 'unified' diff. > Another breakage is even though $full_rev^ _may_ exist (iow, $full_rev > might not be the root commit), the file being blamed may not exist there > (iow $full_rev might have introduced the file). Instead of running > "rev-parse $full_rev^", you would at least need to ask "rev-list -1 > $full_rev^ -- $path" or something from the Porcelain layer, but > unfortunately this is rather expensive. Doesn't blame know revision graph for history of a given file already? But even without it (i.e. ony 'parents' header showing true, not rewritten parents) what we need is some info about pre-image for blamed line. We would need line number of the line in pre-image (or NUL if the page was added in blamed commit), and pre-image filename. I don't know if it could be done cheaply, and if it could be done simply; currently git-diff doesn't have "context diff" format output, and what I though about by pre-image line number requires finding if a line was added in a commit, or was modified in a commit. (If it was removed, it wouldn't be in final image and hence wouldn't be blamed; if it was moved, it wouldn't be blamed, as blame follows code movement). > Because blame already almost knows if the commit the final blame lies on > has a parent, it would be reasonably cheap to add that "parent or nothing" > information to its --porcelain (and its --incremental) format if we wanted > to. It would be easy to add 'parents' header, perhaps empty if we blame root commit, or a boundary commit (do we say 'boundary' then?) when doing revision limited blaming. >From what you write it wouldn't be easy to add "history of a given line begins here", or even "history of a given file begins there"... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-04 14:03 ` Jakub Narebski @ 2008-06-05 6:07 ` Junio C Hamano 2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Junio C Hamano @ 2008-06-05 6:07 UTC (permalink / raw) To: Jakub Narebski; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann Jakub Narebski <jnareb@gmail.com> writes: > On Wed, 4 Jun 2008, Junio C Hamano wrote: >> Yes, but the current scheme breaks down in another way. When $full_rev >> added many lines to the file, and you are adding the link to for a line >> near the end of the file and such a line may not exist. This cannot be >> cheaply done even inside blame itself. > > I think the scheme could be fixed by proposed belo git-blame porcelain > format output extension. For the line number information, I do not think so. Luben's "continue from the same line number in the parent commit" is a cute hack, but that strategy needs a qualifying comment "because hoping that the same line number in the parent commit might have something relevant would be better than stopping and giving up sometimes." It cannot reliably work (and it is not Luben's fault). But the #l<lno> fragment is just a hint to scroll to that point after restarting the blame from previous commit and jumping to the result, so it may not be too big a deal. Such a line may not exist in the resulting blame page, but that's Ok. >> Another breakage is even though $full_rev^ _may_ exist (iow, $full_rev >> might not be the root commit), the file being blamed may not exist there >> (iow $full_rev might have introduced the file). Instead of running >> "rev-parse $full_rev^", you would at least need to ask "rev-list -1 >> $full_rev^ -- $path" or something from the Porcelain layer, but >> unfortunately this is rather expensive. > > Doesn't blame know revision graph for history of a given file already? Not in the sense of "rev-list -2 $full_rev -- $path | sed -e 1d". It builds the graph as it digs deeper, and when it stops, it stopped digging, so all it knows at that point without further computation is $full_rev^@, and not "the previous commit that touched the path". But as Luben explained (and you drew a simple strand of pearls history to illustrate), immediate parent is just for the purpose of restarting the computation. >> Because blame already almost knows if the commit the final blame lies on >> has a parent, it would be reasonably cheap to add that "parent or nothing" >> information to its --porcelain (and its --incremental) format if we wanted >> to. > > It would be easy to add 'parents' header, perhaps empty if we blame > root commit, or a boundary commit (do we say 'boundary' then?) when > doing revision limited blaming. It shouldn't be too hard to say "parents of the blamed commit that has the corresponding preimage of the file is this", and the history does not have to be limited. You need to also handle "the commit that introduced the path" case just like "root" and "boundary" that we cannot dig further than that point. I'll follow this message up with two weatherballoon patches. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output 2008-06-05 6:07 ` Junio C Hamano @ 2008-06-05 6:09 ` Junio C Hamano 2008-06-06 9:22 ` Jakub Narebski 2008-06-05 6:09 ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano 2008-06-06 0:26 ` [PATCH] Avoid errors from git-rev-parse in gitweb blame Jakub Narebski 2 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-06-05 6:09 UTC (permalink / raw) To: Jakub Narebski; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann Both the --porcelain and --incremental format shared the same output format but implemented with two identical codepaths. This merges them into one shared function. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * This is just a preparatory clean-up patch (on jc/blame topic) builtin-blame.c | 65 ++++++++++++++++++++++++++---------------------------- 1 files changed, 31 insertions(+), 34 deletions(-) diff --git a/builtin-blame.c b/builtin-blame.c index 5c7546d..4b9c601 100644 --- a/builtin-blame.c +++ b/builtin-blame.c @@ -1479,6 +1479,34 @@ static void write_filename_info(const char *path) } /* + * Porcelain/Incremental format wants to show a lot of details per + * commit. Instead of repeating this every line, emit it only once, + * the first time each commit appears in the output. + */ +static int emit_one_suspect_detail(struct origin *suspect) +{ + struct commit_info ci; + + if (suspect->commit->object.flags & METAINFO_SHOWN) + return 0; + + suspect->commit->object.flags |= METAINFO_SHOWN; + get_commit_info(suspect->commit, &ci, 1); + printf("author %s\n", ci.author); + printf("author-mail %s\n", ci.author_mail); + printf("author-time %lu\n", ci.author_time); + printf("author-tz %s\n", ci.author_tz); + printf("committer %s\n", ci.committer); + printf("committer-mail %s\n", ci.committer_mail); + printf("committer-time %lu\n", ci.committer_time); + printf("committer-tz %s\n", ci.committer_tz); + printf("summary %s\n", ci.summary); + if (suspect->commit->object.flags & UNINTERESTING) + printf("boundary\n"); + return 1; +} + +/* * The blame_entry is found to be guilty for the range. Mark it * as such, and show it in incremental output. */ @@ -1493,22 +1521,7 @@ static void found_guilty_entry(struct blame_entry *ent) printf("%s %d %d %d\n", sha1_to_hex(suspect->commit->object.sha1), ent->s_lno + 1, ent->lno + 1, ent->num_lines); - if (!(suspect->commit->object.flags & METAINFO_SHOWN)) { - struct commit_info ci; - suspect->commit->object.flags |= METAINFO_SHOWN; - get_commit_info(suspect->commit, &ci, 1); - printf("author %s\n", ci.author); - printf("author-mail %s\n", ci.author_mail); - printf("author-time %lu\n", ci.author_time); - printf("author-tz %s\n", ci.author_tz); - printf("committer %s\n", ci.committer); - printf("committer-mail %s\n", ci.committer_mail); - printf("committer-time %lu\n", ci.committer_time); - printf("committer-tz %s\n", ci.committer_tz); - printf("summary %s\n", ci.summary); - if (suspect->commit->object.flags & UNINTERESTING) - printf("boundary\n"); - } + emit_one_suspect_detail(suspect); write_filename_info(suspect->path); maybe_flush_or_die(stdout, "stdout"); } @@ -1615,24 +1628,8 @@ static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent) ent->s_lno + 1, ent->lno + 1, ent->num_lines); - if (!(suspect->commit->object.flags & METAINFO_SHOWN)) { - struct commit_info ci; - suspect->commit->object.flags |= METAINFO_SHOWN; - get_commit_info(suspect->commit, &ci, 1); - printf("author %s\n", ci.author); - printf("author-mail %s\n", ci.author_mail); - printf("author-time %lu\n", ci.author_time); - printf("author-tz %s\n", ci.author_tz); - printf("committer %s\n", ci.committer); - printf("committer-mail %s\n", ci.committer_mail); - printf("committer-time %lu\n", ci.committer_time); - printf("committer-tz %s\n", ci.committer_tz); - write_filename_info(suspect->path); - printf("summary %s\n", ci.summary); - if (suspect->commit->object.flags & UNINTERESTING) - printf("boundary\n"); - } - else if (suspect->commit->object.flags & MORE_THAN_ONE_PATH) + if (emit_one_suspect_detail(suspect) || + (suspect->commit->object.flags & MORE_THAN_ONE_PATH)) write_filename_info(suspect->path); cp = nth_line(sb, ent->lno); -- 1.5.6.rc1.12.g7f718 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output 2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano @ 2008-06-06 9:22 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-06 9:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann On Tue, 5 Jun 2008, Junio C Hamano wrote: > Both the --porcelain and --incremental format shared the same output > format but implemented with two identical codepaths. This merges them > into one shared function. They have _almost_ the same, but not _exactly_ the same output: INCREMENTAL OUTPUT ------------------ [...] The output format is similar to the Porcelain format, but it does not contain the actual lines from the file that is being annotated. [...] . Unlike Porcelain format, the filename information is always given and terminates the entry: "filename" <whitespace-quoted-filename-goes-here> and thus it's really quite easy to parse for some line- and word-oriented parser (which should be quite natural for most scripting languages). But I guess this got addresssed in the patch, doesn't it? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format 2008-06-05 6:07 ` Junio C Hamano 2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano @ 2008-06-05 6:09 ` Junio C Hamano 2008-06-06 9:27 ` Jakub Narebski 2008-06-06 0:26 ` [PATCH] Avoid errors from git-rev-parse in gitweb blame Jakub Narebski 2 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-06-05 6:09 UTC (permalink / raw) To: Jakub Narebski; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann When the final blame is laid for a line to a <commit, path> pair, it also gives a "previous" information to --porcelain and --incremental output format. It gives the parent commit of the blamed commit, _and_ a path in that parent commit that corresponds to the blamed path --- in short, it is the origin that would have been blamed (or passed blame through) for the line _if_ the blamed commit did not change that line. This unfortunately makes sanity checking of refcount quite complex, so I ripped it out for now. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin-blame.c | 42 ++++++++++++------------------------------ 1 files changed, 12 insertions(+), 30 deletions(-) diff --git a/builtin-blame.c b/builtin-blame.c index 4b9c601..a46e402 100644 --- a/builtin-blame.c +++ b/builtin-blame.c @@ -82,6 +82,7 @@ static unsigned blame_copy_score; */ struct origin { int refcnt; + struct origin *previous; struct commit *commit; mmfile_t file; unsigned char blob_sha1[20]; @@ -123,6 +124,8 @@ static inline struct origin *origin_incref(struct origin *o) static void origin_decref(struct origin *o) { if (o && --o->refcnt <= 0) { + if (o->previous) + origin_decref(o->previous); free(o->file.ptr); free(o); } @@ -1280,6 +1283,10 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt) struct origin *porigin = sg_origin[i]; if (!porigin) continue; + if (!origin->previous) { + origin_incref(porigin); + origin->previous = porigin; + } if (pass_blame_to_parent(sb, origin, porigin)) goto finish; } @@ -1503,6 +1510,11 @@ static int emit_one_suspect_detail(struct origin *suspect) printf("summary %s\n", ci.summary); if (suspect->commit->object.flags & UNINTERESTING) printf("boundary\n"); + if (suspect->previous) { + struct origin *prev = suspect->previous; + printf("previous %s ", sha1_to_hex(prev->commit->object.sha1)); + write_name_quoted(prev->path, stdout, '\n'); + } return 1; } @@ -1866,36 +1878,6 @@ static void sanity_check_refcnt(struct scoreboard *sb) baa = 1; } } - for (ent = sb->ent; ent; ent = ent->next) { - /* Mark the ones that haven't been checked */ - if (0 < ent->suspect->refcnt) - ent->suspect->refcnt = -ent->suspect->refcnt; - } - for (ent = sb->ent; ent; ent = ent->next) { - /* - * ... then pick each and see if they have the the - * correct refcnt. - */ - int found; - struct blame_entry *e; - struct origin *suspect = ent->suspect; - - if (0 < suspect->refcnt) - continue; - suspect->refcnt = -suspect->refcnt; /* Unmark */ - for (found = 0, e = sb->ent; e; e = e->next) { - if (e->suspect != suspect) - continue; - found++; - } - if (suspect->refcnt != found) { - fprintf(stderr, "%s in %s has refcnt %d, not %d\n", - ent->suspect->path, - sha1_to_hex(ent->suspect->commit->object.sha1), - ent->suspect->refcnt, found); - baa = 2; - } - } if (baa) { int opt = 0160; find_alignment(sb, &opt); -- 1.5.6.rc1.12.g7f718 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format 2008-06-05 6:09 ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano @ 2008-06-06 9:27 ` Jakub Narebski 2008-06-06 15:17 ` Junio C Hamano 0 siblings, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-06 9:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann On Tue, 5 Jan 2008, Junio C Hamano wrote: > When the final blame is laid for a line to a <commit, path> pair, it also > gives a "previous" information to --porcelain and --incremental output > format. It gives the parent commit of the blamed commit, _and_ a path in > that parent commit that corresponds to the blamed path --- in short, it is > the origin that would have been blamed (or passed blame through) for the > line _if_ the blamed commit did not change that line. [...] > + if (suspect->previous) { > + struct origin *prev = suspect->previous; > + printf("previous %s ", sha1_to_hex(prev->commit->object.sha1)); > + write_name_quoted(prev->path, stdout, '\n'); > + } What happens if attributed (blamed) commit is "evil merge"? Would git-blame emit multiple "previous <sha-1 of commit> <filename>" headers? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format 2008-06-06 9:27 ` Jakub Narebski @ 2008-06-06 15:17 ` Junio C Hamano 2008-06-06 15:44 ` Jakub Narebski 0 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2008-06-06 15:17 UTC (permalink / raw) To: Jakub Narebski; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann Jakub Narebski <jnareb@gmail.com> writes: > What happens if attributed (blamed) commit is "evil merge"? > Would git-blame emit multiple "previous <sha-1 of commit> <filename>" > headers? Read the code. There is only one previous pointer in each origin. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format 2008-06-06 15:17 ` Junio C Hamano @ 2008-06-06 15:44 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-06 15:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > >> What happens if attributed (blamed) commit is "evil merge"? >> Would git-blame emit multiple "previous <sha-1 of commit> <filename>" >> headers? > > Read the code. There is only one previous pointer in each origin. Ah. True. So the question now is: how git-blame choses one parent if commit which is blamed is an evil merge commit? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-05 6:07 ` Junio C Hamano 2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano 2008-06-05 6:09 ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano @ 2008-06-06 0:26 ` Jakub Narebski 2 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-06 0:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luben Tuikov, Rafael Garcia-Suarez, git, Lea Wiemann On Tue, 5 Jan 2008, Junio C Hamano <gitster@pobox.com> wrote: > Jakub Narebski <jnareb@gmail.com> writes: >> On Wed, 4 Jun 2008, Junio C Hamano wrote: >>> >>> Yes, but the current scheme breaks down in another way. When $full_rev >>> added many lines to the file, and you are adding the link to for a line >>> near the end of the file and such a line may not exist. This cannot be >>> cheaply done even inside blame itself. >> >> I think the scheme could be fixed by proposed below git-blame porcelain >> format output extension. > > For the line number information, I do not think so. I think I have not made myself clear enough. In the proposed additional blame format extension "previous" header (or "pre-image" header, or sth like that) would contain pre-image (before change) line number information, or information that line was added in blamed commit. > Luben's "continue from the same line number in the parent commit" is a > cute hack, but that strategy needs a qualifying comment "because hoping > that the same line number in the parent commit might have something > relevant would be better than stopping and giving up sometimes." It > cannot reliably work (and it is not Luben's fault). Therefore my proposal. I'm not sure though if it can be done cheaply. And Luben's idea is good enough in most cases; as a hint is definitely good enough. > But the #l<lno> fragment is just a hint to scroll to that point after > restarting the blame from previous commit and jumping to the result, so it > may not be too big a deal. Such a line may not exist in the resulting > blame page, but that's Ok. Let me give an example on how I visualized proposed "pre-image" header extension would work. Assume that we started from commit 'S' and commit 'B' is to be blamed for the line in question. Let's us assume that commit 'B' has only one parent, and that "context" diff between B^1 and B is available to blame. For an example, let's use [modified] example from GNU diff documentation (info): diff --git-context a/lao-tzu b/lao-tzu index 55fb100..198772c 100644 *** a/lao-tzu --- b/lao-tzu *************** *** 1,7 **** - The Way that can be told of is not the eternal Way; - The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; ! The Named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, --- 1,6 ---- The Nameless is the origin of Heaven and Earth; ! The named is the mother of all things. ! Therefore let there always be non-being, so we may see their subtlety, And let there always be being, *************** *** 9,11 **** --- 8,13 ---- The two are the same, But after they are produced, But after they are produced, they have different names. + They both may be called deep and profound. + Deeper and more profound, + The door of all subtleties! First example: lets assume that we want find blame (annotation) for the following line: "The named is the mother of all things." Assuming that block of commonly blamed lines begin with given line, current blame output would look like the following: <sha-1 of commit 'B'> <current-lineno> 2 <block-size> author A U Thor author-mail <author@example.com> author-time 1150613103 author-tz -0700 committer C O Mitter committer-mail <committer@example.com> committer-time 1150690754 committer-tz -0700 filename lao-tzu summary Be even more cryptic The named is the mother of all things. What I wanted to add was the following header parents <sha-1 of commit 'B^1'> pre-image 2 4 lao-tzu where 2 is the line number in the commit given line is attributed to, where 4 is the line number of _corresponding_ line in pre-image (before change that gave examined line current form), and 'lao-tzu' is the name of file of pre-image for this line. Second example: lets assume that we want find blame (annotation) for the following line: "The door of all subtleties!" This time the line was added in the commit it is attributed to (commit blamed for this line), so there is no corresponding pre-image line. Extra headers would now look like the following: parents <sha-1 of commit 'B^1'> pre-image 13 - lao-tzu Of course 'parents' and 'pre-image' headers can be joined together in the 'previous' header you proposed. >>> Another breakage is even though $full_rev^ _may_ exist (iow, $full_rev >>> might not be the root commit), the file being blamed may not exist there >>> (iow $full_rev might have introduced the file). Instead of running >>> "rev-parse $full_rev^", you would at least need to ask "rev-list -1 >>> $full_rev^ -- $path" or something from the Porcelain layer, but >>> unfortunately this is rather expensive. >> >> Doesn't blame know revision graph for history of a given file already? > > Not in the sense of "rev-list -2 $full_rev -- $path | sed -e 1d". It > builds the graph as it digs deeper, and when it stops, it stopped digging, > so all it knows at that point without further computation is $full_rev^@, > and not "the previous commit that touched the path". > > But as Luben explained (and you drew a simple strand of pearls history to > illustrate), immediate parent is just for the purpose of restarting the > computation. What I worry about is what happens in the (rare I think) case when _merge_ commit is blamed, and firs-parent leg is simplified in the "per-file" history. For example if git-blame output for given line looks like below: 64625efeb1f216c3811230845bb519123ea0ddc5 2 2 1 author Jakub Narebski author-mail <jnareb@gmail.com> author-time 1212711688 author-tz +0200 committer Jakub Narebski committer-mail <jnareb@gmail.com> committer-time 1212711688 committer-tz +0200 filename foo summary Merge branch 'b' (Fullstop _and_ capitalization) Second line. and "git show 64625efeb1f216c3811230845bb519123ea0ddc5" is: commit 64625efeb1f216c3811230845bb519123ea0ddc5 Merge: 288f63a... ec70d8b... Author: Jakub Narebski <jnareb@gmail.com> Date: Fri Jun 6 02:21:28 2008 +0200 Merge branch 'b' (Fullstop _and_ capitalization) diff --cc foo index 11c1e59,2636666..888af51 --- a/foo +++ b/foo @@@ -1,3 -1,3 +1,3 @@@ First line - second line. -Second line ++Second line. Third line. What should be "previous" line then? Or perhaps there should be _two_ "previous" lines. >>> Because blame already almost knows if the commit the final blame lies on >>> has a parent, it would be reasonably cheap to add that "parent or nothing" >>> information to its --porcelain (and its --incremental) format if we wanted >>> to. >> >> It would be easy to add 'parents' header, perhaps empty if we blame >> root commit, or a boundary commit (do we say 'boundary' then?) when >> doing revision limited blaming. > > It shouldn't be too hard to say "parents of the blamed commit that has the > corresponding preimage of the file is this", and the history does not have > to be limited. You need to also handle "the commit that introduced the > path" case just like "root" and "boundary" that we cannot dig further than > that point. Errr... do your code deal with that case (no path before blamed commit)? > I'll follow this message up with two weatherballoon patches. Thanks a lot. It looks like step in good direction (implementing first proposal), with the caveat of attributed commit being evil merge. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-04 5:58 ` Junio C Hamano 2008-06-04 14:03 ` Jakub Narebski @ 2008-06-04 22:24 ` Luben Tuikov 1 sibling, 0 replies; 40+ messages in thread From: Luben Tuikov @ 2008-06-04 22:24 UTC (permalink / raw) To: Jakub Narebski, Junio C Hamano; +Cc: Rafael Garcia-Suarez, git, Lea Wiemann --- On Tue, 6/3/08, Junio C Hamano <gitster@pobox.com> wrote: > Another breakage is even though $full_rev^ _may_ exist > (iow, $full_rev > might not be the root commit), the file being blamed may > not exist there > (iow $full_rev might have introduced the file). Instead of > running > "rev-parse $full_rev^", you would at least need > to ask "rev-list -1 > $full_rev^ -- $path" or something from the Porcelain > layer, but > unfortunately this is rather expensive. Yes, I've seen this too, but saw no advantage to bring it up at the time. > Because blame already almost knows if the commit the final > blame lies on > has a parent, it would be reasonably cheap to add that > "parent or nothing" > information to its --porcelain (and its --incremental) > format if we wanted > to. Yes, I agree. At the moment those "checks" are left to be deduced by the person data-mining with blame. (Which isn't /that/ bad.) Luben ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 12:45 ` Jakub Narebski 2008-06-03 13:00 ` Rafael Garcia-Suarez @ 2008-06-03 14:24 ` Lea Wiemann 2008-06-03 20:24 ` Jakub Narebski 1 sibling, 1 reply; 40+ messages in thread From: Lea Wiemann @ 2008-06-03 14:24 UTC (permalink / raw) To: Jakub Narebski; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov Jakub Narebski wrote: > I was thinking about extending git-blame porcelain format (and also > incremental format, of course) by 'parents' (and perhaps > 'original-parents') header... Regarding prettiness, I don't find parents in the porcelain output particularly useful, but if other people think they need this, I won't object. :) Regarding performance, it would be good to show that the solution I'm suggesting in my separate is slower than extending git-blame before implementing anything. (I doubt it matters performance-wise.) -- Lea ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 14:24 ` Lea Wiemann @ 2008-06-03 20:24 ` Jakub Narebski 2008-06-03 23:11 ` Lea Wiemann 0 siblings, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 20:24 UTC (permalink / raw) To: Lea Wiemann, Rafael Garcia-Suarez; +Cc: git, Luben Tuikov I have joined there two separate threads... probably attaching them in a wrong place. On Tue, 3 June 2008, Lea Wiemann wrote: > Rafael Garcia-Suarez wrote: > > > > Finally, to avoid forking git-rev-parse too many times, cache its > > results in a new hash %parent_commits. > > I'm not too happy with this: > > 1) Minor point: I'm working on caching for the backend right now > (IOW, basically what you're doing, just centralized in a separate > module), so you're essentially duplicating work, and you're making it > (a little) harder for me to refactor gitweb since I have to rip out > your cache code. I don't think %parent_commits hash is suitable for caching; it is only intermediate step, reducing number of git command calls (and forks) from number of blocks of changes in a blame, to number of distinct commits blamed. From this you would put info into appropriate Perl structure. ATTENTION! This example shows where caching [parsed] data have problems compared to front-end caching (caching output). Caching data is (usually) the best solution for pages which are generated from some parsed data _as a whole_, or can be generated from parsed data as a whole, i.e. heads, tags, summary, projects list, shortlog, history, view etc. Problems occur when we try to cache page with _streaming_ output, such as blob view, blame view, diff part of commitdiff etc. Here better solution might be either front-end cache (caching HTML output), or back-end caching (caching output of git commands). > Those few lines won't hurt, but in general I suggest that nobody > make any larger efforts to cache stuff in gitweb for the next few > weeks. Understandable, we want to avoid conflicts. By the way, if we agree that version %parent_commits is too intrusive dusring GSoC 2008, I think it would be good to accept into maint the patch with --revs-only, which fixes real bug, even if it is annoyance level only... > 2) Major point: You're still forking a lot. The Right Thing is to > condense everything into a single call -- I believe "git-rev-list > --parents --no-walk hash hash hash..." is correct and easily > parsable. Its output seems to be lines of > hash parent_1 parent_2 ... parent_n > with n >= 0. Can you implement that? It would be very useful and > also reusable for me! This is not a good solution for 'blame' view, which is generated "on the fly", by streaming git-blame output via filter. Above solution goes counter to code flow flow: gitweb would have to somehow get list of all blamed commits. (See also note above about caching "stream-generated" pages). Modifying git-blame --porcelain (and --incremental) output has the advantage of simple code on gitweb side, retaining "streamed" page generation. It would be one fork less, but I guess that is negligible. It would be useful for other blame viewers such as "git gui blame" to do similar data mining fast. Other consumers of git-blame output should be (if written correctly) not affected by additional header which they don't understand. The disadvantage would be for gitweb to require version of git binary which has this feature... Of course implementing get_parents($hash[, $hash...]) in either gitweb, or Git.pm, using "git-rev-list --parents --no-walk <args>" could still be useful. ---------------------------------------------------------------------- On Tue, 3 June 2008, Lea Wiemann wrote: > Jakub Narebski wrote: > > I was thinking about extending git-blame porcelain format (and also > > incremental format, of course) by 'parents' (and perhaps > > 'original-parents') header... > > Regarding prettiness, I don't find parents in the porcelain output > particularly useful, but if other people think they need this, I won't > object. :) The change in gitweb was introduced by commit 244a70e (Blame "linenr" link jumps to previous state at "orig_lineno"), by Luben Tuikov; please read commit message for explanation how it could be used for data mining / browsing annotated history of a file. As I wrote above, this solution allows to have very simple, streaming code in gitweb dealing with "line before change" links. The porcelain (and incremental) format of git-blame was created in such way to allow easy extending it; true and rewritten parents would help in navigating annotated file view in history. It should not, I think, affect your efforts; it is not you that proposed to write such extension. > Regarding performance, it would be good to show that the solution I'm > suggesting in my separate is slower than extending git-blame before > implementing anything. (I doubt it matters performance-wise.) It is one fork more. And as I wrote above, you need list of all blamed commits upfront, which goes counter to currently used "streaming output" code flow; it would complicate code, and thus reduce performance. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 20:24 ` Jakub Narebski @ 2008-06-03 23:11 ` Lea Wiemann 2008-06-04 0:11 ` Jakub Narebski 2008-06-08 18:19 ` Lea Wiemann 0 siblings, 2 replies; 40+ messages in thread From: Lea Wiemann @ 2008-06-03 23:11 UTC (permalink / raw) To: Jakub Narebski; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov Jakub Narebski wrote: > I don't think %parent_commits hash is suitable for caching; it is only > intermediate step, reducing number of git command calls (and forks) [...] > > ATTENTION! This example shows where caching [parsed] data have problems > compared to front-end caching (caching output). ATTENTION! Could we please stop having this discussion?! Your argument is completely bogus. If the parent commit hashes are in cache, it's an almost zero-time cache lookup. The only difference it might make compared front-end caching is the CPU time it takes to generate the page, and *I want to see benchmarks before I even start thinking about CPU*. Okay? Good, thanks. Sorry I'm a little indignant, but you seem to be somehow trying to tell me what to implement, and that gets annoying after a while. I don't mind your input, but at some point the discussion just doesn't go any further. > Problems occur when we try to cache page with _streaming_ output, such > as blob view, blame view, diff part of commitdiff etc. We can still stream backend-cache-backed data, though it's a little harder. It's mostly a memory, not a performance issue though -- the only point where I think it actually would be performance-relevant is blame, and blame doesn't stream anyway (see below). > By the way, if we agree that version %parent_commits is too intrusive > dusring GSoC 2008, Oh, I don't mind, FTR. It's not enough lines to matter. >> 2) Major point: You're still forking a lot. The Right Thing is to >> condense everything into a single call > > This is not a good solution for 'blame' view, which is generated "on the > fly", by streaming git-blame output via filter. No, whether you have your "while <$fd>" loop or not doesn't make a difference. Blame first calculates the whole blame and then dumps it out in zero-time, unless you use --incremental. So there's no performance difference in getting all blame output and then dumping it out vs. reading and outputting it line-by-line. And regarding memory, if your blame output doesn't fit into your RAM, you have different kinds of issues. JFTR, I don't have any opinion about extending the porcelain output of git-blame (apart from the fact that happens to not be useful for gitweb for the reason I outlined in the previous paragraph). -- Lea ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 23:11 ` Lea Wiemann @ 2008-06-04 0:11 ` Jakub Narebski 2008-06-04 0:39 ` Lea Wiemann 2008-06-08 18:19 ` Lea Wiemann 1 sibling, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-04 0:11 UTC (permalink / raw) To: Lea Wiemann; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov Lea Wiemann wrote: > Jakub Narebski wrote: >> >> I don't think %parent_commits hash is suitable for caching; it is only >> intermediate step, reducing number of git command calls (and forks) [...] >> >> ATTENTION! This example shows where caching [parsed] data have problems >> compared to front-end caching (caching output). > > ATTENTION! Could we please stop having this discussion?! Yeah, yeah, I know. "Talk is cheap, show me the code" (or at least pseudocode). > Your argument > is completely bogus. If the parent commit hashes are in cache, it's an > almost zero-time cache lookup. You have cut a bit too much (quoted a bit too little) for me to decide if I made myself clear wrt. saving %parent_commits hash into cache. What I wanted to say that in caching intermediate data for 'blame' view you have to save to cache something like @blocks (or @lines) array. This array can contain parents of blamed commits, so there is no need for saving %parent_commits separately: it would be duplication of information. This hash is needed to reduce number of calls to git-rev-parse, and is used to generate parsed info, which info in turn (I think) can be cached. > The only difference it might make > compared front-end caching is the CPU time it takes to generate the > page, and *I want to see benchmarks before I even start thinking about > CPU*. Okay? Good, thanks. The only place where I think front-end caching could be better is 'blob' view with syntax highlighting (using some external filter, like GNU Source Highlight)... which is not implemented yet. I thought that snapshots (if enabled) would fall in this category, but this is the case where data cache is almost identical to output cache (the same happens for [almost] all "raw" / *_plain views). > Sorry I'm a little indignant, but you seem to be somehow trying to tell > me what to implement, and that gets annoying after a while. I don't > mind your input, but at some point the discussion just doesn't go any > further. > >> Problems occur when we try to cache page with _streaming_ output, such >> as blob view, blame view, diff part of commitdiff etc. > > We can still stream backend-cache-backed data, though it's a little > harder. It's mostly a memory, not a performance issue though -- the > only point where I think it actually would be performance-relevant is > blame, and blame doesn't stream anyway (see below). And snapshots. We certainly want to stream snapshots, as they can be quite large. Also blob_plain view might be difficult, if there are extremely large binary files in the repository (it should not happen often, but it can happen). [...] >>> 2) Major point: You're still forking a lot. The Right Thing is to >>> condense everything into a single call >> >> This is not a good solution for 'blame' view, which is generated "on the >> fly", by streaming git-blame output via filter. > > No, whether you have your "while <$fd>" loop or not doesn't make a > difference. It perhaps makes no difference performance wise (solution with "git rev-list --parents --no-walk" has one fork more), but it might make code unnecessarily more complicated. In the rev-list solution you have to browse git-blame output to gather all blamed commits one want to find parents of; in the case of extending git-blame you can just process block after block of code. > Blame first calculates the whole blame and then dumps it > out in zero-time, unless you use --incremental. There is some code in the mailing list archive (and perhaps used by repo's gitweb, but I might be mistaken), which adds git_blame_incremental and use AJAX together with "git blame --incremental" to reduce latency. It was done by having JavaScript check if browser is AJAX-capable, and if it was rewriting 'blame' links to 'blame_incremental'. But if there exist cached blame, I think it would be as fast (in terms of latency) to generate 'blame' from cache as to generate 'blame_incremental'. > So there's no > performance difference in getting all blame output and then dumping it > out vs. reading and outputting it line-by-line. Performance wise, perhaps not. Memory wise, perhaps yes; better not to use more memory than needed, especially if memcached is to share machine. > And regarding memory, > if your blame output doesn't fit into your RAM, you have different kinds > of issues. True. > JFTR, I don't have any opinion about extending the porcelain output of > git-blame (apart from the fact that happens to not be useful for gitweb > for the reason I outlined in the previous paragraph). It would be/might be (I haven't examined corner cases yet) important in the case of file history which both contains evil merges, and it's simplified history is different than full history. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-04 0:11 ` Jakub Narebski @ 2008-06-04 0:39 ` Lea Wiemann 2008-06-04 12:31 ` Jakub Narebski 0 siblings, 1 reply; 40+ messages in thread From: Lea Wiemann @ 2008-06-04 0:39 UTC (permalink / raw) To: Jakub Narebski; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov Jakub Narebski wrote: > And snapshots [and blob_plain]. We certainly want to stream snapshots, as > they can be quite large. Yup. I suppose that those need to be cached on disk rather than in memory, so they need a separate cache. > [Parents in blame output:] > It perhaps makes no difference performance wise (solution with > "git rev-list --parents --no-walk" has one fork more), but it might > make code unnecessarily more complicated. A few lines. *shrugs* Probably actually easier than adding stuff to git-blame's output, but I won't argue against the latter if you want it. > use AJAX together with "git blame --incremental" to reduce latency. > It was done by having JavaScript check if browser is AJAX-capable, Unfortunately there is no such check (and I doubt it's doable without cookie or redirect trickery) -- you'll find that the blames on repo.or.cz don't work without JavaScript. -- Lea ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-04 0:39 ` Lea Wiemann @ 2008-06-04 12:31 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-04 12:31 UTC (permalink / raw) To: Lea Wiemann; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov Lea Wiemann wrote: > Jakub Narebski wrote: > > > > And snapshots [and blob_plain]. We certainly want to stream snapshots, as > > they can be quite large. > > Yup. I suppose that those need to be cached on disk rather than in > memory, so they need a separate cache. Or at least (in the first implementation) to avoid caching them in memory-based cache (and serve them uncached). Although I wonder how memory-based caches such as memcached or swifty, and perhaps also mmap based cache (BerkeleyDB based cache is supposedly fast because it fits into memory/caches in memory) deals with overly large cache entries... > > [Parents in blame output:] > > It perhaps makes no difference performance wise (solution with > > "git rev-list --parents --no-walk" has one fork more), but it might > > make code unnecessarily more complicated. > > A few lines. *shrugs* Probably actually easier than adding stuff to > git-blame's output, but I won't argue against the latter if you want it. With modified (enhanced) git-blame output code would look like this (rough pseudocode): while (<$fd>) { ... <parse 'parent' header> ... } while using no-walk rev-list requires list of blamed parents upfront, so the code would have to look like this @blame_data = <$fd>; @commitlist = map { <get sha1> } grep { <header line> } @blame_list; %commit_parents = get_parents(\@commitlist); # calls git-rev-list foreach (@commitlist) { ... ... } Note that you read whole data into gitweb, inclreasing memory usage... which we want to avoid, especially when using memcached or similar caching backend (git-blame itself has to keep data in memory, but no need to duplicate the amount). Besides git-blame output needs to be extended/enhanced anyway for the data mining / annotated file history navigation Luben wanted to be really robust. See my response to Linus email in this thread (to be written). > > use AJAX together with "git blame --incremental" to reduce latency. > > It was done by having JavaScript check if browser is AJAX-capable, > > Unfortunately there is no such check (and I doubt it's doable without > cookie or redirect trickery) -- you'll find that the blames on > repo.or.cz don't work without JavaScript. I have in my git repository original version (well, one of original versions) adding incremental blame output Message-ID: <20070825222404.16967.9402.stgit@rover> http://permalink.gmane.org/gmane.comp.version-control.git/56657 by Petr Baudis, tweaked version of Fredrik Kuivinen patch, and in the commit message there is the floowing info: Compared to the original patch, this one works with pathinfo-ish URLs as well, and should play well with non-javascript browsers as well (the HTML points to the blame action, while javascript code rewrites the links to use the blame_incremental action; it is somewhat hackish but I couldn't think of a better solution). Instead of rewriting links gitweb's JavaScript could use JavaScript redirect trickery, using JavaScript (by setting location.href for example) to redirect to blame_incremental action from blame action. As to checking if browser is AJAX capable: you can at least check if all methods needed are available. P.S. You would probably want to remove old git-annotate based git_blame (dead code, currently not used by any action), and rename git_blame2 to git_blame. A bit less code to check for caching problems etc,... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 23:11 ` Lea Wiemann 2008-06-04 0:11 ` Jakub Narebski @ 2008-06-08 18:19 ` Lea Wiemann 2008-06-08 20:28 ` Jakub Narebski 1 sibling, 1 reply; 40+ messages in thread From: Lea Wiemann @ 2008-06-08 18:19 UTC (permalink / raw) To: Lea Wiemann; +Cc: Jakub Narebski, Rafael Garcia-Suarez, git, Luben Tuikov Lea Wiemann wrote: > Blame first calculates the whole blame and then dumps it out in > zero-time [so] there's no performance difference in getting all blame > output and then dumping it out vs. reading and outputting it line-by-line. I haven't been following the recent discussion in detail, but here's another thought: If you want to look up the parents, it's usually faster (at least when caching is enabled) to get them all in a single call. IOW, don't look up the parent for each hash as it appears, but collect all hashes and then get a list of all parents with a single call. -- Lea ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-08 18:19 ` Lea Wiemann @ 2008-06-08 20:28 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-08 20:28 UTC (permalink / raw) To: Lea Wiemann; +Cc: Rafael Garcia-Suarez, git, Luben Tuikov On Sun, 8 Jun 2008, Lea Wiemann wrote: > On Wed, 04 Jun 2008, Lea Wiemann wrote: >> >> Blame first calculates the whole blame and then dumps it out in >> zero-time [so] there's no performance difference in getting all blame >> output and then dumping it out vs. reading and outputting it line-by-line. > > I haven't been following the recent discussion in detail, but here's > another thought: If you want to look up the parents, it's usually faster > (at least when caching is enabled) to get them all in a single call. > IOW, don't look up the parent for each hash as it appears, but collect > all hashes and then get a list of all parents with a single call. If caching is enabled, then parent info can be retrieved from cache. If caching is disabled, or cache expired (cache miss) you would have to get whole blame output to get all revisions to get parents for. This means for a short while twice amount of memory (whole blame in git-blame, because thats how non-incremental blame works, and whole blame in gitweb, till reading last byte of blame when git-blame ends); and that is not good when memory-based cache (be it memcache, mmap, or other solution) is on the same machine (sometimes you just don't have a farm of servers...). Junio's patches adding "previous" header to git blame result in no worse output (result) than current code. I have proposed improvements, but I'm not sure they can be implemented cheaply (fairly sure that they cannot, and I'm not sure if improvements are worth the cost). I'd like to know what happens in Junio code when evil merge is blamed; I don't know code enough (and I am a bit lazy here) to get this from code itself. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 11:43 ` Jakub Narebski 2008-06-03 12:03 ` Rafael Garcia-Suarez @ 2008-06-03 20:18 ` Luben Tuikov 2008-06-03 20:29 ` Jakub Narebski 1 sibling, 1 reply; 40+ messages in thread From: Luben Tuikov @ 2008-06-03 20:18 UTC (permalink / raw) To: Rafael Garcia-Suarez, Jakub Narebski; +Cc: git --- On Tue, 6/3/08, Jakub Narebski <jnareb@gmail.com> wrote: > From: Jakub Narebski <jnareb@gmail.com> > Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame > To: "Rafael Garcia-Suarez" <rgarciasuarez@gmail.com> > Cc: git@vger.kernel.org, "Luben Tuikov" <ltuikov@yahoo.com> > Date: Tuesday, June 3, 2008, 4:43 AM > Cc-ed Luben Tuikov, author of this part. Thanks guys! :-) > Rafael Garcia-Suarez <rgarciasuarez@gmail.com> > writes: > > > git-rev-parse will abort with an error when passed a > non-existent > > revision spec, such as "deadbeef^" where > deadbeef has no parent. Yes, I've known about this ever since I coded this. The reasoning was that the value of parsing up the "tree" of parent changes was a lot more (valuable) than the value of detecting that "deadbeef^" had no parent -- which would've been logically apparent to the coder/reviewer/user of "blame2". > > Using the --revs-only parameter makes this error go > away, while > > retaining functionality, keeping the web server error > log nice > > and clean. Ok, that's fine, as long as indeed the functionality is preserved. I leave it up to Jakub and Junio to make sure that indeed the functionality is preserved. (I.e. saving us yet another patch from anyone of us.) > Thanks. This error wasn't detected earlier probably > because > 'blame' view is rarely enabled; and repo.or.cz > gitweb which has > 'blame' enabled IIRC use gitweb which is modified > there, allowing > incremental blame using AJAX. I'm a heavy user of "blame2", often exploring the course of "evolution" of the code and/or code lines and segments. I value this in gitweb (being easier/more visual to use as opposed to command line). > > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > > index 55fb100..f3b4b24 100755 > > --- a/gitweb/gitweb.perl > > +++ b/gitweb/gitweb.perl > > @@ -4226,9 +4226,9 @@ git_blame2 > > esc_html($rev)); > > print "</td>\n"; > > } > > - open (my $dd, "-|", git_cmd(), > "rev-parse", "$full_rev^") > > + open (my $dd, "-|", git_cmd(), > "rev-parse", '--revs-only', > "$full_rev^") > > or die_error(undef, "Open git-rev-parse > failed"); > > - my $parent_commit = <$dd>; > > + my $parent_commit = <$dd> || ''; > > close $dd; > > chomp($parent_commit); > > my $blamed = href(action => 'blame', > > I'd rather remove this, correct it, or make it optional > (this is very > fork-heavy). > > But this patch is good as it is now... Yes, I agree it is good. Just you guys make sure that it doesn't change the value of the functionality of the code. Thanks everyone! Luben ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 20:18 ` Luben Tuikov @ 2008-06-03 20:29 ` Jakub Narebski 2008-06-03 21:27 ` Luben Tuikov 0 siblings, 1 reply; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 20:29 UTC (permalink / raw) To: Rafael Garcia-Suarez; +Cc: Luben Tuikov, git Luben Tuikov wrote: > Jakub Narebski wrote: > > Rafael Garcia-Suarez wrote > > > + my $parent_commit = <$dd> || ''; By the way, here you would probably want + my $parent_commit = <$dd> || '--root'; (if it works). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 20:29 ` Jakub Narebski @ 2008-06-03 21:27 ` Luben Tuikov 2008-06-03 21:34 ` Jakub Narebski 0 siblings, 1 reply; 40+ messages in thread From: Luben Tuikov @ 2008-06-03 21:27 UTC (permalink / raw) To: Rafael Garcia-Suarez, Jakub Narebski; +Cc: git --- On Tue, 6/3/08, Jakub Narebski <jnareb@gmail.com> wrote: > > > > + my $parent_commit = > <$dd> || ''; > > By the way, here you would probably want > > + my $parent_commit = <$dd> || > '--root'; > > (if it works). AFAIR, I did experiment with the "--root" parameter. Not sure why, but my command line ended up without it, and thus git_blame2() ended up without it. If you feel that it is, at this time having had changes commited since my original commit, more correct to include it, then please go ahead. Thanks! Luben ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame 2008-06-03 21:27 ` Luben Tuikov @ 2008-06-03 21:34 ` Jakub Narebski 0 siblings, 0 replies; 40+ messages in thread From: Jakub Narebski @ 2008-06-03 21:34 UTC (permalink / raw) To: ltuikov; +Cc: Rafael Garcia-Suarez, git Dnia wtorek 3. czerwca 2008 23:27, Luben Tuikov napisał: > --- On Tue, 6/3/08, Jakub Narebski <jnareb@gmail.com> wrote: > > > > > + my $parent_commit = > > <$dd> || ''; > > > > By the way, here you would probably want > > > > + my $parent_commit = <$dd> || > > '--root'; > > > > (if it works). > > AFAIR, I did experiment with the "--root" parameter. Not sure why, > but my command line ended up without it, and thus git_blame2() > ended up without it. > > If you feel that it is, at this time having had changes commited since my > original commit, more correct to include it, then please go ahead. I'm sorry, I havent thought this through. This _cannot_ work. You can use '--root' as 'hp' parameter to force creation commitdiff, but not as 'hp' parameter to blame. Sorry for the confusion. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2008-06-08 20:30 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-03 10:46 [PATCH] Avoid errors from git-rev-parse in gitweb blame Rafael Garcia-Suarez 2008-06-03 11:42 ` Lea Wiemann 2008-06-03 11:43 ` Jakub Narebski 2008-06-03 12:03 ` Rafael Garcia-Suarez 2008-06-03 12:45 ` Jakub Narebski 2008-06-03 13:00 ` Rafael Garcia-Suarez 2008-06-03 13:12 ` Jakub Narebski 2008-06-03 13:36 ` Rafael Garcia-Suarez 2008-06-03 14:14 ` Jakub Narebski 2008-06-03 14:40 ` Rafael Garcia-Suarez 2008-06-03 14:56 ` Jakub Narebski 2008-06-03 15:07 ` Rafael Garcia-Suarez 2008-06-03 17:50 ` Jakub Narebski 2008-06-03 21:09 ` Luben Tuikov 2008-06-03 21:03 ` Luben Tuikov 2008-06-03 20:35 ` Luben Tuikov 2008-06-03 21:31 ` Jakub Narebski 2008-06-04 5:58 ` Junio C Hamano 2008-06-04 14:03 ` Jakub Narebski 2008-06-05 6:07 ` Junio C Hamano 2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano 2008-06-06 9:22 ` Jakub Narebski 2008-06-05 6:09 ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano 2008-06-06 9:27 ` Jakub Narebski 2008-06-06 15:17 ` Junio C Hamano 2008-06-06 15:44 ` Jakub Narebski 2008-06-06 0:26 ` [PATCH] Avoid errors from git-rev-parse in gitweb blame Jakub Narebski 2008-06-04 22:24 ` Luben Tuikov 2008-06-03 14:24 ` Lea Wiemann 2008-06-03 20:24 ` Jakub Narebski 2008-06-03 23:11 ` Lea Wiemann 2008-06-04 0:11 ` Jakub Narebski 2008-06-04 0:39 ` Lea Wiemann 2008-06-04 12:31 ` Jakub Narebski 2008-06-08 18:19 ` Lea Wiemann 2008-06-08 20:28 ` Jakub Narebski 2008-06-03 20:18 ` Luben Tuikov 2008-06-03 20:29 ` Jakub Narebski 2008-06-03 21:27 ` Luben Tuikov 2008-06-03 21:34 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).