git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: Fix and simplify "split patch" detection
@ 2007-09-02 20:18 Jakub Narebski
  2007-09-03  0:15 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2007-09-02 20:18 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Petr Baudis, Junio C Hamano

There are some cases when one line from "raw" git-diff output (raw format)
corresponds to more than one patch in the patchset git-diff output; we
call this situation "split patch". Current code misdetected subsequent
patches (for different files) with the same pre-image and post-image
as fragments of "split patch", leading to mislabeled from-file/to-file
diff header etc.


We could parse from_file and to_file (if to_file exists) from the "git
diff header" in the patch, i.e. from the "^diff" line in patchset, and
consider patch "split" or "continued" if not only pre-image and
post-image (from_id and to_id) matches, but only when also from_file
and to_file (if it exists) matches.  This has the advantage of being
generic, not depending on details of git diff output formatting, and
how patch output relates to raw diff output, but it adds yet another
complication.

Note: both from_name and to_name can be quoted and contain spaces;
their separation is nontrivial.


Alternate solution, which we did chose, is to check when git splits
patches, and do not check if parsed info from current patch corresponds
to current or next raw diff format output line.  Git splits patches
only for 'T' (typechange) status filepair, and there always two patches
corresponding to one raw diff line.

And we can get rid of buffering extended diff header and parsing it,
This simplifies git_patchset_body, making it easier to understand and
maintain.  At the other hand it fixes gitweb to git diff output details.


While at it we added 'status_str' to diffinfo output, which stores
status (also for merge commit) as a string.  This allows for easy
checking if there is given status among all for merge commit, e.g.
  $diffinfo->{'status_str'} =~ /T/

There are no users of from_ids_eq subroutine.

Noticed-by: Yann Dirson <ydirson@altern.org>
Diagnosed-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is post 1.5.3 resend of the patch
  Message-Id: <200708290208.08191.jnareb@gmail.com>
posted in the "[BUG] gitweb on repo.or.cz shows buggy commitdiff"
thread.

I'm a tiny little bit ambivalent about this patch: on one hand it
fixes the bug and simplifies git_patchset_body code making it easier
to maintain, on the other hand it ties patch handling code in gitweb
with the details of git-diff patch vs. raw output format.

 gitweb/gitweb.perl |   63 +++++++++++++++++++--------------------------------
 1 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index b2bae1b..8c1e02c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1963,7 +1963,7 @@ sub parse_difftree_raw_line {
 		$res{'to_mode'} = $2;
 		$res{'from_id'} = $3;
 		$res{'to_id'} = $4;
-		$res{'status'} = $5;
+		$res{'status'} = $res{'status_str'} = $5;
 		$res{'similarity'} = $6;
 		if ($res{'status'} eq 'R' || $res{'status'} eq 'C') { # renamed or copied
 			($res{'from_file'}, $res{'to_file'}) = map { unquote($_) } split("\t", $7);
@@ -1979,6 +1979,7 @@ sub parse_difftree_raw_line {
 		$res{'to_mode'} = pop @{$res{'from_mode'}};
 		$res{'from_id'} = [ split(' ', $3) ];
 		$res{'to_id'} = pop @{$res{'from_id'}};
+		$res{'status_str'} = $4;
 		$res{'status'} = [ split('', $4) ];
 		$res{'to_file'} = unquote($5);
 	}
@@ -3112,6 +3113,7 @@ sub git_patchset_body {
 	my $patch_line;
 	my $diffinfo;
 	my (%from, %to);
+	my $patches_per_difftree_line = 1;
 
 	print "<div class=\"patchset\">\n";
 
@@ -3124,42 +3126,13 @@ sub git_patchset_body {
 
  PATCH:
 	while ($patch_line) {
-		my @diff_header;
-		my ($from_id, $to_id);
-
-		# git diff header
-		#assert($patch_line =~ m/^diff /) if DEBUG;
-		#assert($patch_line !~ m!$/$!) if DEBUG; # is chomp-ed
-		$patch_number++;
-		push @diff_header, $patch_line;
-
-		# extended diff header
-	EXTENDED_HEADER:
-		while ($patch_line = <$fd>) {
-			chomp $patch_line;
-
-			last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /);
-
-			if ($patch_line =~ m/^index ([0-9a-fA-F]{40})..([0-9a-fA-F]{40})/) {
-				$from_id = $1;
-				$to_id   = $2;
-			} elsif ($patch_line =~ m/^index ((?:[0-9a-fA-F]{40},)+[0-9a-fA-F]{40})..([0-9a-fA-F]{40})/) {
-				$from_id = [ split(',', $1) ];
-				$to_id   = $2;
-			}
-
-			push @diff_header, $patch_line;
-		}
-		my $last_patch_line = $patch_line;
 
 		# check if current patch belong to current raw line
 		# and parse raw git-diff line if needed
-		if (defined $diffinfo &&
-		    defined $from_id && defined $to_id &&
-		    from_ids_eq($diffinfo->{'from_id'}, $from_id) &&
-		    $diffinfo->{'to_id'} eq $to_id) {
+		if ($patches_per_difftree_line > 1) {
 			# this is continuation of a split patch
 			print "<div class=\"patch cont\">\n";
+			$patches_per_difftree_line--;
 		} else {
 			# advance raw git-diff output if needed
 			$patch_idx++ if defined $diffinfo;
@@ -3167,7 +3140,7 @@ sub git_patchset_body {
 			# compact combined diff output can have some patches skipped
 			# find which patch (using pathname of result) we are at now
 			my $to_name;
-			if ($diff_header[0] =~ m!^diff --cc "?(.*)"?$!) {
+			if ($patch_line =~ m!^diff --cc "?(.*)"?$!) {
 				$to_name = $1;
 			}
 
@@ -3191,6 +3164,7 @@ sub git_patchset_body {
 				}
 			} until (!defined $to_name || $to_name eq $diffinfo->{'to_file'} ||
 			         $patch_idx > $#$difftree);
+
 			# modifies %from, %to hashes
 			parse_from_to_diffinfo($diffinfo, \%from, \%to, @hash_parents);
 			if ($diffinfo->{'nparents'}) {
@@ -3232,32 +3206,43 @@ sub git_patchset_body {
 			# this is first patch for raw difftree line with $patch_idx index
 			# we index @$difftree array from 0, but number patches from 1
 			print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n";
+
+			# typechange diff with 'T' status has two patches per one raw line
+			if ($diffinfo->{'status_str'} =~ /T/) {
+				$patches_per_difftree_line = 2;
+			}
 		}
 
+		# git diff header
+		#assert($patch_line =~ m/^diff /) if DEBUG;
+		#assert($patch_line !~ m!$/$!) if DEBUG; # is chomp-ed
+		$patch_number++;
 		# print "git diff" header
-		$patch_line = shift @diff_header;
 		print format_git_diff_header_line($patch_line, $diffinfo,
 		                                  \%from, \%to);
 
 		# print extended diff header
-		print "<div class=\"diff extended_header\">\n" if (@diff_header > 0);
+		print "<div class=\"diff extended_header\">\n";
 	EXTENDED_HEADER:
-		foreach $patch_line (@diff_header) {
+		while ($patch_line = <$fd>) {
+			chomp $patch_line;
+
+			last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /);
+
 			print format_extended_diff_header_line($patch_line, $diffinfo,
 			                                       \%from, \%to);
 		}
-		print "</div>\n"  if (@diff_header > 0); # class="diff extended_header"
+		print "</div>\n"; # class="diff extended_header"
 
 		# from-file/to-file diff header
-		$patch_line = $last_patch_line;
 		if (! $patch_line) {
 			print "</div>\n"; # class="patch"
 			last PATCH;
 		}
 		next PATCH if ($patch_line =~ m/^diff /);
 		#assert($patch_line =~ m/^---/) if DEBUG;
-		#assert($patch_line eq $last_patch_line) if DEBUG;
 
+		my $last_patch_line = $patch_line;
 		$patch_line = <$fd>;
 		chomp $patch_line;
 		#assert($patch_line =~ m/^\+\+\+/) if DEBUG;
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] gitweb: Fix and simplify "split patch" detection
  2007-09-02 20:18 [PATCH] gitweb: Fix and simplify "split patch" detection Jakub Narebski
@ 2007-09-03  0:15 ` Junio C Hamano
  2007-09-03  9:47   ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2007-09-03  0:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Yann Dirson, Petr Baudis

Jakub Narebski <jnareb@gmail.com> writes:

> Alternate solution, which we did chose, is to check when git splits
> patches, and do not check if parsed info from current patch corresponds
> to current or next raw diff format output line.  Git splits patches
> only for 'T' (typechange) status filepair, and there always two patches
> corresponding to one raw diff line.

Not that I can think of a better way offhand than what you
already mentioned, but I have to say that I am not entirely
happy with this implementation.  A really old git showed two
patches (one creation and one deletion) for "complete rewrite",
which was corrected long time ago.  I do not think we will
change the typechange output in a similar way in the future, but
relying on that level of detail feels somewhat ugly.

As you are reading from --patch-with-raw, you already know the
order of patches that will be given to you when you finished
reading the "raw" part.  The patches will come in the same
order.  So it might be possible to keep track of patches to what
path you are expecting and decide if it is part of what you are
processing at the point you process "diff --git" line.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gitweb: Fix and simplify "split patch" detection
  2007-09-03  0:15 ` Junio C Hamano
@ 2007-09-03  9:47   ` Jakub Narebski
  2007-09-03 11:00     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2007-09-03  9:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Yann Dirson, Petr Baudis

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > Alternate solution, which we did chose, is to check when git splits
> > patches, and do not check if parsed info from current patch corresponds
> > to current or next raw diff format output line.  Git splits patches
> > only for 'T' (typechange) status filepair, and there always two patches
> > corresponding to one raw diff line.
> 
> Not that I can think of a better way offhand than what you
> already mentioned, but I have to say that I am not entirely
> happy with this implementation.

I'm also bit unhappy with tying git_patchset_body code to the
minute details of git-diff output.

>                                    A really old git showed two 
> patches (one creation and one deletion) for "complete rewrite",
> which was corrected long time ago.  I do not think we will
> change the typechange output in a similar way in the future, but
> relying on that level of detail feels somewhat ugly.

Gitweb requires git with --git-dir at least, so I don't think it
(it meaning "complete rewrite" patch being "split patch") is
a problem here.

> As you are reading from --patch-with-raw, you already know the
> order of patches that will be given to you when you finished
> reading the "raw" part.  The patches will come in the same
> order.  So it might be possible to keep track of patches to what
> path you are expecting and decide if it is part of what you are
> processing at the point you process "diff --git" line.

I'll try to come with the second solution.

I wonder if the post-image name is unique in raw format of diff
output, and can be used alone to check if there are two patches
per one raw output format line...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gitweb: Fix and simplify "split patch" detection
  2007-09-03  9:47   ` Jakub Narebski
@ 2007-09-03 11:00     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2007-09-03 11:00 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Yann Dirson, Petr Baudis

Jakub Narebski <jnareb@gmail.com> writes:

> I wonder if the post-image name is unique in raw format of diff
> output, and can be used alone to check if there are two patches
> per one raw output format line...

I think that is a very sane assumption.

You may have a situation that a single preimage is used to
produce more than one postimage, but it does not make sense to
say this postimage is produced by doing this to that preimage
and there is another way to produce the same postimage using
something else.

At least, I do not think the diffcore chain is _capable_ of
coming up with such alternatives even if it wanted to ;-).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-09-03 11:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-02 20:18 [PATCH] gitweb: Fix and simplify "split patch" detection Jakub Narebski
2007-09-03  0:15 ` Junio C Hamano
2007-09-03  9:47   ` Jakub Narebski
2007-09-03 11:00     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).