git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix segfault with git log -c --follow
@ 2013-05-27 22:49 Clemens Buchacher
  2013-05-28 17:22 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Clemens Buchacher @ 2013-05-27 22:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

In diff_tree_combined we make a copy of diffopts. In
try_to_follow_renames, called via diff_tree_sha1, we free and
re-initialize diffopts->pathspec->items. Since we did not make a deep
copy of diffopts in diff_tree_combined, the original diffopts does not
get the update. By the time we return from diff_tree_combined,
rev->diffopt->pathspec->items points to an invalid memory address. We
get a segfault next time we try to access that pathspec.

Instead, along with the copy of diffopts, make a copy pathspec->items as
well.

We would also have to make a copy of pathspec->raw to keep it consistent
with pathspec->items, but nobody seems to rely on that.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---

I wonder why I get a segfault from this so reliably, since it's not
actually a null-pointer dereference. Maybe this is gcc 4.8 doing
something different from previous versions?

Also, I have absolutely no confidence in my understanding of this code.
This is the first solution that came to mind, and could be totally
wrong. I just figured a patch is better than no patch.

Clemens

 combine-diff.c |  3 +++
 t/t4202-log.sh | 14 ++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..8825604 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1302,6 +1302,7 @@ void diff_tree_combined(const unsigned char *sha1,
 	int i, num_paths, needsep, show_log_first, num_parent = parents->nr;
 
 	diffopts = *opt;
+	diff_tree_setup_paths(diffopts.pathspec.raw, &diffopts);
 	diffopts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	DIFF_OPT_SET(&diffopts, RECURSIVE);
 	DIFF_OPT_CLR(&diffopts, ALLOW_EXTERNAL);
@@ -1372,6 +1373,8 @@ void diff_tree_combined(const unsigned char *sha1,
 		paths = paths->next;
 		free(tmp);
 	}
+
+	diff_tree_release_paths(&diffopts);
 }
 
 void diff_tree_combined_merge(const struct commit *commit, int dense,
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 9243a97..cb03d28 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -530,6 +530,20 @@ test_expect_success 'show added path under "--follow -M"' '
 	)
 '
 
+test_expect_success 'git log -c --follow' '
+	test_create_repo follow-c &&
+	(
+		cd follow-c &&
+		test_commit initial file original &&
+		git rm file &&
+		test_commit rename file2 original &&
+		git reset --hard initial &&
+		test_commit modify file foo &&
+		git merge -m merge rename &&
+		git log -c --follow file2
+	)
+'
+
 cat >expect <<\EOF
 *   commit COMMIT_OBJECT_NAME
 |\  Merge: MERGE_PARENTS
-- 
1.8.2.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] fix segfault with git log -c --follow
  2013-05-27 22:49 [PATCH] fix segfault with git log -c --follow Clemens Buchacher
@ 2013-05-28 17:22 ` Junio C Hamano
  2013-05-28 22:54   ` Clemens Buchacher
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2013-05-28 17:22 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git

Clemens Buchacher <drizzd@aon.at> writes:

> In diff_tree_combined we make a copy of diffopts. In
> try_to_follow_renames, called via diff_tree_sha1, we free and
> re-initialize diffopts->pathspec->items. Since we did not make a deep
> copy of diffopts in diff_tree_combined, the original diffopts does not
> get the update. By the time we return from diff_tree_combined,
> rev->diffopt->pathspec->items points to an invalid memory address. We
> get a segfault next time we try to access that pathspec.

I am not quite sure if I follow.  Do you mean

	diff_tree_combined()
        - makes a shallow copy of rev->diffopt
        - calls diff_tree_sha1()
          diff_tree_sha1()
          - tries to follow rename and clobbers diffopt
        - tries to use the shallow copy of original rev->diffopt
          that no longer is valid, which is a problem

I wonder, just like we force recursive and disable external on the
copy before we use it to call diff_tree_sha1(), if we should disable
follow-renames on it.  "--follow" is an option that is given to the
history traversal part and it should not play any role in getting
the pairwise diff with all parents diff_tree_combined() does.

Besides,

 - "--follow" hack lets us keep track of only one path; and

 - "-c" and "--cc" make sense only when dealing with a merge commit
   and the path in the child may have come from different path in
   parents,

so I am not sure if allowing combination of "--follow -c/--cc" makes
much sense in the first place.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fix segfault with git log -c --follow
  2013-05-28 17:22 ` Junio C Hamano
@ 2013-05-28 22:54   ` Clemens Buchacher
  2013-05-28 23:24     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Clemens Buchacher @ 2013-05-28 22:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, May 28, 2013 at 10:22:17AM -0700, Junio C Hamano wrote:
> Clemens Buchacher <drizzd@aon.at> writes:
> 
> > In diff_tree_combined we make a copy of diffopts. In
> > try_to_follow_renames, called via diff_tree_sha1, we free and
> > re-initialize diffopts->pathspec->items. Since we did not make a deep
> > copy of diffopts in diff_tree_combined, the original diffopts does not
> > get the update. By the time we return from diff_tree_combined,
> > rev->diffopt->pathspec->items points to an invalid memory address. We
> > get a segfault next time we try to access that pathspec.
> 
> I am not quite sure if I follow.  Do you mean
> 
> 	diff_tree_combined()
>         - makes a shallow copy of rev->diffopt
>         - calls diff_tree_sha1()
>           diff_tree_sha1()
>           - tries to follow rename and clobbers diffopt

Right.

>         - tries to use the shallow copy of original rev->diffopt
>           that no longer is valid, which is a problem

diff_tree_combined does not try to use it right away. It does return,
but rev->diffopt is now invalid and the next time we do any kind of diff
with it, we have a problem.

> I wonder, just like we force recursive and disable external on the
> copy before we use it to call diff_tree_sha1(), if we should disable
> follow-renames on it.  "--follow" is an option that is given to the
> history traversal part and it should not play any role in getting the
> pairwise diff with all parents diff_tree_combined() does.

Can't parse that last sentence.

In any case, I don't think disabling diff_tree_sha1 is a solution. The
bug is in diff_tree_sha1 and its subfunctions, because they manipulate a
data structures such that it becomes corrupt. And they do so in an
obfuscated and clearly unintentional manner. So we should not blame the
user for calling diff_tree_sha1 in such a way that it causes corruption.

> Besides,
> 
>  - "--follow" hack lets us keep track of only one path; and

Ok. Good to know it is considered a hack. The code is quite strange
indeed.

>  - "-c" and "--cc" make sense only when dealing with a merge commit
>    and the path in the child may have come from different path in
>    parents,

Sorry, I don't get it.

> so I am not sure if allowing combination of "--follow -c/--cc" makes
> much sense in the first place.

My use-case is came up with this history:

1. Code gets added to file A.
2. File A gets renamed to B in a different branch.
3. The branches get merged, and code from (1) is removed in the merge.

Later I wonder why code from (1) is gone from B even though I felt
certain it had been added before. I also remember that B was renamed at
some point. So I do git log -p --follow B, and it nicely shows that diff
where the code was added, but no diff where the code is removed.

The reason is of course, that the code was removed in the merge and that
diff is not shown. And -c is usually what I do to enable showing diffs
in merge commits.

And if the pairwise diff can also deal with file renames, I think it
absolutely does make sense to show also a three-way diff.

I can't tell far away the code is from supporting anything like that.

Cheers,
Clemens

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fix segfault with git log -c --follow
  2013-05-28 22:54   ` Clemens Buchacher
@ 2013-05-28 23:24     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2013-05-28 23:24 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git

Clemens Buchacher <drizzd@aon.at> writes:

>> I wonder, just like we force recursive and disable external on the
>> copy before we use it to call diff_tree_sha1(), if we should disable
>> follow-renames on it.  "--follow" is an option that is given to the
>> history traversal part and it should not play any role in getting the
>> pairwise diff with all parents diff_tree_combined() does.
>
> Can't parse that last sentence.
>
> In any case, I don't think disabling diff_tree_sha1 is a solution. The
> bug is in diff_tree_sha1 and its subfunctions, because they manipulate a
> data structures such that it becomes corrupt. And they do so in an
> obfuscated and clearly unintentional manner. So we should not blame the
> user for calling diff_tree_sha1 in such a way that it causes corruption.
>
>> Besides,
>> 
>>  - "--follow" hack lets us keep track of only one path; and
>
> Ok. Good to know it is considered a hack. The code is quite strange
> indeed.

The problem with --follow is that it only tracks one path globally.
In a history like this, suppose that a path X long time ago was
renamed to Y at commit B:

    ---o---A---B---C---o HEAD

and you start digging with "log --follow -c HEAD -- Y".  When
looking at C, because it and its parent B both have path Y, the
try-to-follow hack does not kick in, and when trying to show C, we
will show the change in Y (because that is the pathspec).

Then we look at B.  Because B has path we are following, i.e. Y, and
its parent A does not, try-to-follow hack kicks in, and it mangles
the pathspec that is used globally for history traversal to X while
showing the difference between A's X and B's Y.  Then we dig further
to find A; at this point the global pathspec is swapped and now it
is X.

That makes --follow a working hack for a simplest single strand of
pearls.  But if you have a mergy history, e.g.

    ---o---A---------------B---C---o HEAD
            \                 /
             D---E---F---G---H

it can break in interesting ways.  We are likely to have looked at H
before looking at B and used pathspec Y while inspecting H, but
after looking at B, the global pathspec is swapped to X, and then we
try to look at G, F, E and D, none of which may have renamed the
original X, so you would likely miss the change to the path Y you
wanted to follow.

To fix this, we would need to keep "what path are we following" not
in the global revs->pathspec, but per the traversal paths that are
currently active (e.g. when we look at C and H, it is Y, when we
look at B, it is X, when we look at G, that is inherited from H and
still Y, not affected by the rename at B.  And then when we look at
A (we need topo-order traversal to do this), it needs to notice that
one child (i.e. B) has been following X while the other (i.e. D) Y,
and merge the "I've been following this path" information in a
sensible way (e.g. look at its own tree and see what is available,
in this case X).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-05-28 23:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-27 22:49 [PATCH] fix segfault with git log -c --follow Clemens Buchacher
2013-05-28 17:22 ` Junio C Hamano
2013-05-28 22:54   ` Clemens Buchacher
2013-05-28 23:24     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).