* [RFC] blame: new option to better handle merged cherry-picks
@ 2014-01-02 17:55 Bernhard R. Link
2014-01-02 20:29 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Bernhard R. Link @ 2014-01-02 17:55 UTC (permalink / raw)
To: git
Allows to disable the git blame optimization of assuming that if there is a
parent of a merge commit that has the exactly same file content, then
only this parent is to be looked at.
This optimization, while being faster in the usual case, means that in
the case of cherry-picks the blamed commit depends on which other commits
touched a file.
If for example one commit A modified both files b and c. And there are
commits B and C, B only modifies file b and C only modifies file c
(so that no conflicts happen), and assume A is cherry-picked as A'
and the two branches then merged:
--o-----B---A
\ \
---C---A'--M---
Then without this new option git blame blames the A|A' changes of
file b to A while blaming the changes of c to A'.
With the new option --no-parent-shortcut it blames both changes to the
same commit.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
---
Documentation/blame-options.txt | 6 ++++++
builtin/blame.c | 5 ++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
index 0cebc4f..55dd12b 100644
--- a/Documentation/blame-options.txt
+++ b/Documentation/blame-options.txt
@@ -48,6 +48,12 @@ include::line-range-format.txt[]
Show the result incrementally in a format designed for
machine consumption.
+--no-parent-shortcut::
+ Always look at all parents of a merge and do not shortcut
+ to the first parent with no changes to the file looked at.
+ This takes more time but produces more reliable results
+ if branches with cherry-picked commits were merged.
+
--encoding=<encoding>::
Specifies the encoding used to output author names
and commit summaries. Setting it to `none` makes blame
diff --git a/builtin/blame.c b/builtin/blame.c
index 4916eb2..dab2c36 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -45,6 +45,7 @@ static int incremental;
static int xdl_opts;
static int abbrev = -1;
static int no_whole_file_rename;
+static int no_parent_shortcut;
static enum date_mode blame_date_mode = DATE_ISO8601;
static size_t blame_date_width;
@@ -1248,7 +1249,8 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt)
porigin = find(sb, p, origin);
if (!porigin)
continue;
- if (!hashcmp(porigin->blob_sha1, origin->blob_sha1)) {
+ if (!no_parent_shortcut &&
+ !hashcmp(porigin->blob_sha1, origin->blob_sha1)) {
pass_whole_blame(sb, origin, porigin);
origin_decref(porigin);
goto finish;
@@ -2247,6 +2249,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
static const char *contents_from = NULL;
static const struct option options[] = {
OPT_BOOL(0, "incremental", &incremental, N_("Show blame entries as we find them, incrementally")),
+ OPT_BOOL(0, "no-parent-shortcut", &no_parent_shortcut, N_("Don't take shortcuts in some merges but handle cherry-picks better")),
OPT_BOOL('b', NULL, &blank_boundary, N_("Show blank SHA-1 for boundary commits (Default: off)")),
OPT_BOOL(0, "root", &show_root, N_("Do not treat root commits as boundaries (Default: off)")),
OPT_BOOL(0, "show-stats", &show_stats, N_("Show work cost statistics")),
--
1.8.5.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC] blame: new option to better handle merged cherry-picks
2014-01-02 17:55 [RFC] blame: new option to better handle merged cherry-picks Bernhard R. Link
@ 2014-01-02 20:29 ` Junio C Hamano
2014-01-02 21:15 ` Bernhard R. Link
0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2014-01-02 20:29 UTC (permalink / raw)
To: Bernhard R. Link; +Cc: git
"Bernhard R. Link" <brl+git@mail.brlink.eu> writes:
> Allows to disable the git blame optimization of assuming that if there is a
> parent of a merge commit that has the exactly same file content, then
> only this parent is to be looked at.
I think this is what we usually call --full-history in "git log"
family, but more importantly, I do not think this is solving a valid
problem.
> This optimization, while being faster in the usual case, means that in
> the case of cherry-picks the blamed commit depends on which other commits
> touched a file.
>
> If for example one commit A modified both files b and c. And there are
> commits B and C, B only modifies file b and C only modifies file c
> (so that no conflicts happen), and assume A is cherry-picked as A'
> and the two branches then merged:
>
> --o-----B---A
> \ \
> ---C---A'--M---
So the contents of b at M is as the same as in A, so following 'b'
will see A and B changed that path, which is correct.
The contents of c at M is? It is different from A because at A c
lacks the change made to it at C. The merged result at M would
match C in A', no? So following 'c' will see A' and C changed that
path, no?
So what is wrong about it? If the original history were like this
instead, and A' were a cherry-pick of A, then what should happen?
> --o-----B---A'
> \ \
> ---C---A---M---
Don't we want to see c blamed the same way?
Also, when handling a merge, we have to handle parents sequencially,
checking the difference between M with its first parent first, and
then passing blame for the remaining common lines to the remaining
parents. If you flip the order of parents of M when you merge A and
A' in your original history, and with your patch, what would you
see when you blame c? Wouldn't it notice that M:c is identical to c
in its first parent (now A') and pass the whole blame to A' anyway
with or without your change?
> Then without this new option git blame blames the A|A' changes of
> file b to A while blaming the changes of c to A'.
> With the new option --no-parent-shortcut it blames both changes to the
> same commit.
>
> Signed-off-by: Bernhard R. Link <brlink@debian.org>
> ---
> Documentation/blame-options.txt | 6 ++++++
> builtin/blame.c | 5 ++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
> index 0cebc4f..55dd12b 100644
> --- a/Documentation/blame-options.txt
> +++ b/Documentation/blame-options.txt
> @@ -48,6 +48,12 @@ include::line-range-format.txt[]
> Show the result incrementally in a format designed for
> machine consumption.
>
> +--no-parent-shortcut::
> + Always look at all parents of a merge and do not shortcut
> + to the first parent with no changes to the file looked at.
> + This takes more time but produces more reliable results
> + if branches with cherry-picked commits were merged.
> +
> --encoding=<encoding>::
> Specifies the encoding used to output author names
> and commit summaries. Setting it to `none` makes blame
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 4916eb2..dab2c36 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -45,6 +45,7 @@ static int incremental;
> static int xdl_opts;
> static int abbrev = -1;
> static int no_whole_file_rename;
> +static int no_parent_shortcut;
>
> static enum date_mode blame_date_mode = DATE_ISO8601;
> static size_t blame_date_width;
> @@ -1248,7 +1249,8 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt)
> porigin = find(sb, p, origin);
> if (!porigin)
> continue;
> - if (!hashcmp(porigin->blob_sha1, origin->blob_sha1)) {
> + if (!no_parent_shortcut &&
> + !hashcmp(porigin->blob_sha1, origin->blob_sha1)) {
> pass_whole_blame(sb, origin, porigin);
> origin_decref(porigin);
> goto finish;
> @@ -2247,6 +2249,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
> static const char *contents_from = NULL;
> static const struct option options[] = {
> OPT_BOOL(0, "incremental", &incremental, N_("Show blame entries as we find them, incrementally")),
> + OPT_BOOL(0, "no-parent-shortcut", &no_parent_shortcut, N_("Don't take shortcuts in some merges but handle cherry-picks better")),
> OPT_BOOL('b', NULL, &blank_boundary, N_("Show blank SHA-1 for boundary commits (Default: off)")),
> OPT_BOOL(0, "root", &show_root, N_("Do not treat root commits as boundaries (Default: off)")),
> OPT_BOOL(0, "show-stats", &show_stats, N_("Show work cost statistics")),
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] blame: new option to better handle merged cherry-picks
2014-01-02 20:29 ` Junio C Hamano
@ 2014-01-02 21:15 ` Bernhard R. Link
2014-01-02 21:48 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Bernhard R. Link @ 2014-01-02 21:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
* Junio C Hamano <gitster@pobox.com> [140102 21:29]:
> > This optimization, while being faster in the usual case, means that in
> > the case of cherry-picks the blamed commit depends on which other commits
> > touched a file.
> >
> > If for example one commit A modified both files b and c. And there are
> > commits B and C, B only modifies file b and C only modifies file c
> > (so that no conflicts happen), and assume A is cherry-picked as A'
> > and the two branches then merged:
> >
> > --o-----B---A
> > \ \
> > ---C---A'--M---
>
> So the contents of b at M is as the same as in A, so following 'b'
> will see A and B changed that path, which is correct.
>
> The contents of c at M is? It is different from A because at A c
> lacks the change made to it at C. The merged result at M would
> match C in A', no? So following 'c' will see A' and C changed that
> path, no?
>
> So what is wrong about it?
It's not wrong (that's why I do not suggest to change the default
behaviour), but it's inconsistent and can be a bit confusing to
have either the one or the other commit blamed depending on whether
some file was touched or not.
The history I'm a bit more concerned is something like (with ...
being unrelated commits not touching B or C):
--o-----...---A--...---B---...--
\ \
---...---A'--...---C---...---M---
Here having B or C touching b or c determines which of A or A' is
blamed for which part of the patch.
It's even enough to have:
--...---A'--...---B---...--
/ \
---o---...---A--................---M---
To have the A/A' changes of c to be attributed to A while the b changes
are attributed to A'. I.e. you have a master branch that has commit A,
which is also cherry-picked to some previously forked side-branch.
Once that side-branch is merged back, parts of the change are attributed
to A' if they are in a file that is not touched otherwise in the main
branch.
> Also, when handling a merge, we have to handle parents sequencially,
> checking the difference between M with its first parent first, and
> then passing blame for the remaining common lines to the remaining
> parents. If you flip the order of parents of M when you merge A and
> A' in your original history, and with your patch, what would you
> see when you blame c? Wouldn't it notice that M:c is identical to c
> in its first parent (now A') and pass the whole blame to A' anyway
> with or without your change?
When giving git-blame the new option introduced with my patch, only
the order of parents determines which commit is blamed. Without
the option (i.e. the currently only possible behaviour) which commit
is blamed depends what else touches other parts of the file.
If both branches make modifications to the file (or if there is
any merge conflict resolution in the merge) then the bahaviour with
or without the option are the same.
But in the example with one commit B touching also b and one commit C
touching also c, there is (without the new option) always one part
of the cherry-picked commit is blamed on the original and one on the
cherry-picked, no matter how you order the parents.
(While by having your mainline always the most leftward parent, with
the the new option you always get those commit blamed that is the
"first one this was introduced to mainline".)
Bernhard R. Link
--
F8AC 04D5 0B9B 064B 3383 C3DA AFFC 96D1 151D FFDC
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] blame: new option to better handle merged cherry-picks
2014-01-02 21:15 ` Bernhard R. Link
@ 2014-01-02 21:48 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2014-01-02 21:48 UTC (permalink / raw)
To: Bernhard R. Link; +Cc: git
"Bernhard R. Link" <brl+git@mail.brlink.eu> writes:
> When giving git-blame the new option introduced with my patch, only
> the order of parents determines which commit is blamed. Without
> the option (i.e. the currently only possible behaviour) which commit
> is blamed depends what else touches other parts of the file.
I am trying to figure out why that difference matters, in other
words, when using the new option is actually useful. You give the
command a scenario that can be solved in two equally valid ways
(blaming to either A or A' is equally valid), and sometimes the
command gives the identical result with or without the new option,
and some other times the user gets a different but an equally valid
result (but after traversing more history spending more cycles). I
am not sure what problem the new option solves. I am trying to come
up with an easy-to-understand explanation to the end users: "If you
want to see blame's result with the property X, use this option---it
may have to spend extra cycles, but the property X is so desirable
that it may be worth it". And I am having a hard time understanding
what that X is.
> But in the example with one commit B touching also b and one commit C
> touching also c, there is (without the new option) always one part
> of the cherry-picked commit is blamed on the original and one on the
> cherry-picked, no matter how you order the parents.
Yeah, the cherry-picked one will introduce the same change as the
one that was cherry-picked, so if you look at the end result and ask
"where did _this_ line come from?", there are two equally plausible
candidates, as "blame" output can give only one answer to each line.
I still do not see why the one that is picked with the new option is
better. At best, it looks to me that it is saying "running with
this option may (or may not) give a different answer, so run the
command with and without it and see which one you like", which does
not sound too useful to the end users. That is where my confusion
comes from.
> (While by having your mainline always the most leftward parent, with
> the the new option you always get those commit blamed that is the
> "first one this was introduced to mainline".)
Yes, I vaguely recall we talked about adding --first-parent option
to the command in the past. I do not remember what came out of that
discussion.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-01-02 21:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-02 17:55 [RFC] blame: new option to better handle merged cherry-picks Bernhard R. Link
2014-01-02 20:29 ` Junio C Hamano
2014-01-02 21:15 ` Bernhard R. Link
2014-01-02 21:48 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).