* Finally implement "git log --follow"
@ 2007-06-19 21:22 Linus Torvalds
2007-06-19 21:35 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-06-19 21:22 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
----
NOTE! For obvious enough reasons, I limited the rename following to just
*one* pathname. If somebody wants to extend it to any more, be my guest.
Also, I really might have some silly bug somewhere.
"It Works For Me(tm)", and I tested it with both the git archive and the
kernel (block/ll_rw_block.c) and it worked beautifully, but I'll also
admit that the patch is a bit _too_ clever for my taste normally. The
patch actually looks more straightforward than it really is: that "hook
directly into diff_tree_sha1()" thing is just too damn clever for words.
People who want to improve on it should get rid of the memory leak I
introduced - I decided to not bother cleaning up the whole rename diff
queue, I just reset it. I'm lazy, and I'm a *man*. I do the rough manly
stuff, others can clean up after me.
*Burp*. That hit the spot. *Scratch*
Linus
---
builtin-log.c | 5 ++++
diff.c | 2 +
diff.h | 1 +
revision.c | 4 ++-
tree-diff.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 70 insertions(+), 1 deletions(-)
diff --git a/builtin-log.c b/builtin-log.c
index 0aede76..a26a010 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -58,6 +58,11 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
argc = setup_revisions(argc, argv, rev, "HEAD");
if (rev->diffopt.pickaxe || rev->diffopt.filter)
rev->always_show_header = 0;
+ if (rev->diffopt.follow_renames) {
+ rev->always_show_header = 0;
+ if (rev->diffopt.nr_paths != 1)
+ usage("git logs can only follow renames on one pathname at a time");
+ }
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
if (!strcmp(arg, "--decorate")) {
diff --git a/diff.c b/diff.c
index 4aa9bbc..9938969 100644
--- a/diff.c
+++ b/diff.c
@@ -2210,6 +2210,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
}
else if (!strcmp(arg, "--find-copies-harder"))
options->find_copies_harder = 1;
+ else if (!strcmp(arg, "--follow"))
+ options->follow_renames = 1;
else if (!strcmp(arg, "--abbrev"))
options->abbrev = DEFAULT_ABBREV;
else if (!prefixcmp(arg, "--abbrev=")) {
diff --git a/diff.h b/diff.h
index a7ee6d8..9fd6d44 100644
--- a/diff.h
+++ b/diff.h
@@ -55,6 +55,7 @@ struct diff_options {
full_index:1,
silent_on_remove:1,
find_copies_harder:1,
+ follow_renames:1,
color_diff:1,
color_diff_words:1,
has_changes:1,
diff --git a/revision.c b/revision.c
index 1f4590b..7834bb1 100644
--- a/revision.c
+++ b/revision.c
@@ -1230,7 +1230,9 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
if (revs->prune_data) {
diff_tree_setup_paths(revs->prune_data, &revs->pruning);
- revs->prune_fn = try_to_simplify_commit;
+ /* Can't prune commits with rename following: the paths change.. */
+ if (!revs->diffopt.follow_renames)
+ revs->prune_fn = try_to_simplify_commit;
if (!revs->full_diff)
diff_tree_setup_paths(revs->prune_data, &revs->diffopt);
}
diff --git a/tree-diff.c b/tree-diff.c
index 852498e..42924e9 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -3,6 +3,7 @@
*/
#include "cache.h"
#include "diff.h"
+#include "diffcore.h"
#include "tree.h"
static char *malloc_base(const char *base, int baselen, const char *path, int pathlen)
@@ -290,6 +291,59 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, stru
return 0;
}
+/*
+ * Does it look like the resulting diff might be due to a rename?
+ * - single entry
+ * - not a valid previous file
+ */
+static inline int diff_might_be_rename(void)
+{
+ return diff_queued_diff.nr == 1 &&
+ !DIFF_FILE_VALID(diff_queued_diff.queue[0]->one);
+}
+
+static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, const char *base, struct diff_options *opt)
+{
+ struct diff_options diff_opts;
+ const char *paths[2];
+ int i;
+
+ diff_setup(&diff_opts);
+ diff_opts.recursive = 1;
+ diff_opts.detect_rename = DIFF_DETECT_RENAME;
+ diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+ diff_opts.single_follow = opt->paths[0];
+ paths[0] = NULL;
+ diff_tree_setup_paths(paths, &diff_opts);
+ if (diff_setup_done(&diff_opts) < 0)
+ die("unable to set up diff options to follow renames");
+ diff_tree(t1, t2, base, &diff_opts);
+ diffcore_std(&diff_opts);
+
+ /* NOTE! Ignore the first diff! That was the old one! */
+ for (i = 1; i < diff_queued_diff.nr; i++) {
+ struct diff_filepair *p = diff_queued_diff.queue[i];
+
+ /*
+ * Found a source? Not only do we use that for the new
+ * diff_queued_diff, we also use that as the path in
+ * the future!
+ */
+ if ((p->status == 'R' || p->status == 'C') && !strcmp(p->two->path, opt->paths[0])) {
+ diff_queued_diff.queue[0] = p;
+ opt->paths[0] = xstrdup(p->one->path);
+ diff_tree_setup_paths(opt->paths, opt);
+ break;
+ }
+ }
+
+ /*
+ * Then, ignore any but the first entry! It might be the old one,
+ * or it might be the rename/copy we found
+ */
+ diff_queued_diff.nr = 1;
+}
+
int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, struct diff_options *opt)
{
void *tree1, *tree2;
@@ -306,6 +360,11 @@ int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const cha
init_tree_desc(&t1, tree1, size1);
init_tree_desc(&t2, tree2, size2);
retval = diff_tree(&t1, &t2, base, opt);
+ if (opt->follow_renames && diff_might_be_rename()) {
+ init_tree_desc(&t1, tree1, size1);
+ init_tree_desc(&t2, tree2, size2);
+ try_to_follow_renames(&t1, &t2, base, opt);
+ }
free(tree1);
free(tree2);
return retval;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-19 21:22 Finally implement "git log --follow" Linus Torvalds
@ 2007-06-19 21:35 ` Linus Torvalds
2007-07-04 13:21 ` Marco Costalba
2007-06-20 6:27 ` Marco Costalba
2007-06-21 1:15 ` Linus Torvalds
2 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-06-19 21:35 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
On Tue, 19 Jun 2007, Linus Torvalds wrote:
>
> People who want to improve on it should get rid of the memory leak I
> introduced - I decided to not bother cleaning up the whole rename diff
> queue, I just reset it. I'm lazy, and I'm a *man*. I do the rough manly
> stuff, others can clean up after me.
>
> *Burp*. That hit the spot. *Scratch*
On that note, if people want to decide that "git log" should *default* to
follow when one filename is given, that's ok by me. A few warnings:
- You should only do it for the "single file" case, and that's very
*different* from the "single pattern" case. I often do "git log" on a
single directory, and that had better not start doing any "filename
following" by default.
So before turning on "follow" automatically, it should be checked that
that exact file exists right now (might be as simplistic as just doing
a "lstat()" and verifying that it's a file or symlink in the current
working tree)
- the commit simplification isn't done. So you cannot do "--follow"
together with somethign that wants a contiguous history (like gitk) and
uses "--parents".
This is pretty fundamental. The commit simplification is a separate
phase over the (non-linear!) commit space, and the "git log --follow"
logic is really a *different* logic over the _linear_ space (namely the
streaming "log output" space).
Two totally different things, in other words. And not compatible.
So if we want to turn on "follow" by default, then I'm certainly ok with
it, but it needs to be done with care. so I'd almost suggest just adding
an alias instead, aka
git config alias.follow 'log --follow'
and then doing
git follow filename
rather than changing the behaviour of "git log" itself.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-19 21:35 ` Linus Torvalds
@ 2007-07-04 13:21 ` Marco Costalba
0 siblings, 0 replies; 11+ messages in thread
From: Marco Costalba @ 2007-07-04 13:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
On 6/19/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> - the commit simplification isn't done. So you cannot do "--follow"
> together with somethign that wants a contiguous history (like gitk) and
> uses "--parents".
>
> This is pretty fundamental. The commit simplification is a separate
> phase over the (non-linear!) commit space, and the "git log --follow"
> logic is really a *different* logic over the _linear_ space (namely the
> streaming "log output" space).
>
> Two totally different things, in other words. And not compatible.
>
Of course I need exactly that!
Really don't want to thrash away the nice qgit graph, also in file
histories, and at the same time I really like that '--follow' feature,
so....
I've just started to hit the wall against the scary revision.c and
friends, my goal is to find a solution (trick/hack) to get the renames
also with --parents.
First of all I have commented out:
/* Can't prune commits with rename following: the paths change.. */
// if (!revs->diffopt.follow_renames)
revs->prune_fn = try_to_simplify_commit;
So I've see that with --parents set try_to_simplify_commit() does his
job and when the processing arrives in diff_tree_sha1() it is already
pruned, so also if the check
if (opt->follow_renames && diff_might_be_rename())
is true and the following try_to_follow_renames() is called, no
alternative tree is found.
My idea, after have understood a little bit better how it works, is to
'resurrect' the pruned info before calling try_to_follow_renames().
To do this I was thinking of saving a pointer to some pruned tree
information in try_to_simplify_commit() and in case use that pointer
after in diff_tree_sha1() when needed.
Nightmares??? Perhaps, I'm just starting with this difficult piece of
code and probably (90%) I will give up before the end and very
probably (99%) all said before is just brain dead, but...anyway I have
to start from somewhere.
Marco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-19 21:22 Finally implement "git log --follow" Linus Torvalds
2007-06-19 21:35 ` Linus Torvalds
@ 2007-06-20 6:27 ` Marco Costalba
2007-06-20 16:47 ` Linus Torvalds
2007-06-21 1:15 ` Linus Torvalds
2 siblings, 1 reply; 11+ messages in thread
From: Marco Costalba @ 2007-06-20 6:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
On 6/19/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Ok, I've really held off doing this too damn long, because I'm lazy, and I
> was always hoping that somebody else would do it.
>
> But no, people keep asking for it, but nobody actually did anything, so I
> decided I might as well bite the bullet, and instead of telling people
> they could add a "--follow" flag to "git log" to do what they want to do,
> I decided that it looks like I just have to do it for them..
>
Thanks,
now I have one more reason to switch from using git-rev-list to
git-log in qgit, file history annotation, that is currently based on
git-rev-list, will automatically gain a powerful feature.
Marco
P.S: The only thing that is still missing in git-log against
git-rev-list is a --stdin option. This is needed because with 40 chars
SHA the command line could became very long in case of repos with many
tags/branches or when loading StGIT unapplied patches (one sha per
patch) and one git-rev-list call to loading them all.
Without --stdin we break easily on _some_ OS and in the worst cases (a
lot of unapplied patches) *even* under Linux.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-20 6:27 ` Marco Costalba
@ 2007-06-20 16:47 ` Linus Torvalds
2007-06-21 6:21 ` Marco Costalba
0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-06-20 16:47 UTC (permalink / raw)
To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List
On Wed, 20 Jun 2007, Marco Costalba wrote:
>
> now I have one more reason to switch from using git-rev-list to
> git-log in qgit, file history annotation, that is currently based on
> git-rev-list, will automatically gain a powerful feature.
Well, note my other email about how "--follow" does *not* play together
with "--parents", because the two do totally different kinds of filtering
of the commits. So yes, you can use it for file history annotation, but
not for the "graphical history" part that depends on parenthood etc.
That said, you really would be better off using
git blame -M --incremental
for the history annotation. It does the rename following *and* so much
more.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-20 16:47 ` Linus Torvalds
@ 2007-06-21 6:21 ` Marco Costalba
2007-06-21 15:44 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Marco Costalba @ 2007-06-21 6:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
On 6/20/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> That said, you really would be better off using
>
> git blame -M --incremental
>
That's how it is currently inplemented in qgit:
- Main view revision list + graph: git-rev-list
- File history and annotation: git-rev-list
The possible options are:
ALL GIT LOG
-------------------
- Main view revision list + graph: git-log
- File history and annotation: git-log
Good: easy implementation and a lot of common code to share among
modules, all the current features (see "range fltering" below) are
preserved
Bad: Currently git-log does not support --stdin option, required IMHO
when git-log is runned by a tool, not a user, due to the possibility
of a very long command line.
MIXED
---------
- Main view revision list + graph: git-rev-list
- File history and annotation: git-blame
Good: Gain additional features of git-blame against git-log
Bad: FWIK "git blame -M --incremental" does not support annotating all
the files in history in one go, as it is possible both with
git-rev-list and git-log. You have to re-annotate the new file when
browsing file history with the mouse.
Impementation cannot reuse big chunks of code, so a lot of new code is
needed and probably old one will not disappear.
FWIK "git blame -M --incremental" does not support "code lines range
filtering", when with the mouse you select some lines of code and
after filtering you see the subset of file's history that modified
that range of code. This feature is currently supported by qgit
annotating code.
Also jump to the same currently selected code line when switching to a
different version in file history it is currently supported by
annotate code.
INTERESTING
--------------------
- Main view revision list + graph: git-log
- File history and annotation: git-blame
A curious mix.
Some advice?
Thanks
Marco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-21 6:21 ` Marco Costalba
@ 2007-06-21 15:44 ` Linus Torvalds
2007-06-21 15:55 ` Johannes Schindelin
0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-06-21 15:44 UTC (permalink / raw)
To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List
On Thu, 21 Jun 2007, Marco Costalba wrote:
>
> Bad: Currently git-log does not support --stdin option, required IMHO
> when git-log is runned by a tool, not a user, due to the possibility
> of a very long command line.
I do think we should just fix this. The patch to do so can't be *that*
bad.
I'll look at it.
> FWIK "git blame -M --incremental" does not support "code lines range
> filtering", when with the mouse you select some lines of code and
> after filtering you see the subset of file's history that modified
> that range of code. This feature is currently supported by qgit
> annotating code.
Yeah, that sounds like it would be a good thing to fix.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-21 15:44 ` Linus Torvalds
@ 2007-06-21 15:55 ` Johannes Schindelin
2007-06-21 16:09 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2007-06-21 15:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List
Hi,
On Thu, 21 Jun 2007, Linus Torvalds wrote:
> On Thu, 21 Jun 2007, Marco Costalba wrote:
> >
> > Bad: Currently git-log does not support --stdin option, required IMHO
> > when git-log is runned by a tool, not a user, due to the possibility
> > of a very long command line.
>
> I do think we should just fix this. The patch to do so can't be *that*
> bad.
The only quirk here is that "--stdin" makes no sense with a pager.
Therefore, you'd have to move the automatic pager invocation to _after_
option parsing.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-21 15:55 ` Johannes Schindelin
@ 2007-06-21 16:09 ` Linus Torvalds
0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-06-21 16:09 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List
On Thu, 21 Jun 2007, Johannes Schindelin wrote:
>
> The only quirk here is that "--stdin" makes no sense with a pager.
> Therefore, you'd have to move the automatic pager invocation to _after_
> option parsing.
I don't know.. The pager invocation already checks whether the output is a
tty. Quite frankly, I think that is sufficient for all cases we're
discussing right now (ie anybody who uses "--stdin" would *also* redirect
the output!), and we could certainly make it check that both the input
*and* the output is a tty.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Finally implement "git log --follow"
2007-06-19 21:22 Finally implement "git log --follow" Linus Torvalds
2007-06-19 21:35 ` Linus Torvalds
2007-06-20 6:27 ` Marco Costalba
@ 2007-06-21 1:15 ` Linus Torvalds
2007-06-21 1:37 ` Junio C Hamano
2 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-06-21 1:15 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Junio, I note that you merged this into 'next', and also:
On Tue, 19 Jun 2007, Linus Torvalds wrote:
>
> [ long and boring explanation ]
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ----
Note the *four* dashes instead of three, so:
> NOTE! For obvious enough reasons, I limited the rename following to just
> *one* pathname. If somebody wants to extend it to any more, be my guest.
>
> Also, I really might have some silly bug somewhere.
>
> "It Works For Me(tm)", and I tested it with both the git archive and the
> kernel (block/ll_rw_block.c) and it worked beautifully, but I'll also
> admit that the patch is a bit _too_ clever for my taste normally. The
> patch actually looks more straightforward than it really is: that "hook
> directly into diff_tree_sha1()" thing is just too damn clever for words.
>
> People who want to improve on it should get rid of the memory leak I
> introduced - I decided to not bother cleaning up the whole rename diff
> queue, I just reset it. I'm lazy, and I'm a *man*. I do the rough manly
> stuff, others can clean up after me.
>
> *Burp*. That hit the spot. *Scratch*
This also hit the git archives.
My burping and scratching wasn't really *meant* to be part of the official
record, but hey, there you have it.
No problem, and maybe you noticed and thought it amusing, but in case it
went unnoticed, you may want to clean that up when/if moving it to
"master".
Or not. Your choice.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-07-04 13:21 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-19 21:22 Finally implement "git log --follow" Linus Torvalds
2007-06-19 21:35 ` Linus Torvalds
2007-07-04 13:21 ` Marco Costalba
2007-06-20 6:27 ` Marco Costalba
2007-06-20 16:47 ` Linus Torvalds
2007-06-21 6:21 ` Marco Costalba
2007-06-21 15:44 ` Linus Torvalds
2007-06-21 15:55 ` Johannes Schindelin
2007-06-21 16:09 ` Linus Torvalds
2007-06-21 1:15 ` Linus Torvalds
2007-06-21 1:37 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).