* Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited?
@ 2024-08-26 19:00 Tao Klerks
2024-08-26 19:43 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Tao Klerks @ 2024-08-26 19:00 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee
Hi folks,
I've been working on a "git blame optimizer for partial clone repos",
following up on a thread with Derrick Stolee from 2021 (
https://lore.kernel.org/git/0b57cba9-3ab3-dfdf-5589-a0016eaea634@gmail.com/
), with the intention to pre-fetch all locally-missing blobs for a
given file in the history of a branch/commit, and it ended up being
much more complex to do that I expected, basically because "git log
--follow -- SOMEFILE" doesn't return all the commits containing
versions of "SOMEFILE" that "git blame" will end up visiting.
I thought this strange/interesting, because (as far as I can tell), as
long as there are no renames in the history of the file, "git log --
SOMEFILE", without the "--follow", *does* seem to return all the
commits containing unique versions of the file.
The only reference to this weirdness that I could find in the doc,
after ripping my hair out for a few hours, was in a note on
"log.follow" config, under the "git log" doc eg at
https://git-scm.com/docs/git-log : "This has the same limitations as
--follow, i.e. it cannot be used to follow multiple files and does not
work well on non-linear history."
What seems weird and interesting to me, is that whatever is going
"wrong" in "git log --follow" doesn't happen in "git blame". I
couldn't find an easy way to demo/prove it, but I found experimentally
that the set of blobs examined by "git blame" (and fetched
just-in-time if-needed in a partial clone) is larger than the set of
blobs in commits output by "git log --follow -- FILENAME", but not
larger than the set of blobs in commits output by "git log --
FILENAME" (for a file that has not been renamed).
You can see the strange effect that "--follow" has by comparing a run
with and without on "git.c" in the git project for example - a file
that was never renamed:
git log --pretty='%H' -- git.c | sort | uniq > ~/test1 # 717 commits
git log --pretty='%H' --follow -- git.c | sort | uniq > ~/test2 # 537 commits
You'll find that with "--follow", you get a couple extra commits, and
a whole bunch missing. You can try to fill them in with
"--full-history" etc, but while such options are *accepted*, they are
also completely and utterly *ignored*.
Insofar as this is a known issue... is there an intelligible reason
for it? Should be something we aspire to fix, or should the doc be
improved to make it more obvious what this option does and doesn't do?
Thanks,
Tao
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited?
2024-08-26 19:00 Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited? Tao Klerks
@ 2024-08-26 19:43 ` Junio C Hamano
2024-08-26 22:52 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2024-08-26 19:43 UTC (permalink / raw)
To: Tao Klerks; +Cc: git, Derrick Stolee
Tao Klerks <tao@klerks.biz> writes:
> What seems weird and interesting to me, is that whatever is going
> "wrong" in "git log --follow" doesn't happen in "git blame".
Yes, because log.follow was done as a checkbox item but blame was
done as a real feature ;-)
In tree-diff.c:try_to_follow_renames(), you'll notice that it only
has a space to remember a single path in .single_follow member in
the diff_opts. That member is the hack.
Imagine that the original commit had paths A and B, and over time,
the history diverged and in one fork A got renamed to C while
another fork B got renamed to C. Eventually these two forks merge.
Now you want to "follow" C, so .single_follow member will have C.
----1----3----5(rename A to C)----7---9---10---11
\ /
2----4----6(rename B to C)----8
You follow the history of one fork and notice that C came from A at
commit #5. Great. Your .pathspec member will be _switched_ to A
and you keep following the history of A.
Imagine further that your history traversal didn't follow one fork
fully before following the other fork, but dug commits from newer to
older, so your traversal jumps around between two forks. What
happens when your "git log --follow HEAD -- C" that has internally
switched to follow A already jumps back to follow the other fork at
this point? It does see that A exists (maybe unchanged), and you
see A's history, but that is not a releavant history---what ended up
in the final C from that fork was in B, not A.
Unlike the above checkbox hack, "git blame" uses a real data
structure to keep track of what came from where. Instead of a
global "this single path is what interests us now", it knows "in
this commit, this is the path we are looking at", and when it looks
at the parents of that commit, it checks where that path the child
was interested in came from each different parent, and records a
similar "in this commit (which is parent of the commit we were
looking at), this path is what we are interested in".
To equip "git log --follow" with similar "correctness" as "git
blame", you'd need to somehow stop using that single .pathspec thing
for the purpose of keeping track of "which path are we following
now?" and instead use "this is the path we are following" that is
per history traversal path.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited?
2024-08-26 19:43 ` Junio C Hamano
@ 2024-08-26 22:52 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2024-08-26 22:52 UTC (permalink / raw)
To: Tao Klerks; +Cc: git, Derrick Stolee
Junio C Hamano <gitster@pobox.com> writes:
> Unlike the above checkbox hack, "git blame" uses a real data
> structure to keep track of what came from where. Instead of a
> global "this single path is what interests us now", it knows "in
> this commit, this is the path we are looking at", and when it looks
> at the parents of that commit, it checks where that path the child
> was interested in came from each different parent, and records a
> similar "in this commit (which is parent of the commit we were
> looking at), this path is what we are interested in".
FWIW, the above is greatly simplified. For "git blame" to correctly
handle a case like "This commit created file F by taking pieces from
files A, B, C, D, and E", and annotating the lines in file F, we
need to keep track of the set of "lines n..m of path A", "lines l..k
of path B", etc., at commit X as the targets of interest, and as we
dig down the history, figure out where in the parent commits of X
each of these range of lines come from. So what "blame" uses is
much richer than just a single path per commit being traversed (once
the traversal passes through from a commit to all of its parents,
this list of "line ranges per path" can be released, so that is not
a huge memory burden even for a deep history).
Now "git log --follow" does not have to keep track of range of
lines, but if you start following from file F that was created by
concatenating pieces of multiple existing files A, B, ..., and E,
you either want to pick one of these 5 and follow it, or you replace
F with all five of these files and follow them from that point. In
any case, you need a richer data structure than the current (ab)use
of the .pathspec member during the traversal.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-26 22:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 19:00 Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited? Tao Klerks
2024-08-26 19:43 ` Junio C Hamano
2024-08-26 22:52 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox