* gsoc - Better git log --follow support
@ 2011-03-19 19:24 Michał Łowicki
2011-03-19 21:57 ` GSoC - Better "git log --follow" support Jakub Narebski
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Michał Łowicki @ 2011-03-19 19:24 UTC (permalink / raw)
To: git
Hi!
I'm looking at idea about better git log --follow support from
https://git.wiki.kernel.org/index.php/SoC2011Ideas .There is something
like this - "[.. ] it does not interact well with git's usual history
simplification [...]". Can someone elaborate this? I've found History
Simplification in git rev-list man but don't know yet about issues
with --follow.
--
BR,
Michał Łowicki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GSoC - Better "git log --follow" support
2011-03-19 19:24 gsoc - Better git log --follow support Michał Łowicki
@ 2011-03-19 21:57 ` Jakub Narebski
2011-03-21 12:24 ` gsoc - Better git log --follow support Jeff King
2011-04-13 21:04 ` Michał Łowicki
2 siblings, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2011-03-19 21:57 UTC (permalink / raw)
To: Michał Łowicki; +Cc: git
Michał Łowicki <mlowicki@gmail.com> writes:
> I'm looking at idea about better git log --follow support from
> https://git.wiki.kernel.org/index.php/SoC2011Ideas .There is something
> like this - "[.. ] it does not interact well with git's usual history
> simplification [...]". Can someone elaborate this? I've found History
> Simplification in git rev-list man but don't know yet about issues
> with --follow.
Well, '--follow' option to git-log is a bit of bolted-on hack.
It does work only for single file (it doesn't work e.g. for
directory), and it not always work correctly. For example
$ git log --follow gitweb/gitweb.perl
correctly follows gitweb history across gitweb.cgi => gitweb.perl
rename in 5d043a3 (gitweb: fill in gitweb configuration by Makefile,
2006-08-01), but it doesn't follow through subtree merging of gitweb
repository in 0a8f4f0 (Merge git://git.kernel.org/pub/scm/git/gitweb,
2006-06-10).
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-19 19:24 gsoc - Better git log --follow support Michał Łowicki
2011-03-19 21:57 ` GSoC - Better "git log --follow" support Jakub Narebski
@ 2011-03-21 12:24 ` Jeff King
2011-03-22 23:23 ` Michał Łowicki
2011-04-13 21:04 ` Michał Łowicki
2 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-03-21 12:24 UTC (permalink / raw)
To: Michał Łowicki; +Cc: git
On Sat, Mar 19, 2011 at 08:24:20PM +0100, Michał Łowicki wrote:
> I'm looking at idea about better git log --follow support from
> https://git.wiki.kernel.org/index.php/SoC2011Ideas .There is something
> like this - "[.. ] it does not interact well with git's usual history
> simplification [...]". Can someone elaborate this? I've found History
> Simplification in git rev-list man but don't know yet about issues
> with --follow.
In short, history simplification is a way of looking at a subset of the
commit history graph, but in a way that makes it look like a complete
graph. Imagine I have a linear history like this:
A--B--C
where "A" modifies "file1", "B" modifies "file2", and "C" modifies
"file1" again. If I ask for the history of "file1" with "git log file1",
then git will pretend as if the graph looks like:
A--C
including rewriting the parent of "C" to point to "A" (because the
parent pointer is basically an edge in the graph).
If you are just doing a straight "git log", the actual parentage is not
that interesting. We either show commits or we don't, and we don't show
links between them. But try "git log --graph" or "gitk", which do care
about the edges. They want to show you a whole connected graph.
Now consider --follow. It doesn't happen during the commit limiting
phase, but instead it happens while we're showing commits. And if it
decides a commit isn't interesting, we don't show it. That works OK for
"git log", but it makes the graph for other things disjointed.
You can see it in this example:
# make the A-B-C repo we mentioned above
git init repo && cd repo
echo content >file1 && git add file1 && git commit -m one
echo content >file2 && git add file2 && git commit -m two
echo content >>file1 && git add file1 && git commit -m three
# Now look at it in gitk; we see a nice linear graph.
gitk
# Now let's try it with path limiting. We see a nice subgraph that
# pretends to be linear, because we "squished" out the uninteresting
# nodes.
gitk file1
# Now let's make some more commits with a rename.
echo content >>file2 && git commit -a -m four
git mv file1 newfile && git commit -m five
echo content >>newfile && git commit -a -m six
# If we use path limiting, we'll only see the two most recent commits.
# We get stopped at the rename because path limiting is just about the
# pathname.
gitk newfile
# So we can use --follow to follow the rename. First let's try simple
# output. You should see commits 1, 3, 5, and 6, which touched either
# newfile or its rename source, file1.
git log --oneline --follow newfile
# But now look at it in gitk. Commit 4 is included as a boundary
# commit, but we fail to notice that it connects to three. And we
# don't see commit 3 connecting to anything, and commit 1 is missing
# entirely.
gitk --follow newfile
Obviously this a pretty simplistic example. But you can imagine in a
history with a lot of branching how useful this simplification is to
understanding what happened to a subset of the tree.
Jakub mentioned another example with gitweb's subtree merge not being
found by --follow. I haven't looked into that case, but it may be
related (or it may simply be a defect in follow finding the right
source).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-21 12:24 ` gsoc - Better git log --follow support Jeff King
@ 2011-03-22 23:23 ` Michał Łowicki
2011-03-23 16:20 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Michał Łowicki @ 2011-03-22 23:23 UTC (permalink / raw)
To: Jeff King; +Cc: git
W dniu 21 marca 2011 13:24 użytkownik Jeff King <peff@peff.net> napisał:
> On Sat, Mar 19, 2011 at 08:24:20PM +0100, Michał Łowicki wrote:
>
>> I'm looking at idea about better git log --follow support from
>> https://git.wiki.kernel.org/index.php/SoC2011Ideas .There is something
>> like this - "[.. ] it does not interact well with git's usual history
>> simplification [...]". Can someone elaborate this? I've found History
>> Simplification in git rev-list man but don't know yet about issues
>> with --follow.
>
> In short, history simplification is a way of looking at a subset of the
> commit history graph, but in a way that makes it look like a complete
> graph. Imagine I have a linear history like this:
>
> A--B--C
>
> where "A" modifies "file1", "B" modifies "file2", and "C" modifies
> "file1" again. If I ask for the history of "file1" with "git log file1",
> then git will pretend as if the graph looks like:
>
> A--C
>
> including rewriting the parent of "C" to point to "A" (because the
> parent pointer is basically an edge in the graph).
>
> If you are just doing a straight "git log", the actual parentage is not
> that interesting. We either show commits or we don't, and we don't show
> links between them. But try "git log --graph" or "gitk", which do care
> about the edges. They want to show you a whole connected graph.
>
> Now consider --follow. It doesn't happen during the commit limiting
> phase, but instead it happens while we're showing commits. And if it
> decides a commit isn't interesting, we don't show it. That works OK for
> "git log", but it makes the graph for other things disjointed.
>
> You can see it in this example:
>
> # make the A-B-C repo we mentioned above
> git init repo && cd repo
> echo content >file1 && git add file1 && git commit -m one
> echo content >file2 && git add file2 && git commit -m two
> echo content >>file1 && git add file1 && git commit -m three
>
> # Now look at it in gitk; we see a nice linear graph.
> gitk
>
> # Now let's try it with path limiting. We see a nice subgraph that
> # pretends to be linear, because we "squished" out the uninteresting
> # nodes.
> gitk file1
>
> # Now let's make some more commits with a rename.
> echo content >>file2 && git commit -a -m four
> git mv file1 newfile && git commit -m five
> echo content >>newfile && git commit -a -m six
>
> # If we use path limiting, we'll only see the two most recent commits.
> # We get stopped at the rename because path limiting is just about the
> # pathname.
> gitk newfile
>
> # So we can use --follow to follow the rename. First let's try simple
> # output. You should see commits 1, 3, 5, and 6, which touched either
> # newfile or its rename source, file1.
> git log --oneline --follow newfile
>
> # But now look at it in gitk. Commit 4 is included as a boundary
> # commit, but we fail to notice that it connects to three. And we
> # don't see commit 3 connecting to anything, and commit 1 is missing
> # entirely.
> gitk --follow newfile
Why commit 4 is displayed here (changes only file2) ?
# git log with graph works here OK. It displays six -- five .. --
three .. - one .In this case results shouldn't be similar to gitk ?
git log --graph --follow newfile
>
> Obviously this a pretty simplistic example. But you can imagine in a
> history with a lot of branching how useful this simplification is to
> understanding what happened to a subset of the tree.
>
> Jakub mentioned another example with gitweb's subtree merge not being
> found by --follow. I haven't looked into that case, but it may be
> related (or it may simply be a defect in follow finding the right
> source).
>
> -Peff
>
--
Pozdrawiam,
Michał Łowicki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-22 23:23 ` Michał Łowicki
@ 2011-03-23 16:20 ` Jeff King
2011-03-23 16:58 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-03-23 16:20 UTC (permalink / raw)
To: Michał Łowicki; +Cc: git
On Wed, Mar 23, 2011 at 12:23:45AM +0100, Michał Łowicki wrote:
> > # But now look at it in gitk. Commit 4 is included as a boundary
> > # commit, but we fail to notice that it connects to three. And we
> > # don't see commit 3 connecting to anything, and commit 1 is missing
> > # entirely.
> > gitk --follow newfile
>
> Why commit 4 is displayed here (changes only file2) ?
It's part of how gitk shows the graph. It shows all of the commits you
asked for with blue nodes, and then it shows the "boundary" commits with
a special white node. This lets you distinguish between actual root
commits (i.e., ones with no parents) and ones whose parents are simply
uninteresting to the current query.
> # git log with graph works here OK. It displays six -- five .. --
> three .. - one .In this case results shouldn't be similar to gitk ?
> git log --graph --follow newfile
Sort of. Notice the "..." in the output (it is easier to see with "git
log --graph --oneline --follow newfile). It is not showing the
simplified history, but instead indicates that there were some commits
omitted in between the two points. It doesn't make the output terrible
in such a simple linear case. But consider a case with branching:
# Our A-B-C repo
git init repo && cd repo
echo content >file1 && git add file1 && git commit -m one
echo content >file2 && git add file2 && git commit -m two
echo content >>file1 && git add file1 && git commit -m three
# Now make a side branch that also touches file1
git checkout -b side HEAD^
echo content >>file1 && git commit -a -m four
# And merge them back to together
git merge master
# And then do our other commits with rename on top
echo content >>file2 && git commit -a -m five
git mv file1 newfile && git commit -m six
echo content >>newfile && git commit -a -m seven
Showing "git log --graph --oneline --follow newfile" becomes a bit more
confusing. A simplified history would show "six" as the merge between
the two branches, but here it happens at some indeterminate point in the
history that is not shown.
And again, this is a simple example. For something more complex, try
this in git.git:
# We know builtin-add.c got renamed to builtin/add.c, so
# let's cheat and tell git which paths we're interested in.
# The resulting graph is pretty readable, and is more or less what we
# would want from --follow.
git log --oneline --graph -- builtin-add.c builtin/add.c
# Now try it with --follow. Not so pretty.
git log --oneline --graph --follow builtin/add.c
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-23 16:20 ` Jeff King
@ 2011-03-23 16:58 ` Junio C Hamano
2011-03-23 17:06 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-03-23 16:58 UTC (permalink / raw)
To: Jeff King; +Cc: Michał Łowicki, git
Jeff King <peff@peff.net> writes:
> # Now try it with --follow. Not so pretty.
> git log --oneline --graph --follow builtin/add.c
Is that an artifact of history simplification?
I've always thought that it was because --follow hack used a single global
pathspec that flipped at a rename boundary,regardless of which part of the
history (i.e. the branch that was before the rename or after the rename)
it is following. So if you have two branches merged together:
o---o---o---o---o---x---x---x
/ /
...o---o---o---x---x---x
where commits marked with 'x' has it under the new path while commits
marked with 'o' has it under the old path, and start to dig the history
from the rightmost commit, the hack notices the rename at the transition
between the "o---x" on the upper branch and from then on keep digging the
history using the old path as the pathspec. The commit history traversal
goes reverse-chronologically, so when inspecting the next commit, which is
the rightmost commit on the lower branch, the hack fails because it uses a
wrong pathspec (at that point it should still be using the new path as the
pathspec, but it already has switched to the old path).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-23 16:58 ` Junio C Hamano
@ 2011-03-23 17:06 ` Jeff King
2011-03-23 18:12 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-03-23 17:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Michał Łowicki, git
On Wed, Mar 23, 2011 at 09:58:11AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > # Now try it with --follow. Not so pretty.
> > git log --oneline --graph --follow builtin/add.c
>
> Is that an artifact of history simplification?
I think it's a combination of factors. The lack of history
simplification is why the graph is all choppy. The insanely wide
results, though, are probably due to the problem you mention below.
> I've always thought that it was because --follow hack used a single global
> pathspec that flipped at a rename boundary,regardless of which part of the
> history (i.e. the branch that was before the rename or after the rename)
> it is following. So if you have two branches merged together:
>
> o---o---o---o---o---x---x---x
> / /
> ...o---o---o---x---x---x
>
> where commits marked with 'x' has it under the new path while commits
> marked with 'o' has it under the old path, and start to dig the history
> from the rightmost commit, the hack notices the rename at the transition
> between the "o---x" on the upper branch and from then on keep digging the
> history using the old path as the pathspec. The commit history traversal
> goes reverse-chronologically, so when inspecting the next commit, which is
> the rightmost commit on the lower branch, the hack fails because it uses a
> wrong pathspec (at that point it should still be using the new path as the
> pathspec, but it already has switched to the old path).
When I prototyped the multi-file --follow last summer, I added newly
found source paths to the pathspec list instead of replacing them.
Strictly speaking, this can add unwanted commits when the names are
re-used for unrelated files (either the source name is used on a
parallel side branch, or the destination name is used in an earlier
file). But in practice it generates pretty good results, because those
corner cases don't tend to happen much.
Obviously a solution that always provides an exact right answer is
preferable to "pretty good results", but we'd have to keep in mind the
performance difference.
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-23 17:06 ` Jeff King
@ 2011-03-23 18:12 ` Junio C Hamano
2011-03-23 18:22 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-03-23 18:12 UTC (permalink / raw)
To: Jeff King; +Cc: Michał Łowicki, git
Jeff King <peff@peff.net> writes:
> Obviously a solution that always provides an exact right answer is
> preferable to "pretty good results", but we'd have to keep in mind the
> performance difference.
And that is why the current --follow hack was declared to be good enough
to give "pretty good results" by its inventor, no?
I still agree with it personally, and if we _were_ to improve it out of
"hack" status, we should aim to do the right thing (provided if there is a
"right thing" exists).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-23 18:12 ` Junio C Hamano
@ 2011-03-23 18:22 ` Jeff King
0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2011-03-23 18:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Michał Łowicki, git
On Wed, Mar 23, 2011 at 11:12:37AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Obviously a solution that always provides an exact right answer is
> > preferable to "pretty good results", but we'd have to keep in mind the
> > performance difference.
>
> And that is why the current --follow hack was declared to be good enough
> to give "pretty good results" by its inventor, no?
Absolutely. I just think we can make "pretty good" slightly better with
just a little more effort.
> I still agree with it personally, and if we _were_ to improve it out of
> "hack" status, we should aim to do the right thing (provided if there is a
> "right thing" exists).
Right. The problem is that I'm not sure we want to pay the performance
penalty to take it out of "hack" status. But that doesn't mean we can't
make it as good a hack as possible. :)
Actually, I think the non-hack version of it is not really --follow at
all, but more like Bo's line-level browser. But I think that still
leaves room for a solution like --follow that is perhaps a bit faster
and provides a pretty good answer.
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-03-19 19:24 gsoc - Better git log --follow support Michał Łowicki
2011-03-19 21:57 ` GSoC - Better "git log --follow" support Jakub Narebski
2011-03-21 12:24 ` gsoc - Better git log --follow support Jeff King
@ 2011-04-13 21:04 ` Michał Łowicki
2011-04-15 4:06 ` Jonathan Nieder
2 siblings, 1 reply; 12+ messages in thread
From: Michał Łowicki @ 2011-04-13 21:04 UTC (permalink / raw)
To: git; +Cc: Jeff King, jrnieder
Hi!
This is the first plan for this gsoc project. Jonathan Nieder
described me how --follow works and pointed me to a lot of sources for
studying. He also suggested where I should start a how it should look
like (thanks for that).
25.04 - 15.06
1) study the revision walking code
* understand its stages,
* improve Documentation/technical/api-revision-walking.txt (it
doesn't explain revision walking code stages so I could save others
some time in the future)
2) study the pathspec matching + limiting and rename detaction API
* possiblity to update/improve documenation here as well
3) figure out what state --follow will need to maintain, where it will
fit into the revision walking process and design new architecture for
it
16.06 - 26-08
4) implementation
I plan to spend about 2 months for the first 3 points. It's all about
poking the right developers and sending question to the mailing list.
I'll try to send some updates soon when I get through some basic
lecture and the most important code.
Any suggestions/ideas are as always welcome. Be prepare for many
questions from my side :)
Greetings,
Michał Łowicki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-04-13 21:04 ` Michał Łowicki
@ 2011-04-15 4:06 ` Jonathan Nieder
2011-04-15 19:41 ` Michał Łowicki
0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Nieder @ 2011-04-15 4:06 UTC (permalink / raw)
To: Michał Łowicki; +Cc: git, Jeff King
Hi,
Michał Łowicki wrote:
> 25.04 - 15.06
> 1) study the revision walking code
[...]
> 2) study the pathspec matching + limiting and rename detaction API
[...]
> 3) figure out what state --follow will need to maintain, where it will
> fit into the revision walking process and design new architecture for
> it
Ideally this should happen in the next couple of days, rather than the
next couple of months. Otherwise the project would be an unknown and
it would be hard in good conscience to accept funding for it.
That said, I am personally willing to help out in the next few days
(to help put a solid proposal together) and throughout the summer (to
fix git log --follow) regardless. I will be very happy when --follow
works reliably.
> 16.06 - 26-08
> 4) implementation
>
> I plan to spend about 2 months for the first 3 points. It's all about
> poking the right developers and sending question to the mailing list.
It's hard to say how the process of studying code works. Certainly
asking a question can be a good way to start, and reading code can
lead to more questions. Another strategy that can work well is to
take the plunge and see what effect changes to the code have.
> I'll try to send some updates soon when I get through some basic
> lecture and the most important code.
Ok. Remember it's okay to ask for help (though of course not so great
to demand it) if you get stuck or have no idea where to start on
something.
> Any suggestions/ideas are as always welcome. Be prepare for many
> questions from my side :)
Looking forward to it. If we end up with better technical
documentation as a side effect, all the better.
Regards,
Jonathan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: gsoc - Better git log --follow support
2011-04-15 4:06 ` Jonathan Nieder
@ 2011-04-15 19:41 ` Michał Łowicki
0 siblings, 0 replies; 12+ messages in thread
From: Michał Łowicki @ 2011-04-15 19:41 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Jeff King
W dniu 15 kwietnia 2011 06:06 użytkownik Jonathan Nieder
<jrnieder@gmail.com> napisał:
> Hi,
>
> Michał Łowicki wrote:
>
>> 25.04 - 15.06
>> 1) study the revision walking code
> [...]
>> 2) study the pathspec matching + limiting and rename detaction API
> [...]
>> 3) figure out what state --follow will need to maintain, where it will
>> fit into the revision walking process and design new architecture for
>> it
>
> Ideally this should happen in the next couple of days, rather than the
> next couple of months. Otherwise the project would be an unknown and
> it would be hard in good conscience to accept funding for it.
Right, I could try do this sooner but is it doable without deep
understanding of 1 and 2?
>
> That said, I am personally willing to help out in the next few days
> (to help put a solid proposal together) and throughout the summer (to
> fix git log --follow) regardless. I will be very happy when --follow
> works reliably.
Great!
>
>> 16.06 - 26-08
>> 4) implementation
>>
>> I plan to spend about 2 months for the first 3 points. It's all about
>> poking the right developers and sending question to the mailing list.
>
> It's hard to say how the process of studying code works. Certainly
> asking a question can be a good way to start, and reading code can
> lead to more questions. Another strategy that can work well is to
> take the plunge and see what effect changes to the code have.
>
>> I'll try to send some updates soon when I get through some basic
>> lecture and the most important code.
>
> Ok. Remember it's okay to ask for help (though of course not so great
> to demand it) if you get stuck or have no idea where to start on
> something.
>
>> Any suggestions/ideas are as always welcome. Be prepare for many
>> questions from my side :)
>
> Looking forward to it. If we end up with better technical
> documentation as a side effect, all the better.
>
> Regards,
> Jonathan
>
--
Pozdrawiam,
Michał Łowicki
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-04-15 19:41 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-19 19:24 gsoc - Better git log --follow support Michał Łowicki
2011-03-19 21:57 ` GSoC - Better "git log --follow" support Jakub Narebski
2011-03-21 12:24 ` gsoc - Better git log --follow support Jeff King
2011-03-22 23:23 ` Michał Łowicki
2011-03-23 16:20 ` Jeff King
2011-03-23 16:58 ` Junio C Hamano
2011-03-23 17:06 ` Jeff King
2011-03-23 18:12 ` Junio C Hamano
2011-03-23 18:22 ` Jeff King
2011-04-13 21:04 ` Michał Łowicki
2011-04-15 4:06 ` Jonathan Nieder
2011-04-15 19:41 ` Michał Łowicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).