git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git log follow question
@ 2010-05-14  0:57 Albert Krawczyk
  2010-05-14  4:16 ` Bo Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Albert Krawczyk @ 2010-05-14  0:57 UTC (permalink / raw)
  To: git

Hi Everybody,

I'm having an issue understanding the way git log --follow works with git
log --parents

When I run 
git log --parents --pretty=format:Commit:%H%nParent:%P%n%n alloc.c
I get: 
Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
Parent:100c5f3b0b27ec6617de1a785c4ff481e92636c1

Commit:100c5f3b0b27ec6617de1a785c4ff481e92636c1
Parent:2c1cbec1e2f0bd7b15fe5e921d287babfd91c7d3

Commit:2c1cbec1e2f0bd7b15fe5e921d287babfd91c7d3
Parent:579d1fbfaf25550254014fa472faac95f88eb779

Commit:579d1fbfaf25550254014fa472faac95f88eb779
Parent:855419f764a65e92f1d5dd1b3d50ee987db1d9de

Commit:855419f764a65e92f1d5dd1b3d50ee987db1d9de
Parent:

When I try to run git log --parents --follow I get this:
git log --parents --follow --pretty=format:Commit:%H%nParent:%P%n%n alloc.c

Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
Parent:75b44066f3ed7cde238cdea1f0bf9e2f1744c820

Commit:100c5f3b0b27ec6617de1a785c4ff481e92636c1
Parent:2c1cbec1e2f0bd7b15fe5e921d287babfd91c7d3

Commit:2c1cbec1e2f0bd7b15fe5e921d287babfd91c7d3
Parent:f948792990f82a35bf0c98510e7511ef8acb9cd3

Commit:579d1fbfaf25550254014fa472faac95f88eb779
Parent:446c6faec69f7ac521b8b9fc2b1874731729032f

Commit:855419f764a65e92f1d5dd1b3d50ee987db1d9de
Parent:64e86c57867593ba0ee77a7b0ff0eb8e9d4d8ed5

As you can see git log --parents and git log --follow --parents produce very
different results, and as far as I can tell they should produce identical
outputs. 

Could somebody tell me if I'm doing something wrong with the syntax? Or have
I stumbled onto a quirk I fail to understand?

Thanks,
Albert 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14  0:57 Git log follow question Albert Krawczyk
@ 2010-05-14  4:16 ` Bo Yang
  2010-05-14  4:37 ` Jeff King
       [not found] ` <21464_1273811837_4BECD37D_21464_745_1_20100514043704.GC6075@coredump.intra.peff.net>
  2 siblings, 0 replies; 12+ messages in thread
From: Bo Yang @ 2010-05-14  4:16 UTC (permalink / raw)
  To: Albert Krawczyk; +Cc: git

Hi Albert,
On Fri, May 14, 2010 at 8:57 AM, Albert Krawczyk
<albert@burgmann.anu.edu.au> wrote:
> As you can see git log --parents and git log --follow --parents produce very
> different results, and as far as I can tell they should produce identical
> outputs.
>
> Could somebody tell me if I'm doing something wrong with the syntax? Or have
> I stumbled onto a quirk I fail to understand?

The problem you encountered is something about parent rewriting.
That's when git do a revision walk it will *modify* the actual parents
of a commit according on the command line options.
When you invoke, git log --parents , the parents rewriting mechanism
is on. Take you have five commits:
commit1 <- commit2 <- commit3 <- 4 <- 5
And only commit 5 and commit 2 change the file alloc.c. When you run
'git log --parents alloc.c', the actual parent of commit 5 will be
modified to commit2. When you run git without --parents, commit 5's
parent will still be commit 4.
And when '--follow' is given, parent rewriting mechanism will be shut
down, so you get two different output.

For another words, I don't understand why we shut down the parent
rewriting when '--follow' given. This make users confusing and also
make --graph inpossible to work with '--follow'...

Regards!
Bo
-- 
My blog: http://blog.morebits.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14  0:57 Git log follow question Albert Krawczyk
  2010-05-14  4:16 ` Bo Yang
@ 2010-05-14  4:37 ` Jeff King
  2010-05-14 14:50   ` Linus Torvalds
       [not found] ` <21464_1273811837_4BECD37D_21464_745_1_20100514043704.GC6075@coredump.intra.peff.net>
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2010-05-14  4:37 UTC (permalink / raw)
  To: Albert Krawczyk; +Cc: Linus Torvalds, git

On Fri, May 14, 2010 at 10:57:34AM +1000, Albert Krawczyk wrote:

> I'm having an issue understanding the way git log --follow works with git
> log --parents
>
> When I run
> git log --parents --pretty=format:Commit:%H%nParent:%P%n%n alloc.c
> I get:
> Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
> Parent:100c5f3b0b27ec6617de1a785c4ff481e92636c1
>
> [...]
>
> When I try to run git log --parents --follow I get this:
> git log --parents --follow --pretty=format:Commit:%H%nParent:%P%n%n alloc.c
>
> Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
> Parent:75b44066f3ed7cde238cdea1f0bf9e2f1744c820

Hmm. The actual parent is 75b44066. You get 100c5f in the first case
because basic revision path-limiting simplifies the history graph to
remove uninteresting commits (and rewrites the parents).

So the answer isn't _wrong_ exactly, but it is less useful. Seeing the
simplified graph is generally what we want. This is a limitation of the
way --follow is implemented. It turns off history pruning because our
list of what to prune will be changing over time.

Probably we would have to special-case the FOLLOW_RENAMES code to
rewrite the parent list before display.

I'm cc'ing Linus, who has more of a clue in both of those areas than I
do.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Git log follow question
       [not found] ` <21464_1273811837_4BECD37D_21464_745_1_20100514043704.GC6075@coredump.intra.peff.net>
@ 2010-05-14  4:43   ` Albert Krawczyk
  0 siblings, 0 replies; 12+ messages in thread
From: Albert Krawczyk @ 2010-05-14  4:43 UTC (permalink / raw)
  To: 'Jeff King', struggleyb.nku; +Cc: git

Bo and Peff,

Thank you very much for your explanation. 

Linus, thanks in advance for looking at this.

Regards,
Albert 

-----Original Message-----
From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On Behalf Of Jeff King
Sent: Friday, 14 May 2010 2:37 PM
To: Albert Krawczyk
Cc: Linus Torvalds; git@vger.kernel.org
Subject: Re: Git log follow question

On Fri, May 14, 2010 at 10:57:34AM +1000, Albert Krawczyk wrote:

> I'm having an issue understanding the way git log --follow works with 
> git log --parents
>
> When I run
> git log --parents --pretty=format:Commit:%H%nParent:%P%n%n alloc.c I 
> get:
> Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
> Parent:100c5f3b0b27ec6617de1a785c4ff481e92636c1
>
> [...]
>
> When I try to run git log --parents --follow I get this:
> git log --parents --follow --pretty=format:Commit:%H%nParent:%P%n%n 
> alloc.c
>
> Commit:4b25d091ba53c758fae0096b8c0662371857b9d9
> Parent:75b44066f3ed7cde238cdea1f0bf9e2f1744c820

Hmm. The actual parent is 75b44066. You get 100c5f in the first case because basic revision path-limiting simplifies the history graph to remove uninteresting commits (and rewrites the parents).

So the answer isn't _wrong_ exactly, but it is less useful. Seeing the simplified graph is generally what we want. This is a limitation of the way --follow is implemented. It turns off history pruning because our list of what to prune will be changing over time.

Probably we would have to special-case the FOLLOW_RENAMES code to rewrite the parent list before display.

I'm cc'ing Linus, who has more of a clue in both of those areas than I do.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14  4:37 ` Jeff King
@ 2010-05-14 14:50   ` Linus Torvalds
  2010-05-14 15:19     ` Martin Langhoff
  2010-05-25  9:31     ` Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2010-05-14 14:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Albert Krawczyk, git



On Fri, 14 May 2010, Jeff King wrote:
> 
> I'm cc'ing Linus, who has more of a clue in both of those areas than I
> do.

I'm pretty sure I mentioned about this exact issue when I posted the 
original follow patches, and it basically boils down to: "--follow" is a 
total hack, and does _not_ use the regular commit filtering function, and 
as a result, fancy things like "--parent" don't really work well with it.

IOW, I'm not at all certain that it is fixable. "--follow_ is a very 
fundamentally non-gitty thing to do, and really is a complete hack. It's a 
fairly _small_ hack - if you didn't know better and looked at the source 
code, you might think that it fits very naturally into git. But no.

Now, it's possible that we could hack up --parent to work with --follow 
too, but quite frankly, I don't know how. Because the --follow hack really 
basically boils down to:

 - do _not_ prune commits at all (this the the thing that normally 
   simplifies the parenthood and removes uninteresting commits)

 - for the whole list of normal commits in "git log", do the patch 
   generation with a magic special hack that looks for renames.

 - if it was a rename, change the path that we magically track, so that 
   next commit that we look at, we'll follow the new (older) path.

 - if the patch is empty, we force-hide the commit (internally, this is 
   the "rev->always_show_header = 0;" thing)

and the key here is that we do all the magic at the _end_ of the queue, 
long after we've done the pruning of commits that normally does the 
parenthood renaming.

If we want --follow and --parent to work together, you'd need to move the 
special rename hack to be in the early phases. I'm sure it's possible. It 
might even be reasonably simple. But it's very fundamentally not what we 
do now.

And no, I'm unlikely to look at it. Sorry. I have used --follow 
occasionally, but it's a hack to see "ok, there it got renamed". It would 
be nice if "gitk --follow <pathname>" worked properly, but it's just not 
something I care very much about.

			Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14 14:50   ` Linus Torvalds
@ 2010-05-14 15:19     ` Martin Langhoff
  2010-05-14 15:29       ` Linus Torvalds
       [not found]       ` <22729_1273851106_4BED6CE2_22729_6897_1_alpine.LFD.2.00.1005140827250.3711@i5.linux-foundation.org>
  2010-05-25  9:31     ` Jeff King
  1 sibling, 2 replies; 12+ messages in thread
From: Martin Langhoff @ 2010-05-14 15:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Albert Krawczyk, git

On Fri, May 14, 2010 at 10:50 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> If we want --follow and --parent to work together, you'd need to move the
> special rename hack to be in the early phases. I'm sure it's possible. It
> might even be reasonably simple. But it's very fundamentally not what we
> do now.
...
> It would
> be nice if "gitk --follow <pathname>" worked properly, but it's just not
> something I care very much about.

Putting the internal machinery aside, it would be enormously useful
for the end user.

The Linux kernel is unusual in that there are relatively few renames /
reorgs in the mainline -- maintainers pushback and force those things
to happen before a patchset is merged. And you (as the lead
maintainer) probably know all the renames in your own project.

The use case for this is: "Where the hell does this WTF-worthy
function come from, in this WTF-esque old codebase I just inherited?"

cheers,


m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14 15:19     ` Martin Langhoff
@ 2010-05-14 15:29       ` Linus Torvalds
       [not found]       ` <22729_1273851106_4BED6CE2_22729_6897_1_alpine.LFD.2.00.1005140827250.3711@i5.linux-foundation.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2010-05-14 15:29 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Jeff King, Albert Krawczyk, git



On Fri, 14 May 2010, Martin Langhoff wrote:
> 
> The use case for this is: "Where the hell does this WTF-worthy
> function come from, in this WTF-esque old codebase I just inherited?"

Umm. And git does that better than anything else. 

"git log --follow" works fine. As does "git blame -C".

It's just that gitk does not, because it wants to show the graph.

Anyway, if you feel strongly about it, and really want "gitk --follow", 
you really need to do it yourself. I gave you some pointers. I personally 
don't think it's worth it.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Git log follow question
       [not found]       ` <22729_1273851106_4BED6CE2_22729_6897_1_alpine.LFD.2.00.1005140827250.3711@i5.linux-foundation.org>
@ 2010-05-14 22:39         ` Albert Krawczyk
  0 siblings, 0 replies; 12+ messages in thread
From: Albert Krawczyk @ 2010-05-14 22:39 UTC (permalink / raw)
  To: git

Hi,

Thank you everybody for your replies.

I believe that this functionality would be useful; however, sadly my C
skills are non-existent so I don't even know how to start looking at this
problem.

I don't suppose there are other developers on here that would be interested
in having a look at this functionality?

Thanks again,
Albert 

-----Original Message-----
From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On Behalf
Of Linus Torvalds
Sent: Saturday, 15 May 2010 1:29 AM
To: Martin Langhoff
Cc: Jeff King; Albert Krawczyk; git@vger.kernel.org
Subject: Re: Git log follow question



On Fri, 14 May 2010, Martin Langhoff wrote:
> 
> The use case for this is: "Where the hell does this WTF-worthy 
> function come from, in this WTF-esque old codebase I just inherited?"

Umm. And git does that better than anything else. 

"git log --follow" works fine. As does "git blame -C".

It's just that gitk does not, because it wants to show the graph.

Anyway, if you feel strongly about it, and really want "gitk --follow", you
really need to do it yourself. I gave you some pointers. I personally don't
think it's worth it.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in the body
of a message to majordomo@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-14 14:50   ` Linus Torvalds
  2010-05-14 15:19     ` Martin Langhoff
@ 2010-05-25  9:31     ` Jeff King
  2010-05-25 18:49       ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2010-05-25  9:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Albert Krawczyk, git

On Fri, May 14, 2010 at 07:50:26AM -0700, Linus Torvalds wrote:

> I'm pretty sure I mentioned about this exact issue when I posted the 
> original follow patches, and it basically boils down to: "--follow" is a 
> total hack, and does _not_ use the regular commit filtering function, and 
> as a result, fancy things like "--parent" don't really work well with it.
>
> [...]
>
> And no, I'm unlikely to look at it. Sorry. I have used --follow 
> occasionally, but it's a hack to see "ok, there it got renamed". It would 
> be nice if "gitk --follow <pathname>" worked properly, but it's just not 
> something I care very much about.

Thanks for the input. I took a look at it myself and it is a bit more
complex than just turning on pruning. I have a prototype --follow that
handles arbitrary pathspecs instead of single files; instead of
replacing the single-file pathspec, it just widens the pathspec as it
traverses history. That eliminates some of the issues, but I am still
getting some odd results from --parents.

So I am giving up for now, as it is not something I care that much
about, either (though multiple-file --follow is). However, Bo Yang, one
of the GSoC students, is planning on working on it as part of his
line-level history browsing project. So we'll see what comes of that.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-25  9:31     ` Jeff King
@ 2010-05-25 18:49       ` Linus Torvalds
  2010-05-26  5:58         ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2010-05-25 18:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Albert Krawczyk, git



On Tue, 25 May 2010, Jeff King wrote:
> 
> Thanks for the input. I took a look at it myself and it is a bit more
> complex than just turning on pruning. I have a prototype --follow that
> handles arbitrary pathspecs instead of single files; instead of
> replacing the single-file pathspec, it just widens the pathspec as it
> traverses history.

Doing it "right" is actually a _lot_ more complex than that.

Think especially about the case of the file having been renamed in one 
branch, and in another branch it was created from scratch, and then a 
merge that sorts it all out (think two people aiming for the same thing, 
just doing it differently - but with similar approaches).

Now, imagine reaching the common commit by walking _one_ of the chains 
before having walked the other one fully. So now you're looking at a 
commit using one set of pathnames, and then later on you'll hit the _same_ 
commit (through the other branch), but with another set of pathnames. But 
by then you've already handled that commit.

The above isn't an issue with the regular pathname pruning, because the 
pruning rules never change - so the order of handling commits never 
matter, and you can do the pruning before/independently of having done any 
history following.

And it's not an issue with the current total hack, because the current 
total hack doesn't even _try_ to handle it, and doesn't even really try to 
do anything proper. The current hack is very much by design was a "hey, 
this is about as good as CVS/SVN could ever do", rather than anything that 
has any good design.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-25 18:49       ` Linus Torvalds
@ 2010-05-26  5:58         ` Jeff King
  2010-05-26 14:40           ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2010-05-26  5:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Albert Krawczyk, git

On Tue, May 25, 2010 at 11:49:25AM -0700, Linus Torvalds wrote:

> On Tue, 25 May 2010, Jeff King wrote:
> > 
> > Thanks for the input. I took a look at it myself and it is a bit more
> > complex than just turning on pruning. I have a prototype --follow that
> > handles arbitrary pathspecs instead of single files; instead of
> > replacing the single-file pathspec, it just widens the pathspec as it
> > traverses history.
> 
> Doing it "right" is actually a _lot_ more complex than that.

Did you mean doing history rewriting right is more complex than that, or
did you mean that handling multiple follow pathspecs is more complex
than pathspec-widening (where "handling multiple pathspecs" means making
"--follow subdir" work about as well as "--follow file", but not
actually doing real history rewriting)?

If the former, I agree.

If the latter, I am not sure it is any worse than the single-file follow
case.

For example, consider this history:

  echo content >file && git add . && git commit -m base
  git mv file new && git commit -m moved
  sleep 1 ;# to ensure timestamp difference
  git checkout -b other HEAD^
  echo changes >>file && git commit -a -m changes
  git merge master

We'll traverse in this order:

  merge
  changes (to file)
  moved (from file to new)
  base (create file)

If I do "git log --follow new" with the current master, I will see only
"moved" and "base". I don't see "changes" because it operates on "file",
not "new". But if we reverse the order in which the two branches'
commits were made, then we will parse "moved" first, and we _will_ see
"changes", because we've updated our pathspec. So it matters when we
traverse the rename.

For "base" and everything prior to it, in general we will already have
traversed the rename because we try to do things in date order. But in
the face of clock skew, it is possible to follow the ancestry down
before hitting a rename on another branch.

So my point is that even with the current --follow, there are already
corner cases where traversal order matters. Which is maybe the point you
were trying to make, too, but I was unclear from your example if you
meant that the problem was _worse_ with simple expansion of pathspecs
(i.e., not actually turning on revs->prune) than the current --follow.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git log follow question
  2010-05-26  5:58         ` Jeff King
@ 2010-05-26 14:40           ` Linus Torvalds
  0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2010-05-26 14:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Albert Krawczyk, git



On Wed, 26 May 2010, Jeff King wrote:
> > 
> > Doing it "right" is actually a _lot_ more complex than that.
> 
> Did you mean doing history rewriting right is more complex than that,

History rewriting with changing pathspecs.

> or did you mean that handling multiple follow pathspecs is more complex 
> than pathspec-widening

No, the "expand pathspec to cover the newly found rename" part is pretty 
simple. But the fact that the pathspec changes over the history inevitably 
leads to the problem of finding commits in the right order.

The thing is, if the pathspec is history-dependent, then that means that 
in order to get it right, you should walk the history in topological order 
in order to get a proper pathspec. But you don't know what the topological 
order _is_ until you've walked the history - which in turn means that if 
you want to get "perfect" results, you need to walk the history first, and 
then have a separate phase to do the pathspec.

That's actually what the current --follow kind of does, but because the 
current follow isn't even trying to get a proper pathspec in the bigger 
picture (it only tracks a single global filename rather than widening the 
net), it also skips the topological part, since even if it did things in 
topological order it would _still_ get things wrong.

Doing it really right also actually would require making the pathspec be a 
per-commit thing rather than a single global one. Otherwise you get other 
odd effects, if that filename has ever been something different. But since 
you only do a simple widening, I guess you don't much care (you already 
get odd effects if there was a criss-cross rename, and will end up picking 
up the history for _both_ files, rather than just the original one).

				Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-26 14:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14  0:57 Git log follow question Albert Krawczyk
2010-05-14  4:16 ` Bo Yang
2010-05-14  4:37 ` Jeff King
2010-05-14 14:50   ` Linus Torvalds
2010-05-14 15:19     ` Martin Langhoff
2010-05-14 15:29       ` Linus Torvalds
     [not found]       ` <22729_1273851106_4BED6CE2_22729_6897_1_alpine.LFD.2.00.1005140827250.3711@i5.linux-foundation.org>
2010-05-14 22:39         ` Albert Krawczyk
2010-05-25  9:31     ` Jeff King
2010-05-25 18:49       ` Linus Torvalds
2010-05-26  5:58         ` Jeff King
2010-05-26 14:40           ` Linus Torvalds
     [not found] ` <21464_1273811837_4BECD37D_21464_745_1_20100514043704.GC6075@coredump.intra.peff.net>
2010-05-14  4:43   ` Albert Krawczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).