git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Petr Baudis <pasky@suse.cz>, Git Mailing List <git@vger.kernel.org>
Subject: Re: git-rev-list: add "--dense" flag
Date: Sat, 22 Oct 2005 13:37:42 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0510221251330.10477@g5.osdl.org> (raw)
In-Reply-To: <7voe5iqk82.fsf@assigned-by-dhcp.cox.net>



On Fri, 21 Oct 2005, Junio C Hamano wrote:
> 
> If rev-list can optionally be told to detect renames internally
> (it has necessary bits after all), it could adjust the set of
> paths to follow when it sees something got renamed, either by
> replacing the original path given from the command line with its
> previous name, or adding its previous name to the set of path
> limitters (to cover the copy case as well).

The problem with renames isn't the straight-line case. 

The problem with renames is the merge case. And quite frankly, I don't 
know how to handle that sanely.

If everything was straight-line (CVS), renames would be trivial. But 
git-rev-list very fundamentally works on something that isn't. So let's 
look at the _fundamental_ problem:

 - git-rev-list traverses one commit at a time, and it doesn't even _know_ 
   whether it has seem all the parents of that commit during the first 
   phase.

   The first phase is the "limit_list()" thing, which is when we decide 
   which commits are reachable, uninteresting, and which merges to follow.

   Now, think about this: since we don't even know that we've seen all 
   parents, that pretty much by definition means that we can't know what 
   has been renamed if we were to track it.

 - in the second phase (which is where I do the densification), we do 
   actually have the full tree and that makes a lot of things a lot easier 
   to do. When it comes to dense, for example, it means that I know all 
   the UNINTERESTING logic has already been done, and that all merges have 
   been simplified.

   But in the second phase, we couldn't do rename detection either, since 
   by then, we've already fixed the list of names as far as merges are 
   concerned.

And note that this fundamental issue is true _whether_ we have some 
explicit rename information in a commit or not. 

Git-rev-list has a few additional issues that make it even more 
interesting:

 - git-rev-list fundamentally is designed for multiple heads. There is no 
   one special "origin" head. We might not have _one_ end-point, we might 
   have twenty-five different heads we're looking at at the same time. Try

	gitk --all --dense -d -- git-fetch.c

   on the current git archive, and be impressed by how well it works (and 
   yes, you'll see a funny artifact: "--dense" never removes a line of 
   development entirely, so you'll see a single dot for the two 
   "unrelated" lines: "gitk" and "to-do" do not share a root with the rest 
   of them. You'll get a single dot for that root, even if it 
   obviously doesn't actually change "git-fetch.c").

   You'll also very clearly see the "Big tool rename" which created that 
   name: it will be the first commit after the root that is visible that 
   way.

 - path limiters are fundamentally designed to take a list of paths and 
   directories. Personally, I find that to be a lot more important than a 
   single file. That's very efficient the way we do it, but if you were to 
   _change_ the list of paths, you'd basically have to track those changes 
   over history.

   (This actually happens even with a single path - imagine a merge, and a 
   rename down one line of the merge. You'll have to keep track of 
   different files for the different lines of development, and then when 
   they join together, and the rename only happened in one of them, you'll 
   have to make an executive decision, or just continue to track _both_)

So what do I propose? I propose that you realize that "git-rev-list" is 
just the _building_ block. It does one thing, and one thing only: it 
creates revision list.

A user basically _never_ uses git-rev-list directly: it just doesn't make 
sense. It's not what git-rev-list is there for. git-rev-list is meant to 
be used as a base for doing the real work. And that's how you can use it 
for renaming too.

If you think of "git-rev-list --dense -- name" as a fast way to get a set 
of commits that affect "name", suddenly it all makes sense. You suddenly 
realize that that's a nice building block for figuring out renames. It's 
not _all_ of it, but it's a big portion.

To go back to "gitk", let's see what the path limitation shows us. Right 
now, doing a

	gitk --all --dense -d -- git-fetch.sh

only shows that particular name, and that's by design. Maybe that's what 
the user wants? You have to realize that especially if you remember an 
_old_ name that may not even exist any more, that's REALLY what you want. 
Something that works more like "annotate" is useless, because something 
that works like annotate would just say "I don't have that file, I can't 
follow renames" and exit.

So the first lesson to learn is that following just pure path-names is 
actually meaningful ON ITS OWN! Sometimes you do NOT want to follow 
renames.

For example, let's say that you used to work on git 4 months ago, but gave 
up, since it was too rough for you. But you played around with it, and now 
you're back, and you have an old patch that you want to re-apply, but it 
doesn't talk about "git-fetch.sh", it talks about "git-fetch-script". So 
you do

	gitk --dense -- git-fetch-script

and voila, it does the right thing, and top of the thing is "Big tool 
rename", which tells you _exactly_ what happened to that PATHNAME. 

See? Static pathnames are important! 

Now, this does show that when you _do_ care about renames, "gitk" right 
now doesn't help you very much (no offense to gitk - how could it know 
that git-rev-list would give it pathname detection one day?). Let's 
go back to the original example, and see what we could do to make gitk 
more useful..

	gitk --all --dense -d -- git-fetch.sh

Go and select that "Big tool rename" thing, and start looking for the 
rename..

You won't find it. Why? You'll see all-new files, no renames. It turns out 
that "gitk" follows the "parent" thing a bit _too_ slavishly, which is 
correct on one level, but in this case with "--dense" it turns out that 
what you want to do is see only what _that_ commit does, not what that 
commit did relative to its parent (which was the initial revision).

So while "--dense" made gitk work even without any changes, it's clear 
that the new capability means that gitk might want to have a new button: 
"show diff against 'fake parent'" and "show diff against 'real parent'".

If you want the global view, the default gitk behaviour is correct (it 
will show a "valid" diff - you'll see everything that changed between the 
points it shows). But in a rename-centric world, you want the _local_ 
change to that commit, and right now gitk can't show you that.

So for trackign renames, we probably want that as a helper to gitk.

Also, gitk has a "re-read references" button, but if you track renames, 
you probably want to do more than re-read them: you want to re-DO them 
with the "Big rename" as the new head (forget the old references 
entirely), and with the name list changed. New functionality (possibly 
you'd like to havea "New wiew" button, which actually starts a new gitk 
so that you can see both of them). Right now you'd have to do it by hand:

	gitk --dense 215a7ad1ef790467a4cd3f0dcffbd6e5f04c38f7 -- git-fetch-script 

(where 215a.. is the thing you get when you select the "Big tool rename")

You'd also probably like to have some way to limit the names shown for the 
git-diff-tree in gitk. 

In short, the new "gitk --dense -- filename" doesn't help you nearly as 
much as it _could_. But when you squint a bit, I think you'll see how it's 
quite possible to do...

		Linus

  reply	other threads:[~2005-10-22 20:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-21 23:40 git-rev-list: add "--dense" flag Linus Torvalds
2005-10-21 23:47 ` Linus Torvalds
2005-10-21 23:55 ` Linus Torvalds
2005-10-22  0:37 ` Petr Baudis
2005-10-22  0:47   ` Handling renames Petr Baudis
2005-10-22  1:28     ` Linus Torvalds
2005-10-22  1:51       ` Petr Baudis
2005-10-22  2:10         ` Junio C Hamano
2005-10-22  2:49           ` Petr Baudis
2005-10-22  3:23         ` Linus Torvalds
2005-10-22  1:26   ` git-rev-list: add "--dense" flag Linus Torvalds
2005-10-22  2:56     ` Petr Baudis
2005-10-22  3:30       ` Linus Torvalds
2005-10-22  3:55     ` Junio C Hamano
2005-10-22 20:37       ` Linus Torvalds [this message]
2005-10-25 18:07 ` Jonas Fonseca
2005-10-25 18:23   ` Linus Torvalds
2005-10-25 18:40     ` Jonas Fonseca
2005-10-25 18:52       ` Linus Torvalds
2005-10-25 18:50     ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2005-10-23 19:30 Marco Costalba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0510221251330.10477@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=pasky@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).