From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Petr Baudis <pasky@suse.cz>, Git Mailing List <git@vger.kernel.org>
Subject: Re: git-rev-list: add "--dense" flag
Date: Sat, 22 Oct 2005 13:37:42 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0510221251330.10477@g5.osdl.org> (raw)
In-Reply-To: <7voe5iqk82.fsf@assigned-by-dhcp.cox.net>
On Fri, 21 Oct 2005, Junio C Hamano wrote:
>
> If rev-list can optionally be told to detect renames internally
> (it has necessary bits after all), it could adjust the set of
> paths to follow when it sees something got renamed, either by
> replacing the original path given from the command line with its
> previous name, or adding its previous name to the set of path
> limitters (to cover the copy case as well).
The problem with renames isn't the straight-line case.
The problem with renames is the merge case. And quite frankly, I don't
know how to handle that sanely.
If everything was straight-line (CVS), renames would be trivial. But
git-rev-list very fundamentally works on something that isn't. So let's
look at the _fundamental_ problem:
- git-rev-list traverses one commit at a time, and it doesn't even _know_
whether it has seem all the parents of that commit during the first
phase.
The first phase is the "limit_list()" thing, which is when we decide
which commits are reachable, uninteresting, and which merges to follow.
Now, think about this: since we don't even know that we've seen all
parents, that pretty much by definition means that we can't know what
has been renamed if we were to track it.
- in the second phase (which is where I do the densification), we do
actually have the full tree and that makes a lot of things a lot easier
to do. When it comes to dense, for example, it means that I know all
the UNINTERESTING logic has already been done, and that all merges have
been simplified.
But in the second phase, we couldn't do rename detection either, since
by then, we've already fixed the list of names as far as merges are
concerned.
And note that this fundamental issue is true _whether_ we have some
explicit rename information in a commit or not.
Git-rev-list has a few additional issues that make it even more
interesting:
- git-rev-list fundamentally is designed for multiple heads. There is no
one special "origin" head. We might not have _one_ end-point, we might
have twenty-five different heads we're looking at at the same time. Try
gitk --all --dense -d -- git-fetch.c
on the current git archive, and be impressed by how well it works (and
yes, you'll see a funny artifact: "--dense" never removes a line of
development entirely, so you'll see a single dot for the two
"unrelated" lines: "gitk" and "to-do" do not share a root with the rest
of them. You'll get a single dot for that root, even if it
obviously doesn't actually change "git-fetch.c").
You'll also very clearly see the "Big tool rename" which created that
name: it will be the first commit after the root that is visible that
way.
- path limiters are fundamentally designed to take a list of paths and
directories. Personally, I find that to be a lot more important than a
single file. That's very efficient the way we do it, but if you were to
_change_ the list of paths, you'd basically have to track those changes
over history.
(This actually happens even with a single path - imagine a merge, and a
rename down one line of the merge. You'll have to keep track of
different files for the different lines of development, and then when
they join together, and the rename only happened in one of them, you'll
have to make an executive decision, or just continue to track _both_)
So what do I propose? I propose that you realize that "git-rev-list" is
just the _building_ block. It does one thing, and one thing only: it
creates revision list.
A user basically _never_ uses git-rev-list directly: it just doesn't make
sense. It's not what git-rev-list is there for. git-rev-list is meant to
be used as a base for doing the real work. And that's how you can use it
for renaming too.
If you think of "git-rev-list --dense -- name" as a fast way to get a set
of commits that affect "name", suddenly it all makes sense. You suddenly
realize that that's a nice building block for figuring out renames. It's
not _all_ of it, but it's a big portion.
To go back to "gitk", let's see what the path limitation shows us. Right
now, doing a
gitk --all --dense -d -- git-fetch.sh
only shows that particular name, and that's by design. Maybe that's what
the user wants? You have to realize that especially if you remember an
_old_ name that may not even exist any more, that's REALLY what you want.
Something that works more like "annotate" is useless, because something
that works like annotate would just say "I don't have that file, I can't
follow renames" and exit.
So the first lesson to learn is that following just pure path-names is
actually meaningful ON ITS OWN! Sometimes you do NOT want to follow
renames.
For example, let's say that you used to work on git 4 months ago, but gave
up, since it was too rough for you. But you played around with it, and now
you're back, and you have an old patch that you want to re-apply, but it
doesn't talk about "git-fetch.sh", it talks about "git-fetch-script". So
you do
gitk --dense -- git-fetch-script
and voila, it does the right thing, and top of the thing is "Big tool
rename", which tells you _exactly_ what happened to that PATHNAME.
See? Static pathnames are important!
Now, this does show that when you _do_ care about renames, "gitk" right
now doesn't help you very much (no offense to gitk - how could it know
that git-rev-list would give it pathname detection one day?). Let's
go back to the original example, and see what we could do to make gitk
more useful..
gitk --all --dense -d -- git-fetch.sh
Go and select that "Big tool rename" thing, and start looking for the
rename..
You won't find it. Why? You'll see all-new files, no renames. It turns out
that "gitk" follows the "parent" thing a bit _too_ slavishly, which is
correct on one level, but in this case with "--dense" it turns out that
what you want to do is see only what _that_ commit does, not what that
commit did relative to its parent (which was the initial revision).
So while "--dense" made gitk work even without any changes, it's clear
that the new capability means that gitk might want to have a new button:
"show diff against 'fake parent'" and "show diff against 'real parent'".
If you want the global view, the default gitk behaviour is correct (it
will show a "valid" diff - you'll see everything that changed between the
points it shows). But in a rename-centric world, you want the _local_
change to that commit, and right now gitk can't show you that.
So for trackign renames, we probably want that as a helper to gitk.
Also, gitk has a "re-read references" button, but if you track renames,
you probably want to do more than re-read them: you want to re-DO them
with the "Big rename" as the new head (forget the old references
entirely), and with the name list changed. New functionality (possibly
you'd like to havea "New wiew" button, which actually starts a new gitk
so that you can see both of them). Right now you'd have to do it by hand:
gitk --dense 215a7ad1ef790467a4cd3f0dcffbd6e5f04c38f7 -- git-fetch-script
(where 215a.. is the thing you get when you select the "Big tool rename")
You'd also probably like to have some way to limit the names shown for the
git-diff-tree in gitk.
In short, the new "gitk --dense -- filename" doesn't help you nearly as
much as it _could_. But when you squint a bit, I think you'll see how it's
quite possible to do...
Linus
next prev parent reply other threads:[~2005-10-22 20:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-21 23:40 git-rev-list: add "--dense" flag Linus Torvalds
2005-10-21 23:47 ` Linus Torvalds
2005-10-21 23:55 ` Linus Torvalds
2005-10-22 0:37 ` Petr Baudis
2005-10-22 0:47 ` Handling renames Petr Baudis
2005-10-22 1:28 ` Linus Torvalds
2005-10-22 1:51 ` Petr Baudis
2005-10-22 2:10 ` Junio C Hamano
2005-10-22 2:49 ` Petr Baudis
2005-10-22 3:23 ` Linus Torvalds
2005-10-22 1:26 ` git-rev-list: add "--dense" flag Linus Torvalds
2005-10-22 2:56 ` Petr Baudis
2005-10-22 3:30 ` Linus Torvalds
2005-10-22 3:55 ` Junio C Hamano
2005-10-22 20:37 ` Linus Torvalds [this message]
2005-10-25 18:07 ` Jonas Fonseca
2005-10-25 18:23 ` Linus Torvalds
2005-10-25 18:40 ` Jonas Fonseca
2005-10-25 18:52 ` Linus Torvalds
2005-10-25 18:50 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2005-10-23 19:30 Marco Costalba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0510221251330.10477@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=pasky@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).