* Following renames @ 2006-03-26 1:49 Petr Baudis 2006-03-26 2:49 ` Junio C Hamano 2006-03-26 3:19 ` Linus Torvalds 0 siblings, 2 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 1:49 UTC (permalink / raw) To: git Hi, so, now that I've put up with the fuzzy rename autodetection (for now), I'd like to make cg-log auto-follow renames and I'm wondering about the best implementation (it seems that I won't do without core Git cooperation). I think it should be possible to implement in a way so that it has minimal performance impact and therefore I can have it turned on by default. Now I'm using the notorious git-rev-list listoffiles | git-diff-tree --stdin pipeline in cg-log, and I'm wondering about the best way to add rename detection there. In [1], Linus suggests a non-core solution. Unfortunately, it doesn't fly - it requires at least two git-ls-tree calls per revision which would bog things down awfully (to roughly half of the original speed). But even if git-rev-list reported disappearing files, Cogito would have to do a lot of complicated bookkeeping in order to properly track renames in parallel branches - for each 'head' commit at any point of the history traversal, you need to record a separate set of interesting files. It would also have to restart git-rev-list at any moment when a rename happens on any of the head commits. Scales well not. An obvious solution would be to have git-diff-tree --follow which updates its interesting path set based on seen renames, and now that I've written about non-linear history, it's obvious that it's incorrect. The other obvious way to go is then to add rename detection support to git-rev-list, and it's less obvious that this is a dead end too - I didn't inspect the code myself yet, but for now I trust Linus in [2] (I didn't quite understand the argument, I guess I need to sleep on it). So, any thoughts about how to approach this? Either git-diff-tree would have to be taught about the heads bookkeeping, or the git-rev-list hurdles would have to be overcome, or we might have a git-rev-rename-filter or something, but that feels quite redundant and might meet with the same problems as git-rev-list. == References == [1] Oct 21 <Pine.LNX.4.64.0510211814050.10477@g5.osdl.org> [2] Oct 22 <Pine.LNX.4.64.0510221251330.10477@g5.osdl.org> -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 1:49 Following renames Petr Baudis @ 2006-03-26 2:49 ` Junio C Hamano 2006-03-26 3:52 ` Jakub Narebski ` (3 more replies) 2006-03-26 3:19 ` Linus Torvalds 1 sibling, 4 replies; 41+ messages in thread From: Junio C Hamano @ 2006-03-26 2:49 UTC (permalink / raw) To: Petr Baudis; +Cc: git Petr Baudis <pasky@ucw.cz> writes: > An obvious solution would be to have git-diff-tree --follow which > updates its interesting path set based on seen renames, and now that > I've written about non-linear history, it's obvious that it's incorrect. > The other obvious way to go is then to add rename detection support to > git-rev-list, and it's less obvious that this is a dead end too - I > didn't inspect the code myself yet, but for now I trust Linus in [2] > (I didn't quite understand the argument, I guess I need to sleep on it). I'd have to sleep on how the core side can help Porcelains, but I think it is a good thing that you, one of the most vocal advocate on the list for doing rename recording, are thinking about this issue and probably would look into rev-list.c soon. Looking at the evolution of rev-list.c file itself was a good exercise to realize that rename tracking (more specifically, having whatchanged to follow renames) is not such a useful thing (at least for me). If I am interested in rev-list.c's evolution from "the set of command line flags it supported" point of view, then whatchanged to show the history of rev-list.c file itself would be a very good way to show that to me. rev-list_usage[] = "..." stayed there almost from the beginning. However, if I am interested in the way how it traverses the commits has changed over time, I would need to start from revision.c and switch to rev-list.c when that part of the code was split out from it, because the current rev-list.c does not have the main part of the traversal logic at all. Another example. Today's tar-tree updates have one interesting function I think should belong to strbuf.c, and before merging it to the mainline, I may move that function from tar-tree.c to strbuf.c. After that happens, if I run "whatchanged strbuf.c" to see where that function came from, I would want it to notice it came from tar-tree.c, although it is not a rename at all. Just one function moved from a file to another. What this suggests is that switching the set of paths to follow while traversing ancestry chain needs to depend on which part of the original file you are interested in. Marking "this commit renames (or copies) file A to file B" is not that useful -- for that matter, detecting at runtime like we currently do is not better either. If a file A and file B were cleaned up and merged into a single file C, which is in the tip of the tree, which one you would want whatchanged to switch following depends on which part of the C you were interested in. Unless you are interested in the _entire_ contents of the file, that is. Then tracking or even recording renames becomes useful, but that is a special case. That is the reason I am not so enthused about recording renames. I think the time is better spent on enhancing what pickaxe tries to do (currently it does very little), which I hinted in a separate message late last night. But that does not have to stop you, and does not have to stop me from thinking about ways to help you either. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 2:49 ` Junio C Hamano @ 2006-03-26 3:52 ` Jakub Narebski 2006-03-27 6:00 ` Paul Jakma 2006-03-26 10:52 ` Petr Baudis ` (2 subsequent siblings) 3 siblings, 1 reply; 41+ messages in thread From: Jakub Narebski @ 2006-03-26 3:52 UTC (permalink / raw) To: git Junio C Hamano wrote: > Petr Baudis <pasky@ucw.cz> writes: > >> An obvious solution would be to have git-diff-tree --follow which >> updates its interesting path set based on seen renames, and now that >> I've written about non-linear history, it's obvious that it's incorrect. >> The other obvious way to go is then to add rename detection support to >> git-rev-list, and it's less obvious that this is a dead end too - I >> didn't inspect the code myself yet, but for now I trust Linus in [2] >> (I didn't quite understand the argument, I guess I need to sleep on it). > > I'd have to sleep on how the core side can help Porcelains, but > I think it is a good thing that you, one of the most vocal > advocate on the list for doing rename recording, are thinking > about this issue and probably would look into rev-list.c soon. > > Looking at the evolution of rev-list.c file itself was a good > exercise to realize that rename tracking (more specifically, > having whatchanged to follow renames) is not such a useful > thing (at least for me). [...] > What this suggests is that switching the set of paths to follow > while traversing ancestry chain needs to depend on which part of > the original file you are interested in. Marking "this commit > renames (or copies) file A to file B" is not that useful -- for > that matter, detecting at runtime like we currently do is not > better either. If a file A and file B were cleaned up and > merged into a single file C, which is in the tip of the tree, > which one you would want whatchanged to switch following depends > on which part of the C you were interested in. > > Unless you are interested in the _entire_ contents of the file, > that is. Then tracking or even recording renames becomes > useful, but that is a special case. > > That is the reason I am not so enthused about recording renames. > I think the time is better spent on enhancing what pickaxe tries > to do (currently it does very little), which I hinted in a > separate message late last night. I think one of the better ideas/suggestions about *recording* filenames was in the "impure renames / history tracking" thread http://marc.theaimsgroup.com/?l=git&m=114122175216489&w=2 <Pine.LNX.4.64.0603011343170.13612@sheen.jakma.org> about adding *auxiliary* (helper) information about renames in commits. I'm not sure about recording parts of the file that were moved or copied. That might have been left for runtime detection in the likes of pickaxe. As it would be helper-only information it would ensure backwards compatibility (older versions would ignore additional information) and forward compatibility (newer version would fallback to current runtime renames tracking/detection). To be generic, I think that the command to record rename/copy or copy'n'paste/cut'n'paste would take set of source files (one or more, unless we want to have an option to mark the file as new supressing any superficial similarities, in which case it would be zero or more), and set of destination files (one or more, with files which were in source repeated it was copy, not repeated if it was rename or cut'n'paste; unless we want to record deletions also, in which case it would be zero or more files). Such information can be I guess easily entered by user... if one remembers to record rename/cut'n'paste/etc. that is. Perhaps if it were a way to easy add such information later, for example confirming detected renames/relationships during merge... It would be much more difficult for user to enter the ranges unassisted. What worries me is that such information, recorded in "own fields to the GIT revision messages" (in commits) can be used only if you track the ancestry; it doesn't help if you have only have two or more revisions and not build relationship graph between them. But maybe I worry unnecessary... BTW. following renames is important not only in examining file [contents] history, in the likes of diff, whatchanged, annotate/blame, pickaxe but also for merges. References: =========== * http://marc.theaimsgroup.com/?l=linux-kernel&m=111314792424707 * http://article.gmane.org/gmane.comp.version-control.git/217 * http://marc.theaimsgroup.com/?l=git&m=114123702826251 * http://marc.theaimsgroup.com/?l=git&m=114315795227271 -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 3:52 ` Jakub Narebski @ 2006-03-27 6:00 ` Paul Jakma 0 siblings, 0 replies; 41+ messages in thread From: Paul Jakma @ 2006-03-27 6:00 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Sun, 26 Mar 2006, Jakub Narebski wrote: > I think one of the better ideas/suggestions about *recording* filenames was > in the "impure renames / history tracking" thread > http://marc.theaimsgroup.com/?l=git&m=114122175216489&w=2 > <Pine.LNX.4.64.0603011343170.13612@sheen.jakma.org> For the record, the responses I received were educational ;). Sufficiently so I no longer think renames should be recorded. At least, definitely not as renames. I now grok the reasoning for doing it by 'similarity' - it is indeed a *much* more useful concept. (E.g. the 'pickaxe' idea people keep alluding though sounds amazingly useful). So the question really is what, if any, weaknesses does the current similarity estimation have, and how to solve them. I can think of two weaknesses: 1. the similarity algorithms can be expensive potentially, and they essentially get run a lot with the same inputs, to produce the same results - over and over as one works with a git repo. (there was a thread a while ago on this I think). 2. Some 'similarities' are just not deducible by current software state of the art. E.g. where some code is rewritten in another language: foo.X -> foo.Y The high-level algorithms may remain the exact same, but the code may be unrecognisable as similar except to a human. However, tracking history back across this rewrite probably would still be valuable to the human. So I think what /might/ be interesting is to have a 'similarity cache', which would help 1, and to allow for manual injection of such hints (into a seperate and stronger cache most likely) - which would help 2. Something to record the following information: (tree1,tree2)[1]: Id1 <-> Id1' . . . Idn <-> Idn' That would allow: 1. Performance repercussions of similarity estimation to be one-time, cached there-after. (throw-away information, if a better similarity estimation heuristic comes along, you can rebuild this cache) 2. The user to inject their own 'hints' into similarity estimation, particularly for cases that just aren't obvious and probably never will be to software estimators (e.g. the rewrite cases), but where the user sees value in being able to follow back the history. Avoids: - encoding anything permanently into the repository (which was something I was thinking of, and others before me apparently, but which I now accept would be an awful idea ;) ). 1. I'm not sure if it should be indexed by (commit ID) or (tree1,tree2) tuple. ?? regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: Men take only their needs into consideration -- never their abilities. -- Napoleon Bonaparte ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 2:49 ` Junio C Hamano 2006-03-26 3:52 ` Jakub Narebski @ 2006-03-26 10:52 ` Petr Baudis 2006-03-26 10:55 ` Petr Baudis 2006-03-26 16:08 ` Timo Hirvonen 2006-03-26 16:31 ` Jakub Narebski 3 siblings, 1 reply; 41+ messages in thread From: Petr Baudis @ 2006-03-26 10:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: git (Note that I do *not* want to raise the explicit vs. implicit rename tracking argument, in case anyone would misunderstood. I've accepted implicit rename tracking as a fact of Git life for now. I just want to make use of it now. ;-) Dear diary, on Sun, Mar 26, 2006 at 04:49:48AM CEST, I got a letter where Junio C Hamano <junkio@cox.net> said that... > Looking at the evolution of rev-list.c file itself was a good > exercise to realize that rename tracking (more specifically, > having whatchanged to follow renames) is not such a useful > thing (at least for me). Well, noone argues that rename tracking cures all the woes of hackerkind and anything more precise than that is useless. I'm rather saying that rename tracking indeed _is_ a special case of something more general and truly very interesting, but a special case so frequent that it's worth doing even if we can't do the general case yet. Or at least people *think* it's very frequent and it gives them the warm fuzzy feeling knowing that the tool can handle it (at least somehow) - and the warm fuzzy feeling is important, especially if you're trusting your sources to the tool. So, obviously, you'll find plenty of counter-examples where rename detection won't help. I don't argue that. I merely say that there will still be enough cases where following renames will help to warrant doing it. Now, Git history has enough examples of where rename following would be useful. When I'm digging into the history, I'm hitting the big tools rename barrier all the time, and just yesterday when wondering about jdl's <snap> removal from git.txt I've hit 2cf565c53 - coming along any file to that commit should make me follow Documentation/core-git.txt out of the commit (well, that's rather copy than rename detection). > Another example. Today's tar-tree updates have one interesting > function I think should belong to strbuf.c, and before merging > it to the mainline, I may move that function from tar-tree.c to > strbuf.c. After that happens, if I run "whatchanged strbuf.c" > to see where that function came from, I would want it to notice > it came from tar-tree.c, although it is not a rename at all. > Just one function moved from a file to another. A wild pickaxe - when the string disappears from file X, scan all the changes in the commit and start following files where it reappears. This should help, right? But when you want to implement this, you hit the exact same problems as when you try to follow renames, only a different part of diffcore detects it. So, what I'm trying to solve is actually not just following renames but a more general problem. > If a file A and file B were cleaned up and merged into a single file > C, which is in the tip of the tree, which one you would want > whatchanged to switch following depends on which part of the C you > were interested in. If in doubt (and the user does not use pickaxe to clarify it), you can just follow both. The user will get some extra stuff (or maybe even not if he wants to know about pieces from both), but we are at least trying to be useful and DTRT instead of doing nothing in case we would by any chance not do the very best. > Unless you are interested in the _entire_ contents of the file, > that is. Then tracking or even recording renames becomes > useful, but that is a special case. A frequent (and wanted) special case. > That is the reason I am not so enthused about recording renames. > I think the time is better spent on enhancing what pickaxe tries > to do (currently it does very little), which I hinted in a > separate message late last night. Sure, pickaxe is cool, but as I said above, if you try to teach _it_ following around files, you'll hit the exact same problems as me. We're just trying to build something using lego blocks with different stuff inside but otherwise actually looking pretty much the same. The thing with pickaxe is that frequently it would be simply more laborous to dig for and construct the proper pickaxe string than just firing up cg-log -s filename with greedy renames following and quickly scanning through the results. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 10:52 ` Petr Baudis @ 2006-03-26 10:55 ` Petr Baudis 0 siblings, 0 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 10:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Dear diary, on Sun, Mar 26, 2006 at 12:52:48PM CEST, I got a letter where Petr Baudis <pasky@suse.cz> said that... > Well, noone argues that rename tracking cures all the woes of hackerkind ^^^^^^^^^^ Or is it hackerdom? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 2:49 ` Junio C Hamano 2006-03-26 3:52 ` Jakub Narebski 2006-03-26 10:52 ` Petr Baudis @ 2006-03-26 16:08 ` Timo Hirvonen 2006-03-26 16:43 ` Linus Torvalds 2006-03-26 16:31 ` Jakub Narebski 3 siblings, 1 reply; 41+ messages in thread From: Timo Hirvonen @ 2006-03-26 16:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: pasky, git On Sat, 25 Mar 2006 18:49:48 -0800 Junio C Hamano <junkio@cox.net> wrote: > Looking at the evolution of rev-list.c file itself was a good > exercise to realize that rename tracking (more specifically, > having whatchanged to follow renames) is not such a useful > thing (at least for me). It would be useful for me. I had all files organized in subdirectories, but then noticed it was not good idea because make does not play nicely with subdirs, so I moved all files to top level directory. Now git-whatchanged -p file.c stops at the big rename. To continue I have to do git-whatchanged -p -- <some-commit> <old-filename> > Another example. Today's tar-tree updates have one interesting > function I think should belong to strbuf.c, and before merging > it to the mainline, I may move that function from tar-tree.c to > strbuf.c. After that happens, if I run "whatchanged strbuf.c" > to see where that function came from, I would want it to notice > it came from tar-tree.c, although it is not a rename at all. > Just one function moved from a file to another. Yes in this case you can do $ git-whatchanged strbuf.c $ git-whatchanged tar-tree.c but after rename... $ git-whatchanged old-file.c fatal: 'old-file.c': No such file or directory $ touch old-file.c $ git-whatchanged old-file.c Hah, it worked! Hmm... this works too without the touch-hack: $ git-whatchanged file.c old-file.c I wish I had known this before. > What this suggests is that switching the set of paths to follow > while traversing ancestry chain needs to depend on which part of > the original file you are interested in. Marking "this commit > renames (or copies) file A to file B" is not that useful -- for > that matter, detecting at runtime like we currently do is not > better either. If a file A and file B were cleaned up and > merged into a single file C, which is in the tip of the tree, > which one you would want whatchanged to switch following depends > on which part of the C you were interested in. OK, maybe following renames is not such a good idea. But for GUIs (gitk, qgit) following renames or even file merges (select a file to follow by clicking it) would be big plus. -- http://onion.dynserv.net/~timo/ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 16:08 ` Timo Hirvonen @ 2006-03-26 16:43 ` Linus Torvalds 0 siblings, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 16:43 UTC (permalink / raw) To: Timo Hirvonen; +Cc: Junio C Hamano, pasky, git On Sun, 26 Mar 2006, Timo Hirvonen wrote: > > $ git-whatchanged old-file.c > fatal: 'old-file.c': No such file or directory > > $ touch old-file.c > $ git-whatchanged old-file.c > > Hah, it worked! It worked even before: git-whatchanged -- old-file.c always works. If you think of the "filename spec" as _always_ having to have a "--" to separate the filenames from the other arguments, you're thinking the right way. Then, there's a _shorthand_ for existing files, where we allow users being lazy (because _I_ am very lazy indeed), which allows dropping of the "--", but then the code requires that the filenames are real filenames as of now. > Hmm... this works too without the touch-hack: > > $ git-whatchanged file.c old-file.c > > I wish I had known this before. Actually, it -shouldn't- work. It's just that "git-rev-parse" isn't as anal as it should be. Here's a fix. Linus ---- diff --git a/rev-parse.c b/rev-parse.c index f90e999..104b1e2 100644 --- a/rev-parse.c +++ b/rev-parse.c @@ -172,7 +172,9 @@ int main(int argc, char **argv) char *dotdot; if (as_is) { - show_file(arg); + if (show_file(arg) && as_is < 2) + if (lstat(arg, &st) < 0) + die("'%s': %s", arg, strerror(errno)); continue; } if (!strcmp(arg,"-n")) { @@ -192,7 +194,7 @@ int main(int argc, char **argv) if (*arg == '-') { if (!strcmp(arg, "--")) { - as_is = 1; + as_is = 2; /* Pass on the "--" if we show anything but files.. */ if (filter & (DO_FLAGS | DO_REVS)) show_file(arg); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 2:49 ` Junio C Hamano ` (2 preceding siblings ...) 2006-03-26 16:08 ` Timo Hirvonen @ 2006-03-26 16:31 ` Jakub Narebski 2006-03-26 16:46 ` Linus Torvalds 3 siblings, 1 reply; 41+ messages in thread From: Jakub Narebski @ 2006-03-26 16:31 UTC (permalink / raw) To: git I wonder what is the most common case in Linux kernel or git. 1.) renaming the file in the same directory, old-file.c to new-file.c? 2.) moving file to other directory (project reorganization), old-dir/file.c to new-dir/file.c? 3.) splitting file into modules, huge-file.c to file1.c, file2.c? 4.) copying fragment of one file to other? 5.) moving fragment of code from one file to other? -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 16:31 ` Jakub Narebski @ 2006-03-26 16:46 ` Linus Torvalds 2006-03-26 17:10 ` Jakub Narebski 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 16:46 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Sun, 26 Mar 2006, Jakub Narebski wrote: > > I wonder what is the most common case in Linux kernel or git. > > 1.) renaming the file in the same directory, old-file.c to new-file.c? The kernel uses subdirectories extensively, and a lot of renames (most of them, I'd say) is because of that subdirectory structure. So the same-directory case is the unusual one, I'd say. > 3.) splitting file into modules, huge-file.c to file1.c, file2.c? > 4.) copying fragment of one file to other? > 5.) moving fragment of code from one file to other? I'd say that (5) is very common. And (4) happens a lot under certain circumstances (new driver, new architecture, new filesystem..). Doing (3) happens, but probably less often that it should ;/ Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 16:46 ` Linus Torvalds @ 2006-03-26 17:10 ` Jakub Narebski 2006-03-26 18:10 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Jakub Narebski @ 2006-03-26 17:10 UTC (permalink / raw) To: git Linus Torvalds wrote: > On Sun, 26 Mar 2006, Jakub Narebski wrote: >> >> I wonder what is the most common case in Linux kernel or git. >> >> 1.) renaming the file in the same directory, old-file.c to new-file.c? >> 2.) moving file to other directory (project reorganization), >> old-dir/file.c to new-dir/file.c? > The kernel uses subdirectories extensively, and a lot of renames (most of > them, I'd say) is because of that subdirectory structure. > > So the same-directory case is the unusual one, I'd say. If (2) is common enough then discussed improvements to rename detection, namely comparing basenames as a base for candidate selection is a good idea. I wonder how common is (2) compared to (1)+(2) i.e. move to other dir and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c >> 3.) splitting file into modules, huge-file.c to file1.c, file2.c? >> 4.) copying fragment of one file to other? >> 5.) moving fragment of code from one file to other? > > I'd say that (5) is very common. And (4) happens a lot under certain > circumstances (new driver, new architecture, new filesystem..). > > Doing (3) happens, but probably less often that it should ;/ Detecting (4) and (5) fast (i.e. for merges) without auxilary (helper) information would probably be hard. For interrogation/porcellanish commands (like pickaxe) would probably be easier. -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 17:10 ` Jakub Narebski @ 2006-03-26 18:10 ` Linus Torvalds 2006-03-26 19:22 ` Marco Costalba 2006-03-27 6:55 ` Jakub Narebski 0 siblings, 2 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 18:10 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Sun, 26 Mar 2006, Jakub Narebski wrote: > > If (2) is common enough then discussed improvements to rename detection, > namely comparing basenames as a base for candidate selection is a good idea. BK had this "renametool" which got started automatically when you applied a patch that removed one or more files and added one or more files, so that you could then pair up the files manually. It left the rename pairing 100% to the user, but it helped a bit by guessing what the pairing might be, and yes, it used the basenames to set up that initial guess. It worked in many cases, but it also failed in many cases. I do think it was a useful heuristic within the BK model (since the _real_ choice was left to the user), but I don't think it's very useful for git. The thing is, the fast rename detection that is in the "next" branch really does a lot better, and it's fast enough. (If you wanted to make it even faster, but less precise, you could limit it to the first few kilobytes of file contents - still a _lot_ better heuristic than the actual filename, and it would make the worst-case behaviour much better). > I wonder how common is (2) compared to (1)+(2) i.e. move to other dir > and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c I don't have any numbers, but from usign renametool for a few years, my gut feel/recollection is that about half of renames in the kernel were moving to a new directory, and about half changed names (often in _addition_ to moving). But I didn't much think about it, so that's just a very rough guess based on using a tool that helped you do these things manually. For example, one common case was a directory structure like .. type-file1.c type-file2.c otherfiles.c yet-more.c .. being split up into a subdirectory .. type/file1.c type/file2.c otherfiles.c yet-more.c .. (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it contained some piece of data that became redundant with the move. > >> 3.) splitting file into modules, huge-file.c to file1.c, file2.c? > >> 4.) copying fragment of one file to other? > >> 5.) moving fragment of code from one file to other? > > > > I'd say that (5) is very common. And (4) happens a lot under certain > > circumstances (new driver, new architecture, new filesystem..). > > > > Doing (3) happens, but probably less often that it should ;/ > > Detecting (4) and (5) fast (i.e. for merges) without auxilary (helper) > information would probably be hard. For interrogation/porcellanish commands > (like pickaxe) would probably be easier. Yes. I don't think we necessarily want to merge automatically across things like that, even if it sounds like something you'd want in a perfect world. Stupid and obvious (and fails) is often better than smart and complex (and succeeds), because at least you _understand_ what happens. But _following_ a particular change back is important, and should be both efficient and simple to do. Ie the example tool I talked about in http://article.gmane.org/gmane.comp.version-control.git/217 is still relevant and important, I think. I literally think that people wouldn't even _want_ a "git annotate", if they instead had more of a visual tool that showed the current state of the file, and you could click on a line/set of lines to follow it back to the previous change to that area. I'd argue that almost always when you want "annotate", you already have the particular place that you want to look at in mind (you're really not interested in the whole file). So wouldn't it be _much_ nicer to have a "graphical git-whatchanged", where you just delve deeper (and you don't even look at the whole file like git-whatchanged does, but you ask for a very particular region). Ie, what I imagine would be something gitk/qgit like, where you see the file content, select a line or two (or a whole function), and it goes back in history and shows you the last diff that changed that line/two/function. We can do that EFFICIENTLY. Much more efficiently than git-annotate, in fact. And then when you see the diff, you might say "I'm not interested in this one, that was just a re-indent" and then continue back. THAT is the kind of graphical tool I'd want. And dammit, it should even be _easy_. I'm just a total clutz myself when it comes to doing things like QT or nice tcl/tk text-panes, and this really does have to be visual, since the whole point is that "select text" and interactive part. So if somebody wants to be a hero, and feels comfortable with those kinds of things, this really should be a fairly straightforward thing to do (it would be useful even without rename detection or data movement detection, but it's also something where you really _could_ do efficient data movement detection by just looking at the "whole diff" when something changed in that small area). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 18:10 ` Linus Torvalds @ 2006-03-26 19:22 ` Marco Costalba 2006-03-26 22:23 ` Linus Torvalds 2006-03-27 6:55 ` Jakub Narebski 1 sibling, 1 reply; 41+ messages in thread From: Marco Costalba @ 2006-03-26 19:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git On 3/26/06, Linus Torvalds <torvalds@osdl.org> wrote: > > > So wouldn't it be _much_ nicer to have a "graphical git-whatchanged", > where you just delve deeper (and you don't even look at the whole file > like git-whatchanged does, but you ask for a very particular region). > > Ie, what I imagine would be something gitk/qgit like, where you see the > file content, select a line or two (or a whole function), and it goes back > in history and shows you the last diff that changed that > line/two/function. We can do that EFFICIENTLY. Much more efficiently than > git-annotate, in fact. And then when you see the diff, you might say "I'm > not interested in this one, that was just a re-indent" and then continue > back. > > THAT is the kind of graphical tool I'd want. And dammit, it should even be > _easy_. I'm just a total clutz myself when it comes to doing things like > QT or nice tcl/tk text-panes, and this really does have to be visual, > since the whole point is that "select text" and interactive part. > > So if somebody wants to be a hero, and feels comfortable with those kinds > of things, this really should be a fairly straightforward thing to do (it > would be useful even without rename detection or data movement detection, > but it's also something where you really _could_ do efficient data > movement detection by just looking at the "whole diff" when something > changed in that small area). > I am a thousand miles away from being an hero (and glad of it), but.... I really need a bit of feedback or comment about this because IMHO qgit annotate is *almost* very similar to what you would ask, so I need to understand well the difference: FIRST WAY After annotating a file history (double click on a file name in bottom-right window or directly from tree view), you see the whole file annotated. If you have the diff window open you see also the corresponding patch (scrolled to selected file name). Now, double clicking on the chosen code line in file content makes currently two things: - Diff window is updated to show corresponding revision patch, i.e. the last patch that modified that line of code. - File content, as well as file annotation, changes to show the content of the file just after the patch was applied, from there it is normally possible to go back in the history of that code region in the same way, i.e. double clicking on interesting lines. Biggest limitation of 'annotation browsing' is that 'code removing only' patches are not annotated and you need to check them directly in diff window. SECOND WAY Without opening the file viewer it is possible to select a file (or more then one or one directory) from tree view and press magic wand button. This causes main view to be updated with git-rev-list -- <selected paths> content, i.e. a filtered view. With diff viewer window open you can browse across file patch history related to chosen file. Biggest limitation is that all the revisions who touch the file are shown, not only the ones limited to a selected region. IF I HAVE UNDERSTOOD... If I have understood what you would like to see it something like the following: - From diff/file viewer window select a code region. - Press Magic wand button and feed git-rev-list with <selected path> _and_ <selected content> - Show git-rev-list output on main window as usual, but now selected revisions are filtered out not only for path but also for region of code touched. Am I guessing correctly? Marco ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 19:22 ` Marco Costalba @ 2006-03-26 22:23 ` Linus Torvalds 2006-03-27 5:47 ` Marco Costalba 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 22:23 UTC (permalink / raw) To: Marco Costalba; +Cc: Jakub Narebski, git On Sun, 26 Mar 2006, Marco Costalba wrote: > > FIRST WAY > > After annotating a file history (double click on a file name in > bottom-right window or directly from tree view), you see the whole > file annotated. If you have the diff window open you see also the > corresponding patch (scrolled to selected file name). The problem is that this step is already _way_ too expensive. I don't want to work with any tool that makes "Step 1" take a minute or two for a project that has a few years of history. Try it on the linux historic project with some file that gets lots of modifications. In other words, starting off with "annotate" is MUCH too expensive. You should start off basically with "git-whatchanged". Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 22:23 ` Linus Torvalds @ 2006-03-27 5:47 ` Marco Costalba 2006-03-27 6:46 ` Junio C Hamano 2006-03-27 8:07 ` Linus Torvalds 0 siblings, 2 replies; 41+ messages in thread From: Marco Costalba @ 2006-03-27 5:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote: > > > On Sun, 26 Mar 2006, Marco Costalba wrote: > > > > FIRST WAY > > > > After annotating a file history (double click on a file name in > > bottom-right window or directly from tree view), you see the whole > > file annotated. If you have the diff window open you see also the > > corresponding patch (scrolled to selected file name). > > The problem is that this step is already _way_ too expensive. > > I don't want to work with any tool that makes "Step 1" take a minute or > two for a project that has a few years of history. Try it on the linux > historic project with some file that gets lots of modifications. > Historic Linux test (63428 revisions) File: drivers/net/tg3.c Revisions that modify tg3.c : 292 With qgit 15s to retrieve file history (git-rev-list) 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one) and... $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (797major+43033minor)pagefaults 0swaps NOTE: It seems that git-whatchanged asks for checked the out file to work. It didn't work with no repository checked out. Marco ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 5:47 ` Marco Costalba @ 2006-03-27 6:46 ` Junio C Hamano 2006-03-27 8:07 ` Linus Torvalds 1 sibling, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2006-03-27 6:46 UTC (permalink / raw) To: Marco Costalba; +Cc: git "Marco Costalba" <mcostalba@gmail.com> writes: > NOTE: It seems that git-whatchanged asks for checked the out file to > work. It didn't work with no repository checked out. Perhaps, $ git-whatchanged HEAD -- drivers/net/tg3.c as Linus explained in a separate message today... ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 5:47 ` Marco Costalba 2006-03-27 6:46 ` Junio C Hamano @ 2006-03-27 8:07 ` Linus Torvalds 2006-03-27 11:19 ` Marco Costalba 2006-03-27 11:55 ` Marco Costalba 1 sibling, 2 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-27 8:07 UTC (permalink / raw) To: Marco Costalba; +Cc: Jakub Narebski, git On Mon, 27 Mar 2006, Marco Costalba wrote: > > Historic Linux test (63428 revisions) > > File: drivers/net/tg3.c > Revisions that modify tg3.c : 292 > > With qgit > 15s to retrieve file history (git-rev-list) > 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one) .. and it does absolutely _nothing_ while it's doing that, does it? > $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null > 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (797major+43033minor)pagefaults 0swaps In contrast, git-whatchanged will start outputting the recent changes immediately. And that's the point. Almost always, we're interested in the _recent_ stuff. The fact that it takes longer to get the old history is not very important. You generally don't ask "what changed in this file" for a file that hasn't changed in five years. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 8:07 ` Linus Torvalds @ 2006-03-27 11:19 ` Marco Costalba 2006-03-27 11:30 ` Johannes Schindelin 2006-03-27 16:52 ` Linus Torvalds 2006-03-27 11:55 ` Marco Costalba 1 sibling, 2 replies; 41+ messages in thread From: Marco Costalba @ 2006-03-27 11:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote: > > > On Mon, 27 Mar 2006, Marco Costalba wrote: > > > > Historic Linux test (63428 revisions) > > > > File: drivers/net/tg3.c > > Revisions that modify tg3.c : 292 > > > > With qgit > > 15s to retrieve file history (git-rev-list) > > 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one) > > .. and it does absolutely _nothing_ while it's doing that, does it? > yes, it's true. > > $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null > > 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k > > 0inputs+0outputs (797major+43033minor)pagefaults 0swaps > > In contrast, git-whatchanged will start outputting the recent changes > immediately. > > And that's the point. Almost always, we're interested in the _recent_ > stuff. The fact that it takes longer to get the old history is not very > important. You generally don't ask "what changed in this file" for a file > that hasn't changed in five years. > We could run git-rev-list with a time range specifier (changes of last year as example) by default so to have fast results and run all time history _only_ on request. This perhaps could solve the fast output for recent revs problem, if this is the problem. I still think the problem with annotation is that you don't see patches that _remove_ lines of code, you need the whole diff for this. Marco ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 11:19 ` Marco Costalba @ 2006-03-27 11:30 ` Johannes Schindelin 2006-03-27 16:52 ` Linus Torvalds 1 sibling, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2006-03-27 11:30 UTC (permalink / raw) To: Marco Costalba; +Cc: git Hi, On Mon, 27 Mar 2006, Marco Costalba wrote: > I still think the problem with annotation is that you don't see > patches that _remove_ lines of code, you need the whole diff for this. Interesting. You'd need a "git-emalb" (blame, but reverse), and you'd need to tell it a range "rev1..rev2" which is *not* to be interpreted as "^rev1 rev2" but as a direct path from rev1 to rev2. Ciao, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 11:19 ` Marco Costalba 2006-03-27 11:30 ` Johannes Schindelin @ 2006-03-27 16:52 ` Linus Torvalds 1 sibling, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-27 16:52 UTC (permalink / raw) To: Marco Costalba; +Cc: Jakub Narebski, git On Mon, 27 Mar 2006, Marco Costalba wrote: > > > > And that's the point. Almost always, we're interested in the _recent_ > > stuff. The fact that it takes longer to get the old history is not very > > important. You generally don't ask "what changed in this file" for a file > > that hasn't changed in five years. > > We could run git-rev-list with a time range specifier (changes of last > year as example) by default so to have fast results and run all time > history _only_ on request. Yes. However, what I've been meaning to do (but just haven't had the time and energy for so far) is to fix "git-rev-list" with a path limiter. Right now that always causes things to be totally serialized, and the revision walking will first look up _all_ the history (well, it will prune out the merges) before starting to output stuff. So right now in order for "git-whatchanged" to be fast and incremental, it doesn't do any path limiting with git-rev-list at ALL, and does it all in git-diff-tree. Which is horrid. > I still think the problem with annotation is that you don't see > patches that _remove_ lines of code, you need the whole diff for this. Well, that's just another reason "annotate" sucks. If you select a range of lines, my suggested tool _would_ show you lines that got removed there, and git-whatchanged does it quite well. I really think "annotate" is _fundamentally_ a broken operation. It's not what any sane developer actually wants, and it has serious limitations (ie it depends on whole history, and it cannot show removals well). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 8:07 ` Linus Torvalds 2006-03-27 11:19 ` Marco Costalba @ 2006-03-27 11:55 ` Marco Costalba 2006-03-27 12:27 ` Andreas Ericsson 1 sibling, 1 reply; 41+ messages in thread From: Marco Costalba @ 2006-03-27 11:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote: > > In contrast, git-whatchanged will start outputting the recent changes > immediately. > To integrate git-whatchanged like functionality with filter on a specific code region, the Linus original request, I am wondering about something like this: A new git-diff-tree option --range=a..b to delimit a region, identified by code lines bounduaries. As example git-diff-tree --range=10..15 HEAD -- <path> Coud give these answers, added to standard git-diff-tree output: * 10..25 --> modified region new region bounduaries are lines from 10 to 25 15..20 --> region _NOT_ modified but new region bounduaries are lines from 15 to 20 (perhaps patch added 5 lines _before_ the region) 10..15 ---> region _NOT_ modified and lines, if any, added/removed _after_ the region * 10..15 --> modified region with the same boundiaries (as example removing trailing witespaces) With this new option of git-diff-tree becames very simple to retrieve a file history limited to a region, because the region bounduaries in ouput from one rev are feed as input in parent rev. Comments? Marco ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 11:55 ` Marco Costalba @ 2006-03-27 12:27 ` Andreas Ericsson 0 siblings, 0 replies; 41+ messages in thread From: Andreas Ericsson @ 2006-03-27 12:27 UTC (permalink / raw) To: Marco Costalba; +Cc: Linus Torvalds, Jakub Narebski, git Marco Costalba wrote: > On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote: > >>In contrast, git-whatchanged will start outputting the recent changes >>immediately. >> > > > To integrate git-whatchanged like functionality with filter on a > specific code region, the Linus original request, I am wondering about > something like this: > > A new git-diff-tree option --range=a..b to delimit a region, > identified by code lines bounduaries. > Make it --line-range if you implement this. On a first glance I thought you meant --commit-range. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 18:10 ` Linus Torvalds 2006-03-26 19:22 ` Marco Costalba @ 2006-03-27 6:55 ` Jakub Narebski 2006-03-27 7:40 ` David Lang 1 sibling, 1 reply; 41+ messages in thread From: Jakub Narebski @ 2006-03-27 6:55 UTC (permalink / raw) To: git Linus Torvalds wrote: > On Sun, 26 Mar 2006, Jakub Narebski wrote: >> >> If (2) is common enough then discussed improvements to rename detection, >> namely comparing basenames as a base for candidate selection is a good >> idea. > > BK had this "renametool" which got started automatically when you applied > a patch that removed one or more files and added one or more files, so > that you could then pair up the files manually. [...] > The thing is, the fast rename detection that is in the "next" branch > really does a lot better, and it's fast enough. I was thinking about the fast ename detection algorithm in "next" branch. That is the question if recording additional (helper) information about contents copying and moving like the mentioned "renametool" did is worth the effort, both in coding it and from user's point of view. Or would better contents copying and moving detection ("renames detection") for whatchanged and similar suffice. I am of opinion that voluntary information about contents moving and copying in the commits would help. Purposes: 1.) Record contents moving and similarity information which cannot or cannot be easily calculated; see Paul Jakma response in this thread MessageID: <Pine.LNX.4.64.0603270642090.5276@sheen.jakma.org> for example copying fragment of code, small fragment of the whole file, creating documentation or header file from code, or code skeleton from template, or rewrite of code in different language (e.g. shell script to perl, script to compiled code e.g. Perl or Python to C). 2.) Caching the results of similarity algorithm/rename detection tool (also Paul Jakma post), including remembering false positives and undetected renames, for efficiency. Calculated automatically parts might be throw-away. Sources of information: 1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp like commands (which nobody likes) and systematized (pseudolanguage?) for copying and moving contents in the log messages. 2.) Semi-manual tools like the mentioned "renametool" of BK. 3.) Support from editor (remebering where copied and pasted, or cut and pasted fragment came from, and providing prefilled command to record contents moving ("renames") or prefilled commit log containing this information. Hard to get, probably most useful. 4.) Information from resolved merges and results of diagnosis (pickaxe like) tools, especially recording "renames" which were not detected, and removing "renames" which were detected falsily. Is that the place where I should provide code (patch) for testing the idea :) ? >> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir >> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c > > For example, one common case was a directory structure like > > .. > type-file1.c > type-file2.c > otherfiles.c > yet-more.c > .. > > being split up into a subdirectory > > .. > type/file1.c > type/file2.c > otherfiles.c > yet-more.c > .. > > (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as > drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it > contained some piece of data that became redundant with the move. Perhaps fast rename detection algorithm needs some smart similarity estimate for names, which would put more weight in the parts closer to basename, and would detect */type-file1.c and */type/file1.c as similar. -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 6:55 ` Jakub Narebski @ 2006-03-27 7:40 ` David Lang 2006-03-27 7:53 ` Jakub Narebski 0 siblings, 1 reply; 41+ messages in thread From: David Lang @ 2006-03-27 7:40 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Mon, 27 Mar 2006, Jakub Narebski wrote: > 2.) Caching the results of similarity algorithm/rename detection tool (also > Paul Jakma post), including remembering false positives and undetected > renames, for efficiency. Calculated automatically parts might be > throw-away. this sounds like it could easily devolve into a O(n!) situation where you are cacheing how everything is related (or not related) to everything else. Paul was makeing the point that the purpose was to cache the data to eliminate the time needed to calculate it, but if you don't store all the results then you don't know if the result is not relavent, or unknown, so you need to calculate it again. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-27 7:40 ` David Lang @ 2006-03-27 7:53 ` Jakub Narebski 0 siblings, 0 replies; 41+ messages in thread From: Jakub Narebski @ 2006-03-27 7:53 UTC (permalink / raw) To: git David Lang wrote: > On Mon, 27 Mar 2006, Jakub Narebski wrote: > >> 2.) Caching the results of similarity algorithm/rename detection tool >> (also Paul Jakma post), including remembering false positives and >> undetected renames, for efficiency. Calculated automatically parts might >> be throw-away. > > this sounds like it could easily devolve into a O(n!) situation where you > are cacheing how everything is related (or not related) to everything > else. Paul was makeing the point that the purpose was to cache the data to > eliminate the time needed to calculate it, but if you don't store all the > results then you don't know if the result is not relavent, or unknown, so > you need to calculate it again. First of all, you only remember non-trivial relations (i.e. file.c is always related to file.c). If the cache would be only for commits, it would be O(c*p*n), where c is number of commits, p is percentage of contents moving ("renames") times percent of files changed in the commit, and n is the number of files, probably O(c) practically. Even if we remember for all (tree1,tree2) pairs it would be O(c^2). Additionally cache can be limited in size (pruning oldest cache). -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 1:49 Following renames Petr Baudis 2006-03-26 2:49 ` Junio C Hamano @ 2006-03-26 3:19 ` Linus Torvalds 2006-03-26 7:35 ` Ryan Anderson 2006-03-26 10:07 ` Petr Baudis 1 sibling, 2 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 3:19 UTC (permalink / raw) To: Petr Baudis; +Cc: git On Sun, 26 Mar 2006, Petr Baudis wrote: > > In [1], Linus suggests a non-core solution. Unfortunately, it doesn't > fly - it requires at least two git-ls-tree calls per revision which > would bog things down awfully (to roughly half of the original speed). No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED. In other words, basically never. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 3:19 ` Linus Torvalds @ 2006-03-26 7:35 ` Ryan Anderson 2006-03-26 21:09 ` Petr Baudis 2006-03-26 10:07 ` Petr Baudis 1 sibling, 1 reply; 41+ messages in thread From: Ryan Anderson @ 2006-03-26 7:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git [-- Attachment #1: Type: text/plain, Size: 646 bytes --] Linus Torvalds wrote: > On Sun, 26 Mar 2006, Petr Baudis wrote: > >> In [1], Linus suggests a non-core solution. Unfortunately, it doesn't >> fly - it requires at least two git-ls-tree calls per revision which >> would bog things down awfully (to roughly half of the original speed). >> > > No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED. > > In other words, basically never. > A simple example is the first loop in git-annotate.perl. (Which was basically written by Linus, I just translated it from a shell/pseudo-code example into Perl) -- Ryan Anderson sometimes Pug Majere [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 7:35 ` Ryan Anderson @ 2006-03-26 21:09 ` Petr Baudis 0 siblings, 0 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 21:09 UTC (permalink / raw) To: Ryan Anderson; +Cc: Linus Torvalds, git Dear diary, on Sun, Mar 26, 2006 at 09:35:02AM CEST, I got a letter where Ryan Anderson <ryan@michonline.com> said that... > Linus Torvalds wrote: > > On Sun, 26 Mar 2006, Petr Baudis wrote: > > > >> In [1], Linus suggests a non-core solution. Unfortunately, it doesn't > >> fly - it requires at least two git-ls-tree calls per revision which > >> would bog things down awfully (to roughly half of the original speed). > >> > > > > No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED. > > > > In other words, basically never. > > > > A simple example is the first loop in git-annotate.perl. (Which was > basically written by Linus, I just translated it from a > shell/pseudo-code example into Perl) One case it does not handle: 2 -- b -- 1 / \ 6 a d \ 3 5 / c --- d git-rev-list at 6 will (understandably) show 6 5 5 and you will never detect the d -> b rename leading to 2. This is one reason why I'm actually not using --parents and pipe stuff directly to git-diff-tree --stdin -M and then read its output. This also lets me merge parallel lines of development based on date and I don't have to fork per each file deletion. With any luck I'll have the first draft of my (also perlish) script done this evening yet. (BTW, it has the same output format as git-rev-list | git-diff-tree --pretty=raw -M so with some tweaking it could also serve as a git-whatchanged backend. Actually, it would be nice to have it in core Git in the long term so that it gets all the portability tweaks and such.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 3:19 ` Linus Torvalds 2006-03-26 7:35 ` Ryan Anderson @ 2006-03-26 10:07 ` Petr Baudis 2006-03-26 10:34 ` Fredrik Kuivinen 2006-03-26 16:33 ` Linus Torvalds 1 sibling, 2 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 10:07 UTC (permalink / raw) To: Linus Torvalds, Ryan Anderson; +Cc: git Dear diary, on Sun, Mar 26, 2006 at 05:19:50AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> said that... > On Sun, 26 Mar 2006, Petr Baudis wrote: > > > > In [1], Linus suggests a non-core solution. Unfortunately, it doesn't > > fly - it requires at least two git-ls-tree calls per revision which > > would bog things down awfully (to roughly half of the original speed). > > No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED. > > In other words, basically never. Huh? I don't see that now (and caps don't help me see it better). That's certainly not what is in [1], and I don't see _how_ to detect the renames in this case, and what would I be actually doing git-ls-tree for when I've already detected the rename. Based on [1], I'd be doing git-ls-tree merely to detect that a file _disappeared_ in the first place, I have to do other stuff to detect the renames themselves. Dear diary, on Sun, Mar 26, 2006 at 09:35:02AM CEST, I got a letter where Ryan Anderson <ryan@michonline.com> said that... > A simple example is the first loop in git-annotate.perl. (Which was > basically written by Linus, I just translated it from a > shell/pseudo-code example into Perl) Thanks for the hint. Unfortunately, this is precisely the thing I want to avoid, that is essentially reimplementing part of git-rev-list - to do something good, I would have to do my own toposort and merge by date between parallel lines. OTOH, I might just construct a large revlist commandline specifying all the segments I'm interested in and see what happens when I run that. Besides, doing it in shell would be pretty ugly job (forcing me to finally rewrite it in perl is not a bad thing but that'd be a somewhat larger project since I share various common routines with other cg tools, etc). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 10:07 ` Petr Baudis @ 2006-03-26 10:34 ` Fredrik Kuivinen 2006-03-26 16:33 ` Linus Torvalds 1 sibling, 0 replies; 41+ messages in thread From: Fredrik Kuivinen @ 2006-03-26 10:34 UTC (permalink / raw) To: Petr Baudis; +Cc: Linus Torvalds, Ryan Anderson, git On Sun, Mar 26, 2006 at 12:07:17PM +0200, Petr Baudis wrote: > Dear diary, on Sun, Mar 26, 2006 at 05:19:50AM CEST, I got a letter > where Linus Torvalds <torvalds@osdl.org> said that... > > On Sun, 26 Mar 2006, Petr Baudis wrote: > > > > > > In [1], Linus suggests a non-core solution. Unfortunately, it doesn't > > > fly - it requires at least two git-ls-tree calls per revision which > > > would bog things down awfully (to roughly half of the original speed). > > > > No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED. > > > > In other words, basically never. > > Huh? I don't see that now (and caps don't help me see it better). That's > certainly not what is in [1], and I don't see _how_ to detect the > renames in this case, and what would I be actually doing git-ls-tree for > when I've already detected the rename. Based on [1], I'd be doing > git-ls-tree merely to detect that a file _disappeared_ in the first > place, I have to do other stuff to detect the renames themselves. > > Dear diary, on Sun, Mar 26, 2006 at 09:35:02AM CEST, I got a letter > where Ryan Anderson <ryan@michonline.com> said that... > > A simple example is the first loop in git-annotate.perl. (Which was > > basically written by Linus, I just translated it from a > > shell/pseudo-code example into Perl) > > Thanks for the hint. Unfortunately, this is precisely the thing I want > to avoid, that is essentially reimplementing part of git-rev-list - to > do something good, I would have to do my own toposort and merge by date > between parallel lines. OTOH, I might just construct a large revlist > commandline specifying all the segments I'm interested in and see what > happens when I run that. > > Besides, doing it in shell would be pretty ugly job (forcing me to > finally rewrite it in perl is not a bad thing but that'd be a somewhat > larger project since I share various common routines with other cg > tools, etc). > If you decide to modify rev-list to do rename tracking you might want to have a look at how this is done in blame.c. git-blame only tracks one file (since that is what it needs) but I think it should be possible to track multiple files with a similar approach. - Fredrik ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 10:07 ` Petr Baudis 2006-03-26 10:34 ` Fredrik Kuivinen @ 2006-03-26 16:33 ` Linus Torvalds 2006-03-26 19:14 ` Petr Baudis 1 sibling, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 16:33 UTC (permalink / raw) To: Petr Baudis; +Cc: Ryan Anderson, git On Sun, 26 Mar 2006, Petr Baudis wrote: > > Huh? I don't see that now (and caps don't help me see it better). That's > certainly not what is in [1], and I don't see _how_ to detect the > renames in this case, and what would I be actually doing git-ls-tree for > when I've already detected the rename. Based on [1], I'd be doing > git-ls-tree merely to detect that a file _disappeared_ in the first > place, I have to do other stuff to detect the renames themselves. No, the point is that "git-rev-list" already does all of [1] in the core. If you do git-rev-list --parents --remove-empty $REV -- $filename then you'll get the whole history for that filename. When it ends, you know the file went away, and then you do basically _one_ "where the hell did it go" thing. And yes, it's not git-ls-tree (unless you only want to follow pure renames), it's actually one "git-diff-tree -M $lastrev". Then you just continue with the new filename (and do another "git-rev-list" until you hit the next rename). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 16:33 ` Linus Torvalds @ 2006-03-26 19:14 ` Petr Baudis 2006-03-26 20:31 ` Petr Baudis ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 19:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ryan Anderson, git Dear diary, on Sun, Mar 26, 2006 at 06:33:13PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> said that... > If you do > > git-rev-list --parents --remove-empty $REV -- $filename > > then you'll get the whole history for that filename. When it ends, you > know the file went away, and then you do basically _one_ "where the hell > did it go" thing. > > And yes, it's not git-ls-tree (unless you only want to follow pure > renames), it's actually one "git-diff-tree -M $lastrev". Then you just > continue with the new filename (and do another "git-rev-list" until you > hit the next rename). I wrote a long rant but then it all suddenly fit together and I have now an idea how to implement it reasonably elegantly. So only a bugreport remains: My current target is to support this tree (letters are filenames, numbers are commit ids; I'll translate any git output to those digits): 2 4 b -- d 1 / \ 6 a d \ 3 5 / c -- d With the commits created in the numerical order (so log shows 1,2,3,4,5,6, and my target is cg-log d showing the same output). If anyone wants the sample history, it's at http://pasky.or.cz/~xpasky/renametree1.git/ Curiously, git-rev-list does something totally strange when trying to list per-file history at this point: $ git-rev-list HEAD -- d 4 Huh? (It should list 6, 5, 4 instead.) I worked it around by recording a change in d in the merge 6: http://pasky.or.cz/~xpasky/renametree2.git/ $ git-rev-list --parents --remove-empty HEAD -- d 6 4 5 5 4 Which is the expected output. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 19:14 ` Petr Baudis @ 2006-03-26 20:31 ` Petr Baudis 2006-03-26 22:22 ` Linus Torvalds 2006-03-26 23:26 ` Petr Baudis 2 siblings, 0 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 20:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ryan Anderson, git Dear diary, on Sun, Mar 26, 2006 at 09:14:45PM CEST, I got a letter where Petr Baudis <pasky@suse.cz> said that... > Curiously, git-rev-list does something totally strange when trying to > list per-file history at this point: > > $ git-rev-list HEAD -- d > 4 > > Huh? (It should list 6, 5, 4 instead.) Obviously not 6 since the file was not changed in that revision, but I'd still expect it to list 5. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 19:14 ` Petr Baudis 2006-03-26 20:31 ` Petr Baudis @ 2006-03-26 22:22 ` Linus Torvalds 2006-03-26 22:31 ` Petr Baudis 2006-03-26 23:26 ` Petr Baudis 2 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 22:22 UTC (permalink / raw) To: Petr Baudis; +Cc: Ryan Anderson, git On Sun, 26 Mar 2006, Petr Baudis wrote: > > My current target is to support this tree (letters are filenames, > numbers are commit ids; I'll translate any git output to those digits): > > 2 4 > b -- d > 1 / \ 6 > a d > \ 3 5 / > c -- d Yeah, the problem with this is that you need to track separate names across separate points. However: > Curiously, git-rev-list does something totally strange when trying to > list per-file history at this point: > > $ git-rev-list HEAD -- d > 4 > > Huh? (It should list 6, 5, 4 instead.) What it does is list the points where file "d" _changed_. "d" did not change in 6 - it had a parent commit (4) where "d" had the same contents (in fact, it likely had _two_ parents where it had the same contents, but git will pick the first one). So commit "6" is uninteresting, and commit "5" will never even be looked at, since we decided that the history of "d" comes from the first parent with the same contents. So then it lists "4", because file "d" really did change in that commit (it went away). Now you need to look at "4" and find the rename (which gives you 2) and then from there you do rename detection and get (1), and as a result your change history should end up being (1)a -> (2)b -> (4)d (-> 6(d) which was your start point) which is correct (now, there are other histories _too_ that get us to the same point, but the one you found this way was _a_ history). > I worked it around by recording a change in d in the merge 6: > > http://pasky.or.cz/~xpasky/renametree2.git/ > > $ git-rev-list --parents --remove-empty HEAD -- d > 6 4 5 > 5 > 4 > > Which is the expected output. No, it's the expected output just because you expected merges to always show up. Merges get ignored if any of the parents have the same content already. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 22:22 ` Linus Torvalds @ 2006-03-26 22:31 ` Petr Baudis 2006-03-26 22:43 ` Junio C Hamano 2006-03-26 23:09 ` Linus Torvalds 0 siblings, 2 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-26 22:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ryan Anderson, git Dear diary, on Mon, Mar 27, 2006 at 12:22:04AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> said that... > So commit "6" is uninteresting, and commit "5" will never even be > looked at, since we decided that the history of "d" comes from the > first parent with the same contents. And this is the thing I have a problem with - this does not make much sense to me, why can't we just follow all parents instead of arbitrarily choosing one of them? > which is correct (now, there are other histories _too_ that get us to the > same point, but the one you found this way was _a_ history). Ok, in that case I want the _full_ history. :-) > No, it's the expected output just because you expected merges to always > show up. Merges get ignored if any of the parents have the same content > already. Eek. Can I avoid that? What was the reason for choosing this behavior? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 22:31 ` Petr Baudis @ 2006-03-26 22:43 ` Junio C Hamano 2006-03-26 23:10 ` Linus Torvalds 2006-03-26 23:09 ` Linus Torvalds 1 sibling, 1 reply; 41+ messages in thread From: Junio C Hamano @ 2006-03-26 22:43 UTC (permalink / raw) To: Petr Baudis; +Cc: git, Linus Torvalds Petr Baudis <pasky@suse.cz> writes: >> No, it's the expected output just because you expected merges to always >> show up. Merges get ignored if any of the parents have the same content >> already. > > Eek. Can I avoid that? What was the reason for choosing this behavior? Perhaps rev-list --sparse? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 22:43 ` Junio C Hamano @ 2006-03-26 23:10 ` Linus Torvalds 2006-03-27 7:30 ` Junio C Hamano 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 23:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Petr Baudis, git On Sun, 26 Mar 2006, Junio C Hamano wrote: > Petr Baudis <pasky@suse.cz> writes: > > >> No, it's the expected output just because you expected merges to always > >> show up. Merges get ignored if any of the parents have the same content > >> already. > > > > Eek. Can I avoid that? What was the reason for choosing this behavior? > > Perhaps rev-list --sparse? No. "--sparse" still removes the uninteresting parents of merges. It just doesn't then make the linear history any denser. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 23:10 ` Linus Torvalds @ 2006-03-27 7:30 ` Junio C Hamano 0 siblings, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2006-03-27 7:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git Linus Torvalds <torvalds@osdl.org> writes: > No. "--sparse" still removes the uninteresting parents of merges. It just > doesn't then make the linear history any denser. Hmph, you are right. add_parents_to_list() calls prune_fn unconditionally while running limit_list(). Disabling that with yet another flag might be a possibility but I suspect then it would not be much different from running rev-list without path limiter and having the caller process the result. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 22:31 ` Petr Baudis 2006-03-26 22:43 ` Junio C Hamano @ 2006-03-26 23:09 ` Linus Torvalds 1 sibling, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2006-03-26 23:09 UTC (permalink / raw) To: Petr Baudis; +Cc: Ryan Anderson, git On Mon, 27 Mar 2006, Petr Baudis wrote: > Dear diary, on Mon, Mar 27, 2006 at 12:22:04AM CEST, I got a letter > where Linus Torvalds <torvalds@osdl.org> said that... > > So commit "6" is uninteresting, and commit "5" will never even be > > looked at, since we decided that the history of "d" comes from the > > first parent with the same contents. > > And this is the thing I have a problem with - this does not make much > sense to me, why can't we just follow all parents instead of arbitrarily > choosing one of them? Sure, you can. It's _usually_ a huge waste of time, though. Why would you want to do more work than you need, since clearly the other parent was _not_ interesting from the standpoint of the question "where did this content come from"? > > No, it's the expected output just because you expected merges to always > > show up. Merges get ignored if any of the parents have the same content > > already. > > Eek. Can I avoid that? What was the reason for choosing this behavior? Huge efficiency gains. Lookie here. Do gitk -- rev-list.c on the git archive with the current git-rev-list, and with your hacked-up version. And tell me my version isn't a hell of a lot better. Because, I guarantee you, it is. We're just not _interested_ in all those merges that didn't actually make any difference. Read up on what modern neuro-science thinks about the human brain, and what a lot of it is about. It's about ignoring irrelevant information. The ability to throw stuff out that isn't interesting is the _real_ basis of true intelligence. I'd rather have git do the _intelligent_ history, than show history that isn't relevant and workign harder doing so. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 19:14 ` Petr Baudis 2006-03-26 20:31 ` Petr Baudis 2006-03-26 22:22 ` Linus Torvalds @ 2006-03-26 23:26 ` Petr Baudis 2006-03-27 21:59 ` Petr Baudis 2 siblings, 1 reply; 41+ messages in thread From: Petr Baudis @ 2006-03-26 23:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ryan Anderson, git Dear diary, on Sun, Mar 26, 2006 at 09:14:45PM CEST, I got a letter where Petr Baudis <pasky@suse.cz> said that... > Dear diary, on Sun, Mar 26, 2006 at 06:33:13PM CEST, I got a letter > where Linus Torvalds <torvalds@osdl.org> said that... > > If you do > > > > git-rev-list --parents --remove-empty $REV -- $filename > > > > then you'll get the whole history for that filename. When it ends, you > > know the file went away, and then you do basically _one_ "where the hell > > did it go" thing. > > > > And yes, it's not git-ls-tree (unless you only want to follow pure > > renames), it's actually one "git-diff-tree -M $lastrev". Then you just > > continue with the new filename (and do another "git-rev-list" until you > > hit the next rename). > > I wrote a long rant but then it all suddenly fit together and I have now > an idea how to implement it reasonably elegantly. So, this is what I have. Testing (I've gave it very little of that) and thoughts welcome. It is probably pretty efficient, at least in terms of fork()s it does only 2*N of them where N is the number of commits containing interesting renames. Actually, this should be even possible to reduce to N+1 if you do a single git-diff-tree call and multiplex different git-rev-lists to it, but I'm too tired to do the trickery now. It has 'cg' in the name but depends on no Cogito stuff; it should be in fact possible to trivially put it to git-whatchanged in place of the final pipeline (not that I'd be suggesting this to be done universally, but perhaps git-whatchanged -f ...?). There are three downsides in this regard: (i) No -c support. I need the separate deltas coming out from git-diff-tree but I think I can join them together pretty easily on my own, except that I have problems with -c (see <20060326102100.GF18185@pasky.or.cz>) so I'm not sure how exactly is it supposed to behave. (ii) Only --pretty=raw output. It shouldn't be hard to add the reformatting code, but I'm personally not going to use it and kind of lazy, so I'll let someone else do that, I guess. :-) (iii) Raw deltas required. -p parsing support would be certainly useful and possible, but see (ii). To quickly see what it does, you can try it e.g. on the git-log.sh file in the Git repository. Thoughts? Opinions? Bugs? Patches? Signed-off-by: Petr Baudis <pasky@suse.cz> diff --git a/cg-Xfollowrenames b/cg-Xfollowrenames new file mode 100755 index 0000000..fa5c552 --- /dev/null +++ b/cg-Xfollowrenames @@ -0,0 +1,246 @@ +#!/usr/bin/env perl +# +# git-rev-list | git-diff-tree --stdin following renames +# Copyright (c) Petr Baudis, 2006 +# Uses bits of git-annotate.perl by Ryan Anderson. +# +# This script will efficiently show output as of the +# +# git-rev-list --remove-empty ARGS -- FILE... | +# git-diff-tree -M -r -m --stdin --pretty=raw ARGS +# +# pipeline, except that it follows renames of individual files listed +# in the FILE... set. +# +# Usage: +# +# cg-Xfollowrenames revlistargs -- difftreeargs -- revs -- files + +# TODO: Does not work on multiple files properly yet - most probably +# (I didn't test it!). We want git-rev-list to stop traversing the history +# when _any_ file disappears while now it probably stops traversing when +# _all_ files disappear. + +use warnings; +use strict; + +$| = 1; + +our (@revlist_args, @difftree_args, @revs, @files); + +{ # Load arguments + my @argp = (\@revlist_args, \@difftree_args, \@revs, \@files); + my $argi = 0; + for my $arg (@ARGV) { + if ($arg eq '--' and $argi < $#argp) { + $argi++; + next; + } + push(@{$argp[$argi]}, $arg); + } +} + + +# The heads we watch (sorted by commit time) +our @heads; +# Each head is: { +# # Persistent for the whole line of development: +# pipe => $pipe, +# files => \@files, # to watch for +# +# id => $sha1, # useful actually only for debugging +# time => $timestamp, +# str => $prettyoutput, +# parents => \@sha1s, +# +# # When the commit is processed, spawn these extra heads: +# recurse => {$sha1id => \@files, ...}, +# } + +# To avoid printing duplicate commits +# FIXME: Currently, we will not handle merge commits properly since +# we hit them multiple times. +our %commits; + + +sub open_pipe($@) { + my ($stdin, @execlist) = @_; + + my $pid = open my $kid, "-|"; + defined $pid or die "Cannot fork: $!"; + + unless ($pid) { + if (defined $stdin) { + open STDIN, "<&", $stdin or die "Cannot dup(): $!"; + } + exec @execlist; + die "Cannot exec @execlist: $!"; + } + + return $kid; +} + +sub revlist($@) { + my ($rev, @files) = @_; + open_pipe(undef, "git-rev-list", "--remove-empty", + @revlist_args, $rev, "--", @files) + or die "Failed to exec git-rev-list: $!"; +} + +sub difftree($) { + my ($revlist) = @_; + open_pipe($revlist, "git-diff-tree", "-r", "-m", "--stdin", "-M", + "--pretty=raw", @difftree_args) + or die "Failed to exec git-diff-tree: $!"; +} + +sub revdiffpipe($@) { + my ($rev, @files) = @_; + my $pipe = difftree(revlist($rev, @files)); +} + + +sub read_commit($$) { + my ($head, $tolerant) = @_; + my $pipe = $head->{'pipe'}; + my $against; + my @oldset = @{$head->{'files'}}; + my @newset; + my $rename; + + # Load header + while (my $line = <$pipe>) { + $head->{'str'} .= $line; + chomp $line; + $line eq '' and goto header_loaded; + + if ($line =~ /^diff-tree (\S+) \(from (root|\S+)\)/) { + $head->{'id'} = $1; + if (not $tolerant and $commits{$1}++) { + close $pipe; + return undef; + } + # The 'root' case is harmless since there'll be no renames. + $against = $2; + } elsif ($line =~ /^parent (\S+)/) { + push (@{$head->{'parents'}}, $1); + } elsif ($line =~ /^committer .*?> (\d+)/) { + $head->{'time'} = $1; + } + } + return undef; +header_loaded: + + # Load message + while (my $line = <$pipe>) { + $head->{'str'} .= $line; + chomp $line; + $line eq '' and goto message_loaded; + } + return undef; +message_loaded: + + # Load delta + while (my $line = <$pipe>) { + $head->{'str'} .= $line; + chomp $line; + $line eq '' and goto delta_loaded; + + $line =~ /^:/ or return undef; + my ($info, $newfile, $oldfile) = split("\t", $line); + if ($info =~ /[RC]\d*$/) { + # Behold, a rename! + # (Or a copy, it's all the same for us.) + my $i; + for ($i = 0; $i <= $#oldset; $i++) { + $oldfile eq $oldset[$i] or next; + $rename = 1; + splice(@oldset, $i, 1); + push(@newset, $newfile); + last; + } + # In case of multiple candidates, follow + # all of them: + # (TODO: This might be a policy decision + # best left on the user.) + if ($i > $#oldset and grep { $oldfile eq $_ } @newset) { + $rename = 1; + push(@newset, $newfile); + } + } elsif ($info =~ /D$/) { + # Not weeding out deleted files might cause bizarre + # results when following multiple files since + # git-rev-list weeds them out too (probably?). + @oldset = grep { $newfile ne $_ } @oldset; + @{$head->{'files'}} = grep { $newfile ne $_ } @{$head->{'files'}}; + } + } + $head->{'str'} .= "\n"; +delta_loaded: + + if ($rename) { + $head->{'recurse'}->{$against} = [@newset, @oldset]; + } + return 1; +} + +sub load_commit($) { + my ($head) = @_; + $head->{'time'} = undef; + $head->{'str'} = ''; + $head->{'parents'} = (); + + read_commit($head, 0) or return undef; + + # In case there was a merge, the commit will be multiple times + # here, each time with a different delta section. Read them all. + for (1 .. $#{$head->{'parents'}}) { # stupid vim syntax highlighting + read_commit($head, 1) or return undef; + } + + return 1; +} + + +# Add head at the proper position +sub add_head($) { + my ($head) = @_; + my $i; + for ($i = 0; $i <= $#heads; $i++) { + last if ($head->{'time'} > $heads[$i]->{'time'}) + } + splice(@heads, $i, 0, $head); +} + +# Create new head +sub init_head($@) { + my ($rev, @files) = @_; + my $head = { files => \@files, 'pipe' => revdiffpipe($rev, @files) }; + load_commit($head) or return; + add_head($head); +} + + + +{ # Seed the heads list + for my $rev (@revs) { + init_head($rev, @files); + } +} + +# Process the heads +{ + while (@heads) { + my $head = splice(@heads, 0, 1); + + print $head->{'str'}; + + foreach my $parent (keys %{$head->{'recurse'}}) { + init_head($parent, @{$head->{'recurse'}->{$parent}}); + } + $head->{'recurse'} = undef; + + load_commit($head) or next; + add_head($head); + } +} -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: Following renames 2006-03-26 23:26 ` Petr Baudis @ 2006-03-27 21:59 ` Petr Baudis 0 siblings, 0 replies; 41+ messages in thread From: Petr Baudis @ 2006-03-27 21:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ryan Anderson, git Dear diary, on Mon, Mar 27, 2006 at 01:26:49AM CEST, I got a letter where Petr Baudis <pasky@suse.cz> said that... > To quickly see what it does, you can try it e.g. on the git-log.sh file > in the Git repository. By the way, the cg-log in master uses it now to automagically follow file renames (in case you call it per-file like cg-log FILENAME). If you hate it, you can prevent it by cg-log --no-renames (cg-log -R). Looks pretty slick. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2006-03-27 22:00 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-26 1:49 Following renames Petr Baudis 2006-03-26 2:49 ` Junio C Hamano 2006-03-26 3:52 ` Jakub Narebski 2006-03-27 6:00 ` Paul Jakma 2006-03-26 10:52 ` Petr Baudis 2006-03-26 10:55 ` Petr Baudis 2006-03-26 16:08 ` Timo Hirvonen 2006-03-26 16:43 ` Linus Torvalds 2006-03-26 16:31 ` Jakub Narebski 2006-03-26 16:46 ` Linus Torvalds 2006-03-26 17:10 ` Jakub Narebski 2006-03-26 18:10 ` Linus Torvalds 2006-03-26 19:22 ` Marco Costalba 2006-03-26 22:23 ` Linus Torvalds 2006-03-27 5:47 ` Marco Costalba 2006-03-27 6:46 ` Junio C Hamano 2006-03-27 8:07 ` Linus Torvalds 2006-03-27 11:19 ` Marco Costalba 2006-03-27 11:30 ` Johannes Schindelin 2006-03-27 16:52 ` Linus Torvalds 2006-03-27 11:55 ` Marco Costalba 2006-03-27 12:27 ` Andreas Ericsson 2006-03-27 6:55 ` Jakub Narebski 2006-03-27 7:40 ` David Lang 2006-03-27 7:53 ` Jakub Narebski 2006-03-26 3:19 ` Linus Torvalds 2006-03-26 7:35 ` Ryan Anderson 2006-03-26 21:09 ` Petr Baudis 2006-03-26 10:07 ` Petr Baudis 2006-03-26 10:34 ` Fredrik Kuivinen 2006-03-26 16:33 ` Linus Torvalds 2006-03-26 19:14 ` Petr Baudis 2006-03-26 20:31 ` Petr Baudis 2006-03-26 22:22 ` Linus Torvalds 2006-03-26 22:31 ` Petr Baudis 2006-03-26 22:43 ` Junio C Hamano 2006-03-26 23:10 ` Linus Torvalds 2006-03-27 7:30 ` Junio C Hamano 2006-03-26 23:09 ` Linus Torvalds 2006-03-26 23:26 ` Petr Baudis 2006-03-27 21:59 ` Petr Baudis
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.