From mboxrd@z Thu Jan 1 00:00:00 1970 From: Carl Worth Subject: Two crazy proposals for changing git's diff commands Date: Wed, 08 Feb 2006 16:29:44 -0800 Message-ID: <87slqtcr2f.wl%cworth@cworth.org> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Feb__8_16:29:36_2006-1"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit X-From: git-owner@vger.kernel.org Thu Feb 09 01:30:54 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1F6zhr-00076S-99 for gcvg-git@gmane.org; Thu, 09 Feb 2006 01:30:43 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422726AbWBIAak (ORCPT ); Wed, 8 Feb 2006 19:30:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422727AbWBIAak (ORCPT ); Wed, 8 Feb 2006 19:30:40 -0500 Received: from theworths.org ([217.160.253.102]:63670 "EHLO theworths.org") by vger.kernel.org with ESMTP id S1422726AbWBIAaj (ORCPT ); Wed, 8 Feb 2006 19:30:39 -0500 Received: (qmail 29533 invoked from network); 8 Feb 2006 19:30:36 -0500 Received: from localhost (HELO raht.localdomain) (127.0.0.1) by localhost with SMTP; 8 Feb 2006 19:30:36 -0500 To: git@vger.kernel.org User-Agent: Wanderlust/2.14.0 (Africa) Emacs/21.4 Mule/5.0 (SAKAKI) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: --pgp-sign-Multipart_Wed_Feb__8_16:29:36_2006-1 Content-Type: text/plain; charset=US-ASCII So, here I am as a newly converted index-embracer---no more index denying from me. However, I'm still trying to wrap my brain around the various diff commands that git provides and how they would fit into my workflow, Junio and I have touched on this already in a previous thread, but I'm starting here with more fresh and complete analysis of the UI around diff. The motivation for the *long* (my apologies) message below is largely the fact that I realized my workflow would use the following command very regularly: git diff-index --cached HEAD I would be using this before every "got commit" to get a preview, and it seems like a painfully long name for such a common operation. Even its shortcut form: git diff --cached HEAD is among the longest of the "git diff" shortcuts. My other motivation is that in spite of having what I think is a fairly good grasp of the structure of git, (the object database, the index, etc.), I still have one heck of a time trying to remember which diff commands are which. So here we go... Background ========== At a conceptual level, there are 4 diff operations each of which acts on an ordered pair (from->to) of trees. One operation takes two explicit tree objects, while the other three act on 0 or 1 explicit trees and 2 or 1 implicit trees (based on either the index or files in the working directory). Specifically, these are tree->index, index->files, and the composite of the two, tree->files. To get at these 4 different operations, git provides 4 commands named diff-tree, diff-index --cached, diff-files, and diff-index. On top of these, git also provides some syntactic sugar in the form of "diff" shortcuts for a total of 8 different diff commands. This is all summarized in the following table: Operation (from -> to) Core command Shortcut command ----------------------- ------------ --------------- diff -> diff-tree diff diff -> index diff-index --cached diff --cached diff index -> files diff-files diff diff -> files diff-index diff I think this background is fairly complete, at least as far as the functionality exposed by git-diff goes---I am ignoring git-diff-stages for now, and throughout the remainder of this message. Use cases ========= After understanding things that far, I asked myself what each of the four operations are useful for. The tree->tree case is easiest as it simply shows the difference between two trees that exist in the object store. The remaining three cases are more interesting because they provide mechanisms for querying trees that don't yet (or may never) exist in the object store. Here are the questions I have been able to come up with so far that the operations can help in answering: -> (diff-tree , or diff ) What changed between two trees? -> index (diff-index --cached , diff --cached ) When is HEAD: What will "git commit" do? index -> files (diff-files, diff) What work have I done that I haven't updated into the index yet? Or, if not manually updating the index: What will "git commit -a" do? -> files (diff-index , diff ) When is HEAD: What will "git commit -a" do? Can anyone else think of common use cases for these various operations that I've missed? Subjective comments/proposals ============================= Everything above should be pretty much objective descriptions of how things exist. From here on out, I'll start in with my opinions and hopefully some useful proposals, (ordered by increasing likelihood of being controversial). It's interesting to me that both variants of diff-index (with and without --cached) require a tree argument, while at the same time, both variants seem most useful when used with HEAD. So it looks like there's a reasonable default value for that option that is missing. This should be pretty painless to fix, (no user retraining required): Proposal 1: Make diff-index use HEAD if no is specified With this proposal, my most-common command now shrinks to: git diff-index --cached (And I was quite surprised to just learn that the shortcut version of "git diff --cached" already does default to HEAD rather than calling git-diff-files and erroring out on the unknown --cached argument. That's handy, even if a bit unexpected from the documentation of "git diff".) So, my common command is a bit shorter, but I'd like to shrink it more, and I still haven't addressed my which-diff-command-do-I-want confusion. First, I see a potential problem in the use cases table above. The "git diff" command is taught in the tutorial as a way to preview what will be committed by "commit -a". But I think this lays a trap for git newcomers. If "git diff" is learned as a commit preview, (during a larval index-unaware stage), then this behavior will have to be unlearned when the user starts using the index. At that point "git diff" becomes a way to examine what will *not* be committed by "git commit" rather than what *will* be committed by "git commit -a". This seems an unkind thing to do to new users. Instead, users of "commit -a" should be provided with a HEAD->files diff operation for previewing commits. That's what they really want to see, (and not the index->files diff that only happens to match in the case they haven't manually dirtied the index). So the 'correct' preview command for "commit -a" is currently one of "diff HEAD" or "diff-index HEAD", and under Proposal 1 it would be "diff-index". So, without using the shortcut version, the tutorial's preview command is down to a single name "diff-index", but it is rather awkward to have a "-index" command in the index-avoidance stage of the tutorials. And I think this goes toward my which-diff confusion. Consider the three diff operations that allow for investigating un-committed trees: -> index diff-index --cached index -> files diff-files -> files diff-index Here, "diff-index" is the only one of the three commands that does not operate on the index. One can notice a similar thing in the ASCII diagram from the core tutorial where diff-index is the only one of these three operations that completely bypasses the index (!). All that just to say that there's some inconsistent naming in the core commands. This is papered over somewhat by the "git diff" shortcuts but at the same time that also adds even _more_ diff commands for people like me to have to learn. So, here, finally is a proposal to change the names of some diff commands. I expect this to be more controversial than proposal #1 as it is sure to run up against ingrained muscle memory in some cases. But I've tried to minimize that as much as possible, and I hope it will be workable. Proposal #2: Provide the following 4 diff commands: Operation (from -> to) Proposal ----------------------- -------- diff -> diff diff -> index diff-index (default to HEAD) diff index -> files diff diff -> files diff-files (default to HEAD) The goal here is that using diff, diff-index, and diff-files without any tree argument and without any options (such as --cached) should cover the most common cases. So there's less typing in general. These 3 diff commands can be presented as fundamental, and usable without needing a layer of sugar above. So there is already less to learn.n Also, diff-index and diff-files have parallel structure and naming, each performing a diff from HEAD or the given tree to either the current index or files, respectively. So the names should also be easier to learn. Under this proposal, my "git commit" preview becomes: git diff-index and the tutorial's "git commit -a" preview becomes: git diff-files which looks pretty nice to me [*]. Let's examine the impact this proposal would have on the existing core and shortcut diff commands. Here's an explanation of the Notes column entries below: "no change": The existing command is provided in an identical form under the proposal. "compatible": The existing command will continue to function identically for backwards compatibility with muscle memory. But some things (such as --cached options) will simply be ignored and "unadvertised" under the new proposal incompatible[*]:The proposal is not compatible with existing usage, so some amount of retraining will be needed. In each case, I've made notes on ways that might make this bearable. Current core command After proposal Notes -------------------- -------------- ----- diff-tree diff compatible diff-index --cached diff-index compatible diff-files diff incompatible[1] diff-index diff-files incompatible[2] [1] Existing "diff-files" is index->files while the proposed "diff-files" would be HEAD->files. Fortunately the retraining here is to a simpler command ("diff") which already exists. Hopefully, current users already prefer the simpler command anyway. [2] Existing "diff-index " is ->files while the proposed "diff-index " would be ->index. Fortunately the existing "diff-index " has an existing shortcut as "diff " which can be maintained compatibly. So hopefully, current users already prefer the simpler command anyway. Otherwise, retraining for this command will involves an index->files substitution, but hopefully the consistent naming under the new proposal will help here. Current shortcut command After proposal Notes ------------------------ -------------- ----- diff diff no change diff --cached diff-index compatible [3] diff diff no change diff diff-files compatible [3] [3] The proposal doesn't recommend any version of "git diff" with a single argument. Fortunately we can continue to provide compatible support for both such existing uses since they differ based on the presence or absence of the --cached option. Anyway, that's a proposal for some diff commands if we had the opportunity to do it from scratch. I'm not a trained user of git that would be impacted by this change, so I can't make any fair comment on whether the change would be worth making or not. But I would definitely be interested in hearing what existing users of git think of the idea. And of course, I am glad to fix up the implementation and all the documentation as necessary to implement this proposal if people think it's a good idea. -Carl [*] It's not the original topic of this post, but now that I've finished this, I realize that if the diff proposal were implemented then "commit-files" would make a dandy replacement for "commit -a". That could lead to finally providing the parallel preview commands I originally wanted: git diff-index # as preview for git commit-index and: git diff-files # as preview for git commit-files Then "git commit" would just be a shortcut for git commit-index. (Oh, and that would also lead to a natural "git ci" abbreviation too, if desired. This would parallel the "ci == checkin" abbreviation that some other systems provide.) I think the separate notions of commit-index and commit-files would do a good job of allowing for simple tutorials, (eliminates the "what the heck is -a all about?" questions), that also don't contribute to general index-unawareness lead to later index-confusion as the current "git diff; git commit -a" does. This might even lead to a natural distinction between "git status-index" and "git status files" too. --pgp-sign-Multipart_Wed_Feb__8_16:29:36_2006-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQBD6oz46JDdNq8qSWgRAmjAAJ4yMKa6W1Y4gjklal+jog/kp0dBQwCcD7PY gWuIsgR11RIUhk3YEOOjf8s= =UYUb -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Feb__8_16:29:36_2006-1--