[RFC] Two conceptually distinct commit commands

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Two conceptually distinct commit commands
@ 2006-12-04 19:08 Carl Worth
  2006-12-04 20:10 ` Carl Worth
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-04 19:08 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 10990 bytes --]

[
  I think the proposal below is original, and more correctly captures
  the essence of the "commit interface wart" than any previous
  proposal I've made. This proposal is also based entirely on what is
  useful for all git users, and what I perceive git's conceptual
  models to be. That is, this proposal concerns what _I_, (as a fairly
  experienced git user), actually want, without any bias for any
  assumptions about what an imagined "new user" might want. Notably,
  it does not try to satisfy naive (and likely incorrect) assumptions
  about git's model.

  Finally, this proposal intentionally uses ludicrously long command
  names. This is because a discussion of realistically short names
  triggers the two loaded issues of "muscle memory" and which concepts
  get blessed as "defaults". In previous threads, those issues have
  muddied the conceptual issues I'd like to focus on here. Let's talk
  about the concepts first, and save discussions of naming for later
  if necessary.
]

Proposal
-------
Here are the two commit commands I would like to see in git:

  commit-index-content [paths...]

    Commits the content of the index for the given paths, (or all
    paths in the index). The index content can be manipulated with
    "git add", "git rm", "git mv", and "git update-index".

  commit-working-tree-content [paths...]

    Commits the content of the working tree for the given paths, (or
    all tracked paths). Untracked files can be committed for the first
    time by specifying their names on the command-line or by using
    "git add" to add them just prior to the commit. Any rename or
    removal of a tracked file will be detected and committed
    automatically.

Rationale summary
-----------------
These two commands capture a distinct conceptual split that is useful
for what users want to do with git. The split is necessary and
sufficient to provide access to four different useful pieces of commit
machinery. This is more functionality than in current git, and is
provided with more clarity.

The semantics of the two commands above are distinct enough that any
given tutorial introduction to git could outline a complete work-flow
by using only one or the other of the two commands, (or by presenting
one first and then expanding to the other).

The conceptual split here is necessary. In general, neither of the two
commands can be defined in terms of the other. This is independent of
the fact that commit-index-content is more core and provides shared
machinery for commit-working-tree-content. It is also independent of
the fact that commit-working-tree-content _can_ be defined in terms of
commit-index-content in the special case of the "all tracked paths"
form.

The two-way split here is also sufficient. It provides access to four
different, and useful, pieces of commit machinery. Of the four, only
three of these pieces currently exist in git. The new behavior is that
of "commit-index-content paths..."  and is actually quite useful as
described in the detailed rationale below.

Finally, the two-way split here is simpler and more clear than the
three different commit commands currently provided by git, ("commit",
"commit paths...", and "commit -a"). The improved clarity comes from
taking advantage of the following standard command-line convention:

	If optional arguments are omitted from a command, the command
	is semantically equivalent to some default argument being
	provided.

This convention is standard across many unix commands and is prevalent
in git itself, (such as commands like git-log defaulting to HEAD when
no revision specifier is provided). Note that this convention is not
followed by the current git-commit. The behavior of "git commit" and
"git commit paths..." involve distinct semantics. It is not the case
that "git commit" is equivalent to "git commit paths..." with some
default argument supplied. Violating this command-line convention is
unkind in general, but it also steals "space" from the command-line
for implementing the semantics of "git commit" with the application of
a <paths...> limit. This is discussed in more detail below.

So, by cleanly separating the two different useful git-commit
behaviors, and applying a standard command-line convention, we end up
with more functionality and less to teach. What's not to love? All
that would be missing is to come up with names for the two
commands. As I promised above, I'm going to avoid proposing any
binding of the concepts to realistic names here, but I will point out
that one of the "names" might very well be a command-line option
alteration of the other command.

Rationale details
-----------------
Although the conceptual split is only two commands, the actual
implementation of this functionality breaks down into four separate
internal behaviors, (based on whether doing "given paths" or "all
tracked paths"). Three of the four exist in git already, while the
fourth is new, (and also useful). Let's review each of the four along
with the names that git currently provides for them:

1. commit-index-content		# all paths in the index

    This functionality currently exists as "git commit" and is the
    oldest and definitely the "most core" git commit command. Until
    fairly recently, all other git commit commands could easily be
    described as a variation of this functionality.

2. commit-index-content paths...

    This functionality does not currently exist in any git commit
    command, as far as I know. The behavior is to commit only a
    (path-based) subset of the content that has been staged into the
    index.

    I was originally just going to say that this functionality "might
    be useful in some cases", but coincidentally Alan Chandler
    happened to request it just yesterday on the list:

	I have been editing a set of files to make a commit, and after editing each
	one had done a git update-index.

	At this point I am just about to commit when I realise that one of the files
	has changes in it that really ought to be a separate commit.

	So effectively, I want to do one of three things

	a) git-commit <that-file>

    It's interesting to note that either of the two solutions
    suggested in response to Alan might not work in general. For
    example, "git reset", would not be a satisfactory solution if the
    user had dirty content in any of the affected files compared to
    what was staged in the index. Similarly, just removing the
    safety-valve on the existing "git commit <that-file>" would commit
    the wrong content if the working-tree contents of <that-file> were
    dirty with respect to the index.

    Now, it might still sound far-fetched to imagine wanting to commit
    a subset of something staged in the index while also having dirty
    content, but it occurs to me that I would actually _love_ to have
    this capability. The case I would use it for is fairly common,
    (and something that I think will speak to Junio who often brings
    up a similar scenario).

    Here's where I would like this functionality:

	I receive a patch while I'm in the middle of doing other work,
	(but with a clean index compared to HEAD, which is what I've
	usually). The patch looks good, so I want to commit it right
	away, but I do want to separate it into two or more pieces,
	(commonly this is because I want to separate the "add a test
	case demonstrating a bug" part from the "fix the bug"
	part). So, if I could do:

	git apply --index
	git commit-index-content <files that add the test case>
	git commit-index-content

	Then this would do exactly what I want. I wouldn't even have
	to think about whether my local modifications are to any of
	the same paths as touched by the patch.

    Today, in this scenario, what I have to do is to create a
    temporary branch with a clean working tree, and then use the index
    to stage the commit there. That process involves a few annoyances,
    (stashing my dirty work, inventing a free name for the temporary
    branch (which usually involves "git branch -D tmp"), switching back
    when I'm done, and trying to remember to clean up the branch). The
    new capability would let me skip _all_ of that overhead and
    instead I could just delight in the beauty and power of the
    index. Woo-hoo!

3. commit-working-tree-content		# all tracked files

    This functionality currently exists as "git commit -a" and, while
    not _really_ old in git's history, its invention predates my
    initial exposure to git. It has almost always been described in
    terms of its implementation, ("first update the index for all
    paths in the index, then commit that index").

    One benefit of this description is that it forces the user to
    learn about the index up front, (and gain a better understanding
    of git's model). One cost is that the user is forced to learn a
    two-stage implementation for a single-step process, (commit my
    changes). I won't try to weigh the costs/benefits here, but
    compare this to the description in (4) below.

4. commit-working-tree-content paths...

    This functionality currently exists as "git commit paths..." and
    is the newest variant of any git-commit command described here.

    I think the evolution of what the semantics of the "git commit
    paths..." command-line has been is very instructive. There was a
    time when this command could be described in terms of a two-stage
    manipulation of "the" index just like "commit -a" is described
    today. That is:

	Old: first update the index for all specified paths, then
	     commit the index".

    But then the semantics were changed and the new description does
    not involve the index at all:

	New: Commit only the files specified on the command line.

    The old behavior is still available with the --include option, but
    nobody has ever come out in favor of that being a useful command,
    (I agree it is not useful at all). Meanwhile, the new (default)
    behavior as been strongly identified by Linus as extremely
    useful. Junio has recently noticed that the old --index behavior
    is more conceptually consistent with the classic, commit-the-index
    definition of the core "git commit", but that's not sufficient
    justification for promoting functionality that would never be
    useful.

    So the evolution of the current "commit paths..." shows utility of
    functionality being a primary concern in defining the semantics of
    git commands. And that's wonderful.

In my opinion, what has happened with the evolution of "commit paths"
and "commit -a" is that a new conceptual commit behavior has been
invented, (what I've termed commit-working-tree-content), but it
hasn't been recognized yet as separate from the core
commit-index-content nature of "git commit". And there's some muddling
in that simply adding a <paths..> argument to "git commit" completely
changes its semantics, (which violates the command-line convention I
described above).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 19:08 [RFC] Two conceptually distinct commit commands Carl Worth
@ 2006-12-04 20:10 ` Carl Worth
  2006-12-04 21:19   ` Jakub Narebski
  2006-12-05  0:52 ` Horst H. von Brand
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Carl Worth @ 2006-12-04 20:10 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

> 	So, if I could do:
>
> 	git apply --index
> 	git commit-index-content <files that add the test case>
> 	git commit-index-content
>
> 	Then this would do exactly what I want. I wouldn't even have
> 	to think about whether my local modifications are to any of
> 	the same paths as touched by the patch.

BTW, the current "apply --index" doesn't allow what I imagined in the
scenario above. It notices that the affected file is different in the
working tree compared to the index and just refuses to do anything.

Given that safety-valve in git-apply, the current behavior of "git
commit paths" would allow for splitting a submitted patch into two
commits.

The difference is that it only works if the local modifications do not
affect any of the same paths as the patch. The user is freed from
worrying about this somewhat, since if it's not the case, then
git-apply just complains and doesn't do anything.

But what might be very interesting is a modified "git-apply --index"
that would not fail in this case, but would instead do the following:

	1. Apply the patch to the working tree

	2. Apply the patch to the index

And of course, if either fails then the entire apply operation fails,
leaving no changes to working tree or to the index.

With that new git-apply behavior, then the scenario I outlined above
would work, and would work in spite of any changes to the same file in
both the working tree and the index. It would also require the
separate commands for commit-index-content vs.
commit-working-tree-content as described in my original message above.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 20:10 ` Carl Worth
@ 2006-12-04 21:19   ` Jakub Narebski
  2006-12-05  2:36     ` Carl Worth
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Narebski @ 2006-12-04 21:19 UTC (permalink / raw)
  To: git

Carl Worth wrote:

> But what might be very interesting is a modified "git-apply --index"
> that would not fail in this case, but would instead do the following:
> 
>         1. Apply the patch to the working tree
> 
>         2. Apply the patch to the index
> 
> And of course, if either fails then the entire apply operation fails,
> leaving no changes to working tree or to the index.

Or even new option to git-apply, namely --index-only, which would apply
patch to index only.

BTW. With git-apply we have three possibilities to apply patch to:
HEAD, index and working tree version, and three possibilities to
store result in: working tree only, working tree and index both,
and index only.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 19:08 [RFC] Two conceptually distinct commit commands Carl Worth
  2006-12-04 20:10 ` Carl Worth
@ 2006-12-05  0:52 ` Horst H. von Brand
  2006-12-05  1:18   ` Carl Worth
  2006-12-05  1:19   ` Jakub Narebski
  2006-12-05  3:51 ` Theodore Tso
  2006-12-06  1:13 ` Junio C Hamano
  3 siblings, 2 replies; 18+ messages in thread
From: Horst H. von Brand @ 2006-12-05  0:52 UTC (permalink / raw)
  To: Carl Worth; +Cc: git

Carl Worth <cworth@cworth.org> wrote:

[...]

> Proposal
> -------
> Here are the two commit commands I would like to see in git:
> 
>   commit-index-content [paths...]
> 
>     Commits the content of the index for the given paths, (or all
>     paths in the index). The index content can be manipulated with
>     "git add", "git rm", "git mv", and "git update-index".
> 
>   commit-working-tree-content [paths...]
> 
>     Commits the content of the working tree for the given paths, (or
>     all tracked paths). Untracked files can be committed for the first
>     time by specifying their names on the command-line or by using
>     "git add" to add them just prior to the commit. Any rename or
>     removal of a tracked file will be detected and committed
>     automatically.

Edit somefile with, e.g, emacs: Get backup called somefile~
Realize that somefile is nonsense, delete it(s edited version)
commit-working-tree-contents: Now you have the undesirable somefile~ saved

Edit somefile, utterly changing it: Get backup called somefile~
mv somefile newfile
commit-working-tree-contents: somefile~ saved, newfile lost

Edit somefile a bit, move it to newfile. Make sure no backups left over.
commit-working-tree-contents: somefile deleted, newfile lost

This is /not/ easy to get right, as it depends on what the user wants, and
the random programs run in between git commands.

You need to tell git somehow what files you want saved, and which ones are
junk. I.e., just the first command (unfortunately).
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  0:52 ` Horst H. von Brand
@ 2006-12-05  1:18   ` Carl Worth
  2006-12-05  2:14     ` Horst H. von Brand
  2006-12-05  1:19   ` Jakub Narebski
  1 sibling, 1 reply; 18+ messages in thread
From: Carl Worth @ 2006-12-05  1:18 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2956 bytes --]

On Mon, 04 Dec 2006 21:52:38 -0300, "Horst H. von Brand" wrote:
> >     Commits the content of the working tree for the given paths, (or
> >     all tracked paths). Untracked files can be committed for the first
> >     time by specifying their names on the command-line or by using
> >     "git add" to add them just prior to the commit. Any rename or
> >     removal of a tracked file will be detected and committed
> >     automatically.
>
> Edit somefile with, e.g, emacs: Get backup called somefile~
> Realize that somefile is nonsense, delete it(s edited version)
> commit-working-tree-contents: Now you have the undesirable somefile~ saved

The semantics I intended to describe for commit-working-tree-content
would not add this file. That's a "new file" so would have to be
mentioned either explicitly on the command-line or in a git-add
command before it would be committed.

> Edit somefile, utterly changing it: Get backup called somefile~
> mv somefile newfile
> commit-working-tree-contents: somefile~ saved, newfile lost

OK, you've found a bug in my description above, (though not in the
intended semantics). By "rename...detected automatically" I meant only
that the fact that a file has "disappeared" as part of a rename need
not be mentioned to git. The fact that the contents are made available
as a new file name still would need to be told to git with "git add",
(or would be worthwhile to mention "git mv" I suppose).

> This is /not/ easy to get right, as it depends on what the user wants, and
> the random programs run in between git commands.
>
> You need to tell git somehow what files you want saved, and which ones are
> junk. I.e., just the first command (unfortunately).

Perhaps I was too oblique in calling this thing
commit-working-tree-contents. This isn't some fabricated-from-scratch
command. The intent of my message was that readers would recognize the
description as matching what the current "commit -a" and "commit
files..."  commands do. So I really wasn't trying to invent anything
really different than those. So almost any problems of unexpected
behavior you can find almost surely apply to "commit -a" already.

I did throw one new thing into the description, (that does not exist
in current git). That's the mention that new files could be added by
mentioning them explicitly on the command-line. This was intended as a
way to allow a tutorial to sidestep the details of how "git add"
interacts with the index. If this one feature is a bad idea, it could
be dropped with no impact on the rest of the proposal nor my
discussion of it.

Similarly, I worded the mention of "git add" to suggest it be done
"just prior to the commit". Again, I did this just to avoid having to
mention anything about the need to "git add" again if the file was
edited between the time of add and the time of commit. That language
is already proposed for the git-add documentation, so there's no need
to repeat it all here.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  0:52 ` Horst H. von Brand
  2006-12-05  1:18   ` Carl Worth
@ 2006-12-05  1:19   ` Jakub Narebski
  1 sibling, 0 replies; 18+ messages in thread
From: Jakub Narebski @ 2006-12-05  1:19 UTC (permalink / raw)
  To: git

Horst H. von Brand wrote:

> Carl Worth <cworth@cworth.org> wrote:
> 
> [...]
> 
>> Proposal
>> -------
>> Here are the two commit commands I would like to see in git:
>> 
>>   commit-index-content [paths...]
>> 
>>     Commits the content of the index for the given paths, (or all
>>     paths in the index). The index content can be manipulated with
>>     "git add", "git rm", "git mv", and "git update-index".
>> 
>>   commit-working-tree-content [paths...]
>> 
>>     Commits the content of the working tree for the given paths, (or
>>     all tracked paths). Untracked files can be committed for the first
>>     time by specifying their names on the command-line or by using
>>     "git add" to add them just prior to the commit. Any rename or
>>     removal of a tracked file will be detected and committed
>>     automatically.
> 
> Edit somefile with, e.g, emacs: Get backup called somefile~
> Realize that somefile is nonsense, delete it(s edited version)
> commit-working-tree-contents: Now you have the undesirable somefile~ saved

No, you don't, assuming that you have *~ in .gitignore or .git/info/exclude

> Edit somefile, utterly changing it: Get backup called somefile~
> mv somefile newfile
> commit-working-tree-contents: somefile~ saved, newfile lost

No, assuming that you use git-mv as you should.

> Edit somefile a bit, move it to newfile. Make sure no backups left over.
> commit-working-tree-contents: somefile deleted, newfile lost

No, as above.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  1:18   ` Carl Worth
@ 2006-12-05  2:14     ` Horst H. von Brand
  2006-12-05  2:32       ` Carl Worth
  0 siblings, 1 reply; 18+ messages in thread
From: Horst H. von Brand @ 2006-12-05  2:14 UTC (permalink / raw)
  To: Carl Worth; +Cc: Horst H. von Brand, git

Carl Worth <cworth@cworth.org> wrote:
> On Mon, 04 Dec 2006 21:52:38 -0300, "Horst H. von Brand" wrote:
> > >     Commits the content of the working tree for the given paths, (or
> > >     all tracked paths). Untracked files can be committed for the first
> > >     time by specifying their names on the command-line or by using
> > >     "git add" to add them just prior to the commit. Any rename or
> > >     removal of a tracked file will be detected and committed
> > >     automatically.

> > Edit somefile with, e.g, emacs: Get backup called somefile~
> > Realize that somefile is nonsense, delete it(s edited version)
> > commit-working-tree-contents: Now you have the undesirable somefile~ saved

> The semantics I intended to describe for commit-working-tree-content
> would not add this file. That's a "new file" so would have to be
> mentioned either explicitly on the command-line or in a git-add
> command before it would be committed.

How do you distinguish a "new file, same contents as old file" from "old
file, renamed"? What is the difference between:

  mv somefile newfile

and

  cp somefine newfile
  rm somefile

?

How should

  cp somefile newfile
  vi somefile

be handled? How about

  cp somefile oldfile
  vi somefile

or just

  mv somefile oldfile

? Or

  cp somefile somefile.my-own-bakup
  vi somefile

?

The whole problem is your description based on "file renaming" and
such. AFAIU git has a list of file names it is tracking, and for those
names it keeps track of what the contents for each are at each commit. That
the name somefile had some contents that later show up as newfile (both
names tracked) is recorded just as that. You could /interpret/ this as a
"rename" if somefile is then gone, but it could well be something else.
Besides, you'd have to search for the old somefile contents among /all/
newfiles just to find out it was renamed. Better don't mix facts with
interpretation (== guesses on what operations came in between the snapshots
git takes). Note that it should never matter what strange ideas a random
user gets for naming her temporary backup files, or their git configuration.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  2:14     ` Horst H. von Brand
@ 2006-12-05  2:32       ` Carl Worth
  0 siblings, 0 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-05  2:32 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 761 bytes --]

On Mon, 04 Dec 2006 23:14:07 -0300, "Horst H. von Brand" wrote:
> What is the difference between:
>   mv somefile newfile
> and
>   cp somefine newfile
>   rm somefile

There is no difference. This is git, a content tracker.

Same for the rest.

> The whole problem is your description based on "file renaming" and
> such.

OK. Strike the words "or rename" from the description, leaving just:

	Any removal of a tracked file will be detected and committed
	automatically.

The rest of my analysis still stands, I believe. And I'd be glad to
accept further suggestions on documenting these. The goal is simply to
have a user-oriented description of the semantics that are consistent
between the current "git commit -a" and "git commit files..."
commands.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 21:19   ` Jakub Narebski
@ 2006-12-05  2:36     ` Carl Worth
  0 siblings, 0 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-05  2:36 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 325 bytes --]

On Mon, 04 Dec 2006 22:19:55 +0100, Jakub Narebski wrote:
> Or even new option to git-apply, namely --index-only, which would apply
> patch to index only.

Wouldn't that leave me in a state where my working-tree would be all
set for reverting the patch? Can you imagine a scenario where that
would actually be useful?

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 19:08 [RFC] Two conceptually distinct commit commands Carl Worth
  2006-12-04 20:10 ` Carl Worth
  2006-12-05  0:52 ` Horst H. von Brand
@ 2006-12-05  3:51 ` Theodore Tso
  2006-12-05  6:33   ` Junio C Hamano
  2006-12-05  6:38   ` Carl Worth
  2006-12-06  1:13 ` Junio C Hamano
  3 siblings, 2 replies; 18+ messages in thread
From: Theodore Tso @ 2006-12-05  3:51 UTC (permalink / raw)
  To: Carl Worth; +Cc: git

On Mon, Dec 04, 2006 at 11:08:22AM -0800, Carl Worth wrote:
>
> Here are the two commit commands I would like to see in git:
> 
>   commit-index-content [paths...]
> 
>     Commits the content of the index for the given paths, (or all
>     paths in the index). The index content can be manipulated with
>     "git add", "git rm", "git mv", and "git update-index".
> 
>   commit-working-tree-content [paths...]
> 
>     Commits the content of the working tree for the given paths, (or
>     all tracked paths). Untracked files can be committed for the first
>     time by specifying their names on the command-line or by using
>     "git add" to add them just prior to the commit. Any rename or
>     removal of a tracked file will be detected and committed
>     automatically.

I think this is a very interesting proposal, although I think I
disagree with the last part:

      Any [rename or] removal of a tracked file will be detected and
      committed automatically.

If adds aren't going done automatically (because otherwise you have
problems with foo.c~ accidentally getting checked it), then it's
non-symmetric to expect that deletes will also happen automatically.
It's relatively rare that files are removed or renamed, and sometimes
files accidentally disappear.  

So in the case where there are no pathnames given to "git
commit-working-tree-content", I would argue that it does not do any
implicit "git add" on new files NOR any implicit "git rm" on missing
files unless the user actually specifies an --implicit-add or
--implicit-delete option, respectively.  If users want to make
--implicit-add and/or --implicit-delete the default, that could be a
configuration option, but I don't think it should be a default.

A second issue which you left unspecified is what should
commit-working-tree-content do if the index != HEAD.  In particular,
in this case:

edit foo.c
git update-index
edit foo.c
git commit-working-tree-content foo.c

What should happen to foo.c in the index?  Should it be stay the same?
Should the contents be replaced with version of foo.c that has just
been commited?  The latter seems to make sense, but runs the risk of
losing the data (what was in the index).  The former has the downside
that the index might have a version of foo.c which is older than what
has been just commited, which could be confusing.  Or should git
commit-working-tree abort with an error message if index != HEAD?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  3:51 ` Theodore Tso
@ 2006-12-05  6:33   ` Junio C Hamano
  2006-12-05  6:38   ` Carl Worth
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-12-05  6:33 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git, Carl Worth

Theodore Tso <tytso@mit.edu> writes:

> A second issue which you left unspecified is what should
> commit-working-tree-content do if the index != HEAD.  In particular,
> in this case:
>
> edit foo.c
> git update-index
> edit foo.c
> git commit-working-tree-content foo.c
>
> What should happen to foo.c in the index?  Should it be stay the same?
> Should the contents be replaced with version of foo.c that has just
> been commited?  The latter seems to make sense, but runs the risk of
> losing the data (what was in the index).  The former has the downside
> that the index might have a version of foo.c which is older than what
> has been just commited, which could be confusing.  Or should git
> commit-working-tree abort with an error message if index != HEAD?

That is exactly the "'commit --only' jumps the index" issue.

Updating the index with what is committed makes sense because
the commit after this --only commit happens builds on top of it,
and not doing so would mean the change to foo.c would be
reverted.  As you mentioned above, updating the index with the
committed version of foo.c means information loss of what was
staged earliser, and the traditional behaviour has been to
"abort with an error if index != HEAD" at that path, which was a
safety valve.

However, In the recent discussion, everybody (Linus, Nico, and I
included) seems to think this information loss is acceptable and
in fact is even useful.  I've sent a patch to remove the
obsolete safety valve for comments today, but haven't applied it
to any of my public branches yet, but most likely I will, and it
will happen sooner with encouragement from the list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-05  3:51 ` Theodore Tso
  2006-12-05  6:33   ` Junio C Hamano
@ 2006-12-05  6:38   ` Carl Worth
  1 sibling, 0 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-05  6:38 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]

On Mon, 4 Dec 2006 22:51:23 -0500, Theodore Tso wrote:
>
> If adds aren't going done automatically (because otherwise you have
> problems with foo.c~ accidentally getting checked it), then it's
> non-symmetric to expect that deletes will also happen automatically.
> It's relatively rare that files are removed or renamed, and sometimes
> files accidentally disappear.

It's non-symmetric, yes, but it's what I would personally like. It's
not an essential aspect of the proposal, so it could go either way as
the git crowd decides.

To explain my personal preference, I like the notion of all files
being "untracked" until I inform the system about their
existence. After that, I'd like the system to take care of them and
notice when they get modified, or when they get deleted.

> So in the case where there are no pathnames given to "git
> commit-working-tree-content", I would argue that it does not do any
> implicit "git add" on new files NOR any implicit "git rm" on missing
> files unless the user actually specifies an --implicit-add or
> --implicit-delete option, respectively.  If users want to make
> --implicit-add and/or --implicit-delete the default, that could be a
> configuration option, but I don't think it should be a default.

The ability to configure --implicit-delete and --implicit-add to
git-commit seems good. They're long enough arguments that

> What should happen to foo.c in the index?  Should it be stay the same?
> Should the contents be replaced with version of foo.c that has just
> been commited?  The latter seems to make sense, but runs the risk of
> losing the data (what was in the index).  The former has the downside
> that the index might have a version of foo.c which is older than what
> has been just commited, which could be confusing.  Or should git
> commit-working-tree abort with an error message if index != HEAD?

This case is already under debate in a separate thread. There "git
commit files", (which really is commit-working-tree-content already),
currently errors out in this case, but the proposal is to allow it to
proceed with the commit, (thereby "losing" the intermediate staged
content).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-04 19:08 [RFC] Two conceptually distinct commit commands Carl Worth
                   ` (2 preceding siblings ...)
  2006-12-05  3:51 ` Theodore Tso
@ 2006-12-06  1:13 ` Junio C Hamano
  2006-12-06  4:53   ` Carl Worth
  3 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-12-06  1:13 UTC (permalink / raw)
  To: Carl Worth; +Cc: git

Carl Worth <cworth@cworth.org> writes:

>     ... The case I would use it for is fairly common,
>     (and something that I think will speak to Junio who often brings
>     up a similar scenario).
>
>     Here's where I would like this functionality:
>
> 	I receive a patch while I'm in the middle of doing other work,
> 	(but with a clean index compared to HEAD, which is what I've
> 	usually). The patch looks good, so I want to commit it right
> 	away, but I do want to separate it into two or more pieces,
> 	(commonly this is because I want to separate the "add a test
> 	case demonstrating a bug" part from the "fix the bug"
> 	part).

(This is offtopic)

I often faced situations like that during git.git history.  One
patch to expose the bug in the existing code, and another to fix
it.  And there are three ways to make that commit.

 (1) one commit exposes, then another fixes.
 (2) one commit fixes, then another verifies the bug is no more.
 (3) one commit to include both.

In my experience, (1) is only useful during the time I am coming
up with the fix (if I am fixing it myself) or during the time I
am reviewing and committing the fix (if I am applying somebody
else's patch).  Committing in that order lets me validate the
brokenness after making the first commit, and then lets me feel
good by not seeing that problem after the second commit.  But
this means I deliberately record a state that is known not to
pass the test, which means it is a problem for somebody else in
the future when the history needs to be bisected to hunt for an
unrelated bug.  If the "test" is just an optional test in the
test suite, then it is easy to work around (the person who is
bisecting can ignore that bug by not running that particular
test), but if it is an assert somewhere deep inside the code,
ignoring it is not very easy, especially if the person who is
bisecting is not familiar with that part of the code.

What I recommend people to do these days is either (2) or (3),
but do so _after_ verifying the fix in the reverse order.  The
criteria to choose between (2) or (3) is fairly simple: if the
"test" is easily separable (e.g. changes to a test script file
that does not overlap with the "fix" patch), roll both in one
commit.  Then it would not later cause problems for bisection.

Enough of offtopic.

The sequence to split a patch in place would be (I'll speak in
the present tense and pretend Nico's "git add" does not exist
yet):

	git apply
        git update-index <files for the first batch>
        git commit
        git commit -a ;# the remainder

so you do not necessarily need a new "concept".  It is
inconvenient that you need to deal with new files, but that is a
minor detail compared to a bigger problem I'll mention in the
next paragraph.

I think the problem with your thinking is that you still are
talking from "file boundary matters" point of view.  The above
sequence is only useful if the patch to be split is separable
cleanly at the file boundary, in which case, I would (and I've
done so often) split the patch file in my editor and run two
independent "git apply + git commit" sequences.  That way, I
could test each in isolation (perhaps with some dirty working
tree state if I do not stash them away and do reset --hard).
Anything more realistic and practically useful would require
splitting of the patch in semantic ways regardless of file
boundaries.

As I have already said (and you seemed to share the same
discipline), I do not like people committing anything
non-trivial that is not tested.  The patch you received might
not have been tested by the submitter, but there is a chance
that it might have been ;-).  But with the way you said you want
to make the commits in the message I am responding to, the first
commit would never have been tested by anybody in isolation, not
by the original submitter even if he tested the patch before
giving it to you, nor you -- your working tree had either none
of his patch or all of it, and never was in the state with only
the first batch.

So while at the theoretical level I understand what you would
want to achieve with the "single patch that should have been
sent as two patch series" example, from the practical point of
view I do not see much value in it (because "file boundary
matters" is a minority case that is not very interesting), and
from the discipline point of view I would rather not want to
have such a too-convenient way to commit things that were
different in nontrivial ways from what you had in your working
tree (if we use something like Darcs record to update
hunk-by-hunk).

>     The old behavior is still available with the --include option, but
>     nobody has ever come out in favor of that being a useful command,

That is a slight overstatement.  When committing with paths
argument to conclude a merge, --include semantics is the only
variant that makes sense (--only semantics is so wrong that
there is a safety valve that catches it).  Most of the time,
however, if I need to resolve and record a complicated merge, I
would either do "update-index" to clear the deck of the paths
that I already dealt with, and by the time I would type "git
commit", I have an index that has exactly what I want in the
merge commit.  That makes --include a less often used form.  If
a merge is small and easy to resolve at only a few paths, it
still is handy to say "git commit -i resolved-path.c".  It does
not add anything to the semantics -- it is only a typesaver.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-06  1:13 ` Junio C Hamano
@ 2006-12-06  4:53   ` Carl Worth
  2006-12-06  9:54     ` Commit order in git.git, was " Johannes Schindelin
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-06  4:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 5685 bytes --]

On Tue, 05 Dec 2006 17:13:20 -0800, Junio C Hamano wrote:
> (This is offtopic)

It's an interesting topic nonetheless, so I'll comment anyway.

>  (1) one commit exposes, then another fixes.
>  (2) one commit fixes, then another verifies the bug is no more.
>  (3) one commit to include both.

I feel very strongly that I want (1) in the history.

> unrelated bug.  If the "test" is just an optional test in the
> test suite, then it is easy to work around (the person who is
> bisecting can ignore that bug by not running that particular
> test), but if it is an assert somewhere deep inside the code,
> ignoring it is not very easy, especially if the person who is
> bisecting is not familiar with that part of the code.

Granted, something like an assert that breaks the library would not be
a useful thing to have in the history. I'm certainly not in favor of
something like that.

I'm talking about tests that demonstrate pre-existing broken-ness in
the code. In the case of cairo, our test suite is entirely optional,
and each test is out-of-process, so even if a test totally crashes the
suite continues and it's easy to ignore that.

But it's not just the correctness test suite. We also have a
performance test suite, and I encourage the same "add test case, then
performance fix" pattern there. This shows an even more obvious
example of why it's useful to have separate commits in the history,
(since someone may want to verify the performance impact on a separate
system at any point in the future, and the two commits makes it easy
to get "before" and "after" for the performance fix results from the
new test case).

> The sequence to split a patch in place would be (I'll speak in
> the present tense and pretend Nico's "git add" does not exist
> yet):
>
> 	git apply
>         git update-index <files for the first batch>
>         git commit
>         git commit -a ;# the remainder

Yes, I can use this today, and I do, (as I mentioned in my mail). The
only requirement is that I start with a non-dirty working tree. I can
arrange that, but it would be just a bit less inconvenient if I didn't
have to.

> so you do not necessarily need a new "concept".

No, I don't need it. And this "commit-index-content [paths...]" was
the least significant part of my proposal. As I said, originally I was
just going to say this "might be useful in some cases", but then
someone just happened to request this feature on the list at the same
time I was considering the proposal.

Anyway, it spite of this being an accidental feature of y proposal, it
seems to be the only part you commented on. Even if this functionality
weren't made available at all, I'd still be interested in your
comments on the main thrust of my proposal. I think that consists of:

	1. Unifying the two current commands that provide
	   commit-working-tree-content semantics into a single,
	   use-oriented description.

	2. Avoiding a change of semantics triggered by merely applying
	   pathname arguments without any command-line option or
	   alternate command name.

> As I have already said (and you seemed to share the same
> discipline), I do not like people committing anything
> non-trivial that is not tested.

Indeed. I like that discipline very much. And in fact, that's an
important reason that I split a patch that I receive like this, (or
just bounce it, which I often do, and would definitely do if it's not
easy to split)

>                                 But with the way you said you want
> to make the commits in the message I am responding to, the first
> commit would never have been tested by anybody in isolation, not
> by the original submitter even if he tested the patch before
> giving it to you, nor you -- your working tree had either none
> of his patch or all of it, and never was in the state with only
> the first batch.

Who said I wouldn't test it? I do split commits like this precisely so
that I _can_ test it this way---and git helps a lot here. I do the
split commit, then easily back up to the revision that adds the test
case, verify the test fails before the bug fix, (which is something
the maintainer doesn't get a chance to do with your (2) approach),
then move forward and verify that the test passes after the fix.

So, sure, I haven't ever had that working tree before the commit. But
git makes it easy to get that working tree after I commit and test
everything before I push anything out.

[Incidentally, that's yet _another_ thing people can mention to
friends who come from cvs and think that the separation of commit and
push is annoying. And that's in addition to all of the performance
advantages, the ability to work entirely offline, etc. etc.]

> >     The old behavior is still available with the --include option, but
> >     nobody has ever come out in favor of that being a useful command,
>
> That is a slight overstatement.

OK. I should have worded that as "I wasn't aware of any arguments...".

>                That makes --include a less often used form.  If
> a merge is small and easy to resolve at only a few paths, it
> still is handy to say "git commit -i resolved-path.c".  It does
> not add anything to the semantics -- it is only a typesaver.

Oh, OK. I see it now. That's for combining the update-index, (or
"resolve/resolved"---is there any consensus on that being a useful
synonym for update-index here?) and the commit into a single
command. I guess that's just a shortcut I've never used.

So, yes, that shortcut would not fit cleanly into either of my
proposed commit-working-tree-content nor commit-index-content. That
would still require a two-stage:

	git update-index resolved-path.c
	git commit-index-content

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Commit order in git.git, was Re: [RFC] Two conceptually distinct commit commands
  2006-12-06  4:53   ` Carl Worth
@ 2006-12-06  9:54     ` Johannes Schindelin
  2006-12-06 16:14     ` Carl Worth
  2006-12-06 18:31     ` Junio C Hamano
  2 siblings, 0 replies; 18+ messages in thread
From: Johannes Schindelin @ 2006-12-06  9:54 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, git

Hi,

On Tue, 5 Dec 2006, Carl Worth wrote:

> On Tue, 05 Dec 2006 17:13:20 -0800, Junio C Hamano wrote:
> > (This is offtopic)
> 
> It's an interesting topic nonetheless, so I'll comment anyway.
> 
> >  (1) one commit exposes, then another fixes.
> >  (2) one commit fixes, then another verifies the bug is no more.
> >  (3) one commit to include both.
> 
> I feel very strongly that I want (1) in the history.

Note that (1) maybe would reflect history better, but (2) and (3) are way 
nicer to bisecting.

I fell very strongly that I want (3) in the history.

(Though I am guilty of many instances of (2)...)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-06  4:53   ` Carl Worth
  2006-12-06  9:54     ` Commit order in git.git, was " Johannes Schindelin
@ 2006-12-06 16:14     ` Carl Worth
  2006-12-06 18:31     ` Junio C Hamano
  2 siblings, 0 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-06 16:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Tue, 05 Dec 2006 20:53:30 -0800, Carl Worth wrote:
> On Tue, 05 Dec 2006 17:13:20 -0800, Junio C Hamano wrote:
> 	2. Avoiding a change of semantics triggered by merely applying
> 	   pathname arguments without any command-line option or
> 	   alternate command name.

By the way, the original command-line convention I used in the
proposal was that the omission of an optional argument should be
equivalent to supplying some default argument. Here's another
convention that is also useful to examine:

	Adding path-name arguments limits the behavior of the command,
	(and does not otherwise change the semantics).

I don't know that this is as universal a convention outside of git,
but it's quite strong within git. The path name limiting exists in
deep parts of the machinery and allows for things like:

	git log -- paths...	# path-limited version of "git log"
	git diff -- paths...	# path-limited version of "git diff"
	etc.

It's interesting to look at how the various commit commands fit (or
do not fit) this convention:

  git commit paths...
  git commit --only paths...

	This command cannot be explained in terms of the semantics of
	"git commit" (without command-line options). This command
	_can_ be explained as a path-limited version of "git commit -a".

  git commit --include paths...

	This command does something _extra_ to the given paths before
	executing the equivalent of "git commit". I think this is a
	fairly unique violation of the path-limiting convention.

The proposal I made with commit-working-tree-content and
commit-index-content consistently follow the path-limiting convention.

I think consistency of command-line conventions like this are
important for making the tool usable. And there have been notable
improvements to consistency of convention in git recently, (for
example, using <since>..[<until>] in git-format-patch rather than
<his> <mine>).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-06  4:53   ` Carl Worth
  2006-12-06  9:54     ` Commit order in git.git, was " Johannes Schindelin
  2006-12-06 16:14     ` Carl Worth
@ 2006-12-06 18:31     ` Junio C Hamano
  2006-12-06 23:29       ` Carl Worth
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-12-06 18:31 UTC (permalink / raw)
  To: Carl Worth; +Cc: git, Johannes Schindelin

Carl Worth <cworth@cworth.org> writes:

> ... Even if this functionality
> weren't made available at all, I'd still be interested in your
> comments on the main thrust of my proposal. I think that consists of:
>
> 	1. Unifying the two current commands that provide
> 	   commit-working-tree-content semantics into a single,
> 	   use-oriented description.
>
> 	2. Avoiding a change of semantics triggered by merely applying
> 	   pathname arguments without any command-line option or
> 	   alternate command name.

I am not sure what needs to be commented on at this point, since
it is not yet clear to me where you want your proposal to lead
us.

I do not agree with your "three commands" or "two semantics"
characterization of the current way "git commit" works.  "git
commit" without any optional argument already acts as if a
sensible default arguments are given, that is "no funny business
with additional paths, commit just what the user has staged
already."

"git commit" is primarily about committing what has been staged
in the index, and "--all" is just a type-saver short-hand (just
like "--include" is) to perform update-index the last minute and
nothing more.  In other words, "--all" is a variant of the
pathname-less form "git commit".  It is not a variant of "git
commit --only paths..." form, as you characterized.

The pathname form (the "--only" variant) on the surface seem to
work differently, but when you think about it, it is not all
that different from the normal commit.  We explain that it
ignores index, but in the bigger picture, it does not really.

In this sequence:

	edit a b
	git update-index a
	git commit --only b
	git commit --all

the first commit does "jump" the changes already made to the
index, but after it makes the commit, the index has the same
contents as if you did "git update-index a b" where you ran that
"git commit".  In other words, it is just a handy short-hand to
pretend as if you did the above sequence in this order instead:

	edit a b
        git update-index b
        git commit
        git update-index a
        git commit

So I actually think it is a mistake to stress the fact that "git
commit --only paths..." seems to act differently from the normal
"git commit" too much.  It just helps to split the changes in
your working tree if the changes happen to be cleanly separable
at file boundaries (aka "CVS mentality").  When the changes are
not cleanly separable at file boundaries, the "more painfully
index aware" variant also allows you to split the changes in
your working tree in the time dimension:

        edit a
	git update-index a
        edit a
        git commit ;# without paths
	git update-index a
        git commit

In short, while I understand that your "proposal" shows your own
way to summarize the semantics of "git commit", I am not seeing
what it buys us, and I do not see the need to come up with a
pair of new two commands for making commits (if that is what the
proposal is about, that is, but it is not clear to me if that is
what you are driving at).  I think it would only confuse users.

> 	I receive a patch while I'm in the middle of doing other work,
> 	(but with a clean index compared to HEAD, which is what I've
> 	usually). The patch looks good, so I want to commit it right
> 	away, but I do want to separate it into two or more pieces,
> 	(commonly this is because I want to separate the "add a test
> 	case demonstrating a bug" part from the "fix the bug"
> 	part).
> ...
> Who said I wouldn't test it? I do split commits like this precisely so
> that I _can_ test it this way---and git helps a lot here. I do the
> split commit, then easily back up to the revision that adds the test
> case, verify the test fails before the bug fix, (which is something
> the maintainer doesn't get a chance to do with your (2) approach),
> then move forward and verify that the test passes after the fix.
>
> So, sure, I haven't ever had that working tree before the commit. But
> git makes it easy to get that working tree after I commit and test
> everything before I push anything out.

You saw a good patch in the middle of something that you did not
want to lose your working tree changes for.  That good patch was
not really good enough to be applied straight into your tree but
needed tweaking and splitting.  Nevertheless you went ahead and
made two commits out of that patch, even though you were in the
middle of something.  You could not test them right away after
committing because your tree was in no shape to test them in
isolation.  But that is excusable because you would not push
these commits out right away, before you have a chance to test
them by rewinding your working tree when you are done with what
you were originally doing.

Is it just me who finds the above a very much made-up example?

It means the patch (which is good and not good at the same time)
was not all that urgent after all, and it could well have waited
until you are done with what you were originally doing.

In any case, I should clarify my aversion to partial commits a
bit.  What is more important is to notice that, while you cannot
compile-and-run test what is in the index in isolation (without
a fuse that exports the index contents as a virtual filesystem
-- anybody interested?), you _can_ preview and verify the text
that is going to be committed by comparing the index and the
HEAD.  And for that, your "staging" action (i.e. Nico's "git
add") needs to be a separate step from your "committing" action.

In other words, I would even love Johannes's "per hunk commit"
idea, at least if it had an option to preview the whole thing
just one more time before committing, and I would love it better
if it had an option for not committing but just updating.  You
could:

	$ edit foo bar
	: the whole mess in working tree is in no shape to be committed.
        $ git add foo	;# stage the state of the entire file
        $ git hunk-add bar ;# go interactive and update index selectively
	$ git status -v	;# that is "git commit --dry-run --diff"

to review what would be committed.  So while the commit that
would be made may not be compile-and-run tested, I would not
mind partial commit that much (and after all not all the
projects that track their contents with git are not "compiled"
nor "need testing" projects -- they could be tracking plain text
documentation, and the last-minute eyeballing may be a good
enough test for such contents).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Two conceptually distinct commit commands
  2006-12-06 18:31     ` Junio C Hamano
@ 2006-12-06 23:29       ` Carl Worth
  0 siblings, 0 replies; 18+ messages in thread
From: Carl Worth @ 2006-12-06 23:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 12087 bytes --]

On Wed, 06 Dec 2006 10:31:21 -0800, Junio C Hamano wrote:
> I am not sure what needs to be commented on at this point, since
> it is not yet clear to me where you want your proposal to lead
> us.

Thanks for the comments you made here---that's the kind of thing I was
looking for.

As for where I'm trying to lead us, what I really want to do is to
help improve the learnability of git. A big part of that is about
improving the set of "use-oriented" documentation, (which describes
how to achieve tasks, as opposed to what might be termed "technically
oriented" documentation which describes how individual tools work). I
think too much of the existing documentation falls into the second
class.

A parallel thread is already talking about some of the important
organizational aspects of use-oriented documentation. And I agree with
that thread is that the short "attention span" is a primary
consideration for this kind of documentation. The user has a task to
be accomplish, and any text or concepts that aren't contributing to
the solution of that task should be eliminated.

Note that when I talk about eliminating unnecessary concepts, I do not
mean lying to the user about the underlying model or any concepts. We
can't have a sugar-coated tutorial that says one thing, and then
expect users to "unlearn" that if they go deeper into the reference
manual. That's a recipe for disaster.

Also, when I say "use-oriented" I'm not suggesting that the
documentation be shallow. It can go as deep as any workflow we care to
document and introduce whatever concepts of git are necessary to
support that workflow. (There is, though a level at which "technically
oriented" documentation is all that's needed, or even desired, and
that's when the documentation is targeting authors of interfaces that
build on top of git---not users trying to use git to get work done at
the command line).

OK, so if my concern is all about documentation, then what am I doing
proposing new commands or new ways of thinking about existing commands
rather than just sending documentation patches? The problem is that
the current semantics of the following variations of "git commit":

	git commit
	git commit -a
	git commit paths...

defeat the goal of writing good, clean use-oriented documentation. So
there's some adjustment that should be made first. And I don't even
care what the adjustment is, (for example, it doesn't have to be
"commit -a by default"), but please recognize the problem and help me
come up with an acceptable way to fix it.

To demonstrate, let's take the simplest of use cases and try to
document it in as clear a way as possible. Let's imagine we're in a
tutorial where we've just guided the user to making modifications to
several existing, tracked files, (starting from an initial clone, not
an init-db), and the next task to teach the user commit for the first
time. We would like to document both "commit a single modified file"
and "commit all modified files". Here are two approaches that I can
come up with:

1. Any commit involves first "add"ing together new content, and then
   committing the result. For example to commit a single file:

	git add file		# add new content from file
	git commit		# commit the result

   As a shortcut, "commit -a", (or --all) can be used to automatically
   "add" the content of all tracked files before the commit. So the
   common case of committing all tracked files is as easy as:

	git commit -a		# commit content of all tracked files

2. The new content of modified files can be committed by naming the
   files on the "git commit" command line. For example:

	git commit file		# commit new content of file

   As a shortcut, "commit -a", (or --all) can be used to commit the
   content of all tracked files:

	git commit -a		# commit content of all tracked files

Neither of the above is totally satisfactory.

In (1) the user is not presented with a framework that will make sense
of "git commit files...". The expansion of "-a" as "--all" could
easily give the user the impression that "git commit files..." is a
shortcut for "git add files...; git commit", but that's wrong and
could lead to unexpected results and confusion.

In (2) the user is not presented with a framework that will make sense
of "git commit" with no arguments. The user is left to wonder about
why the --all is needed and what it means exactly, (particularly since
"git commit" also commits the content of all tracked files.

Various fixes have been proposed for these potential confusions. For
example, making "git commit files..." default to the behavior of
--include instead of --only would eliminate the confusion I described
for (1). And making -a the default for "git commit" would eliminate
the confusion I described for (2).

However, actually implementing either of those fixes would then break
the initial "commit one file" example from the other approach. Because
of that, the conversation has often fallen into debate over whether
(1) or (2) is the "one true way" to describe git, and which one leads
the user to have an incorrect mental model.

But I think that debate is misguided since both descriptions are
worthwhile and valid. (1) is based around an explanation of what "git
commit" does, and (2) is based around an explanation of what "git
commit files..." does. And both of these commands are very useful
exactly how they are.

It's almost coincidental that "commit -a" fits in logically with
either description.

So what I was trying to get across in this latest thread is that git's
command-line interface already has two slightly different models for
what's going on in a commit. You don't agree with me on that point
yet, (more on that below in my reply).

I really don't care what the final fix is, but I would love to see
documentation with no more complexity than the above that accurately
captures the useful functionality.

And I don't actually have a concrete proposal for a fix yet---I was
just offering the commit-index-content and commit-working-tree-content
ideas as ways to think about the issue. Maybe the two documentation
blurbs above capture it in a better way.

Do you feel like you have a better understanding of what I'm trying to
do now?

> I do not agree with your "three commands" or "two semantics"
> characterization of the current way "git commit" works.  "git
> commit" without any optional argument already acts as if a
> sensible default arguments are given, that is "no funny business
> with additional paths, commit just what the user has staged
> already."

I agree that "git commit" does nothing funny by default. What I was
pointing out is that "git commit" and "git commit paths..." do not
have the same semantics. There's really nothing to debate about
there. There is no argument you can substitute for <paths...> to give
you identical behavior as "git commit". That's a fact.

> "git commit" is primarily about committing what has been staged
> in the index, and "--all" is just a type-saver short-hand (just
> like "--include" is) to perform update-index the last minute and
> nothing more.  In other words, "--all" is a variant of the
> pathname-less form "git commit".  It is not a variant of "git
> commit --only paths..." form, as you characterized.

I hope the documentation blurbs (1) and (2) above show how "commit -a"
can be seen as a variant of either "commit" or "commit files...",
(which themselves are both useful semantics, but demonstrably
distinct).

> The pathname form (the "--only" variant) on the surface seem to
> work differently, but when you think about it, it is not all
> that different from the normal commit.  We explain that it
> ignores index, but in the bigger picture, it does not really.

No, it really is different.

> the first commit does "jump" the changes already made to the
> index, but after it makes the commit, the index has the same
> contents as if you did "git update-index a b" where you ran that
> "git commit".  In other words, it is just a handy short-hand to
> pretend as if you did the above sequence in this order instead:

How could you document "git commit files..." as a shorthand? A
shorthand for what exactly? A shorthand for pretending you didn't just
type the commands you did type that got the index into its current
state, but had instead typed different commands before the commit and
other commands afterwards?

That's crazy. That's not a shorthand. That's just plain different
semantics. The current "git commit files..." command never does commit
the contents of "the" index as a concept presented by "git
commit". (This is independent of the fact that the implementation of
"git commit files..." certainly does use an index file somewhere and
uses it to create a commit object in the same way that "git commit"
uses "the" index).

> So I actually think it is a mistake to stress the fact that "git
> commit --only paths..." seems to act differently from the normal
> "git commit" too much.

I think that would be lying to the users and setting them up to get
confused later. I discussed this above as the confusion that can
result with the explanation of (1). If you teach "git commit" as
commiting "the" index, and de-emphasize that "git commit files..."
is semantically distinct, then how is a user ever supposed to learn
what it is that "git commit files..." is actually doing?

> In short, while I understand that your "proposal" shows your own
> way to summarize the semantics of "git commit", I am not seeing
> what it buys us, and I do not see the need to come up with a
> pair of new two commands for making commits (if that is what the
> proposal is about, that is, but it is not clear to me if that is
> what you are driving at).  I think it would only confuse users.

Forgive me again for being obtuse. I don't think we should necessarily
add two new commands. I was trying to illustrate a problem in the
existing command set, and propose a new way of thinking about the
tasks that the current commands help a user to perform, (committing
content from the working tree or committing content from the index). I
don't actually have a concrete proposal for how to take that way of
thinking and map it to a command set, (and one that would disrupt
current git users as little as possible). I'd love to have some help
with that part.

> Is it just me who finds the above a very much made-up example?

Fine. We can ignore that example.

> In any case, I should clarify my aversion to partial commits a
> bit.  What is more important is to notice that, while you cannot
> compile-and-run test what is in the index in isolation (without
> a fuse that exports the index contents as a virtual filesystem
> -- anybody interested?), you _can_ preview and verify the text
> that is going to be committed by comparing the index and the
> HEAD.  And for that, your "staging" action (i.e. Nico's "git
> add") needs to be a separate step from your "committing" action.

Yes, I often use the index as a place to preview things. And it is
true that I find myself using update-index when I could have used
"commit paths..." precisely because I can preview it once more. But I
do use the "commit paths..." form at times as well. If I have just
reviewed things in "git diff" and there are _really_ obviously
separable pieces I will commit them alone with staging into the index
and reviewing again.

It's probably the case that I skip the explicit staging and extra
preview when I can use a single pathname as the argument to "git
commit".

> In other words, I would even love Johannes's "per hunk commit"
> idea, at least if it had an option to preview the whole thing
> just one more time before committing, and I would love it better
> if it had an option for not committing but just updating.

Yes! I've wanted tools to help with per-hunk separation before, but
since I'm so likely to make mistakes while doing that I would only
want that to go into the index so that I could review it before
committing. I guess I might need a per-hunk way to fix up my mistakes
too if I put a hunk into the index that I didn't want to be there.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-12-06 23:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-04 19:08 [RFC] Two conceptually distinct commit commands Carl Worth
2006-12-04 20:10 ` Carl Worth
2006-12-04 21:19   ` Jakub Narebski
2006-12-05  2:36     ` Carl Worth
2006-12-05  0:52 ` Horst H. von Brand
2006-12-05  1:18   ` Carl Worth
2006-12-05  2:14     ` Horst H. von Brand
2006-12-05  2:32       ` Carl Worth
2006-12-05  1:19   ` Jakub Narebski
2006-12-05  3:51 ` Theodore Tso
2006-12-05  6:33   ` Junio C Hamano
2006-12-05  6:38   ` Carl Worth
2006-12-06  1:13 ` Junio C Hamano
2006-12-06  4:53   ` Carl Worth
2006-12-06  9:54     ` Commit order in git.git, was " Johannes Schindelin
2006-12-06 16:14     ` Carl Worth
2006-12-06 18:31     ` Junio C Hamano
2006-12-06 23:29       ` Carl Worth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).