Git development
 help / color / mirror / Atom feed
* Re: git and bzr
From: Jakub Narebski @ 2006-11-28 18:06 UTC (permalink / raw)
  To: git; +Cc: bazaar-ng
In-Reply-To: <456C7592.6020700@ableton.com>

Nicholas Allen wrote:

>> The reason this is a good example is simply the fact that it should 
>> totally silence anybody who still thinks that tracking file identities is 
>> a good thing. It explains well why tracking file identities is just 
>> _stupid_.
>
> I'm unfamiliar with git so I could be totally wrong here!
> 
> I know that bzr supports file renames/moves very effectively and I 

This means: _usually_ works, doesn't it? Emphasisis on "usually"?

> understood that git doesn't support this to the same extent (correct me 
> if I am wrong as I have not used git at all!).

Git supports renames/moves in different way. Instead of recording renames
(which has trouble on it's own, for example rename via applying patch)
in the repository it _detect_ renames when needed.
 
> If that is the case, could that be because bzr gives each file its own 
> id and can detect this easily but git's content based approach can't? If 
> so then claiming file identifiers is *stupid* seems a bit extreme. So I 
> would have thought *both* file identifiers and line/content identifiers 
> are needed for tracking changes made to the files and to their contents 
> respectively. When a file is copied then the contents are copied and it 
> is given a new file identifier. When a file is moved it keeps the same 
> identifier. So don't you need file identifiers as well as line/content 
> identifiers?

There are trouble with file-ids. Most common example is trouble with file
which was created in two branches (two repositories) independently, then
branches got merged. Most (all?) file-id based rename detection has trouble
with repeated merging of those branches, even if there are no true
conflicts.

Read Linus post about file-id based rename detection:
  Message-ID: <Pine.LNX.4.64.0610201049250.3962@g5.osdl.org>
  http://permalink.gmane.org/gmane.comp.version-control.bazaar-ng.general/18458

Not that contents based rename detection doesn have it's own pitfals:
  Message-ID: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>
  http://permalink.gmane.org/gmane.comp.version-control.git/31899
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Sven Verdoolaege @ 2006-11-28 18:08 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git
In-Reply-To: <Pine.LNX.4.64.0611281218290.20138@iabervon.org>

On Tue, Nov 28, 2006 at 12:28:47PM -0500, Daniel Barkalow wrote:
> It would be wrong to do "commit -a" in submodules if the supermodule 
> weren't being committed with -a, of course.

What if you say "git commit submodule" ?
I sure hope you wouldn't want to do a "commit -a" in the submodule.
One of the nice features of git is that you can still perform most
operations if you have a dirty state and I would very much want to
be able to commit only some changes in the submodule and then only
commit that change in submodule commits in the supermodule without
having my other changes in the submodule committed as well.

If you agree with the above, then why should "git commit -a"
do any different from "git commit submodule" if submodule was
the only thing that got changed ?


^ permalink raw reply

* Re: [PATCH 0/2] Making "git commit" to mean "git commit -a".
From: Carl Worth @ 2006-11-28 18:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vd5786opj.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 8660 bytes --]

On Mon, 27 Nov 2006 22:59:52 -0800, Junio C Hamano wrote:
> I've been playing with a "private edition" git to see how it
> feels like to use "git commit" that defaults to the "-a"
> behaviour, using myself as a guinea pig, for the rest of the
> evening.

Thanks already for the documentation improvements and the patches. I
will immediately start using these and use myself as a guinea pig as
well.

> Confession time.  I've had a "purist me" deep inside, who always
> thought that people who play contributor role (that is to say
> "99.9% of people") should make no commits other than the "-a"
> kind [*1*].
...
> *1* The reason to favor "-a" commit is not about hiding the
> index but about discipline.

I agree with your comments on discipline, (and honestly, I don't
really see why they wouldn't apply to anybody). It just plain makes
sense to commit code as it existed and as it has been tested.

And I think this is really the same motivation for the users whose
complaints I've been representing in this thread. I know people who
have read all of the "hide the index" debates on the git list and
still find the "staged commit" features of the index useless, (because
they already have the discipline of never committing a state that
didn't actually exist in their working tree).

> Judging from my experience so far, although I really wanted to
> like this, I am still hesitant to recommend this for inclusion.

I'm glad you were willing to try yourself out as a guinea pig on
this. That's definitely worthwhile. But I don't think your negative
experience here is good evidence against changing the default.

My proposal was not that old-time, index-loving git users should adapt
to a new default. I think that it should be made very straight-forward
for experienced users to drop in an alias or a configuration option
such that all the old defaults are preserved. With that, all of the
complaints you ran into, (which are all of the form "things act
differently than I'm used to"), go away.

> The problem I have with the new behaviour is that it goes
> against the mental model when I start doing anything nontrivial
> (I would not use words as strong as "totally breaks the mental
> model", but it comes close).  I am not sure how well I can
> express this, but the short of it is that "grokking index" is
> not about understanding how the index works, but about trusting
> that git does the right thing to the index and you do not have
> to worry about it all the time.

Frankly, I do not currently trust git to always do the right thing
with the index. Part of that is that some commands are inconsistent
with respect to updating the index or not. For example, the following
two operations:

	git cherry-pick -n <something>
	git am < something

are conceptually very similar, (apply some change without creating a
new commit), but the first updates the index and the second does
not. (This is something you already pointed out in your message and
said that perhaps "apply --index" should be the default. I'll come to
a different conclusion below.)

So things like "git diff" and others work very differently in the
above two situations, and the user has to stay well-aware of what's
happening in the index or not. So I do find myself having to "worry
about it all the time".

Another example is how to "undo" a modification of a file such that it
is restored to its state as in the last commit. I'd like to be able to
teach users a single, reliable command for operations like this. It
would be tempting to just say:

	git checkout some/file

which will often work, but not in the case of an updated index,
(whether manual or due to something like "cherry-pick -n" or an
in-progress merge). In those cases various suggestions might be
offered such as:

	git reset
	git checkout some/file

or:

	git cat-file -p HEAD:some/file > some/file

at which point we can send users screaming again. (I think there's yet
another option that was discussed on the list recently, but if I
recall correctly, it involved an even more obscure option to some git
command than any of the above).

So a simple operation like this "undo" requires the user to understand
the index and adapt the workflow based on its state. But there's no
advantage being offered to the user at all in a case like this. (And
whether the change being undone came through something like
"cherry-pick -n" or "git-am" is totally irrelevant to the work the
user is attempting to get (un)done).

All of the above is just to point out that there are times when the
notion of the index does get in the way. The user has to mentally
track what's happening in the index even when there's no advantage. My
goal is to reduce the set of operations where the user is forced to do
that.

If users want to take advantage of the index, then by all means, it's
there and can be taken advantage of. And when the index does its job
of taking care of things so the user doesn't have to think about it,
that's definitely a good thing.

> The same thing can be said about "git merge" (or "git pull .").
> The index is updated for cleanly merged paths so I do not have
> to worry about the details -- the only thing I have to know is
> that index keeps track of the state and cleanly merged paths are
> taken care of for me automatically, so I do not have to worry
> about them.  "git diff" and "git ls-files -u" will give me
> conflicting paths and I can only concentrate on them.

Sure. The behavior of "git diff" during a conflicted merge is actually
quite intuitive. And that's even intuitive to someone who has no idea
what the index is. So the index is doing a fine job here of taking
care of things so the user doesn't have to think about them. We should
have more of that.

The "git diff" behavior would really only be surprising to someone who
doesn't totally grok the index if the index got updated other than
during a commit or merge. So I think it would be great if that only
happened when the user passed the word "index" on the command line as
in "update-index" or "apply --index".

In fact that rule of them would argue for leaving "git apply" alone
and instead solving the inconsistency I pointed out above by making
"cherry-pick -n" not update the index, (unless passed a new "--index"
option).

> Once I am done, I can ask "git diff" and expect it to show my
> local changes I have no intention of committing for now
> (e.g. GIT-VERSION-GEN in the working tree has v1.4.5-rc1.GIT
> long before I plan to start the rc1 cycle to constantly remind
> me what the next version will be, which is a trick I picked up
> from Linus), and "git diff --cached" would show exactly what I
> will commit.

I understand the trick, and I'm not proposing anything that would
preclude it. But I really don't find it a compelling argument for the
default behavior of git-commit. I don't see why the correct next value
for the version is easier to compute at one time vs. another. Linus
argued that it helped him not forget to update the version, but I
would think this kind of thing would train users to leave uncommitted
stuff around which could lead to mistakes, (and the user _still_ has
to remember "Oh, this is that special commit where I _don't_ leave
that uncommitted stuff around anymore, but I actually commit it."). So
I don't personally see any gain to the trick.

> Probably new people who are not used to the index do not have
> this problem, but I suspect I am not alone among old time
> gitters.

Sure, so put an alias or config option in place so you don't have to
change your ways at all.

> I lost about half an hour after saying "git commit --amend",
> without thinking, because I wanted to amend only the commit
> message, and much later I noticed that it swallowed unrelated
> changes I had in the working tree because it now implied the
> "-a" behaviour, and I should have said "git commit -i --amend".

I definitely commiserate on that one. I myself often use "commit
--amend" to change just a commit message.

But at the same time, I also very often use "commit --amend" to fix up
the tree itself in the most recent commit. And I've also last the same
half hour by forgetting to do "commit -a" or "update-index" when doing
that more than once in the past.

I think the real fix for this particular issue is to add a little more
"stack" functionality to git itself rather than just the one-step-back
functionality of "--amend". For example, one simple thing that might
help would be a command to edit the commit message of any commit. That
would at least be easy to implement as it wouldn't introduce any
user-interface concerns about dealing with conflicts while replaying
history.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] Trim hint printed when gecos is empty.
From: Junio C Hamano @ 2006-11-28 18:29 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Johannes Schindelin
In-Reply-To: <200611281506.53518.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> On Tuesday 2006 November 28 14:40, Johannes Schindelin wrote:
>
>> You are probably different than me. What with my track record, I _trust_
>> my patches to be not perfect at all...
>
> ...  I had understood it was a 
> legal tool to trace the provenance of a patch - not to sign off on it being 
> bug free (which surely is impossible).

Johannes, Andy's interpretation is in line with the policy in
SubmittingPatches.  S-o-b is about warranty of provenance, and
not about correctness or cheering (Acked-by).

And I think it makes sense to add "-s" automatically to commits
made in a private working repository in which the developer who
configured "-s" to be added automatically is the only person who
makes commits.  As already mentioned in the thread, one of the
hooks should be usable for that.  And it certainly is a
possibility to add a config to turn "-s" on.

But I suspect that it would be cleaner and more useful to teach
"git commit" to use a commit message template per repository and
put the S-o-b in there -- that mechanism would be usable for
things other than just S-o-b lines as projects see fit.


^ permalink raw reply

* Re: git and bzr
From: Aaron Bentley @ 2006-11-28 18:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: bazaar-ng, git
In-Reply-To: <ekhrhi$g6t$1@sea.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jakub Narebski wrote:
>>I notice that blame has an option to limit the annotation to recent
>>history.  I can only assume that is for performance reasons.  bzr
>>annotate doesn't need a feature like that, because annotations are
>>explicit in bzr's storage format. 
> 
> 
> But you don't have content movement tracking.
> 
> 
>>                                  I expect that even if we were to 
>>extend annotate to track content across files, it would still be so fast
>>that we wouldn't need it.
> 
> 
> I think not.

There's no question that determining content movement could involve
opening a lot of revisions, but you wouldn't need to examine:

1. revisions that didn't alter any lines being examined
2. revisions that altered only the file in question
3. revisions with multiple parents, because any lines attributed to that
merge will be the outcome of conflict resolution.  (Other lines will be
attributed to one of the parents)

I'll admit though, that when I was thinking of this, I was thinking of
annotation-based merging, a scenario in which the number of lines being
examined is typically extremely low.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFbICL0F+nu1YWqI0RAhaXAJ9tqw/J17oKDV0nnuPlputs1PHBIgCghs6K
q++u4Z9OFGwziUBsnW08y0U=
=tmqe

^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Daniel Barkalow @ 2006-11-28 18:37 UTC (permalink / raw)
  To: skimo; +Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git
In-Reply-To: <20061128180817.GA12463MdfPADPa@greensroom.kotnet.org>

On Tue, 28 Nov 2006, Sven Verdoolaege wrote:

> On Tue, Nov 28, 2006 at 12:28:47PM -0500, Daniel Barkalow wrote:
> > It would be wrong to do "commit -a" in submodules if the supermodule 
> > weren't being committed with -a, of course.
> 
> What if you say "git commit submodule" ?

Obviously no -a, as I said.

> If you agree with the above, then why should "git commit -a"
> do any different from "git commit submodule" if submodule was
> the only thing that got changed ?

If submodule was the only thing that got changed, it's not dirty; if it 
were dirty, some of its contents would also have gotten changed. Surely:

"git commit submodule/foo bar"

should do "git commit foo" in submodule, and then commit the supermodule 
with the new commit for the submodule and the change to bar. And so
"submodule/foo" is something you could commit changes to, so it should get 
picked up by -a.

Of course, if submodule *is* the *only* thing that changed (e.g., you did 
a fast-forward merge in it, or you've previously committed it completely), 
there won't be a "commit -a" in it, because that would just generate a 
gratuitous commit.

	-Daniel

^ permalink raw reply

* Re: git and bzr
From: Jakub Narebski @ 2006-11-28 18:43 UTC (permalink / raw)
  To: Aaron Bentley; +Cc: bazaar-ng, git
In-Reply-To: <456C809C.3050503@utoronto.ca>

Dnia wtorek 28. listopada 2006 19:31, Aaron Bentley napisał:
> Jakub Narebski wrote:
>>>I notice that blame has an option to limit the annotation to recent
>>>history.  I can only assume that is for performance reasons.  bzr
>>>annotate doesn't need a feature like that, because annotations are
>>>explicit in bzr's storage format.
>>
>> But you don't have content movement tracking.
>>
>>>                                  I expect that even if we were to
>>>extend annotate to track content across files, it would still be so fast
>>>that we wouldn't need it.
>>
>>
>> I think not.
> 
> There's no question that determining content movement could involve
> opening a lot of revisions, but you wouldn't need to examine:
> 
> 1. revisions that didn't alter any lines being examined
> 2. revisions that altered only the file in question
> 3. revisions with multiple parents, because any lines attributed to that
> merge will be the outcome of conflict resolution.  (Other lines will be
> attributed to one of the parents)
> 
> I'll admit though, that when I was thinking of this, I was thinking of
> annotation-based merging, a scenario in which the number of lines being
> examined is typically extremely low.

Well, I gues that with "annotate friendly" (weave or knit) storage
annotate/blame would be faster. But fast annotate was not one of the
design goals of git.

How fast is "bzr annotate"?
-- 
Jakub Narebski

^ permalink raw reply

* Re: git and bzr
From: Nicholas Allen @ 2006-11-28 18:58 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: bazaar-ng, git
In-Reply-To: <ekhtnt$rkk$1@sea.gmane.org>


> There are trouble with file-ids. Most common example is trouble with file
> which was created in two branches (two repositories) independently, then
> branches got merged. Most (all?) file-id based rename detection has trouble
> with repeated merging of those branches, even if there are no true
> conflicts.

Do you mean if the 2 files should be merged into 1 file? If they should 
be 2 files with different names there is no problem using file 
identifiers but if they should be merged into one file then I can see 
that this would cause problems. You would have to delete one of the 
files and copy its changes into the other which would create conflicts 
when that file is modified in the other branch. This is a problem if you 
*only* have file identifiers.

But if you tracked both file identifiers *and* content identifiers (as I 
was trying to say in my first post) this wouldn't be a problem would it? 
When content is changed you use the content identifiers but when files 
are changed by renaming or deleting you use file identifiers. To me at 
least it doesn't seem like it's a choice of one or the other or that one 
is stupid and the other isn't but that you need them both. bzr uses file 
ids and git uses content ids. It would be nice if there were an RCS 
that  used both - then you get the best of both worlds don't you?

So I don't think you want to use file identifiers to track changes to 
content (as bzr would do in this case) and you don't want to use content 
identifiers to track changes to files (as git does, to my understanding, 
when a file is renamed).

Nick

^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Sven Verdoolaege @ 2006-11-28 19:06 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git
In-Reply-To: <Pine.LNX.4.64.0611281315020.20138@iabervon.org>

On Tue, Nov 28, 2006 at 01:37:54PM -0500, Daniel Barkalow wrote:
> If submodule was the only thing that got changed, it's not dirty; if it 
> were dirty, some of its contents would also have gotten changed.

For me, the commit is the only "content" of the subproject that the
superproject should care about, so the submodule being dirty or not
is completely irrelevant (for committing), but it seems you see the
subproject more as a (working) tree than as a commit. Of course, as
Linus already mentioned, a "git commit" could still warn you if the
subproject was dirty.

> Surely:
> 
> "git commit submodule/foo bar"

I wouldn't dream of doing such an operation, because it doesn't make
sense to me.  (So as far as I'm concerned, you can make it do whatever
you'd like it to do.)  You can only commit the subproject as a whole.

> should do "git commit foo" in submodule, and then commit the supermodule 
> with the new commit for the submodule and the change to bar. And so
> "submodule/foo" is something you could commit changes to, so it should get 
> picked up by -a.


^ permalink raw reply

* Re: git and bzr
From: Nicholas Allen @ 2006-11-28 19:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: bazaar-ng, git
In-Reply-To: <ekhtnt$rkk$1@sea.gmane.org>

Jakub Narebski wrote:
> Nicholas Allen wrote:
>
>   
>>> The reason this is a good example is simply the fact that it should 
>>> totally silence anybody who still thinks that tracking file identities is 
>>> a good thing. It explains well why tracking file identities is just 
>>> _stupid_.
>>>       
>> I'm unfamiliar with git so I could be totally wrong here!
>>
>> I know that bzr supports file renames/moves very effectively and I 
>>     
>
> This means: _usually_ works, doesn't it? Emphasisis on "usually"?
>
>   
>> understood that git doesn't support this to the same extent (correct me 
>> if I am wrong as I have not used git at all!).
>>     
>
> Git supports renames/moves in different way. Instead of recording renames
> (which has trouble on it's own, for example rename via applying patch)
> in the repository it _detect_ renames when needed.
>   
This can't be fail safe though. I would prefer to also have the option 
to be able to *explicitly* tell the RCS that a file was renamed and not 
have it try to detect from the content  which is bound to have corner 
cases that fail. When I know I renamed a file why can't I explicitly 
tell the RCS and it records the change with the *file identifier*. If I 
change the content then the change is not recorded with the file 
identifier but with the line/content identifier.


^ permalink raw reply

* Re: What's in git.git
From: Carl Worth @ 2006-11-28 19:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzmaf3kdl.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

On Sat, 25 Nov 2006 02:12:38 -0800, Junio C Hamano wrote:
>  * The new "--depth $n" parameter to git clone/fetch tries to
>    limit the commit ancestry depth to $n.

I'm very excited to see the shallow clone stuff coming online. Thanks
to everybody that is working on that!

Has though been given to make the depth selection consistent with
other limiting options for rev-parse and rev-list? For example, I'd
like to be able to use --since to get a shallow clone, (so should
--depth instead be --max-count?, and can we re-use some existing
machinery here?).

>    Petr Baudis (1):
>       Make git-clone --use-separate-remote the default
...
>    Junio C Hamano (19):
>       git-merge: make it usable as the first class UI

Also very exciting. Please do keep up the user-interface improvements,
everybody.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: git and bzr
From: Andy Parkins @ 2006-11-28 19:40 UTC (permalink / raw)
  To: git
In-Reply-To: <456C89E7.8080404@ableton.com>

On Tuesday 2006, November 28 19:11, Nicholas Allen wrote:

> This can't be fail safe though. I would prefer to also have the option
> to be able to *explicitly* tell the RCS that a file was renamed and not
> have it try to detect from the content  which is bound to have corner
> cases that fail. When I know I renamed a file why can't I explicitly

You want to tell git about a rename that will never fail to be detected?  No 
problem.

$ git mv oldname newname
$ git commit

The corner cases you speak about are when you rename and edit.

For me, I prefer that to be detected as at least the detection algorithm can 
be tuned - there is no fixing it if the VCS was forced to consider it a 
rename.

When I started using git I was worried about the lack of a rename, but now I 
realise that it's not needed - it's pointless.  The VCS is snapshotting 
moments in time, that's it.  Then by making cleverer and cleverer 
interpreters of those snapshots you have the potential to do stuff that is 
far more useful than "just" rename recording.


Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply

* Re: git and bzr
From: Jakub Narebski @ 2006-11-28 19:59 UTC (permalink / raw)
  To: git
In-Reply-To: <200611281940.40139.andyparkins@gmail.com>

Andy Parkins wrote:

> On Tuesday 2006, November 28 19:11, Nicholas Allen wrote:
> 
>> This can't be fail safe though. I would prefer to also have the option
>> to be able to *explicitly* tell the RCS that a file was renamed and not
>> have it try to detect from the content  which is bound to have corner
>> cases that fail. When I know I renamed a file why can't I explicitly
> 
> You want to tell git about a rename that will never fail to be detected?  No 
> problem.
> 
> $ git mv oldname newname
> $ git commit
> 
> The corner cases you speak about are when you rename and edit.
> 
> For me, I prefer that to be detected as at least the detection algorithm can 
> be tuned - there is no fixing it if the VCS was forced to consider it a 
> rename.
> 
> When I started using git I was worried about the lack of a rename, but now I 
> realise that it's not needed - it's pointless.  The VCS is snapshotting 
> moments in time, that's it.  Then by making cleverer and cleverer 
> interpreters of those snapshots you have the potential to do stuff that is 
> far more useful than "just" rename recording.

Well, there are two cases where this might be not enough.

On is following file renames for history tracking. git-blame does that,
but git-log and friends does not; the <path> is just revision limiter.
There is an idea of --follow option to git-log (and friends), to be
implemented.

Second is rename detection for 3way merges: only ancestor and final
states are considered, so the above would not help. And rename detection
might fail if ancestor is not similar enough to end states; well, the
merge has low chance of being without conflict then.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Steven Grimm @ 2006-11-28 19:58 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200611281335.38728.andyparkins@gmail.com>

Andy Parkins wrote:
> Unfortunately, during development, you've switched libsubmodule1 to 
> branch "development", but supermodule isn't tracking libsubmodule1/HEAD it's 
> tracking libsubmodule1/master.  Your supermodule commit doesn't capture a 
> snapshot of the tree you're using.
>   

How about if the supermodule commit errors out by default if you commit 
a different submodule branch than the one you committed the previous 
time? Require the user to explicitly acknowledge that yes, they want to 
check in the contents of "development" now, even though the supermodule 
was tracking "master" before.

Otherwise I think you could easily end up with just the opposite 
situation, where you forget you've checked out "development" for a 
moment to look at something, and end up inadvertently committing a bunch 
of stuff that's not ready for prime time yet. In a standalone git 
setting, that's no big deal since the commit only updates the current 
branch and doesn't touch the master branch, but (as I understand the 
proposal) in a supermodule setting you'd actually end up essentially 
doing a merge between your development branch and the previously 
committed master. Or maybe not a merge, but worse, you'd *replace* the 
previously committed master with what's in your dev branch.

I think wanting to commit a submodule on a different branch than last 
time is probably not a typical day-to-day use case, so we should make 
sure the user really wants to do it (but allow it if so.)

On a related note, it would be great from a usability point of view if 
there were a way to say "I always want to be on the same branch in all 
submodules and the supermodule." I think a common scenario will be that 
you are doing development that touches a couple of different 
applications and your development effort is really a single set of 
changes even though it happens to cross submodule boundaries. If this 
branches-in-sync option is turned on, I'd want "git checkout 
development" to check out the development branch in the entire set of 
repositories.

More generally, while I 100% agree that it's very useful to be able to 
operate independently on each submodule, I think it's also going to be 
common to use submodules to selectively clone different pieces of a 
larger project. Say your current development effort needs server A, 
library B, and documentation C, and you want to have *just* those pieces 
in your environment. You don't particularly care about the details of 
how the system has assembled the pieces you want; you want to be able to 
make your changes and push them when you're done. They are really just 
pieces of a larger code base, not independent entities that happen to be 
pulled together into a composite workspace temporarily.

For that use case, I don't want the system to act differently depending 
on whether server A and library B are in the same submodule or separate 
ones; I want to treat the supermodule as the repository, and the system 
should take care of the details of managing the submodules. When I do 
"git commit -a" I want it to give me one editor to write one commit 
comment that covers all of the changes I've made, and when I do "git 
checkout -b" I want a new branch to apply across all the files I'm 
working with.

It is entirely possible that the above is a matter best left to the 
porcelain layer, and that's fine with me. But I think the Perforce-style 
"compose a single workspace out of different bits of a larger project" 
model is hugely useful and whatever submodule system Git ends up with, 
it should be able to emulate as much of that feature as possible.


^ permalink raw reply

* Re: [PATCH 1.2/2 (fixed)] git-svn: fix output reporting from the delta fetcher
From: Eric Wong @ 2006-11-28 20:16 UTC (permalink / raw)
  To: Seth Falcon; +Cc: Pazu, git
In-Reply-To: <m2bqmr1rnw.fsf@ziti.fhcrc.org>

Seth Falcon <sethfalcon@gmail.com> wrote:
> Pazu <pazu@pazu.com.br> writes:
> > Notice that there's no "CamelEar" directory. For some reason, it
> > wasn't fetched in the initial revision. Now, just to make sure this
> > isn't svn fault:
> >
> > mini:~/devel/camel-git pazu$ svn ls -r11143
> > https://tech.bga.bunge.com/BungeHomeExt/GLS/trunk/java/bg-cam
> > .cvsignore
> > BungeIntegrationEar/
> > BungeIntegrationService/
> > BungeIntegrationServiceClient/
> > CamelEar/
> 
> Is CamelEar an empty directory (or was it an empty directory in the
> first fetch) by any chance?
> 
> I think that presently git-svn does not create empty dirs when pulling
> from svn.  It would be nice to have such directories created since
> some projects will expect the empty dir to be there (no need to track
> it in git, IMO).

Git itself cannot easily track empty directories (at least as far as
update-index and checkout) goes.

What I *can* do is run mktree and to force the creation of tree objects
with a 4b825dc642cb6eb9a060e54bf8d69288fbee4904 (empty) sub tree and run
commit-tree on it, but checkout/checkout-index would still need to be
modified to support it.

Is that something the git community wants?

-- 

^ permalink raw reply

* Re: [PATCH] Add support for commit.signoff config option
From: Junio C Hamano @ 2006-11-28 20:17 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200611281202.43394.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> Whether patches require signing off or not is probably a per-project
> setting rather than a per-commit setting.  Therefore as a convenience to
> the user, the commit.signoff setting will automtically add --signoff to
> commits.
>
> Signed-off-by: Andy Parkins <andyparkins@gmail.com>

I muttered something about commit templates which would make
this change a moot point, but independent of that...

> +# Config
> +case "$(git-repo-config --get commit.signoff)" in
> +1|on|yes|true)
> +	signoff=t
> +	;;
> +esac

this is ugly; please use --bool and check only for 'true'.



^ permalink raw reply

* [BUG] git shortlog: need a range!
From: Junio C Hamano @ 2006-11-28 20:20 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

I just got this:

        $ git shortlog --since=Oct.20 --until=Nov.20 master
        fatal: Need a range!

Why isn't this a range?

^ permalink raw reply

* Re: git and bzr
From: Nicholas Allen @ 2006-11-28 20:37 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: bazaar-ng, git
In-Reply-To: <ekhtnt$rkk$1@sea.gmane.org>

Jakub Narebski wrote:
> Nicholas Allen wrote:
> 
>>> The reason this is a good example is simply the fact that it should 
>>> totally silence anybody who still thinks that tracking file identities is 
>>> a good thing. It explains well why tracking file identities is just 
>>> _stupid_.
>> I'm unfamiliar with git so I could be totally wrong here!
>>
>> I know that bzr supports file renames/moves very effectively and I 
> 
> This means: _usually_ works, doesn't it? Emphasisis on "usually"?

Having not used git I can't really say whether git is better than bzr or
not in this regard. I know in the kind of development I do the case
where a file with the same name has been added independantly in 2
different branches is a pretty rare one. Usually, when it has happened
the files should have been 2 separate files with different names anyway
- so bzr would have no problem with this.

However, renaming a file is pretty common and I would rather be explicit
about it and have file name changes easily visible/searchable in my log.

Just out of curiosity: How does git handle the case where one file is
renamed differently in 2 branches and then the branches are repeatably
merged? I know that bzr handles this very well and in various tests I
did there were absolutely no repeated conflicts. Would git behave as
well in this scenario?


^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Daniel Barkalow @ 2006-11-28 20:41 UTC (permalink / raw)
  To: skimo; +Cc: Andreas Ericsson, Linus Torvalds, Yann Dirson, Steven Grimm, git
In-Reply-To: <20061128190618.GB12463MdfPADPa@greensroom.kotnet.org>

On Tue, 28 Nov 2006, Sven Verdoolaege wrote:

> On Tue, Nov 28, 2006 at 01:37:54PM -0500, Daniel Barkalow wrote:
> > If submodule was the only thing that got changed, it's not dirty; if it 
> > were dirty, some of its contents would also have gotten changed.
> 
> For me, the commit is the only "content" of the subproject that the
> superproject should care about, so the submodule being dirty or not
> is completely irrelevant (for committing), but it seems you see the
> subproject more as a (working) tree than as a commit.

I think we agree on the tree/commit/object database model part.

I think we disagree on how the working *directories* relate. I see the 
checked-out state of a submodule as being relevant to the checked-out 
state of the supermodule, such that dirty state in the submodule directory 
is dirty state in the supermodule directory.

> > Surely:
> > 
> > "git commit submodule/foo bar"
> 
> I wouldn't dream of doing such an operation, because it doesn't make
> sense to me.  (So as far as I'm concerned, you can make it do whatever
> you'd like it to do.)  You can only commit the subproject as a whole.

I'm thinking that users of subprojects will often want to work on the
subprojects rather than exclusively using commits prepared by other 
people, and it's too much trouble to have to do the work in a repository 
for just the subproject and pull it into the superproject's submodule to 
test it. So the submodule working directory needs to function as a working 
directory for the subproject. Then

  "cd submodule; git commit foo"

does the obvious thing, but that should be the same as

  "git commit submodule/foo" (since it normally is)

and then it makes sense to let you do multiple commits with a single 
command when the paths end in different modules, since that's obviously 
what you're requesting, and then -a must do all of them.

	-Daniel

^ permalink raw reply

* Re: [PATCH 1.2/2 (fixed)] git-svn: fix output reporting from the delta fetcher
From: Pazu @ 2006-11-28 20:47 UTC (permalink / raw)
  To: Eric Wong; +Cc: Seth Falcon, git
In-Reply-To: <20061128201605.GA1369@localdomain>

On 11/28/06, Eric Wong <normalperson@yhbt.net> wrote:

> Git itself cannot easily track empty directories (at least as far as
> update-index and checkout) goes.
> [...]
> Is that something the git community wants?

No, I guess not. I detailed the real problem in my previous message,
and it had nothing to do with empty directories, but with git-svn
recording broken revisions from svn. Did you get it, or Trogdor ate my
email?


^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Shawn Pearce @ 2006-11-28 21:02 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Andy Parkins, git
In-Reply-To: <456C94E2.6010708@midwinter.com>

Steven Grimm <koreth@midwinter.com> wrote:
> Andy Parkins wrote:
> >Unfortunately, during development, you've switched libsubmodule1 to 
> >branch "development", but supermodule isn't tracking libsubmodule1/HEAD 
> >it's tracking libsubmodule1/master.  Your supermodule commit doesn't 
> >capture a snapshot of the tree you're using.
> >  
>
> Or maybe not a merge, but worse, you'd *replace* the 
> previously committed master with what's in your dev branch.

Right, you would be replacing the prior branch of that submodule with
the new submodule branch.

I think the safety valve you are looking for here is two things:

  * don't automatically update the submodule's HEAD into the
    supermodule's index.

  * make sure the submodule's HEAD is a fast-forward of the
    supermodule's index, with a --force option to force it
	anyway.

Otherwise the developer just has to know what he/she is doing.
Today you can put stuff that isn't ready for prime-time into a
repository on the wrong branch just by applying the wrong patch,
or cherry-picking the wrong commit, etc...  the user can (and
will) make mistakes.  But they can also easily recover from them
by rewinding history and redoing it.

> On a related note, it would be great from a usability point of view if 
> there were a way to say "I always want to be on the same branch in all 
> submodules and the supermodule."

That's not really an issue.

A branch doesn't exist just because you checked-out the branch, or
because you created it.  A branch exists because there were two or
more commits (B and C) which use the same parent (A) and two or more
of those commits survive, e.g. they have refs which point to them
(directly or indirectly) or they were merged into another commit
which itself survives.

Therefore if the supermodule is on the "development branch" the
submodules are also immediately on the same branch, because their
HEADs are derived from whatever is stored in the supermodule's tree.
And that tree is derived from whatever "development branch" means.

Really what you want/need is a special head in the submodule
which acts as the "branch that corresponds to the supermodule".
This probably should just be a naked SHA1 stored in HEAD, which
is committable only because a supermodule exists in a higher level
directory.

The fact that the submodule project has branches *at all* is
totally irrelevant once you start to speak about that submodule
within the supermodule, as its the supermodule which determines
the branch of the submodule.

> But I think the Perforce-style 
> "compose a single workspace out of different bits of a larger project" 
> model is hugely useful

That's a mess.

You start to get into weird cases where the directory structure
expected by the build process is no longer intact, because the user
has sliced it apart in weird ways.  And there's no single version
which corresponds to that workspace as (if I recall correctly)
you can pick different tags or branches at will.  I believe that
ClearCase has the same bug.

You also can't version that now spliced workspace, aside from taking
the configuration file and putting that under version control too.

However I think the proposal on the table will support that to some
degree, in that you can take any version of any repository and embed
it at any directory of any other repository.  This means you can
for example embed the Linux kernel, glibc and gcc projects into
a larger "embedded device" repository, but you cannot alter the
structure of any of those three projects without making your own
locally developed branch of them.  Which is actually the correct
thing to do as any subslicing of a repository is exactly that:
a locally developed branch of that repository.

-- 

^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Shawn Pearce @ 2006-11-28 21:10 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: skimo, Andreas Ericsson, Linus Torvalds, Yann Dirson,
	Steven Grimm, git
In-Reply-To: <Pine.LNX.4.64.0611281407370.20138@iabervon.org>

Daniel Barkalow <barkalow@iabervon.org> wrote:
>   "cd submodule; git commit foo"
> 
> does the obvious thing, but that should be the same as
> 
>   "git commit submodule/foo" (since it normally is)
> 
> and then it makes sense to let you do multiple commits with a single 
> command when the paths end in different modules, since that's obviously 
> what you're requesting, and then -a must do all of them.

Except what if the submodules have different commit message
standards?  E.g. one requires signoff and another doesn't?  Or one
allows privately held information (e.g. its your coporate project)
and one doesn't (e.g. its an open source project you use/contribute
to)?

But slightly more practical: the change message for the superproject
might simply be "resolved bug X, caused by ...".  Which may make a
lot of sense to the top level project, but makes no sense at all
in a submodule involved in the fix as the submodule's developer
community doesn't even know what "X" is, let alone how "..." could
have caused it.

So you really need to think twice before you apply the same commit
message to every project, as each commit message needs to make sense
with that one submodule's limited scope, or within the supermodule's
larger scope.

But if you really still think that the same commit message makes
sense everywhere, we have 'git commit -F'.  Write it out in a file
and hand it off to -F in each module.  This would be easier if
git-ls-files grew a new option:

	vi ~/msg
	for m in $(git ls-files --submodules); do git commit -F ~/msg; done
	git commit -F ~/msg

-- 

^ permalink raw reply

* Re: [PATCH 1.2/2 (fixed)] git-svn: fix output reporting from the delta fetcher
From: Eric Wong @ 2006-11-28 21:15 UTC (permalink / raw)
  To: Pazu; +Cc: Seth Falcon, git
In-Reply-To: <9e7ab7380611281247h723a16fapc5a9898e8a4c7e1f@mail.gmail.com>

Pazu <pazu@pazu.com.br> wrote:
> On 11/28/06, Eric Wong <normalperson@yhbt.net> wrote:
> 
> >Git itself cannot easily track empty directories (at least as far as
> >update-index and checkout) goes.
> >[...]
> >Is that something the git community wants?
> 
> No, I guess not. I detailed the real problem in my previous message,
> and it had nothing to do with empty directories, but with git-svn
> recording broken revisions from svn. Did you get it, or Trogdor ate my
> email?

Oops, I didn't notice the part about git-svn continuing despite a failed
connection.  Thanks for poking me again.
I'll look into how/if abort_edit/close_edit is called and how to deal
with a failed network connection.

-- 

^ permalink raw reply

* Re: git and bzr
From: Nicholas Allen @ 2006-11-28 21:26 UTC (permalink / raw)
  To: Nicholas Allen; +Cc: Jakub Narebski, bazaar-ng, git
In-Reply-To: <456C9DFF.1040407@onlinehome.de>

> 
> Just out of curiosity: How does git handle the case where one file is
> renamed differently in 2 branches and then the branches are repeatably
> merged? I know that bzr handles this very well and in various tests I
> did there were absolutely no repeated conflicts. Would git behave as
> well in this scenario?
> 

Ok - I got curious and decided to install git and try this myself.

In this test I had a file hello.txt that got renamed to hello1.txt in
one branch and hello2.txt in another. Then I merged the changes between
the 2 branches.

Here is how it looked after the merge in bzr:

 bzr status
renamed:
  hello2.txt => hello1.txt
conflicts:
  Path conflict: hello2.txt / hello1.txt
pending merges:
  Nicholas Allen 2006-11-28 Renamed hello to hello1


and here's how it looked in git:
git status
#
# Changed but not updated:
#   (use git-update-index to mark for commit)
#
#       unmerged: hello.txt
#       unmerged: hello1.txt
#       unmerged: hello2.txt
#       modified: hello2.txt
#
nothing to commit

So git is not telling me that I have a conflict due to the same file
being renamed differently in 2 branches - well at least not in a way I
can comprehend anyway! Whereas bzr made this very clear. Also, in git I
ended up with 2 files:

 ls
hello1.txt  hello2.txt

whereas in bzr there was only one file and I just had to decide which
name it was to be given to resolve the conflict.

I'm not sure how I should resolve the conflict in git but that's
probably just because I am not familiar with it yet and the message it
gave was not comprehensible or helpful to me in the slightest. In bzr it
was very easy and repeatably merging caused no trouble at all - the name
conflict had to be resolved only once.

While it was good that git detected my file rename (although this was
not hard as the contents did not change at all) the process in bzr was
*much* smoother and more user friendly than it was it git. When you have
conflicts I think it's especially important that the RCS inform you of
what is really happening so you do not make mistakes. Bzr was much more
informative than git was and told me exactly why there was a conflict
and made it easy to resolve it.

This situation is a pretty common one and it seems to me that git's
content based approach is not as useful in this case as the file
identity approach that bzr uses.


Nick

^ permalink raw reply

* Re: [RFC] Submodules in GIT
From: Daniel Barkalow @ 2006-11-28 21:32 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: skimo, Andreas Ericsson, Linus Torvalds, Yann Dirson,
	Steven Grimm, git
In-Reply-To: <20061128211012.GJ28337@spearce.org>

On Tue, 28 Nov 2006, Shawn Pearce wrote:

> Daniel Barkalow <barkalow@iabervon.org> wrote:
> > and then it makes sense to let you do multiple commits with a single 
> > command when the paths end in different modules, since that's obviously 
> > what you're requesting, and then -a must do all of them.
> 
> Except what if the submodules have different commit message
> standards?  E.g. one requires signoff and another doesn't?  Or one
> allows privately held information (e.g. its your coporate project)
> and one doesn't (e.g. its an open source project you use/contribute
> to)?

I don't think you'd ever want the same commit message for commits in two 
projects. In any case where you'd commit a submodule in the process of 
committing a supermodule, git would do this by recursively calling 
git-commit, which would prompt for separate commit messages.

	-Daniel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox