* git commit -a reports untracked files after a clone
@ 2011-05-15 0:46 Philipp Metzler
2011-05-15 3:43 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Philipp Metzler @ 2011-05-15 0:46 UTC (permalink / raw)
To: git
Hi,
I have a problem with git on OS X 10.6.7
5102c6173c5a1c683dfdd8ccd07528adddd51745 is the first bad commit
SHA: 5102c6173c5a1c683dfdd8ccd07528adddd51745
Author: Joshua Jensen <jjensen@workspacewhiz.com>
Date: Sun Oct 03 2010 11:56:43 GMT+0200 (CEST)
This is how you can reproduce the problem:
1. clone a repo
2. run the command "git commit -a"
I would expect: nothing to commit (working directory clean)
Instead it reports: nothing added to commit but untracked files present (use "git add" to track)
Starting with commit
SHA: 8c8674fc954d8c4bc46f303a141f510ecf264fcd
Author: Jeff King <peff@peff.net>
Date: Fri Mar 25 2011 19:13:31 GMT+0100 (CET)
the following behaviour can be observed when these three commands are run in this order:
1. git commit -a
nothing added to commit but untracked files present (use "git add" to track)
2. git status
nothing to commit (working directory clean)
3. git commit -a
nothing to commit (working directory clean)
So "git status" makes the untracked files "go away".
Cheers,
Philipp
_______________________________________________________________
DI Philipp Metzler
Goli.at GesbR.
Dorf Rieden 7/11
A-6900 Bregenz
EU - Austria
E-Mail: phil@goli.at
Skype: googol
Tel: +43 / 676 / 72 94 176
ICQ: 13950954
o www.philippmetzler.com - Softwareentwicklung und Websites mit Django und Typo3.
o www.goli.at - Ihr Speicherplatz im Netz. Messen Sie uns an unseren Daten.
o www.clickshopping.at - Wir bringen Ihre Produkte auf den Punkt.
o www.greencar.at - Elektroautos und mehr ...
_______________________________________________________________
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-15 0:46 git commit -a reports untracked files after a clone Philipp Metzler
@ 2011-05-15 3:43 ` Junio C Hamano
2011-05-15 8:26 ` Philipp Metzler
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2011-05-15 3:43 UTC (permalink / raw)
To: Philipp Metzler; +Cc: git
Philipp Metzler <phil@goli.at> writes:
> This is how you can reproduce the problem:
> 1. clone a repo
> 2. run the command "git commit -a"
Does it reproduce with _any_ repository, or just a particular one? If it
is the latter, then the above description is useless for anybody to start
formulating any theory on what goes wrong. Sorry.
You being on OS X, I would guess that you may have a pathname in the
project that HFS+ does not like.
On HFS+, when a program creates a file with "open(filename, O_CREAT)",
reading the directory the created path is in with readdir() does not
return the string given when it was created but something else for certain
pathnames; you may be seeing that git is confused by that behaviour.
But that is just a wild guess.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-15 3:43 ` Junio C Hamano
@ 2011-05-15 8:26 ` Philipp Metzler
2011-05-16 10:38 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Philipp Metzler @ 2011-05-15 8:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Hi,
on OS X 10.6.7 it seemes to happen with any git repository - also with the git git repo itself:
[phil@Silberpfeil tmp]$ git --version
git version 1.7.5.1
[phil@Silberpfeil tmp]$ git clone git://git.kernel.org/pub/scm/git/git.git
Cloning into git...
remote: Counting objects: 140383, done.
remote: Compressing objects: 100% (33498/33498), done.
remote: Total 140383 (delta 105777), reused 139383 (delta 104980)
Receiving objects: 100% (140383/140383), 27.61 MiB | 642 KiB/s, done.
Resolving deltas: 100% (105777/105777), done.
[phil@Silberpfeil tmp]$ cd git
[phil@Silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@Silberpfeil git]$ git status
# On branch master
nothing to commit (working directory clean)
[phil@Silberpfeil git]$ git commit -a
# On branch master
nothing to commit (working directory clean)
Cheers,
Philipp
_______________________________________________________________
DI Philipp Metzler
Goli.at GesbR.
Dorf Rieden 7/11
A-6900 Bregenz
EU - Austria
E-Mail: phil@goli.at
Skype: googol
Tel: +43 / 676 / 72 94 176
ICQ: 13950954
o www.philippmetzler.com - Softwareentwicklung und Websites mit Django und Typo3.
o www.goli.at - Ihr Speicherplatz im Netz. Messen Sie uns an unseren Daten.
o www.clickshopping.at - Wir bringen Ihre Produkte auf den Punkt.
o www.greencar.at - Elektroautos und mehr ...
_______________________________________________________________
Am 15.05.2011 um 05:43 schrieb Junio C Hamano:
> Philipp Metzler <phil@goli.at> writes:
>
>> This is how you can reproduce the problem:
>> 1. clone a repo
>> 2. run the command "git commit -a"
>
> Does it reproduce with _any_ repository, or just a particular one? If it
> is the latter, then the above description is useless for anybody to start
> formulating any theory on what goes wrong. Sorry.
>
> You being on OS X, I would guess that you may have a pathname in the
> project that HFS+ does not like.
>
> On HFS+, when a program creates a file with "open(filename, O_CREAT)",
> reading the directory the created path is in with readdir() does not
> return the string given when it was created but something else for certain
> pathnames; you may be seeing that git is confused by that behaviour.
>
> But that is just a wild guess.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-15 8:26 ` Philipp Metzler
@ 2011-05-16 10:38 ` Jeff King
2011-05-16 10:49 ` Philipp Metzler
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2011-05-16 10:38 UTC (permalink / raw)
To: Philipp Metzler; +Cc: Junio C Hamano, git
On Sun, May 15, 2011 at 10:26:01AM +0200, Philipp Metzler wrote:
> [phil@Silberpfeil tmp]$ git --version
> git version 1.7.5.1
> [phil@Silberpfeil tmp]$ git clone git://git.kernel.org/pub/scm/git/git.git
> Cloning into git...
> remote: Counting objects: 140383, done.
> remote: Compressing objects: 100% (33498/33498), done.
> remote: Total 140383 (delta 105777), reused 139383 (delta 104980)
> Receiving objects: 100% (140383/140383), 27.61 MiB | 642 KiB/s, done.
> Resolving deltas: 100% (105777/105777), done.
> [phil@Silberpfeil tmp]$ cd git
> [phil@Silberpfeil git]$ git commit -a
> # On branch master
> # Untracked files:
> # (use "git add <file>..." to include in what will be committed)
> #
> # vcs-svn/
> # xdiff/
Can you try this again with "git commit -uall" so we can see which
specific files in those directories are causing the issue?
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 10:38 ` Jeff King
@ 2011-05-16 10:49 ` Philipp Metzler
2011-05-16 12:08 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Philipp Metzler @ 2011-05-16 10:49 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Hi,
Hope this helps. Just let me know if you need additional infos.
[phil@Silberpfeil tmp]$ git clone git://git.kernel.org/pub/scm/git/git.git
Cloning into git...
remote: Counting objects: 140472, done.
remote: Compressing objects: 100% (33586/33586), done.
remote: Total 140472 (delta 105841), reused 139396 (delta 104981)
Receiving objects: 100% (140472/140472), 27.67 MiB | 638 KiB/s, done.
Resolving deltas: 100% (105841/105841), done.
[phil@Silberpfeil tmp]$ cd git
[phil@Silberpfeil git]$ git commit -uall
# On branch master
nothing to commit (working directory clean)
[phil@Silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9154/
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@Silberpfeil git]$ git commit -uall
# On branch master
nothing to commit (working directory clean)
[phil@Silberpfeil git]$ git commit -u
# On branch master
nothing to commit (working directory clean)
[phil@Silberpfeil git]$ git commit --all
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9154/
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
Cheers,
Philipp
_______________________________________________________________
DI Philipp Metzler
Goli.at GesbR.
Dorf Rieden 7/11
A-6900 Bregenz
EU - Austria
E-Mail: phil@goli.at
Skype: googol
Tel: +43 / 676 / 72 94 176
ICQ: 13950954
o www.philippmetzler.com - Softwareentwicklung und Websites mit Django und Typo3.
o www.goli.at - Ihr Speicherplatz im Netz. Messen Sie uns an unseren Daten.
o www.clickshopping.at - Wir bringen Ihre Produkte auf den Punkt.
o www.greencar.at - Elektroautos und mehr ...
_______________________________________________________________
Am 16.05.2011 um 12:38 schrieb Jeff King:
> On Sun, May 15, 2011 at 10:26:01AM +0200, Philipp Metzler wrote:
>
>> [phil@Silberpfeil tmp]$ git --version
>> git version 1.7.5.1
>> [phil@Silberpfeil tmp]$ git clone git://git.kernel.org/pub/scm/git/git.git
>> Cloning into git...
>> remote: Counting objects: 140383, done.
>> remote: Compressing objects: 100% (33498/33498), done.
>> remote: Total 140383 (delta 105777), reused 139383 (delta 104980)
>> Receiving objects: 100% (140383/140383), 27.61 MiB | 642 KiB/s, done.
>> Resolving deltas: 100% (105777/105777), done.
>> [phil@Silberpfeil tmp]$ cd git
>> [phil@Silberpfeil git]$ git commit -a
>> # On branch master
>> # Untracked files:
>> # (use "git add <file>..." to include in what will be committed)
>> #
>> # vcs-svn/
>> # xdiff/
>
> Can you try this again with "git commit -uall" so we can see which
> specific files in those directories are causing the issue?
>
> -Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 10:49 ` Philipp Metzler
@ 2011-05-16 12:08 ` Jeff King
2011-05-16 12:25 ` Philipp Metzler
2011-05-16 12:38 ` Philipp Metzler
0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2011-05-16 12:08 UTC (permalink / raw)
To: Philipp Metzler; +Cc: Junio C Hamano, git
On Mon, May 16, 2011 at 12:49:07PM +0200, Philipp Metzler wrote:
> [phil@Silberpfeil git]$ git commit -uall
> # On branch master
> nothing to commit (working directory clean)
Hmm, nothing. That's odd.
> [phil@Silberpfeil git]$ git commit -a
> # On branch master
> # Untracked files:
> # (use "git add <file>..." to include in what will be committed)
> #
> # t/t9154/
> # t/t9601/
> # t/t9602/
> # t/t9603/
> # t/t9700/
> # t/valgrind/
> # templates/
> # vcs-svn/
> # xdiff/
> nothing added to commit but untracked files present (use "git add" to track)
And now totally different output from before, and from the previous run.
So this is really strange. The fact that the list of directories is
_different_ from your previous posting implies to me that it is not
something about those particular files, but rather some weird race
condition in the creation of those directories or the index.
But then the fact that we see them with no "-u", but don't see them with
"-uall" implies some weird heisenbug in git's directory traversal. What
happens if you do "git commit --all -uall"? I'd like to see if the thing
that switches the behavior is the presence of "--all" or the absence of
a "-u" option.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 12:08 ` Jeff King
@ 2011-05-16 12:25 ` Philipp Metzler
2011-05-16 12:38 ` Philipp Metzler
1 sibling, 0 replies; 13+ messages in thread
From: Philipp Metzler @ 2011-05-16 12:25 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Hi,
created two directories tmp and tmp2 and executed the commands in a different order with different results:
[phil@silberpfeil tmp]$ git clone git://git.kernel.org/pub/scm/git/git.git
Cloning into git...
remote: Counting objects: 140472, done.
remote: Compressing objects: 100% (33586/33586), done.
remote: Total 140472 (delta 105843), reused 139396 (delta 104981)
Receiving objects: 100% (140472/140472), 27.66 MiB | 641 KiB/s, done.
Resolving deltas: 100% (105843/105843), done.
[phil@silberpfeil tmp]$ cd git
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9150/
# t/t9151/
# t/t9153/
# t/t9154/
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9150/
# t/t9151/
# t/t9153/
# t/t9154/
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
******************************************************************************
[phil@silberpfeil tmp2]$ git clone git://git.kernel.org/pub/scm/git/git.git
Cloning into git...
remote: Counting objects: 140472, done.
remote: Compressing objects: 100% (33586/33586), done.
remote: Total 140472 (delta 105841), reused 139396 (delta 104981)
Receiving objects: 100% (140472/140472), 27.67 MiB | 602 KiB/s, done.
Resolving deltas: 100% (105841/105841), done.
[phil@silberpfeil tmp2]$ cd git
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git status
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
nothing to commit (working directory clean)
Cheers,
Philipp
_______________________________________________________________
DI Philipp Metzler
Goli.at GesbR.
Dorf Rieden 7/11
A-6900 Bregenz
EU - Austria
E-Mail: phil@goli.at
Skype: googol
Tel: +43 / 676 / 72 94 176
ICQ: 13950954
o www.philippmetzler.com - Softwareentwicklung und Websites mit Django und Typo3.
o www.goli.at - Ihr Speicherplatz im Netz. Messen Sie uns an unseren Daten.
o www.clickshopping.at - Wir bringen Ihre Produkte auf den Punkt.
o www.greencar.at - Elektroautos und mehr ...
_______________________________________________________________
Am 16.05.2011 um 14:08 schrieb Jeff King:
> On Mon, May 16, 2011 at 12:49:07PM +0200, Philipp Metzler wrote:
>
>> [phil@Silberpfeil git]$ git commit -uall
>> # On branch master
>> nothing to commit (working directory clean)
>
> Hmm, nothing. That's odd.
>
>> [phil@Silberpfeil git]$ git commit -a
>> # On branch master
>> # Untracked files:
>> # (use "git add <file>..." to include in what will be committed)
>> #
>> # t/t9154/
>> # t/t9601/
>> # t/t9602/
>> # t/t9603/
>> # t/t9700/
>> # t/valgrind/
>> # templates/
>> # vcs-svn/
>> # xdiff/
>> nothing added to commit but untracked files present (use "git add" to track)
>
> And now totally different output from before, and from the previous run.
>
> So this is really strange. The fact that the list of directories is
> _different_ from your previous posting implies to me that it is not
> something about those particular files, but rather some weird race
> condition in the creation of those directories or the index.
>
> But then the fact that we see them with no "-u", but don't see them with
> "-uall" implies some weird heisenbug in git's directory traversal. What
> happens if you do "git commit --all -uall"? I'd like to see if the thing
> that switches the behavior is the presence of "--all" or the absence of
> a "-u" option.
>
> -Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 12:08 ` Jeff King
2011-05-16 12:25 ` Philipp Metzler
@ 2011-05-16 12:38 ` Philipp Metzler
2011-05-16 14:55 ` Jeff King
1 sibling, 1 reply; 13+ messages in thread
From: Philipp Metzler @ 2011-05-16 12:38 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Hi,
Could be a race condition / heisenbug yep. The result of "git commit -a" differs - the directories vcs-svn and xdiff are there all the time but not the others. The only constant thing is that the command "git status" always "cleans up" everything. Another run:
[phil@silberpfeil git]$ git --version
git version 1.7.5.1
[phil@silberpfeil tmp3]$ git clone git://git.kernel.org/pub/scm/git/git.git
Cloning into git...
remote: Counting objects: 140472, done.
remote: Compressing objects: 100% (33586/33586), done.
remote: Total 140472 (delta 105843), reused 139396 (delta 104981)
Receiving objects: 100% (140472/140472), 27.66 MiB | 642 KiB/s, done.
Resolving deltas: 100% (105843/105843), done.
[phil@silberpfeil tmp3]$ git commit --all -uall
fatal: Not a git repository (or any parent up to mount parent /Users)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[phil@silberpfeil tmp3]$ cd git
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# t/t9601/
# t/t9602/
# t/t9603/
# t/t9700/
# t/valgrind/
# templates/
# vcs-svn/
# xdiff/
nothing added to commit but untracked files present (use "git add" to track)
[phil@silberpfeil git]$ git status
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit -a
# On branch master
nothing to commit (working directory clean)
[phil@silberpfeil git]$ git commit --all -uall
# On branch master
nothing to commit (working directory clean)
Cheers,
Philipp
_______________________________________________________________
DI Philipp Metzler
Goli.at GesbR.
Dorf Rieden 7/11
A-6900 Bregenz
EU - Austria
E-Mail: phil@goli.at
Skype: googol
Tel: +43 / 676 / 72 94 176
ICQ: 13950954
o www.philippmetzler.com - Softwareentwicklung und Websites mit Django und Typo3.
o www.goli.at - Ihr Speicherplatz im Netz. Messen Sie uns an unseren Daten.
o www.clickshopping.at - Wir bringen Ihre Produkte auf den Punkt.
o www.greencar.at - Elektroautos und mehr ...
_______________________________________________________________
Am 16.05.2011 um 14:08 schrieb Jeff King:
> On Mon, May 16, 2011 at 12:49:07PM +0200, Philipp Metzler wrote:
>
>> [phil@Silberpfeil git]$ git commit -uall
>> # On branch master
>> nothing to commit (working directory clean)
>
> Hmm, nothing. That's odd.
>
>> [phil@Silberpfeil git]$ git commit -a
>> # On branch master
>> # Untracked files:
>> # (use "git add <file>..." to include in what will be committed)
>> #
>> # t/t9154/
>> # t/t9601/
>> # t/t9602/
>> # t/t9603/
>> # t/t9700/
>> # t/valgrind/
>> # templates/
>> # vcs-svn/
>> # xdiff/
>> nothing added to commit but untracked files present (use "git add" to track)
>
> And now totally different output from before, and from the previous run.
>
> So this is really strange. The fact that the list of directories is
> _different_ from your previous posting implies to me that it is not
> something about those particular files, but rather some weird race
> condition in the creation of those directories or the index.
>
> But then the fact that we see them with no "-u", but don't see them with
> "-uall" implies some weird heisenbug in git's directory traversal. What
> happens if you do "git commit --all -uall"? I'd like to see if the thing
> that switches the behavior is the presence of "--all" or the absence of
> a "-u" option.
>
> -Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 12:38 ` Philipp Metzler
@ 2011-05-16 14:55 ` Jeff King
2011-05-27 18:00 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2011-05-16 14:55 UTC (permalink / raw)
To: Philipp Metzler; +Cc: Junio C Hamano, git
On Mon, May 16, 2011 at 02:38:33PM +0200, Philipp Metzler wrote:
> Could be a race condition / heisenbug yep. The result of "git commit
> -a" differs - the directories vcs-svn and xdiff are there all the time
> but not the others. The only constant thing is that the command "git
> status" always "cleans up" everything. Another run:
OK, I'm making some progress. I can replicate on Linux with:
$ git config core.ignorecase true
$ git clone git foo
$ cd foo && git commit -a
which gives a bunch of directories which contain tracked contents. Doing
"git status" makes the problem go away, but then doing this makes it
come back:
$ rm .git/index
$ git read-tree --reset HEAD
And like you, it doesn't trigger with "-uall".
So it is something about:
1. A fresh index that is perhaps missing stat information for some
entries.
2. Stopping the traversal at the directory boundary rather than
looking at details of each directory ("-uall").
3. core.ignorecase
But it is definitely repeatable. Which is good, because that will make
it easier to track down. :)
I'll see what I can do, but it may not be today.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-16 14:55 ` Jeff King
@ 2011-05-27 18:00 ` Jeff King
2011-05-27 18:13 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2011-05-27 18:00 UTC (permalink / raw)
To: Philipp Metzler; +Cc: Joshua Jensen, Junio C Hamano, git
[+cc Joshua, as the problem is in his 5102c61 (Add case insensitivity
support for directories when using git status, 2010-10-03)]
On Mon, May 16, 2011 at 10:55:35AM -0400, Jeff King wrote:
> OK, I'm making some progress. I can replicate on Linux with:
>
> $ git config --global core.ignorecase true
> $ git clone git foo
> $ cd foo && git commit -a
> # On branch private
> # Untracked files:
> # Documentation/
> # block-sha1/
> # builtin/
> # compat/
> # contrib/
> # git-gui/
> # git_remote_helpers/
> # gitk-git/
> # gitweb/
> # perl/
> # po/
> # ppc/
> # t/
> # templates/
> # vcs-svn/
> # xdiff/
> nothing added to commit but untracked files present
>
> [i.e., all of those directories are listed as containing untracked
> contents even though they clearly have tracked files in them]
OK, I figured it out. Unfortunately, the fix is non-trivial. Basically
we are getting bogus data out of the index's name_hash for directories.
What is happening is this:
1. We load the index, and for each entry, insert it into the index's
name_hash. In addition, if ignorecase is turned on, we make an
entry in the name_hash for the directory (e.g., "contrib/"), which
uses the following code from 5102c61's hash_index_entry_directories:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
ce->next = *pos;
*pos = ce;
}
Note that we only add the directory entry if there is not already an
entry.
2. We run add_files_to_cache, which gets updated information for each
cache entry. It helpfully inserts this information into the cache,
which calls replace_index_entry. This in turn calls
remove_name_hash() on the old entry, and add_name_hash() on the new
one. But remove_name_hash doesn't actually remove from the hash, it
only marks it as "no longer interesting" (from cache.h):
/*
* We don't actually *remove* it, we can just mark it invalid so that
* we won't find it in lookups.
*
* Not only would we have to search the lists (simple enough), but
* we'd also have to rehash other hash buckets in case this makes the
* hash bucket empty (common). So it's much better to just mark
* it.
*/
static inline void remove_name_hash(struct cache_entry *ce)
{
ce->ce_flags |= CE_UNHASHED;
}
This is OK in the specific-file case, since the entries in the hash
form a linked list, and we can just skip the "not here anymore"
entries during lookup.
But for the directory hash entry, we will _not_ write a new entry,
because there is already one there: the old one that is actually no
longer interesting!
3. While traversing the directories, we end up in the
directory_exists_in_index_icase function to see if a directory is
interesting. This in turn checks index_name_exists, which will
look up the directory in the index's name_hash. We see the old,
deleted record, and assume there is nothing interesting. The
directory gets marked as untracked, even though there are index
entries in it.
And that explains the three prerequisites I needed for replicating the
bug:
1. You must be using ignore-case. The bug is in that code path.
2. You must have an index with stale stat information; this is what
provokes add_files_to_cache to update the index entries.
3. You must stop traversal at the directory boundary (i.e., not using
"-uall"). Otherwise we end up looking for the exact filenames,
which do get found.
The problem is in the code I showed above:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
ce->next = *pos;
*pos = ce;
}
Having a single cache entry that represents the directory is not enough;
that entry may go away if the index is changed. It may be tempting to
say that the problem is in our removal method; if we removed the entry
entirely instead of simply marking it as "not here anymore", then we
would know we need to insert a new entry. But that only covers this
particular case of remove-replace. In the more general case, consider
something like this:
1. We add "foo/bar" and "foo/baz" to the index. Each gets their own
entry in name_hash, plus we make a "foo/" entry that points to
"foo/bar".
2. We remove the "foo/bar" entry from the index, and from the
name_hash.
3. We ask if "foo/" exists, and see no entry, even though "foo/baz"
exists.
So we need that directory entry to have the list of _all_ cache entries
that indicate that the directory is tracked. So that implies making a
linked list as we do for other entries, like:
hash = hash_name(ce->name, ptr - ce->name);
pos = insert_hash(hash, &istate->name_hash);
ce->next = *pos;
*pos = ce;
But that's not right either. In fact, it shows a second bug in the
current code, which is that the "ce->next" pointer is supposed to be
linking entries for a specific filename entry, but here we are
overwriting it for the directory entry. I _think_ this can't be
triggered as a bug, because:
1. This is the first entry in the directory (otherwise lookup_hash
would not have returned NULL), and is therefore the first entry
for this specific file. So ce->next must already be NULL.
2. lookup_hash returned NULL, which means "*pos" is going to be NULL.
So even though it looks like we might be truncating an existing list,
it's not possible to do so in practice. But if we start actually keeping
a directory list, we will run into problems, because we'll be splicing
unrelated lists together.
So we need to have a separate next pointer for the list in the directory
bucket, and we need to traverse that list in index_name_exists when we
are looking up a directory.
The patch below seems to fix it for me. I'm not 100% happy with adding
extra icase-only cruft to "struct cache_entry", but I don't really see a
way around it, short of separating out the "next" pointers from
cache_entry entirely (i.e., having a separate "cache_entry_list"
struct that gets stored in the name_hash). In practice, it probably
doesn't matter; we have thousands of cache entries, compared to the
millions of objects (where adding 4 bytes to the struct _does_ impact
performance).
---
diff --git a/cache.h b/cache.h
index 96cfc9a..2868dac 100644
--- a/cache.h
+++ b/cache.h
@@ -153,6 +153,7 @@ struct cache_entry {
unsigned int ce_flags;
unsigned char sha1[20];
struct cache_entry *next;
+ struct cache_entry *dir_next;
char name[FLEX_ARRAY]; /* more */
};
diff --git a/name-hash.c b/name-hash.c
index 1a8c619..30cb2e3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -57,11 +57,9 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
if (*ptr == '/') {
++ptr;
hash = hash_name(ce->name, ptr - ce->name);
- if (!lookup_hash(hash, &istate->name_hash)) {
- pos = insert_hash(hash, &istate->name_hash);
- ce->next = *pos;
- *pos = ce;
- }
+ pos = insert_hash(hash, &istate->name_hash);
+ ce->dir_next = *pos;
+ *pos = ce;
}
}
}
@@ -162,7 +160,10 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
if (same_name(ce, name, namelen, icase))
return ce;
}
- ce = ce->next;
+ if (icase && name[namelen - 1] == '/')
+ ce = ce->dir_next;
+ else
+ ce = ce->next;
}
/*
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-27 18:00 ` Jeff King
@ 2011-05-27 18:13 ` Jeff King
2011-10-05 14:26 ` Joerg Rosenkranz
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2011-05-27 18:13 UTC (permalink / raw)
To: Philipp Metzler; +Cc: Joshua Jensen, Junio C Hamano, git
On Fri, May 27, 2011 at 02:00:45PM -0400, Jeff King wrote:
> 1. We load the index, and for each entry, insert it into the index's
> name_hash. In addition, if ignorecase is turned on, we make an
> entry in the name_hash for the directory (e.g., "contrib/"), which
> uses the following code from 5102c61's hash_index_entry_directories:
>
> hash = hash_name(ce->name, ptr - ce->name);
> if (!lookup_hash(hash, &istate->name_hash)) {
> pos = insert_hash(hash, &istate->name_hash);
> ce->next = *pos;
> *pos = ce;
> }
Urgh, sorry, I was looking at this on one of my topic branches which
tweaked the calling convention of the hash code. All of my analysis is still
valid, but the code in question actually looks like this:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, ce, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
}
And therefore my patch needs tweaked slightly, too. Updated version is below.
diff --git a/cache.h b/cache.h
index 009b365..54f8c05 100644
--- a/cache.h
+++ b/cache.h
@@ -153,6 +153,7 @@ struct cache_entry {
unsigned int ce_flags;
unsigned char sha1[20];
struct cache_entry *next;
+ struct cache_entry *dir_next;
char name[FLEX_ARRAY]; /* more */
};
diff --git a/name-hash.c b/name-hash.c
index c6b6a3f..225dd76 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -57,12 +57,10 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
if (*ptr == '/') {
++ptr;
hash = hash_name(ce->name, ptr - ce->name);
- if (!lookup_hash(hash, &istate->name_hash)) {
- pos = insert_hash(hash, ce, &istate->name_hash);
- if (pos) {
- ce->next = *pos;
- *pos = ce;
- }
+ pos = insert_hash(hash, ce, &istate->name_hash);
+ if (pos) {
+ ce->dir_next = *pos;
+ *pos = ce;
}
}
}
@@ -166,7 +164,10 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
if (same_name(ce, name, namelen, icase))
return ce;
}
- ce = ce->next;
+ if (icase && name[namelen - 1] == '/')
+ ce = ce->dir_next;
+ else
+ ce = ce->next;
}
/*
--
1.7.5.3.12.g99e25.dirty
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-05-27 18:13 ` Jeff King
@ 2011-10-05 14:26 ` Joerg Rosenkranz
2011-10-06 16:06 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Joerg Rosenkranz @ 2011-10-05 14:26 UTC (permalink / raw)
To: git
> On Fri, May 27, 2011 at 02:00:45PM -0400, Jeff King wrote:
> 1. We load the index, and for each entry, insert it into the index's
> name_hash. In addition, if ignorecase is turned on, we make an
> entry in the name_hash for the directory (e.g., "contrib/"), which
> uses the following code from 5102c61's hash_index_entry_directories:
Sorry for reactivating this old thread.
We are running in this problem too. The behavior is the same in msysgit and on
Linux. Your patch resolves that problem for us.
Is there any chance to drive this patch forward?
Thanks,
Joerg
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git commit -a reports untracked files after a clone
2011-10-05 14:26 ` Joerg Rosenkranz
@ 2011-10-06 16:06 ` Jeff King
0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2011-10-06 16:06 UTC (permalink / raw)
To: Joerg Rosenkranz; +Cc: Junio C Hamano, git
On Wed, Oct 05, 2011 at 02:26:30PM +0000, Joerg Rosenkranz wrote:
> > On Fri, May 27, 2011 at 02:00:45PM -0400, Jeff King wrote:
> > 1. We load the index, and for each entry, insert it into the index's
> > name_hash. In addition, if ignorecase is turned on, we make an
> > entry in the name_hash for the directory (e.g., "contrib/"), which
> > uses the following code from 5102c61's hash_index_entry_directories:
>
> Sorry for reactivating this old thread.
> We are running in this problem too. The behavior is the same in msysgit and on
> Linux. Your patch resolves that problem for us.
>
> Is there any chance to drive this patch forward?
Thanks for prodding. I wrote a big analysis at the end of that thread,
but didn't get much response. I've been meaning to pick it up again.
So I spent a few minutes revisiting the topic today. I think it's
definitely a bug, and the fix is definitely correct. The patch is below,
with what I hope is a coherent analysis adapted from the previous
thread.
The only question is whether or not it's OK to add a few bytes to
"struct cache_entry" to cater to an uncommon case (see the comments at
the end of the commit message). Junio might have thoughts on that.
-- >8 --
Subject: [PATCH] fix phantom untracked files when core.ignorecase is set
When core.ignorecase is turned on and there are stale index
entries, "git commit" can sometimes report directories as
untracked, even though they contain tracked files.
You can see an example of this with:
# make a case-insensitive repo
git init repo && cd repo &&
git config core.ignorecase true &&
# with some tracked files in a subdir
mkdir subdir &&
> subdir/one &&
> subdir/two &&
git add . &&
git commit -m base &&
# now make the index entries stale
touch subdir/* &&
# and then ask commit to update those entries and show
# us the status template
git commit -a
which will report "subdir/" as untracked, even though it
clearly contains two tracked files. What is happening in the
commit program is this:
1. We load the index, and for each entry, insert it into the index's
name_hash. In addition, if ignorecase is turned on, we make an
entry in the name_hash for the directory (e.g., "contrib/"), which
uses the following code from 5102c61's hash_index_entry_directories:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
}
Note that we only add the directory entry if there is not already an
entry.
2. We run add_files_to_cache, which gets updated information for each
cache entry. It helpfully inserts this information into the cache,
which calls replace_index_entry. This in turn calls
remove_name_hash() on the old entry, and add_name_hash() on the new
one. But remove_name_hash doesn't actually remove from the hash, it
only marks it as "no longer interesting" (from cache.h):
/*
* We don't actually *remove* it, we can just mark it invalid so that
* we won't find it in lookups.
*
* Not only would we have to search the lists (simple enough), but
* we'd also have to rehash other hash buckets in case this makes the
* hash bucket empty (common). So it's much better to just mark
* it.
*/
static inline void remove_name_hash(struct cache_entry *ce)
{
ce->ce_flags |= CE_UNHASHED;
}
This is OK in the specific-file case, since the entries in the hash
form a linked list, and we can just skip the "not here anymore"
entries during lookup.
But for the directory hash entry, we will _not_ write a new entry,
because there is already one there: the old one that is actually no
longer interesting!
3. While traversing the directories, we end up in the
directory_exists_in_index_icase function to see if a directory is
interesting. This in turn checks index_name_exists, which will
look up the directory in the index's name_hash. We see the old,
deleted record, and assume there is nothing interesting. The
directory gets marked as untracked, even though there are index
entries in it.
The problem is in the code I showed above:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
}
Having a single cache entry that represents the directory is
not enough; that entry may go away if the index is changed.
It may be tempting to say that the problem is in our removal
method; if we removed the entry entirely instead of simply
marking it as "not here anymore", then we would know we need
to insert a new entry. But that only covers this particular
case of remove-replace. In the more general case, consider
something like this:
1. We add "foo/bar" and "foo/baz" to the index. Each gets
their own entry in name_hash, plus we make a "foo/"
entry that points to "foo/bar".
2. We remove the "foo/bar" entry from the index, and from
the name_hash.
3. We ask if "foo/" exists, and see no entry, even though
"foo/baz" exists.
So we need that directory entry to have the list of _all_
cache entries that indicate that the directory is tracked.
So that implies making a linked list as we do for other
entries, like:
hash = hash_name(ce->name, ptr - ce->name);
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
But that's not right either. In fact, it shows a second bug
in the current code, which is that the "ce->next" pointer is
supposed to be linking entries for a specific filename
entry, but here we are overwriting it for the directory
entry. So the same cache entry ends up in two linked lists,
but they share the same "next" pointer.
As it turns out, this second bug can't be triggered in the
current code. The "if (pos)" conditional is totally dead
code; pos will only be non-NULL if there was an existing
hash entry, and we already checked that there wasn't one
through our call to lookup_hash.
But fixing the first bug means taking out that call to
lookup_hash, which is going to activate the buggy dead code,
and we'll end up splicing the two linked lists together.
So we need to have a separate next pointer for the list in
the directory bucket, and we need to traverse that list in
index_name_exists when we are looking up a directory.
This bloats "struct cache_entry" by a few bytes. Which is
annoying, because it's only necessary when core.ignorecase
is enabled. There's not an easy way around it, short of
separating out the "next" pointers from cache_entry entirely
(i.e., having a separate "cache_entry_list" struct that gets
stored in the name_hash). In practice, it probably doesn't
matter; we have thousands of cache entries, compared to the
millions of objects (where adding 4 bytes to the struct
actually does impact performance).
Signed-off-by: Jeff King <peff@peff.net>
---
cache.h | 1 +
name-hash.c | 15 ++++++++-------
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/cache.h b/cache.h
index 607c2ea..1334fbf 100644
--- a/cache.h
+++ b/cache.h
@@ -168,6 +168,7 @@ struct cache_entry {
unsigned int ce_flags;
unsigned char sha1[20];
struct cache_entry *next;
+ struct cache_entry *dir_next;
char name[FLEX_ARRAY]; /* more */
};
diff --git a/name-hash.c b/name-hash.c
index c6b6a3f..225dd76 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -57,12 +57,10 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
if (*ptr == '/') {
++ptr;
hash = hash_name(ce->name, ptr - ce->name);
- if (!lookup_hash(hash, &istate->name_hash)) {
- pos = insert_hash(hash, ce, &istate->name_hash);
- if (pos) {
- ce->next = *pos;
- *pos = ce;
- }
+ pos = insert_hash(hash, ce, &istate->name_hash);
+ if (pos) {
+ ce->dir_next = *pos;
+ *pos = ce;
}
}
}
@@ -166,7 +164,10 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
if (same_name(ce, name, namelen, icase))
return ce;
}
- ce = ce->next;
+ if (icase && name[namelen - 1] == '/')
+ ce = ce->dir_next;
+ else
+ ce = ce->next;
}
/*
--
1.7.7.37.g0e376
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-10-06 16:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-15 0:46 git commit -a reports untracked files after a clone Philipp Metzler
2011-05-15 3:43 ` Junio C Hamano
2011-05-15 8:26 ` Philipp Metzler
2011-05-16 10:38 ` Jeff King
2011-05-16 10:49 ` Philipp Metzler
2011-05-16 12:08 ` Jeff King
2011-05-16 12:25 ` Philipp Metzler
2011-05-16 12:38 ` Philipp Metzler
2011-05-16 14:55 ` Jeff King
2011-05-27 18:00 ` Jeff King
2011-05-27 18:13 ` Jeff King
2011-10-05 14:26 ` Joerg Rosenkranz
2011-10-06 16:06 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).