* Bug report about symlinks @ 2014-07-31 19:50 Nikolay Avdeev 2014-07-31 22:04 ` René Scharfe 0 siblings, 1 reply; 10+ messages in thread From: Nikolay Avdeev @ 2014-07-31 19:50 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 683 bytes --] Greetings from Russia, comrads! I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. Yours sincerely, NickKolok aka Nikolay Avdeev. Доброго времени суток, товарищи! При замене папки на симлинк на другую папку с похожими, но не одинаковыми файлами git status ведёт себя странно. Прилагаю архив с репозиторием. В скрипте - пример и пояснения. С уважением, Николай Авдеев aka NickKolok. [-- Attachment #2: git-bug-demo.tar.bz2 --] [-- Type: application/x-bzip, Size: 10366 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-07-31 19:50 Bug report about symlinks Nikolay Avdeev @ 2014-07-31 22:04 ` René Scharfe 2014-08-01 16:23 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: René Scharfe @ 2014-07-31 22:04 UTC (permalink / raw) To: Nikolay Avdeev; +Cc: git Am 31.07.2014 um 21:50 schrieb Nikolay Avdeev: > I've noticed something strange with git status when replacing a folder with > symlink to another folder. > There is a git repo with script with demo in the attachment. Let's try and make this a bit easier for folks to follow along. # Create test repo with two directories with two files each. $ git init Initialized empty Git repository in /tmp/.git/ $ mkdir a b $ echo x >a/equal $ echo x >b/equal $ echo y >a/different $ echo z >b/different $ git add a b $ git commit -minitial [master (root-commit) 6d36895] initial 4 files changed, 4 insertions(+) create mode 100644 a/different create mode 100644 a/equal create mode 100644 b/different create mode 100644 b/equal # Replace one directory with a symlink to the other. $ rm -rf b $ ln -s a b # First git status call. $ git status On branch master Changes not staged for commit: (use "git add/rm <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) deleted: b/different Untracked files: (use "git add <file>..." to include in what will be committed) b no changes added to commit (use "git add" and/or "git commit -a") # Stage the changes. $ git add b # Second git status call. $ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: b deleted: b/different deleted: b/equal # Commit the staged changes. $ git commit -msymlinked [master 4498f28] symlinked 3 files changed, 1 insertion(+), 2 deletions(-) create mode 120000 b delete mode 100644 b/different delete mode 100644 b/equal That the first and second status call report different results is a feature; staging changes using git add alters the status. A commit after the first status call would be empty. It could be debated if the first git status call should also report b/equal as deleted. René ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-07-31 22:04 ` René Scharfe @ 2014-08-01 16:23 ` Junio C Hamano 2014-08-02 14:10 ` René Scharfe 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2014-08-01 16:23 UTC (permalink / raw) To: René Scharfe; +Cc: Nikolay Avdeev, git René Scharfe <l.s.r@web.de> writes: > # Create test repo with two directories with two files each. > $ git init > Initialized empty Git repository in /tmp/.git/ > $ mkdir a b > $ echo x >a/equal > $ echo x >b/equal > $ echo y >a/different > $ echo z >b/different > $ git add a b > $ git commit -minitial > [master (root-commit) 6d36895] initial > 4 files changed, 4 insertions(+) > create mode 100644 a/different > create mode 100644 a/equal > create mode 100644 b/different > create mode 100644 b/equal > > # Replace one directory with a symlink to the other. > $ rm -rf b > $ ln -s a b > > # First git status call. > $ git status > On branch master > Changes not staged for commit: > (use "git add/rm <file>..." to update what will be committed) > (use "git checkout -- <file>..." to discard changes in working directory) > > deleted: b/different > > Untracked files: > (use "git add <file>..." to include in what will be committed) > > b > > no changes added to commit (use "git add" and/or "git commit -a") > ... > > It could be debated if the first git status call should also report > b/equal as deleted. Hmph. I wonder if "could be" is an understatement. The difference of reporting between b/equal and b/different may indicate that somebody is mistakenly comparing the index with these paths, without first checking each path with has_symlink_leading_path(), or something, no? Or am I misreading the report and codepath? Puzzled... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-01 16:23 ` Junio C Hamano @ 2014-08-02 14:10 ` René Scharfe 2014-08-03 17:19 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: René Scharfe @ 2014-08-02 14:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nikolay Avdeev, git, Nguyễn Thái Ngọc Duy Am 01.08.2014 um 18:23 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> # Create test repo with two directories with two files each. >> $ git init >> Initialized empty Git repository in /tmp/.git/ >> $ mkdir a b >> $ echo x >a/equal >> $ echo x >b/equal >> $ echo y >a/different >> $ echo z >b/different >> $ git add a b >> $ git commit -minitial >> [master (root-commit) 6d36895] initial >> 4 files changed, 4 insertions(+) >> create mode 100644 a/different >> create mode 100644 a/equal >> create mode 100644 b/different >> create mode 100644 b/equal >> >> # Replace one directory with a symlink to the other. >> $ rm -rf b >> $ ln -s a b >> >> # First git status call. >> $ git status >> On branch master >> Changes not staged for commit: >> (use "git add/rm <file>..." to update what will be committed) >> (use "git checkout -- <file>..." to discard changes in working directory) >> >> deleted: b/different >> >> Untracked files: >> (use "git add <file>..." to include in what will be committed) >> >> b >> >> no changes added to commit (use "git add" and/or "git commit -a") >> ... >> >> It could be debated if the first git status call should also report >> b/equal as deleted. > > Hmph. > > I wonder if "could be" is an understatement. The difference of > reporting between b/equal and b/different may indicate that somebody > is mistakenly comparing the index with these paths, without first > checking each path with has_symlink_leading_path(), or something, > no? How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. After the patch, git status reports b/equal as deleted, too. The test suite still passes. And do we need to use the threaded_ variant of the function here? diff --git a/read-cache.c b/read-cache.c index 5d3c8bd..6f0057f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, return ce; } + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) { + if (ignore_missing) + return ce; + if (err) + *err = ENOENT; + return NULL; + } + if (lstat(ce->name, &st) < 0) { if (ignore_missing && errno == ENOENT) return ce; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-02 14:10 ` René Scharfe @ 2014-08-03 17:19 ` Junio C Hamano 2014-08-03 22:59 ` René Scharfe 2014-08-04 11:03 ` Bug report about symlinks Duy Nguyen 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2014-08-03 17:19 UTC (permalink / raw) To: René Scharfe Cc: Nikolay Avdeev, git, Nguyễn Thái Ngọc Duy René Scharfe <l.s.r@web.de> writes: > How about the patch below? Before it checks if an index entry exists > in the work tree, it checks if its path includes a symlink. Honestly, I didn't expect the fix to be in the refresh-index code path, but doing this there sort of makes sense. I think you, who dug to find out where to add the check, already know this, and I am writing this mainly for myself and for the list archive, but when the knee-jerk "has-syjmlink-leading-path missing?" reaction came to me, two obvious optimizations also came to my mind: - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c" is a symbolic link, we mark it as ~CE_VALID, but at the same time, we learn "a/b/c/any/thing" are also ~CE_VALID with that check, so we _could_, after the has_symlink_leading_path once returns true, so there may be an opportunity to fast-forward the scan, marking all paths under "a/b/c" as ~CE_VALID. - After finding out "a/b/c" is a symbolic link to cause anything underneath marked as ~CE_VALID, we also know "a/" and "a/b/" exists as real directories, so a later check for "a/b/some/thing" can start checking at "a/b/some/" without checking "a/" and "a/b/". The latter is more important as it is a much more common case that the shape of the working tree not to change. But I think the implementation of has_symlink_leading_path() already has these optimizations built inside around the (unfortunately named) "struct cache_def", so it probably would not give us much boost to implement such a fast-forwarding of the scan. > And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate->cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. Do you mean that we may want to make istate->cache[] scanned by multiple threads? I am not sure. > diff --git a/read-cache.c b/read-cache.c > index 5d3c8bd..6f0057f 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, > return ce; > } > > + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) { > + if (ignore_missing) > + return ce; > + if (err) > + *err = ENOENT; > + return NULL; > + } > + > if (lstat(ce->name, &st) < 0) { > if (ignore_missing && errno == ENOENT) > return ce; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-03 17:19 ` Junio C Hamano @ 2014-08-03 22:59 ` René Scharfe 2014-08-04 16:34 ` Junio C Hamano 2014-08-04 11:03 ` Bug report about symlinks Duy Nguyen 1 sibling, 1 reply; 10+ messages in thread From: René Scharfe @ 2014-08-03 22:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nikolay Avdeev, git, Nguyễn Thái Ngọc Duy Am 03.08.2014 um 19:19 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> How about the patch below? Before it checks if an index entry exists >> in the work tree, it checks if its path includes a symlink. > > Honestly, I didn't expect the fix to be in the refresh-index code > path, but doing this there sort of makes sense. I found it through observation, not understanding. Just looked for stat/lstat calls executed by git status for b/different and b/equal using strace; debugging printfs told me where they came from. >> And do we need to use the threaded_ variant of the function here? > > Hmmm, this is a tangent, but you comment made me wonder if we also > need to adjust preload_thread() in preload-index.c somehow, but we > do not touch CE_UPTODATE there, so it probably is not necessary. The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It calls threaded_has_symlink_leading_path() before lstat() already, however. (Since f62ce3de: Make index preloading check the whole path to the file.) > The caller of refresh_cache_ent() is walking an array of sorted > pathnames aka istate->cache[] in a single-threaded fashion, possibly > with a pathspec to limit the scan. There are two direct callers (refresh_index(), refresh_cache_entry()) and several indirect ones. Do we have a way to detect unsynchronized parallel access to the has_symlink_leading_path()-cache? Checking the full callers-of-callers tree manually looks a bit scary to me. > Do you mean that we may want to > make istate->cache[] scanned by multiple threads? I am not sure. No, I didn't want to suggest any performance improvements. I'm only interested in correctness for now. René ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-03 22:59 ` René Scharfe @ 2014-08-04 16:34 ` Junio C Hamano 2014-08-09 17:43 ` [PATCH] read-cache: check for leading symlinks when refreshing index René Scharfe 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2014-08-04 16:34 UTC (permalink / raw) To: René Scharfe Cc: Nikolay Avdeev, git, Nguyễn Thái Ngọc Duy René Scharfe <l.s.r@web.de> writes: > Am 03.08.2014 um 19:19 schrieb Junio C Hamano: > >>> And do we need to use the threaded_ variant of the function here? >> >> Hmmm, this is a tangent, but you comment made me wonder if we also >> need to adjust preload_thread() in preload-index.c somehow, but we >> do not touch CE_UPTODATE there, so it probably is not necessary. > > The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It > calls threaded_has_symlink_leading_path() before lstat() already, > however. (Since f62ce3de: Make index preloading check the whole path > to the file.) Yeah, by "we do not touch", I meant "for paths that is beyond a symlink, we do not touch" (i.e. we have that "continue" before lstat-match-then-mark sequence). >> The caller of refresh_cache_ent() is walking an array of sorted >> pathnames aka istate->cache[] in a single-threaded fashion, possibly >> with a pathspec to limit the scan. > > There are two direct callers (refresh_index(), refresh_cache_entry()) > and several indirect ones. Do we have a way to detect unsynchronized > parallel access to the has_symlink_leading_path()-cache? Checking the > full callers-of-callers tree manually looks a bit scary to me. The threaded variant is not used anybody outside preload-index, so currently we should be OK, I would think. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] read-cache: check for leading symlinks when refreshing index 2014-08-04 16:34 ` Junio C Hamano @ 2014-08-09 17:43 ` René Scharfe 0 siblings, 0 replies; 10+ messages in thread From: René Scharfe @ 2014-08-09 17:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nikolay Avdeev, git, Nguyễn Thái Ngọc Duy Don't add paths with leading symlinks to the index while refreshing; we only track those symlinks themselves. We already ignore them while preloading (see read_index_preload.c). Reported-by: Nikolay Avdeev <avdeev@math.vsu.ru> Signed-off-by: Rene Scharfe <l.s.r@web.de> --- read-cache.c | 8 ++++++++ t/t7515-status-symlinks.sh | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100755 t/t7515-status-symlinks.sh diff --git a/read-cache.c b/read-cache.c index 5d3c8bd..6f0057f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, return ce; } + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) { + if (ignore_missing) + return ce; + if (err) + *err = ENOENT; + return NULL; + } + if (lstat(ce->name, &st) < 0) { if (ignore_missing && errno == ENOENT) return ce; diff --git a/t/t7515-status-symlinks.sh b/t/t7515-status-symlinks.sh new file mode 100755 index 0000000..9f989be --- /dev/null +++ b/t/t7515-status-symlinks.sh @@ -0,0 +1,43 @@ +#!/bin/sh + +test_description='git status and symlinks' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo .gitignore >.gitignore && + echo actual >>.gitignore && + echo expect >>.gitignore && + mkdir dir && + echo x >dir/file1 && + echo y >dir/file2 && + git add dir && + git commit -m initial && + git tag initial +' + +test_expect_success SYMLINKS 'symlink to a directory' ' + test_when_finished "rm symlink" && + ln -s dir symlink && + echo "?? symlink" >expect && + git status --porcelain >actual && + test_cmp expect actual +' + +test_expect_success SYMLINKS 'symlink replacing a directory' ' + test_when_finished "rm -rf copy && git reset --hard initial" && + mkdir copy && + cp dir/file1 copy/file1 && + echo "changed in copy" >copy/file2 && + git add copy && + git commit -m second && + rm -rf copy && + ln -s dir copy && + echo " D copy/file1" >expect && + echo " D copy/file2" >>expect && + echo "?? copy" >>expect && + git status --porcelain >actual && + test_cmp expect actual +' + +test_done -- 2.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-03 17:19 ` Junio C Hamano 2014-08-03 22:59 ` René Scharfe @ 2014-08-04 11:03 ` Duy Nguyen 2014-08-04 16:36 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Duy Nguyen @ 2014-08-04 11:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, Nikolay Avdeev, Git Mailing List On Mon, Aug 4, 2014 at 12:19 AM, Junio C Hamano <gitster@pobox.com> wrote: > I think you, who dug to find out where to add the check, already > know this, and I am writing this mainly for myself and for the list > archive, but when the knee-jerk "has-syjmlink-leading-path missing?" > reaction came to me, two obvious optimizations also came to my mind: > > - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c" > is a symbolic link, we mark it as ~CE_VALID, but at the same > time, we learn "a/b/c/any/thing" are also ~CE_VALID with that > check, so we _could_, after the has_symlink_leading_path once > returns true, so there may be an opportunity to fast-forward the > scan, marking all paths under "a/b/c" as ~CE_VALID. > > - After finding out "a/b/c" is a symbolic link to cause anything > underneath marked as ~CE_VALID, we also know "a/" and "a/b/" > exists as real directories, so a later check for "a/b/some/thing" > can start checking at "a/b/some/" without checking "a/" and "a/b/". Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is only used with --assume-unchanged > The latter is more important as it is a much more common case that > the shape of the working tree not to change. > > But I think the implementation of has_symlink_leading_path() already > has these optimizations built inside around the (unfortunately named) > "struct cache_def", so it probably would not give us much boost to > implement such a fast-forwarding of the scan. Yeah my first thought is another flood of lstat(). But it looks like has_symlink_leading_path() tries hard to reduce lstat() already. We could disable this call in filesytems that do not support symlinks (e.g. vfat) but I guess that's just a minority. -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug report about symlinks 2014-08-04 11:03 ` Bug report about symlinks Duy Nguyen @ 2014-08-04 16:36 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2014-08-04 16:36 UTC (permalink / raw) To: Duy Nguyen; +Cc: René Scharfe, Nikolay Avdeev, Git Mailing List Duy Nguyen <pclouds@gmail.com> writes: > Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is > only used with --assume-unchanged Yup. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-08-09 17:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-31 19:50 Bug report about symlinks Nikolay Avdeev 2014-07-31 22:04 ` René Scharfe 2014-08-01 16:23 ` Junio C Hamano 2014-08-02 14:10 ` René Scharfe 2014-08-03 17:19 ` Junio C Hamano 2014-08-03 22:59 ` René Scharfe 2014-08-04 16:34 ` Junio C Hamano 2014-08-09 17:43 ` [PATCH] read-cache: check for leading symlinks when refreshing index René Scharfe 2014-08-04 11:03 ` Bug report about symlinks Duy Nguyen 2014-08-04 16:36 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).