* Behavior of git rm @ 2013-04-03 14:50 jpinheiro 2013-04-03 15:58 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: jpinheiro @ 2013-04-03 14:50 UTC (permalink / raw) To: git Hi all, We are students from Universidade do Minho in Portugal, and we are using git in project as a case study. While experimenting with git we found an unexpected behavior with git rm. Here is a trace of the unexpected behavior: $ git init $ mkdir D $ echo "Hi" > D/F $ git add D/F $ rm -r D $ echo "Hey" > D $ git rm D/F warning: 'D/F': Not a directory rm 'D/F' fatal: git rm: 'D/F': Not a directory If the file D created with last echo did not exist or was named differently then no error would occur as expected. For example: $ git init $ mkdir D $ echo "Hi" > D/F $ git add D/F $ rm -r D $ echo "Hey" > F $ git rm D/F This works as expected, and the only difference is the name of the file of the last echo. Is this the expected behavior of git rm? -- View this message in context: http://git.661346.n2.nabble.com/Behavior-of-git-rm-tp7581485.html Sent from the git mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Behavior of git rm 2013-04-03 14:50 Behavior of git rm jpinheiro @ 2013-04-03 15:58 ` Jeff King 2013-04-03 17:35 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-03 15:58 UTC (permalink / raw) To: jpinheiro; +Cc: git On Wed, Apr 03, 2013 at 07:50:24AM -0700, jpinheiro wrote: > While experimenting with git we found an unexpected behavior with git rm. > Here is a trace of the unexpected behavior: > > $ git init > $ mkdir D > $ echo "Hi" > D/F > $ git add D/F > $ rm -r D > $ echo "Hey" > D > $ git rm D/F > warning: 'D/F': Not a directory > rm 'D/F' > fatal: git rm: 'D/F': Not a directory We drop the D/F entry from the index, but then fail to actually remove it from the filesystem, because it has already been replaced. It is impossible to tell from this toy example what the true intent was, but in such a situation, there is a reasonable chance that the user should have invoked "rm --cached" in the first place. That being said, we do try to handle files which have already gone missing; when unlink() fails, we do not consider it an error if we got ENOENT. We could perhaps add ENOTDIR to that list, as it also indicates that the file is gone (it just happens that one of its prefix directories was replaced with something else). The opposite case is also interesting: $ git init $ echo 1 >D $ git add D $ rm D $ mkdir D $ echo 2 >D/F $ git rm D rm 'D' fatal: git rm: 'D': Is a directory We expect to see 'D' as a file, but it is now a directory. We _could_ recursively remove the directory, but that has the potential to delete files that the user does not expect. So in both cases, "git rm" could certainly detect the situation and proceed with the destructive operation. But when there is such a conflict between what's in the working tree and what's in the index, I think we may be better off erring on the conservative side and bailing, and letting the user reconcile the differences themselves (using either "git add" or "git rm --cached" to update the index, or deciding how to handle the working tree contents themselves with regular "rm"). Of the two situations, I think the first one is less likely to be destructive (noticing that a file is already gone via ENOTDIR), as we are only proceeding with the index deletion, and we end up not touching the filesystem at all. That patch would look something like: diff --git a/builtin/rm.c b/builtin/rm.c index dabfcf6..7b91d52 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only) ce = active_cache[pos]; if (lstat(ce->name, &st) < 0) { - if (errno != ENOENT) + if (errno != ENOENT && errno != ENOTDIR) warning("'%s': %s", ce->name, strerror(errno)); /* It already vanished from the working tree */ continue; diff --git a/dir.c b/dir.c index 57394e4..f9e7355 100644 --- a/dir.c +++ b/dir.c @@ -1603,7 +1603,7 @@ int remove_path(const char *name) { char *slash; - if (unlink(name) && errno != ENOENT) + if (unlink(name) && errno != ENOENT && errno != ENOTDIR) return -1; slash = strrchr(name, '/'); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Behavior of git rm 2013-04-03 15:58 ` Jeff King @ 2013-04-03 17:35 ` Junio C Hamano 2013-04-03 20:36 ` Jeff King 2013-04-04 19:02 ` Jeff King 0 siblings, 2 replies; 18+ messages in thread From: Junio C Hamano @ 2013-04-03 17:35 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: > Of the two situations, I think the first one is less likely to be > destructive (noticing that a file is already gone via ENOTDIR), as we > are only proceeding with the index deletion, and we end up not touching > the filesystem at all. Nice to see sound reasoning. > > diff --git a/builtin/rm.c b/builtin/rm.c > index dabfcf6..7b91d52 100644 > --- a/builtin/rm.c > +++ b/builtin/rm.c > @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only) > ce = active_cache[pos]; > > if (lstat(ce->name, &st) < 0) { > - if (errno != ENOENT) > + if (errno != ENOENT && errno != ENOTDIR) OK. We may be running lstat() on D/F but there may be D that is not a directory. If it is a file, we get ENOTDIR. By the way, if D is a dangling symlink, we get ENOENT; in such a case, we report "rm 'D/F'" on the output and remove the index entry. $ rm -f .git/index && rm -fr D E $ mkdir D && >D/F && git add D && rm -fr D $ ln -s erewhon D && git rm D/F && git ls-files rm 'D/F' Also if D is a symlink that point at a directory E, "git rm" does something interesting. (1) Perhaps we want a complaint in this case. $ rm -f .git/index && rm -fr D E $ mkdir D && >D/F && git add D && rm -fr D $ mkdir E && ln -s E D && git rm D/F (2) Perhaps we want to make sure D/F is not beyond a symlink in this case. $ rm -f .git/index && rm -fr D E $ mkdir D && >D/F && git add D && rm -fr D $ mkdir E && ln -s E D && date >E/F && git rm D/F $ git rm -f D/F > diff --git a/dir.c b/dir.c > index 57394e4..f9e7355 100644 > --- a/dir.c > +++ b/dir.c > @@ -1603,7 +1603,7 @@ int remove_path(const char *name) > { > char *slash; > > - if (unlink(name) && errno != ENOENT) > + if (unlink(name) && errno != ENOENT && errno != ENOTDIR) > return -1; Ditto. > > slash = strrchr(name, '/'); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Behavior of git rm 2013-04-03 17:35 ` Junio C Hamano @ 2013-04-03 20:36 ` Jeff King 2013-04-04 19:02 ` Jeff King 1 sibling, 0 replies; 18+ messages in thread From: Jeff King @ 2013-04-03 20:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote: > > diff --git a/builtin/rm.c b/builtin/rm.c > > index dabfcf6..7b91d52 100644 > > --- a/builtin/rm.c > > +++ b/builtin/rm.c > > @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only) > > ce = active_cache[pos]; > > > > if (lstat(ce->name, &st) < 0) { > > - if (errno != ENOENT) > > + if (errno != ENOENT && errno != ENOTDIR) > > OK. We may be running lstat() on D/F but there may be D that is not > a directory. If it is a file, we get ENOTDIR. > > By the way, if D is a dangling symlink, we get ENOENT; in such a > case, we report "rm 'D/F'" on the output and remove the index entry. > > $ rm -f .git/index && rm -fr D E > $ mkdir D && >D/F && git add D && rm -fr D > $ ln -s erewhon D && git rm D/F && git ls-files > rm 'D/F' That seems sane to me, and makes me feel like handling ENOTDIR here is the right direction. What that conditional is trying to say is "if it is because the file is not there...", and so far we know of three conditions where it is not there: 1. There is no entry at that path. 2. There is a non-directory in the prefix of that path. 3. There is a dangling symlink in the prefix of that path. (1) and (3) we already handle via ENOENT. I think it is sane to handle (2) the same as (3), but we do not do so currently. > Also if D is a symlink that point at a directory E, "git rm" does > something interesting. > > (1) Perhaps we want a complaint in this case. > > $ rm -f .git/index && rm -fr D E > $ mkdir D && >D/F && git add D && rm -fr D > $ mkdir E && ln -s E D && git rm D/F I think that is OK without complaint; the user asked to get rid of D/F, and it is indeed gone (as well as its index entry) after the call finishes. And we did not even need to delete anything, so we cannot be losing data. I am much more concerned about this case: > (2) Perhaps we want to make sure D/F is not beyond a symlink in this > case. > > $ rm -f .git/index && rm -fr D E > $ mkdir D && >D/F && git add D && rm -fr D > $ mkdir E && ln -s E D && date >E/F && git rm D/F where the user is deleting something that may or may not be related to the original D/F. On the other hand, I don't have that much sympathy; "rm" would make the same deletion. But hmm...shouldn't we be doing an up-to-date check? Indeed: $ git rm D/F error: 'D/F' has staged content different from both the file and the HEAD (use -f to force removal) $ git commit -m foo && git rm D/F $ git rm D/F error: 'D/F' has local modifications (use --cached to keep the file, or -f to force removal) So I do not think we need any extra safety; the content-level checks should be enough to make sure we are not losing anything. -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Behavior of git rm 2013-04-03 17:35 ` Junio C Hamano 2013-04-03 20:36 ` Jeff King @ 2013-04-04 19:02 ` Jeff King 2013-04-04 19:03 ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Jeff King @ 2013-04-04 19:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Of the two situations, I think the first one is less likely to be > > destructive (noticing that a file is already gone via ENOTDIR), as we > > are only proceeding with the index deletion, and we end up not touching > > the filesystem at all. > > Nice to see sound reasoning. Here's a patch series which I think covers what we've discussed. [1/3]: rm: do not complain about d/f conflicts during deletion [2/3]: t3600: test behavior of reverse-d/f conflict [3/3]: t3600: test rm of path with changed leading symlinks The first one is the code change, and the rest just documents the cases we discussed. The third one is a little subtle. For the most part is it just testing the normal "changed content requires --force" behavior of rm. But I think it is worth having because it also makes sure that after deleting "d/f" when "d" is a symlink to "e", that we do not remove the new directory "e" nor the symlink "d". I do not think this case was explicitly planned for, but it does do the right thing now, and given the subtlety, I'd rather somebody who changes it notice the breakage in the test suite. -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] rm: do not complain about d/f conflicts during deletion 2013-04-04 19:02 ` Jeff King @ 2013-04-04 19:03 ` Jeff King 2013-04-04 19:03 ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King 2013-04-04 19:06 ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King 2 siblings, 0 replies; 18+ messages in thread From: Jeff King @ 2013-04-04 19:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git If we used to have an index entry "d/f", but "d" has been replaced by a non-directory entry, the user may still want to run "git rm" to delete the stale index entry. They could use "git rm --cached" to just touch the index, but "git rm" should also work: we explicitly try to handle the case that the file has already been removed from the working tree. However, because unlinking "d/f" in this case will not yield ENOENT, but rather ENOTDIR, we do not notice that the file is already gone. Instead, we report it as an error. The simple solution is to treat ENOTDIR in this case exactly like ENOENT; all we want to know is whether the file is already gone, and if a leading path is no longer a directory, then by definition the sub-path is gone. Reported-by: jpinheiro <7jpinheiro@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- builtin/rm.c | 2 +- dir.c | 2 +- t/t3600-rm.sh | 25 +++++++++++++++++++++++++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/builtin/rm.c b/builtin/rm.c index dabfcf6..7b91d52 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only) ce = active_cache[pos]; if (lstat(ce->name, &st) < 0) { - if (errno != ENOENT) + if (errno != ENOENT && errno != ENOTDIR) warning("'%s': %s", ce->name, strerror(errno)); /* It already vanished from the working tree */ continue; diff --git a/dir.c b/dir.c index 57394e4..f9e7355 100644 --- a/dir.c +++ b/dir.c @@ -1603,7 +1603,7 @@ int remove_path(const char *name) { char *slash; - if (unlink(name) && errno != ENOENT) + if (unlink(name) && errno != ENOENT && errno != ENOTDIR) return -1; slash = strrchr(name, '/'); diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index 37bf5f1..73772b2 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -622,4 +622,29 @@ test_expect_success 'rm of a populated nested submodule with a nested .git direc rm -rf submod ' +test_expect_success 'rm of d/f when d has become a non-directory' ' + rm -rf d && + mkdir d && + >d/f && + git add d && + rm -rf d && + >d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test_path_is_file d +' + +test_expect_success SYMLINKS 'rm of d/f when d has become a dangling symlink' ' + rm -rf d && + mkdir d && + >d/f && + git add d && + rm -rf d && + ln -s nonexistent d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test -h d && + test_path_is_missing d +' + test_done -- 1.8.2.rc0.33.gd915649 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/3] t3600: test behavior of reverse-d/f conflict 2013-04-04 19:02 ` Jeff King 2013-04-04 19:03 ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King @ 2013-04-04 19:03 ` Jeff King 2013-04-04 19:06 ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King 2 siblings, 0 replies; 18+ messages in thread From: Jeff King @ 2013-04-04 19:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git The previous commit taught "rm" that it is safe to consider "d/f" removed when "d" has become a non-directory. This patch adds a test for the opposite: a file "d" that becomes a directory. In this case, "git rm" does need to complain, because we should not be removing arbitrary content under "d". Git already behaves correctly, but let's make sure that remains the case by protecting the behavior with a test. Signed-off-by: Jeff King <peff@peff.net> --- t/t3600-rm.sh | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index 73772b2..a2e1a03 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -647,4 +647,16 @@ test_expect_success SYMLINKS 'rm of d/f when d has become a dangling symlink' ' test_path_is_missing d ' +test_expect_success 'rm of file when it has become a directory' ' + rm -rf d && + >d && + git add d && + rm -f d && + mkdir d && + >d/f && + test_must_fail git rm d && + git rev-parse --verify :d && + test_path_is_file d/f +' + test_done -- 1.8.2.rc0.33.gd915649 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 19:02 ` Jeff King 2013-04-04 19:03 ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King 2013-04-04 19:03 ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King @ 2013-04-04 19:06 ` Jeff King 2013-04-04 19:42 ` Junio C Hamano 2 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-04 19:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git If we have a path "d/f" but replace "d" with a symlink to a new directory "e", how should we handle "git rm d/f"? It may seem at first like we need new protections to make sure that we do not delete random content from "e/f". However, we are already covered by git-rm's existing protections: it is happy if the working tree file is either already deleted, or if its content matches that of the index and HEAD (and otherwise requires "-f"). Let's add some tests to make sure that these protections remain in place when used across symlinks. We also want to make sure that neither the symlink nor the pointed-to directory is accidentally removed in an attempt to clean up empty elements of the leading path. Signed-off-by: Jeff King <peff@peff.net> --- t/t3600-rm.sh | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index a2e1a03..9eaec08 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -659,4 +659,47 @@ test_expect_success 'rm of file when it has become a directory' ' test_path_is_file d/f ' +test_expect_success 'set up commit with d/f' ' + rm -rf d e && + mkdir d && + echo content >d/f && + git add d && + git commit -m d +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (file missing)' ' + git reset --hard && + rm -rf d e && + mkdir e && + ln -s e d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test -h d && + test_path_is_dir e +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' + git reset --hard && + rm -rf d e && + mkdir e && + echo content >e/f && + ln -s e d && + git rm d/f && + test_must_fail git rev-parse --verify :d/f && + test -h d && + test_path_is_dir e +' + +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' ' + git reset --hard && + rm -rf d e && + mkdir e && + echo changed >e/f && + ln -s e d && + test_must_fail git rm d/f && + git rev-parse --verify :d/f && + test -h d && + test_path_is_file e/f +' + test_done -- 1.8.2.rc0.33.gd915649 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 19:06 ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King @ 2013-04-04 19:42 ` Junio C Hamano 2013-04-04 19:55 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-04-04 19:42 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' > + git reset --hard && > + rm -rf d e && > + mkdir e && > + echo content >e/f && > + ln -s e d && > + git rm d/f && > + test_must_fail git rev-parse --verify :d/f && > + test -h d && > + test_path_is_dir e > +' This does not check if e/f still exists in the working tree, and I suspect "git rm d/f" removes it. If you do this: rm -fr d e mkdir e >e/f ln -s e d git add d/f we do complain that d/f is beyond a symlink (meaning that all you can add is the symlink d that may happen to point at something). Silent removal of e/f that is unrelated to the current project's tracked contents feels very wrong, and at the same time it looks to me that it is inconsistent with what we do when adding. I need a bit more persuading to understand why it is not a bug, I think. > +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' ' > + git reset --hard && > + rm -rf d e && > + mkdir e && > + echo changed >e/f && > + ln -s e d && > + test_must_fail git rm d/f && > + git rev-parse --verify :d/f && > + test -h d && > + test_path_is_file e/f > +' > + > test_done ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 19:42 ` Junio C Hamano @ 2013-04-04 19:55 ` Jeff King 2013-04-04 20:31 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-04 19:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Thu, Apr 04, 2013 at 12:42:54PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' > > + git reset --hard && > > + rm -rf d e && > > + mkdir e && > > + echo content >e/f && > > + ln -s e d && > > + git rm d/f && > > + test_must_fail git rev-parse --verify :d/f && > > + test -h d && > > + test_path_is_dir e > > +' > > This does not check if e/f still exists in the working tree, and I > suspect "git rm d/f" removes it. I guess I should have been more clear in my test; I think it _should_ be removed (and it is). You do not necessarily care that "d" is now the symlink and not the actual path; it is safe to remove d/f even though it is behind a symlink now, because it has the exact same content that it had before (it is of course important that we still remove the actual d/f index entry, but as far as the working tree goes, we only care that it is safe to remove, and that we remove it). IOW, I should have been more explicit like this: diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index 9eaec08..3b51a63 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -687,7 +687,8 @@ test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' ' git rm d/f && test_must_fail git rev-parse --verify :d/f && test -h d && - test_path_is_dir e + test_path_is_dir e && + test_path_is_missing e/f ' test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' ' > If you do this: > > rm -fr d e > mkdir e > >e/f > ln -s e d > git add d/f > > we do complain that d/f is beyond a symlink (meaning that all you > can add is the symlink d that may happen to point at something). Right, but that is because you are adding a bogus entry to the index; we cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. But in the removal case, the index manipulation is perfectly reasonable. You are deleting the existing "d/f" entry. The only confusion comes from the fact that the working tree does not match that anymore. > Silent removal of e/f that is unrelated to the current project's > tracked contents feels very wrong, and at the same time it looks to > me that it is inconsistent with what we do when adding. > > I need a bit more persuading to understand why it is not a bug, I > think. But that's the point of the two content tests. It _isn't_ unrelated to the current project's tracked contents; it's the exact same content at the same path (albeit accessed via symlinks now). The likely case for this is something like: mv dir somewhere/else ln -s somewhere/else/dir dir I do not mind if you want to insert extra protection to not cross symlink boundaries (which would obviously invalidate my test). But I don't think it is necessary because of the existing content-level protections. Adding extra protections would disallow "git rm dir/file" in the above case, but I don't think it's that inconvenient; the user just has to make the index aware of the typechange first via "git add". -Peff ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 19:55 ` Jeff King @ 2013-04-04 20:31 ` Junio C Hamano 2013-04-04 21:03 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-04-04 20:31 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: >> If you do this: >> >> rm -fr d e >> mkdir e >> >e/f >> ln -s e d >> git add d/f >> >> we do complain that d/f is beyond a symlink (meaning that all you >> can add is the symlink d that may happen to point at something). > > Right, but that is because you are adding a bogus entry to the index; we > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. > But in the removal case, the index manipulation is perfectly reasonable. I think you misread me. I am not adding 'd' as a symlink at all. IIRC, ancient versions of Git got this case wrong and added d/f to the index, which we later fixed. I have been hinting that we should do the same safety not to touch (even compare the contents of) e/f, because the only reason we even look at it is because it appears beyond a symbolic link 'd'. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 20:31 ` Junio C Hamano @ 2013-04-04 21:03 ` Jeff King 2013-04-04 23:12 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-04 21:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Thu, Apr 04, 2013 at 01:31:43PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> If you do this: > >> > >> rm -fr d e > >> mkdir e > >> >e/f > >> ln -s e d > >> git add d/f > >> > >> we do complain that d/f is beyond a symlink (meaning that all you > >> can add is the symlink d that may happen to point at something). > > > > Right, but that is because you are adding a bogus entry to the index; we > > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree. > > But in the removal case, the index manipulation is perfectly reasonable. > > I think you misread me. I am not adding 'd' as a symlink at all. > IIRC, ancient versions of Git got this case wrong and added d/f to > the index, which we later fixed. I think I just spoke sloppily. What is bogus about "d/f" is not that "d" is necessarily in the index right now, but that adding "d/f" implies that "d" is a tree, which it clearly is not. Git maps filesystem symlinks into the index and into its trees without dereferencing them. So whether we have "d" in the index right now or not, "d/f" is wrong conceptually. But I do not think the "we are mis-representing the filesystem" problem applies to this "rm" case. We are not adding anything bogus into the index; on the contrary, we are deleting something that no longer matches the filesystem representation (and is actually the same bogosity that we avoid adding under the rule above). I do agree that it violates git's general behavior with symlinks (i.e., that they are not dereferenced). > I have been hinting that we should do the same safety not to touch > (even compare the contents of) e/f, because the only reason we even > look at it is because it appears beyond a symbolic link 'd'. I can certainly see the safety argument that crossing a symlink at "d" means the resulting "d/f" is not necessarily related to the original "d/f" that is in the index. As I said, I do not mind having the extra protection; my argument was only that the content-check already protects us, so the extra protection is not necessary. And the implication is that I do not feel like working on it. :) I do not mind at all if you drop my third patch (and that is part of the reason I split it out from patch 2, which I do think is a no-brainer), and I am happy to review patches to do the symlink check if you want to work on it. Having made the argument that the content-check is enough, though, I think there is an interesting corner case where it might not be. I don't mind "git rm d/f" deleting "e/f" inside the repository when "d" is a symlink to "e". But what would happen with: rm -rf d ln -s /path/outside/repo d git rm d/f Deleting across symlinks inside the repo can be brushed aside with "eh, well, it is just another way to mention the same name in the filesystem". But deleting anything outside of the repo seems actively wrong. And more on that "brushed aside". I think it is easy in the cases we have been talking about, namely where "d/f" still exists in the index, to think that "git rm d/f" is useful and the question is one of safety: should we delete e/f if it is pointed to? But let us imagine that d/f is _not_ in the index, but "d" is a symlink pointing to some/long/path". The user wants to be lazy and say "git rm d/f", because typing "some/long/path" is too much work. But what happens to the index? We should probably not be removing "some/long/path". Hmm. I think you have convinced me (or perhaps I have convinced myself) that we should generally not be crossing symlink boundaries in translating names between the filesystem and index. I still don't want to work on it, though. :) -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 21:03 ` Jeff King @ 2013-04-04 23:12 ` Junio C Hamano 2013-04-04 23:29 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-04-04 23:12 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: > Deleting across symlinks inside the repo can be brushed aside with "eh, > well, it is just another way to mention the same name in the > filesystem". But deleting anything outside of the repo seems actively > wrong. Yup, you finally caught up ;-) IIRC, such an outside repository target was the case people realized that "git add" shouldn't see across symlinks. > Hmm. I think you have convinced me (or perhaps I have convinced myself) > that we should generally not be crossing symlink boundaries in > translating names between the filesystem and index. I still don't want > to work on it, though. :) That is OK. Just let's not etch a wrong behaviour in stone with that test. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 23:12 ` Junio C Hamano @ 2013-04-04 23:29 ` Jeff King 2013-04-04 23:33 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-04 23:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Thu, Apr 04, 2013 at 04:12:01PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Deleting across symlinks inside the repo can be brushed aside with "eh, > > well, it is just another way to mention the same name in the > > filesystem". But deleting anything outside of the repo seems actively > > wrong. > > Yup, you finally caught up ;-) IIRC, such an outside repository > target was the case people realized that "git add" shouldn't see > across symlinks. It would help if you spelled it out rather than making me come to it while arguing against you. ;P > > Hmm. I think you have convinced me (or perhaps I have convinced myself) > > that we should generally not be crossing symlink boundaries in > > translating names between the filesystem and index. I still don't want > > to work on it, though. :) > > That is OK. Just let's not etch a wrong behaviour in stone with > that test. So let's drop patch 3. Do we want instead to have an expect_failure documenting the correct behavior? -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 23:29 ` Jeff King @ 2013-04-04 23:33 ` Junio C Hamano 2013-04-05 0:00 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-04-04 23:33 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: > So let's drop patch 3. Do we want instead to have an expect_failure > documenting the correct behavior? I think that is very much preferred. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-04 23:33 ` Junio C Hamano @ 2013-04-05 0:00 ` Jeff King 2013-04-05 4:59 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2013-04-05 0:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Thu, Apr 04, 2013 at 04:33:39PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So let's drop patch 3. Do we want instead to have an expect_failure > > documenting the correct behavior? > > I think that is very much preferred. Here's a replacement for patch 3, then. I wasn't sure if the editorializing in the last 2 paragraphs should go in the commit message or the cover letter; feel free to tweak as you see fit. -- >8 -- Subject: [PATCH] t3600: document failure of rm across symbolic links If we have a symlink "d" that points to a directory, we should not be able to remove "d/f". In the normal case, where "d/f" does not exist in the index, we already disallow this, as we only remove things that git knows about in the index. So for something like: ln -s /outside/repo foo git add foo git rm foo/bar we will properly produce an error (as there is no index entry for foo/bar). However, if there is an index entry for the path (e.g., because the movement is due to working tree changes that have not yet been reflected in the index), we will happily delete it, even though the path we delete from the filesystem is not the same as the path in the index. This patch documents that failure with a test. While this is a bug, it should not be possible to cause serious data loss with it. For any path that does not have an index entry, we will complain and bail. For a path which does have an index entry, we will do the usual up-to-date content check. So even if the deleted path in the filesystem is not the same as the one we are removing from the index, we do know that they at least have the same content, and that the content is included in HEAD. That means the worst case is not the accidental loss of content, but rather confusion by the user when a copy of a file another part of the tree is removed. Which makes this bug a minor and hard-to-trigger annoyance rather than a data-loss bug (and hence the fix can be saved for a rainy day when somebody feels like working on it). Signed-off-by: Jeff King <peff@peff.net> --- t/t3600-rm.sh | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index a2e1a03..0c44e9f 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -659,4 +659,32 @@ test_expect_success 'rm of file when it has become a directory' ' test_path_is_file d/f ' +test_expect_success SYMLINKS 'rm across a symlinked leading path (no index)' ' + rm -rf d e && + mkdir e && + echo content >e/f && + ln -s e d && + git add -A e d && + git commit -m "symlink d to e, e/f exists" && + test_must_fail git rm d/f && + git rev-parse --verify :d && + git rev-parse --verify :e/f && + test -h d && + test_path_is_file e/f +' + +test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' ' + rm -rf d e && + mkdir d && + echo content >d/f && + git add -A e d && + git commit -m "d/f exists" && + mv d e && + ln -s e d && + test_must_fail git rm d/f && + git rev-parse --verify :d/f && + test -h d && + test_path_is_file e/f +' + test_done -- 1.8.2.rc0.33.gd915649 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-05 0:00 ` Jeff King @ 2013-04-05 4:59 ` Junio C Hamano 2013-04-05 5:04 ` Jeff King 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-04-05 4:59 UTC (permalink / raw) To: Jeff King; +Cc: jpinheiro, git Jeff King <peff@peff.net> writes: > Here's a replacement for patch 3, then. I wasn't sure if the > editorializing in the last 2 paragraphs should go in the commit message > or the cover letter; feel free to tweak as you see fit. They look fine as they are. > That means the worst case is not the accidental loss of content, > but rather confusion by the user when a copy of a file another > part of the tree is removed. A copy of a file that is on the filesystem that may not be related to the project at all may be lost, and the user may not notice the lossage for quite a while. A symlink that points at /etc/passwd may cause the file to be removed and the user will hopefully notice, but if the pointed-at file is something in $HOME/tmp/ that you occasionally use, you may not notice the lossage immediately, and when you notice the loss, the only assurance you have is that there is a blob that records what was lost _somewhere_ in _some_ of your project that had a symlink that points at $HOME/tmp/ at some point in the past. "Exists somewhere, not lost" is not a very useful assurance, if you ask me ;-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks 2013-04-05 4:59 ` Junio C Hamano @ 2013-04-05 5:04 ` Jeff King 0 siblings, 0 replies; 18+ messages in thread From: Jeff King @ 2013-04-05 5:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: jpinheiro, git On Thu, Apr 04, 2013 at 09:59:49PM -0700, Junio C Hamano wrote: > > That means the worst case is not the accidental loss of content, > > but rather confusion by the user when a copy of a file another > > part of the tree is removed. > > A copy of a file that is on the filesystem that may not be related > to the project at all may be lost, and the user may not notice the > lossage for quite a while. A symlink that points at /etc/passwd may > cause the file to be removed and the user will hopefully notice, but > if the pointed-at file is something in $HOME/tmp/ that you occasionally > use, you may not notice the lossage immediately, and when you notice > the loss, the only assurance you have is that there is a blob that > records what was lost _somewhere_ in _some_ of your project that had > a symlink that points at $HOME/tmp/ at some point in the past. It's actually quite hard to lose those files. We will only remove the file if it has a matching index entry. So you cannot do: ln -s /etc foo git rm foo/passwd because there is no index entry for foo/passwd. You would have to do: mkdir foo echo content >foo/passwd git add foo/passwd rm -rf foo ln -s /etc foo git rm foo/passwd and then you only lose it if it matches exactly "content". And recovering it, you know that the original path that held the content was called "passwd". So yes, technically you could lose a file outside of the repo and have trouble finding which path it came from later. But in practice, not really. Anyway, it is academic at this point. :) -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-04-05 5:05 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-03 14:50 Behavior of git rm jpinheiro 2013-04-03 15:58 ` Jeff King 2013-04-03 17:35 ` Junio C Hamano 2013-04-03 20:36 ` Jeff King 2013-04-04 19:02 ` Jeff King 2013-04-04 19:03 ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King 2013-04-04 19:03 ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King 2013-04-04 19:06 ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King 2013-04-04 19:42 ` Junio C Hamano 2013-04-04 19:55 ` Jeff King 2013-04-04 20:31 ` Junio C Hamano 2013-04-04 21:03 ` Jeff King 2013-04-04 23:12 ` Junio C Hamano 2013-04-04 23:29 ` Jeff King 2013-04-04 23:33 ` Junio C Hamano 2013-04-05 0:00 ` Jeff King 2013-04-05 4:59 ` Junio C Hamano 2013-04-05 5:04 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).