* [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
@ 2015-09-28 16:12 Johannes Schindelin
2015-12-29 14:47 ` Michael Haggerty
0 siblings, 1 reply; 4+ messages in thread
From: Johannes Schindelin @ 2015-09-28 16:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Git has a config variable to indicate that it is operating on a file
system that is case-insensitive: core.ignoreCase. But the
`dir_inside_of()` function did not respect that. As a result, if Git's
idea of the current working directory disagreed in its upper/lower case
with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
user would be greeted by the error message
fatal: git-am cannot be used without a working tree.
when trying to run a rebase.
This fixes https://github.com/git-for-windows/git/issues/402 (reported by
Daniel Harding).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
dir.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/dir.c b/dir.c
index b90484a..fba938b 100644
--- a/dir.c
+++ b/dir.c
@@ -2107,6 +2107,15 @@ int file_exists(const char *f)
return lstat(f, &sb) == 0;
}
+static int cmp_icase(char a, char b)
+{
+ if (a == b)
+ return 0;
+ if (ignore_case)
+ return toupper(a) - toupper(b);
+ return a - b;
+}
+
/*
* Given two normalized paths (a trailing slash is ok), if subdir is
* outside dir, return -1. Otherwise return the offset in subdir that
@@ -2118,7 +2127,7 @@ int dir_inside_of(const char *subdir, const char *dir)
assert(dir && subdir && *dir && *subdir);
- while (*dir && *subdir && *dir == *subdir) {
+ while (*dir && *subdir && !cmp_icase(*dir, *subdir)) {
dir++;
subdir++;
offset++;
--
2.5.3.windows.1.3.gc322723
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
2015-09-28 16:12 [PATCH] Fix "inside work tree" detection on case-insensitive filesystems Johannes Schindelin
@ 2015-12-29 14:47 ` Michael Haggerty
2016-01-01 14:58 ` Johannes Schindelin
0 siblings, 1 reply; 4+ messages in thread
From: Michael Haggerty @ 2015-12-29 14:47 UTC (permalink / raw)
To: Johannes Schindelin, Junio C Hamano; +Cc: git
On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
> Git has a config variable to indicate that it is operating on a file
> system that is case-insensitive: core.ignoreCase. But the
> `dir_inside_of()` function did not respect that. As a result, if Git's
> idea of the current working directory disagreed in its upper/lower case
> with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
> user would be greeted by the error message
>
> fatal: git-am cannot be used without a working tree.
>
> when trying to run a rebase.
>
> This fixes https://github.com/git-for-windows/git/issues/402 (reported by
> Daniel Harding).
I was just going through the 2.7 release notes when I saw this patch. My
understanding was that many of the case-insensitive filesystems also
support Unicode. Is the byte-oriented code in this patch adequate? I
would have thought it necessary to use a Unicode-aware algorithm here,
that knows:
* that bytes != characters
* how to do a case-insensitive comparison of strings that include
non-ASCII characters
* (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms
I suppose that such OSs have built-in functions for deciding whether two
paths are equivalent. Possibly these could be used?
Michael
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> dir.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index b90484a..fba938b 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2107,6 +2107,15 @@ int file_exists(const char *f)
> return lstat(f, &sb) == 0;
> }
>
> +static int cmp_icase(char a, char b)
> +{
> + if (a == b)
> + return 0;
> + if (ignore_case)
> + return toupper(a) - toupper(b);
> + return a - b;
> +}
> +
> /*
> * Given two normalized paths (a trailing slash is ok), if subdir is
> * outside dir, return -1. Otherwise return the offset in subdir that
> @@ -2118,7 +2127,7 @@ int dir_inside_of(const char *subdir, const char *dir)
>
> assert(dir && subdir && *dir && *subdir);
>
> - while (*dir && *subdir && *dir == *subdir) {
> + while (*dir && *subdir && !cmp_icase(*dir, *subdir)) {
> dir++;
> subdir++;
> offset++;
>
--
Michael Haggerty
mhagger@alum.mit.edu
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
2015-12-29 14:47 ` Michael Haggerty
@ 2016-01-01 14:58 ` Johannes Schindelin
2016-01-01 16:24 ` Michael Haggerty
0 siblings, 1 reply; 4+ messages in thread
From: Johannes Schindelin @ 2016-01-01 14:58 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, git
Hi Michael,
On Tue, 29 Dec 2015, Michael Haggerty wrote:
> On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
> > Git has a config variable to indicate that it is operating on a file
> > system that is case-insensitive: core.ignoreCase. But the
> > `dir_inside_of()` function did not respect that. As a result, if Git's
> > idea of the current working directory disagreed in its upper/lower case
> > with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
> > user would be greeted by the error message
> >
> > fatal: git-am cannot be used without a working tree.
> >
> > when trying to run a rebase.
> >
> > This fixes https://github.com/git-for-windows/git/issues/402 (reported by
> > Daniel Harding).
>
> I was just going through the 2.7 release notes when I saw this patch.
Thanks for your diligence!
> My understanding was that many of the case-insensitive filesystems also
> support Unicode. Is the byte-oriented code in this patch adequate? I
> would have thought it necessary to use a Unicode-aware algorithm here,
> that knows:
>
> * that bytes != characters
I am not sure that we can in general assume that the file name is UTF-8...
Or does Git always assume that?
> * how to do a case-insensitive comparison of strings that include
> non-ASCII characters
I was worrying about that, too, but decided to punt on it when I realized
that no other case-insensitive code in Git bothers about those characters.
> * (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms
Whoa... I really would like to stay away from that collection of
potholes...
> I suppose that such OSs have built-in functions for deciding whether two
> paths are equivalent. Possibly these could be used?
We could, in theory, try to do that, but what about the OSes that do *not*
have those functions? We would need our own fallback anyway, so why not
guarantee consistency and use our own functions only?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
2016-01-01 14:58 ` Johannes Schindelin
@ 2016-01-01 16:24 ` Michael Haggerty
0 siblings, 0 replies; 4+ messages in thread
From: Michael Haggerty @ 2016-01-01 16:24 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On 01/01/2016 03:58 PM, Johannes Schindelin wrote:
> Hi Michael,
>
> On Tue, 29 Dec 2015, Michael Haggerty wrote:
>
>> On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
>>> Git has a config variable to indicate that it is operating on a file
>>> system that is case-insensitive: core.ignoreCase. But the
>>> `dir_inside_of()` function did not respect that. As a result, if Git's
>>> idea of the current working directory disagreed in its upper/lower case
>>> with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
>>> user would be greeted by the error message
>>>
>>> fatal: git-am cannot be used without a working tree.
>>>
>>> when trying to run a rebase.
>>>
>>> This fixes https://github.com/git-for-windows/git/issues/402 (reported by
>>> Daniel Harding).
>>
>> I was just going through the 2.7 release notes when I saw this patch.
>
> Thanks for your diligence!
>
>> My understanding was that many of the case-insensitive filesystems also
>> support Unicode. Is the byte-oriented code in this patch adequate? I
>> would have thought it necessary to use a Unicode-aware algorithm here,
>> that knows:
>>
>> * that bytes != characters
>
> I am not sure that we can in general assume that the file name is UTF-8...
> Or does Git always assume that?
No, it does not. I would say that Git resolutely refuses to take a stand
on whether/what encoding is used for filenames within its object database.
But as far as I understand, here we have a slightly different situation:
we have a string from an environment variable, and we have the current
working directory. Neither of these things are paths from the Git ODB,
and we don't have to take a stand about whether two paths from the ODB
are equivalent.
>> * how to do a case-insensitive comparison of strings that include
>> non-ASCII characters
>
> I was worrying about that, too, but decided to punt on it when I realized
> that no other case-insensitive code in Git bothers about those characters.
>
>> * (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms
>
> Whoa... I really would like to stay away from that collection of
> potholes...
>
>> I suppose that such OSs have built-in functions for deciding whether two
>> paths are equivalent. Possibly these could be used?
>
> We could, in theory, try to do that, but what about the OSes that do *not*
> have those functions? We would need our own fallback anyway, so why not
> guarantee consistency and use our own functions only?
I don't see why consistency across OSes is an issue here. Whenever we
are talking about paths on a particular filesystem, it is the
filesystem's idea of equivalence that is relevant.
Suppose we had a wrapper `filesystem_paths_equivalent(path1, path2)`. If
an OS provides a function that can be used for such a comparison, we can
use that. If not, but `ignore_case` is set, we can use your
`cmp_icase()`-based function. Otherwise, the default implementation
would be `!strcmp()`.
I don't think it would be worth going to all this trouble for this one
callsite (at least under my assumption that this has no security
implications). But I would have expected this issue to come up in enough
places that such an abstraction would be helpful.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-01 16:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-28 16:12 [PATCH] Fix "inside work tree" detection on case-insensitive filesystems Johannes Schindelin
2015-12-29 14:47 ` Michael Haggerty
2016-01-01 14:58 ` Johannes Schindelin
2016-01-01 16:24 ` Michael Haggerty
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).