git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
@ 2015-09-28 16:12 Johannes Schindelin
  2015-12-29 14:47 ` Michael Haggerty
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Schindelin @ 2015-09-28 16:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Git has a config variable to indicate that it is operating on a file
system that is case-insensitive: core.ignoreCase. But the
`dir_inside_of()` function did not respect that. As a result, if Git's
idea of the current working directory disagreed in its upper/lower case
with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
user would be greeted by the error message

	fatal: git-am cannot be used without a working tree.

when trying to run a rebase.

This fixes https://github.com/git-for-windows/git/issues/402 (reported by
Daniel Harding).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 dir.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index b90484a..fba938b 100644
--- a/dir.c
+++ b/dir.c
@@ -2107,6 +2107,15 @@ int file_exists(const char *f)
 	return lstat(f, &sb) == 0;
 }
 
+static int cmp_icase(char a, char b)
+{
+	if (a == b)
+		return 0;
+	if (ignore_case)
+		return toupper(a) - toupper(b);
+	return a - b;
+}
+
 /*
  * Given two normalized paths (a trailing slash is ok), if subdir is
  * outside dir, return -1.  Otherwise return the offset in subdir that
@@ -2118,7 +2127,7 @@ int dir_inside_of(const char *subdir, const char *dir)
 
 	assert(dir && subdir && *dir && *subdir);
 
-	while (*dir && *subdir && *dir == *subdir) {
+	while (*dir && *subdir && !cmp_icase(*dir, *subdir)) {
 		dir++;
 		subdir++;
 		offset++;
-- 
2.5.3.windows.1.3.gc322723

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
  2015-09-28 16:12 [PATCH] Fix "inside work tree" detection on case-insensitive filesystems Johannes Schindelin
@ 2015-12-29 14:47 ` Michael Haggerty
  2016-01-01 14:58   ` Johannes Schindelin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Haggerty @ 2015-12-29 14:47 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano; +Cc: git

On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
> Git has a config variable to indicate that it is operating on a file
> system that is case-insensitive: core.ignoreCase. But the
> `dir_inside_of()` function did not respect that. As a result, if Git's
> idea of the current working directory disagreed in its upper/lower case
> with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
> user would be greeted by the error message
> 
> 	fatal: git-am cannot be used without a working tree.
> 
> when trying to run a rebase.
> 
> This fixes https://github.com/git-for-windows/git/issues/402 (reported by
> Daniel Harding).

I was just going through the 2.7 release notes when I saw this patch. My
understanding was that many of the case-insensitive filesystems also
support Unicode. Is the byte-oriented code in this patch adequate? I
would have thought it necessary to use a Unicode-aware algorithm here,
that knows:

* that bytes != characters
* how to do a case-insensitive comparison of strings that include
non-ASCII characters
* (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms

I suppose that such OSs have built-in functions for deciding whether two
paths are equivalent. Possibly these could be used?

Michael

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  dir.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index b90484a..fba938b 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2107,6 +2107,15 @@ int file_exists(const char *f)
>  	return lstat(f, &sb) == 0;
>  }
>  
> +static int cmp_icase(char a, char b)
> +{
> +	if (a == b)
> +		return 0;
> +	if (ignore_case)
> +		return toupper(a) - toupper(b);
> +	return a - b;
> +}
> +
>  /*
>   * Given two normalized paths (a trailing slash is ok), if subdir is
>   * outside dir, return -1.  Otherwise return the offset in subdir that
> @@ -2118,7 +2127,7 @@ int dir_inside_of(const char *subdir, const char *dir)
>  
>  	assert(dir && subdir && *dir && *subdir);
>  
> -	while (*dir && *subdir && *dir == *subdir) {
> +	while (*dir && *subdir && !cmp_icase(*dir, *subdir)) {
>  		dir++;
>  		subdir++;
>  		offset++;
> 


-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
  2015-12-29 14:47 ` Michael Haggerty
@ 2016-01-01 14:58   ` Johannes Schindelin
  2016-01-01 16:24     ` Michael Haggerty
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Schindelin @ 2016-01-01 14:58 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git

Hi Michael,

On Tue, 29 Dec 2015, Michael Haggerty wrote:

> On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
> > Git has a config variable to indicate that it is operating on a file
> > system that is case-insensitive: core.ignoreCase. But the
> > `dir_inside_of()` function did not respect that. As a result, if Git's
> > idea of the current working directory disagreed in its upper/lower case
> > with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
> > user would be greeted by the error message
> > 
> > 	fatal: git-am cannot be used without a working tree.
> > 
> > when trying to run a rebase.
> > 
> > This fixes https://github.com/git-for-windows/git/issues/402 (reported by
> > Daniel Harding).
> 
> I was just going through the 2.7 release notes when I saw this patch.

Thanks for your diligence!

> My understanding was that many of the case-insensitive filesystems also
> support Unicode. Is the byte-oriented code in this patch adequate? I
> would have thought it necessary to use a Unicode-aware algorithm here,
> that knows:
> 
> * that bytes != characters

I am not sure that we can in general assume that the file name is UTF-8...
Or does Git always assume that?

> * how to do a case-insensitive comparison of strings that include
> non-ASCII characters

I was worrying about that, too, but decided to punt on it when I realized
that no other case-insensitive code in Git bothers about those characters.

> * (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms

Whoa... I really would like to stay away from that collection of
potholes...

> I suppose that such OSs have built-in functions for deciding whether two
> paths are equivalent. Possibly these could be used?

We could, in theory, try to do that, but what about the OSes that do *not*
have those functions? We would need our own fallback anyway, so why not
guarantee consistency and use our own functions only?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix "inside work tree" detection on case-insensitive filesystems
  2016-01-01 14:58   ` Johannes Schindelin
@ 2016-01-01 16:24     ` Michael Haggerty
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Haggerty @ 2016-01-01 16:24 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On 01/01/2016 03:58 PM, Johannes Schindelin wrote:
> Hi Michael,
> 
> On Tue, 29 Dec 2015, Michael Haggerty wrote:
> 
>> On 09/28/2015 06:12 PM, Johannes Schindelin wrote:
>>> Git has a config variable to indicate that it is operating on a file
>>> system that is case-insensitive: core.ignoreCase. But the
>>> `dir_inside_of()` function did not respect that. As a result, if Git's
>>> idea of the current working directory disagreed in its upper/lower case
>>> with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
>>> user would be greeted by the error message
>>>
>>> 	fatal: git-am cannot be used without a working tree.
>>>
>>> when trying to run a rebase.
>>>
>>> This fixes https://github.com/git-for-windows/git/issues/402 (reported by
>>> Daniel Harding).
>>
>> I was just going through the 2.7 release notes when I saw this patch.
> 
> Thanks for your diligence!
> 
>> My understanding was that many of the case-insensitive filesystems also
>> support Unicode. Is the byte-oriented code in this patch adequate? I
>> would have thought it necessary to use a Unicode-aware algorithm here,
>> that knows:
>>
>> * that bytes != characters
> 
> I am not sure that we can in general assume that the file name is UTF-8...
> Or does Git always assume that?

No, it does not. I would say that Git resolutely refuses to take a stand
on whether/what encoding is used for filenames within its object database.

But as far as I understand, here we have a slightly different situation:
we have a string from an environment variable, and we have the current
working directory. Neither of these things are paths from the Git ODB,
and we don't have to take a stand about whether two paths from the ODB
are equivalent.

>> * how to do a case-insensitive comparison of strings that include
>> non-ASCII characters
> 
> I was worrying about that, too, but decided to punt on it when I realized
> that no other case-insensitive code in Git bothers about those characters.
> 
>> * (possibly) insensitivity to NFC vs. NFD vs. non-normalized forms
> 
> Whoa... I really would like to stay away from that collection of
> potholes...
> 
>> I suppose that such OSs have built-in functions for deciding whether two
>> paths are equivalent. Possibly these could be used?
> 
> We could, in theory, try to do that, but what about the OSes that do *not*
> have those functions? We would need our own fallback anyway, so why not
> guarantee consistency and use our own functions only?

I don't see why consistency across OSes is an issue here. Whenever we
are talking about paths on a particular filesystem, it is the
filesystem's idea of equivalence that is relevant.

Suppose we had a wrapper `filesystem_paths_equivalent(path1, path2)`. If
an OS provides a function that can be used for such a comparison, we can
use that. If not, but `ignore_case` is set, we can use your
`cmp_icase()`-based function. Otherwise, the default implementation
would be `!strcmp()`.

I don't think it would be worth going to all this trouble for this one
callsite (at least under my assumption that this has no security
implications). But I would have expected this issue to come up in enough
places that such an abstraction would be helpful.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-01 16:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-28 16:12 [PATCH] Fix "inside work tree" detection on case-insensitive filesystems Johannes Schindelin
2015-12-29 14:47 ` Michael Haggerty
2016-01-01 14:58   ` Johannes Schindelin
2016-01-01 16:24     ` Michael Haggerty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).