* [PATCH v2 0/6] Extensions of core.ignorecase=true support
@ 2010-10-03 4:32 Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Joshua Jensen
` (7 more replies)
0 siblings, 8 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
The second version of this patch series fixes the problematic case
insensitive fnmatch call in patch 1 that relied on an apparently GNU-only
extension. Instead, the pattern and string are lowercased into
temporary buffers, and the standard fnmatch is called without relying
on the GNU extension.
Patches 2-6 received no modifications.
The original cover for the patch series follows as posted by Johannes Sixt:
The following patch series extends the core.ignorecase=true support to
handle case insensitive comparisons for the .gitignore file, git status,
and git ls-files. git add and git fast-import will fold the case of the
file being added, matching that of an already added directory entry. Case
folding is also applied to git fast-import for renames, copies, and deletes.
The most notable benefit, IMO, is that the case of directories in the
worktree does not matter if, and only if, the directory exists already in
the index with some different case variant. This helps applications on
Windows that change the case even of directories in unpredictable ways.
Joshua mentioned Perforce as the primary example.
Concerning the implementation, Joshua explained when he initially submitted
the series to the msysgit mailing list:
git status and add both use an update made to name-hash.c where
directories, specifically names with a trailing slash, can be looked up
in a case insensitive manner. After trying a myriad of solutions, this
seemed to be the cleanest. Does anyone see a problem with embedding the
directory names in the same hash as the file names? I couldn't find one,
especially since I append a slash to each directory name.
The git add path case folding functionality is a somewhat radical
departure from what Git does now. It is described in detail in patch 5.
Does anyone have any concerns?
I support the idea of this patch, and I can confirm that it works: I've
used this series in production both with core.ignorecase set to true and
to false, and in the former case, with directories and files with case
different from the index.
Joshua Jensen (6):
Add string comparison functions that respect the ignore_case variable.
Case insensitivity support for .gitignore via core.ignorecase
Add case insensitivity support for directories when using git status
Add case insensitivity support when using git ls-files
Support case folding for git add when core.ignorecase=true
Support case folding in git fast-import when core.ignorecase=true
dir.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
dir.h | 4 ++
fast-import.c | 7 ++-
name-hash.c | 72 +++++++++++++++++++++++++++
read-cache.c | 23 +++++++++
5 files changed, 235 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable.
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 8:30 ` Ævar Arnfjörð Bjarmason
2010-10-03 4:32 ` [PATCH v2 2/6] Case insensitivity support for .gitignore via core.ignorecase Joshua Jensen
` (6 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
Multiple locations within this patch series alter a case sensitive
string comparison call such as strcmp() to be a call to a string
comparison call that selects case comparison based on the global
ignore_case variable. Behaviorally, when core.ignorecase=false, the
*_icase() versions are functionally equivalent to their C runtime
counterparts. When core.ignorecase=true, the *_icase() versions perform
a case insensitive comparison.
Like Linus' earlier ignorecase patch, these may ignore filename
conventions on certain file systems. By isolating filename comparisons
to certain functions, support for those filename conventions may be more
easily met.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
dir.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
dir.h | 4 ++++
2 files changed, 66 insertions(+), 0 deletions(-)
diff --git a/dir.c b/dir.c
index d1e5e5e..ffa410d 100644
--- a/dir.c
+++ b/dir.c
@@ -18,6 +18,68 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, in
int check_only, const struct path_simplify *simplify);
static int get_dtype(struct dirent *de, const char *path, int len);
+/* helper string functions with support for the ignore_case flag */
+int strcmp_icase(const char *a, const char *b)
+{
+ return ignore_case ? strcasecmp(a, b) : strcmp(a, b);
+}
+
+int strncmp_icase(const char *a, const char *b, size_t count)
+{
+ return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
+}
+
+int fnmatch_casefold(const char *pattern, const char *string, int flags)
+{
+ char lowerPatternBuf[MAX_PATH];
+ char lowerStringBuf[MAX_PATH];
+ char* lowerPattern;
+ char* lowerString;
+ size_t patternLen;
+ size_t stringLen;
+ char* out;
+ int ret;
+
+ /*
+ * Use the provided stack buffer, if possible. If the string is too
+ * large, allocate buffer space.
+ */
+ patternLen = strlen(pattern);
+ if (patternLen + 1 > sizeof(lowerPatternBuf))
+ lowerPattern = xmalloc(patternLen + 1);
+ else
+ lowerPattern = lowerPatternBuf;
+
+ stringLen = strlen(string);
+ if (stringLen + 1 > sizeof(lowerStringBuf))
+ lowerString = xmalloc(stringLen + 1);
+ else
+ lowerString = lowerStringBuf;
+
+ /* Make the pattern and string lowercase to pass to fnmatch. */
+ for (out = lowerPattern; *pattern; ++out, ++pattern)
+ *out = tolower(*pattern);
+ *out = 0;
+
+ for (out = lowerString; *string; ++out, ++string)
+ *out = tolower(*string);
+ *out = 0;
+
+ ret = fnmatch(lowerPattern, lowerString, flags);
+
+ /* Free the pattern or string if it was allocated. */
+ if (lowerPattern != lowerPatternBuf)
+ free(lowerPattern);
+ if (lowerString != lowerStringBuf)
+ free(lowerString);
+ return ret;
+}
+
+int fnmatch_icase(const char *pattern, const char *string, int flags)
+{
+ return ignore_case ? fnmatch_casefold(pattern, string, flags) : fnmatch(pattern, string, flags);
+}
+
static int common_prefix(const char **pathspec)
{
const char *path, *slash, *next;
diff --git a/dir.h b/dir.h
index 278d84c..b3e2104 100644
--- a/dir.h
+++ b/dir.h
@@ -101,4 +101,8 @@ extern int remove_dir_recursively(struct strbuf *path, int flag);
/* tries to remove the path with empty directories along it, ignores ENOENT */
extern int remove_path(const char *path);
+extern int strcmp_icase(const char *a, const char *b);
+extern int strncmp_icase(const char *a, const char *b, size_t count);
+extern int fnmatch_icase(const char *pattern, const char *string, int flags);
+
#endif
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 2/6] Case insensitivity support for .gitignore via core.ignorecase
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 3/6] Add case insensitivity support for directories when using git status Joshua Jensen
` (5 subsequent siblings)
7 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
This is especially beneficial when using Windows and Perforce and the
git-p4 bridge. Internally, Perforce preserves a given file's full path
including its case at the time it was added to the Perforce repository.
When syncing a file down via Perforce, missing directories are created,
if necessary, using the case as stored with the filename. Unfortunately,
two files in the same directory can have differing cases for their
respective paths, such as /diRa/file1.c and /DirA/file2.c. Depending on
sync order, DirA/ may get created instead of diRa/.
It is possible to handle directory names in a case insensitive manner
without this patch, but it is highly inconvenient, requiring each
character to be specified like so: [Bb][Uu][Ii][Ll][Dd]. With this patch, the
gitignore exclusions honor the core.ignorecase=true configuration
setting and make the process less error prone. The above is specified
like so: Build
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
dir.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/dir.c b/dir.c
index ffa410d..63d7b41 100644
--- a/dir.c
+++ b/dir.c
@@ -436,14 +436,14 @@ int excluded_from_list(const char *pathname,
if (x->flags & EXC_FLAG_NODIR) {
/* match basename */
if (x->flags & EXC_FLAG_NOWILDCARD) {
- if (!strcmp(exclude, basename))
+ if (!strcmp_icase(exclude, basename))
return to_exclude;
} else if (x->flags & EXC_FLAG_ENDSWITH) {
if (x->patternlen - 1 <= pathlen &&
- !strcmp(exclude + 1, pathname + pathlen - x->patternlen + 1))
+ !strcmp_icase(exclude + 1, pathname + pathlen - x->patternlen + 1))
return to_exclude;
} else {
- if (fnmatch(exclude, basename, 0) == 0)
+ if (fnmatch_icase(exclude, basename, 0) == 0)
return to_exclude;
}
}
@@ -458,14 +458,14 @@ int excluded_from_list(const char *pathname,
if (pathlen < baselen ||
(baselen && pathname[baselen-1] != '/') ||
- strncmp(pathname, x->base, baselen))
+ strncmp_icase(pathname, x->base, baselen))
continue;
if (x->flags & EXC_FLAG_NOWILDCARD) {
- if (!strcmp(exclude, pathname + baselen))
+ if (!strcmp_icase(exclude, pathname + baselen))
return to_exclude;
} else {
- if (fnmatch(exclude, pathname+baselen,
+ if (fnmatch_icase(exclude, pathname+baselen,
FNM_PATHNAME) == 0)
return to_exclude;
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 3/6] Add case insensitivity support for directories when using git status
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 2/6] Case insensitivity support for .gitignore via core.ignorecase Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 4/6] Add case insensitivity support when using git ls-files Joshua Jensen
` (4 subsequent siblings)
7 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
When using a case preserving but case insensitive file system, directory
case can differ but still refer to the same physical directory. git
status reports the directory with the alternate case as an Untracked
file. (That is, when mydir/filea.txt is added to the repository and
then the directory on disk is renamed from mydir/ to MyDir/, git status
shows MyDir/ as being untracked.)
Support has been added in name-hash.c for hashing directories with a
terminating slash into the name hash. When index_name_exists() is called
with a directory (a name with a terminating slash), the name is not
found via the normal cache_name_compare() call, but it is found in the
slow_same_name() function.
Additionally, in dir.c, directory_exists_in_index_icase() allows newly
added directories deeper in the directory chain to be identified.
Ultimately, it would be better if the file list was read in case
insensitive alphabetical order from disk, but this change seems to
suffice for now.
The end result is the directory is looked up in a case insensitive
manner and does not show in the Untracked files list.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
dir.c | 40 ++++++++++++++++++++++++++++++++-
name-hash.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 110 insertions(+), 2 deletions(-)
diff --git a/dir.c b/dir.c
index 63d7b41..86768fb 100644
--- a/dir.c
+++ b/dir.c
@@ -531,6 +531,39 @@ enum exist_status {
};
/*
+ * Do not use the alphabetically stored index to look up
+ * the directory name; instead, use the case insensitive
+ * name hash.
+ */
+static enum exist_status directory_exists_in_index_icase(const char *dirname, int len)
+{
+ struct cache_entry *ce = index_name_exists(&the_index, dirname, len + 1, ignore_case);
+ unsigned char endchar;
+
+ if (!ce)
+ return index_nonexistent;
+ endchar = ce->name[len];
+
+ /*
+ * The cache_entry structure returned will contain this dirname
+ * and possibly additional path components.
+ */
+ if (endchar == '/')
+ return index_directory;
+
+ /*
+ * If there are no additional path components, then this cache_entry
+ * represents a submodule. Submodules, despite being directories,
+ * are stored in the cache without a closing slash.
+ */
+ if (!endchar && S_ISGITLINK(ce->ce_mode))
+ return index_gitdir;
+
+ /* This should never be hit, but it exists just in case. */
+ return index_nonexistent;
+}
+
+/*
* The index sorts alphabetically by entry name, which
* means that a gitlink sorts as '\0' at the end, while
* a directory (which is defined not as an entry, but as
@@ -539,7 +572,12 @@ enum exist_status {
*/
static enum exist_status directory_exists_in_index(const char *dirname, int len)
{
- int pos = cache_name_pos(dirname, len);
+ int pos;
+
+ if (ignore_case)
+ return directory_exists_in_index_icase(dirname, len);
+
+ pos = cache_name_pos(dirname, len);
if (pos < 0)
pos = -pos-1;
while (pos < active_nr) {
diff --git a/name-hash.c b/name-hash.c
index 0031d78..c6b6a3f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -32,6 +32,42 @@ static unsigned int hash_name(const char *name, int namelen)
return hash;
}
+static void hash_index_entry_directories(struct index_state *istate, struct cache_entry *ce)
+{
+ /*
+ * Throw each directory component in the hash for quick lookup
+ * during a git status. Directory components are stored with their
+ * closing slash. Despite submodules being a directory, they never
+ * reach this point, because they are stored without a closing slash
+ * in the cache.
+ *
+ * Note that the cache_entry stored with the directory does not
+ * represent the directory itself. It is a pointer to an existing
+ * filename, and its only purpose is to represent existence of the
+ * directory in the cache. It is very possible multiple directory
+ * hash entries may point to the same cache_entry.
+ */
+ unsigned int hash;
+ void **pos;
+
+ const char *ptr = ce->name;
+ while (*ptr) {
+ while (*ptr && *ptr != '/')
+ ++ptr;
+ if (*ptr == '/') {
+ ++ptr;
+ hash = hash_name(ce->name, ptr - ce->name);
+ if (!lookup_hash(hash, &istate->name_hash)) {
+ pos = insert_hash(hash, ce, &istate->name_hash);
+ if (pos) {
+ ce->next = *pos;
+ *pos = ce;
+ }
+ }
+ }
+ }
+}
+
static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
{
void **pos;
@@ -47,6 +83,9 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
ce->next = *pos;
*pos = ce;
}
+
+ if (ignore_case)
+ hash_index_entry_directories(istate, ce);
}
static void lazy_init_name_hash(struct index_state *istate)
@@ -97,7 +136,21 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
if (len == namelen && !cache_name_compare(name, namelen, ce->name, len))
return 1;
- return icase && slow_same_name(name, namelen, ce->name, len);
+ if (!icase)
+ return 0;
+
+ /*
+ * If the entry we're comparing is a filename (no trailing slash), then compare
+ * the lengths exactly.
+ */
+ if (name[namelen - 1] != '/')
+ return slow_same_name(name, namelen, ce->name, len);
+
+ /*
+ * For a directory, we point to an arbitrary cache_entry filename. Just
+ * make sure the directory portion matches.
+ */
+ return slow_same_name(name, namelen, ce->name, namelen < len ? namelen : len);
}
struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
@@ -115,5 +168,22 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
}
ce = ce->next;
}
+
+ /*
+ * Might be a submodule. Despite submodules being directories,
+ * they are stored in the name hash without a closing slash.
+ * When ignore_case is 1, directories are stored in the name hash
+ * with their closing slash.
+ *
+ * The side effect of this storage technique is we have need to
+ * remove the slash from name and perform the lookup again without
+ * the slash. If a match is made, S_ISGITLINK(ce->mode) will be
+ * true.
+ */
+ if (icase && name[namelen - 1] == '/') {
+ ce = index_name_exists(istate, name, namelen - 1, icase);
+ if (ce && S_ISGITLINK(ce->ce_mode))
+ return ce;
+ }
return NULL;
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 4/6] Add case insensitivity support when using git ls-files
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
` (2 preceding siblings ...)
2010-10-03 4:32 ` [PATCH v2 3/6] Add case insensitivity support for directories when using git status Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 5/6] Support case folding for git add when core.ignorecase=true Joshua Jensen
` (3 subsequent siblings)
7 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
MyDir/fileb.txt is added, running git ls-files mydir only shows
mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
Running git ls-files mYdIR shows nothing.
With this patch running git ls-files for mydir, MyDir, and mYdIR shows
mydir/filea.txt and MyDir/fileb.txt.
Wildcards are not handled case insensitively in this patch. Example:
MyDir/aBc/file.txt is added. git ls-files MyDir/a* works fine, but git
ls-files mydir/a* does not.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
dir.c | 38 ++++++++++++++++++++++++++------------
1 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/dir.c b/dir.c
index 86768fb..6e2505d 100644
--- a/dir.c
+++ b/dir.c
@@ -153,16 +153,30 @@ static int match_one(const char *match, const char *name, int namelen)
if (!*match)
return MATCHED_RECURSIVELY;
- for (;;) {
- unsigned char c1 = *match;
- unsigned char c2 = *name;
- if (c1 == '\0' || is_glob_special(c1))
- break;
- if (c1 != c2)
- return 0;
- match++;
- name++;
- namelen--;
+ if (ignore_case) {
+ for (;;) {
+ unsigned char c1 = tolower(*match);
+ unsigned char c2 = tolower(*name);
+ if (c1 == '\0' || is_glob_special(c1))
+ break;
+ if (c1 != c2)
+ return 0;
+ match++;
+ name++;
+ namelen--;
+ }
+ } else {
+ for (;;) {
+ unsigned char c1 = *match;
+ unsigned char c2 = *name;
+ if (c1 == '\0' || is_glob_special(c1))
+ break;
+ if (c1 != c2)
+ return 0;
+ match++;
+ name++;
+ namelen--;
+ }
}
@@ -171,8 +185,8 @@ static int match_one(const char *match, const char *name, int namelen)
* we need to match by fnmatch
*/
matchlen = strlen(match);
- if (strncmp(match, name, matchlen))
- return !fnmatch(match, name, 0) ? MATCHED_FNMATCH : 0;
+ if (strncmp_icase(match, name, matchlen))
+ return !fnmatch_icase(match, name, 0) ? MATCHED_FNMATCH : 0;
if (namelen == matchlen)
return MATCHED_EXACTLY;
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 5/6] Support case folding for git add when core.ignorecase=true
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
` (3 preceding siblings ...)
2010-10-03 4:32 ` [PATCH v2 4/6] Add case insensitivity support when using git ls-files Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 6/6] Support case folding in git fast-import " Joshua Jensen
` (2 subsequent siblings)
7 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/
is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the
index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although
the earlier portions of this patch series account for those differences
in case, this patch makes the pathing consistent by folding the case of
newly added files against the first file added with that path.
In read-cache.c's add_to_index(), the index_name_exists() support used
for git status's case insensitive directory lookups is used to find the
proper directory case according to what the user already checked in.
That is, MyDir/ABC/'s case is used to alter the stored path for
fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt).
This is especially important when cloning a repository to a case
sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same
directory on a Windows machine, but on Linux, the files exist in two
separate directories. The update to add_to_index(), in effect, treats a
Windows file system as case sensitive by making path case consistent.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
read-cache.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 1f42473..379862c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -608,6 +608,29 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
ce->ce_mode = ce_mode_from_stat(ent, st_mode);
}
+ /* When core.ignorecase=true, determine if a directory of the same name but differing
+ * case already exists within the Git repository. If it does, ensure the directory
+ * case of the file being added to the repository matches (is folded into) the existing
+ * entry's directory case.
+ */
+ if (ignore_case) {
+ const char *startPtr = ce->name;
+ const char *ptr = startPtr;
+ while (*ptr) {
+ while (*ptr && *ptr != '/')
+ ++ptr;
+ if (*ptr == '/') {
+ struct cache_entry *foundce;
+ ++ptr;
+ foundce = index_name_exists(&the_index, ce->name, ptr - ce->name, ignore_case);
+ if (foundce) {
+ memcpy((void*)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);
+ startPtr = ptr;
+ }
+ }
+ }
+ }
+
alias = index_name_exists(istate, ce->name, ce_namelen(ce), ignore_case);
if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
/* Nothing changed, really */
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 6/6] Support case folding in git fast-import when core.ignorecase=true
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
` (4 preceding siblings ...)
2010-10-03 4:32 ` [PATCH v2 5/6] Support case folding for git add when core.ignorecase=true Joshua Jensen
@ 2010-10-03 4:32 ` Joshua Jensen
2010-10-03 13:00 ` Sverre Rabbelier
2010-10-03 8:17 ` [PATCH v2 0/6] Extensions of core.ignorecase=true support Johannes Sixt
2010-10-03 11:48 ` Robert Buck
7 siblings, 1 reply; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 4:32 UTC (permalink / raw)
To: git; +Cc: j6t
When core.ignorecase=true, imported file paths will be folded to match
existing directory case.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
---
fast-import.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..e214048 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -156,6 +156,7 @@ Format of STDIN stream:
#include "csum-file.h"
#include "quote.h"
#include "exec_cmd.h"
+#include "dir.h"
#define PACK_ID_BITS 16
#define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -1461,7 +1462,7 @@ static int tree_content_set(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (!slash1) {
if (!S_ISDIR(mode)
&& e->versions[1].mode == mode
@@ -1527,7 +1528,7 @@ static int tree_content_remove(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (slash1 && !S_ISDIR(e->versions[1].mode))
/*
* If p names a file in some subdirectory, and a
@@ -1585,7 +1586,7 @@ static int tree_content_get(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (!slash1) {
memcpy(leaf, e, sizeof(*leaf));
if (e->tree && is_null_sha1(e->versions[1].sha1))
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
` (5 preceding siblings ...)
2010-10-03 4:32 ` [PATCH v2 6/6] Support case folding in git fast-import " Joshua Jensen
@ 2010-10-03 8:17 ` Johannes Sixt
2010-10-03 23:34 ` Junio C Hamano
2010-10-03 11:48 ` Robert Buck
7 siblings, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-10-03 8:17 UTC (permalink / raw)
To: Joshua Jensen; +Cc: git, Junio C Hamano
Thank you for the resend.
On Sonntag, 3. Oktober 2010, Joshua Jensen wrote:
> git status and add both use an update made to name-hash.c where
> directories, specifically names with a trailing slash, can be looked up
> in a case insensitive manner. After trying a myriad of solutions, this
> seemed to be the cleanest. Does anyone see a problem with embedding the
> directory names in the same hash as the file names? I couldn't find one,
> especially since I append a slash to each directory name.
>
> The git add path case folding functionality is a somewhat radical
> departure from what Git does now. It is described in detail in patch 5.
> Does anyone have any concerns?
Since I'm not an expert in the area that is touched by this series, I'd like
to draw the list's attention to the questions in these two paragraphs.
Junio, IIRC, the series appeared in next for some time before the 1.7.3
release. Does this imply that you reviewed the series and deemed the
implementation sound?
-- Hannes
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable.
2010-10-03 4:32 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Joshua Jensen
@ 2010-10-03 8:30 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:07 ` Joshua Jensen
0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 8:30 UTC (permalink / raw)
To: Joshua Jensen; +Cc: git, j6t
On Sun, Oct 3, 2010 at 04:32, Joshua Jensen <jjensen@workspacewhiz.com> wrote:
> diff --git a/dir.c b/dir.c
> index d1e5e5e..ffa410d 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -18,6 +18,68 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, in
> int check_only, const struct path_simplify *simplify);
> static int get_dtype(struct dirent *de, const char *path, int len);
>
> +/* helper string functions with support for the ignore_case flag */
> +int strcmp_icase(const char *a, const char *b)
> +{
> + return ignore_case ? strcasecmp(a, b) : strcmp(a, b);
> +}
> +
> +int strncmp_icase(const char *a, const char *b, size_t count)
> +{
> + return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
> +}
> +
> +int fnmatch_casefold(const char *pattern, const char *string, int flags)
> +{
> + char lowerPatternBuf[MAX_PATH];
> + char lowerStringBuf[MAX_PATH];
> + char* lowerPattern;
> + char* lowerString;
> + size_t patternLen;
> + size_t stringLen;
> + char* out;
> + int ret;
> +
> + /*
> + * Use the provided stack buffer, if possible. If the string is too
> + * large, allocate buffer space.
> + */
> + patternLen = strlen(pattern);
> + if (patternLen + 1 > sizeof(lowerPatternBuf))
> + lowerPattern = xmalloc(patternLen + 1);
> + else
> + lowerPattern = lowerPatternBuf;
> +
> + stringLen = strlen(string);
> + if (stringLen + 1 > sizeof(lowerStringBuf))
> + lowerString = xmalloc(stringLen + 1);
> + else
> + lowerString = lowerStringBuf;
> +
> + /* Make the pattern and string lowercase to pass to fnmatch. */
> + for (out = lowerPattern; *pattern; ++out, ++pattern)
> + *out = tolower(*pattern);
> + *out = 0;
> +
> + for (out = lowerString; *string; ++out, ++string)
> + *out = tolower(*string);
> + *out = 0;
> +
> + ret = fnmatch(lowerPattern, lowerString, flags);
> +
> + /* Free the pattern or string if it was allocated. */
> + if (lowerPattern != lowerPatternBuf)
> + free(lowerPattern);
> + if (lowerString != lowerStringBuf)
> + free(lowerString);
> + return ret;
> +}
> +
> +int fnmatch_icase(const char *pattern, const char *string, int flags)
> +{
> + return ignore_case ? fnmatch_casefold(pattern, string, flags) : fnmatch(pattern, string, flags);
> +}
I liked v1 of this patch better, although it obviously had portability
issues. But I think it would be better to handle this with:
#ifndef FNM_CASEFOLD
int fnmatch_casefold(const char *pattern, const char *string, int flags)
{
...
}
#endf
int fnmatch_icase(const char *pattern, const char *string, int flags)
{
#ifndef FNM_CASEFOLD
return ignore_case ? fnmatch_casefold(pattern, string,
flags) : fnmatch(pattern, string, flags);
#else
return fnmatch(pattern, string, flags | (ignore_case ?
FNM_CASEFOLD : 0));
#endif
}
Or simply use fnmatch(..., FNM_CASEFOLD) everywhere and include
compat/fnmatch/* on platforms like Solaris that don't have the GNU
extension.
That would allow the GNU libc, FreeBSD libc and others that implement
the GNU extension to do case folding for us, and we wouldn't have to
maintain our own fnmatch_casefold.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable.
2010-10-03 8:30 ` Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:07 ` Joshua Jensen
2010-10-03 9:56 ` [PATCH/RFC v3 0/8] ab/icase-directory: jj/icase-directory with Makefile + configure checks Ævar Arnfjörð Bjarmason
` (9 more replies)
0 siblings, 10 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-03 9:07 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, j6t
----- Original Message -----
From: Ævar Arnfjörð Bjarmason
Date: 10/3/2010 2:30 AM
> On Sun, Oct 3, 2010 at 04:32, Joshua Jensen<jjensen@workspacewhiz.com> wrote
>> +int fnmatch_casefold(const char *pattern, const char *string, int flags)
>> +{
>> + char lowerPatternBuf[MAX_PATH];
>> + char lowerStringBuf[MAX_PATH];
>> + char* lowerPattern;
>> + char* lowerString;
>> + size_t patternLen;
>> + size_t stringLen;
>> + char* out;
>> + int ret;
>> +
>> + /*
>> + * Use the provided stack buffer, if possible. If the string is too
>> + * large, allocate buffer space.
>> + */
>> + patternLen = strlen(pattern);
>> + if (patternLen + 1> sizeof(lowerPatternBuf))
>> + lowerPattern = xmalloc(patternLen + 1);
>> + else
>> + lowerPattern = lowerPatternBuf;
>> +
>> + stringLen = strlen(string);
>> + if (stringLen + 1> sizeof(lowerStringBuf))
>> + lowerString = xmalloc(stringLen + 1);
>> + else
>> + lowerString = lowerStringBuf;
>> +
>> + /* Make the pattern and string lowercase to pass to fnmatch. */
>> + for (out = lowerPattern; *pattern; ++out, ++pattern)
>> + *out = tolower(*pattern);
>> + *out = 0;
>> +
>> + for (out = lowerString; *string; ++out, ++string)
>> + *out = tolower(*string);
>> + *out = 0;
>> +
>> + ret = fnmatch(lowerPattern, lowerString, flags);
>> +
>> + /* Free the pattern or string if it was allocated. */
>> + if (lowerPattern != lowerPatternBuf)
>> + free(lowerPattern);
>> + if (lowerString != lowerStringBuf)
>> + free(lowerString);
>> + return ret;
>> +}
>> +
>> +int fnmatch_icase(const char *pattern, const char *string, int flags)
>> +{
>> + return ignore_case ? fnmatch_casefold(pattern, string, flags) : fnmatch(pattern, string, flags);
>> +}
>
> I liked v1 of this patch better, although it obviously had portability
> issues. But I think it would be better to handle this with:
>
> #ifndef FNM_CASEFOLD
> int fnmatch_casefold(const char *pattern, const char *string, int flags)
> {
> ...
> }
> #endf
>
> int fnmatch_icase(const char *pattern, const char *string, int flags)
> {
> #ifndef FNM_CASEFOLD
> return ignore_case ? fnmatch_casefold(pattern, string,
> flags) : fnmatch(pattern, string, flags);
> #else
> return fnmatch(pattern, string, flags | (ignore_case ?
> FNM_CASEFOLD : 0));
> #endif
> }
>
> Or simply use fnmatch(..., FNM_CASEFOLD) everywhere and include
> compat/fnmatch/* on platforms like Solaris that don't have the GNU
> extension.
The real problem with compat/fnmatch is determining which random
platforms need that support and updating the makefile accordingly.
Further, the compat/fnmatch/* code would need to be rejigged somewhat,
so there is no possible conflict (now or in the future) with the
provided symbols. We discussed this as a potential problem developers
would need to be aware of if the system fnmatch.h (or whatever it is
called) gets #included.
Anyway, what you describe above creates two code paths. I would imagine
that would be harder to debug; that is, on some platforms, it uses
fnmatch_casefold and on others, it hands it off to fnmatch(...,
FNM_CASEFOLD).
In any case, I'd like to find a solution to get this series working for
everyone. I've been out of commission for a month (deploying Git to 80+
programmers at an organization, by the way), but I'm back now and can
work this until it is complete.
Josh
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 0/8] ab/icase-directory: jj/icase-directory with Makefile + configure checks
2010-10-03 9:07 ` Joshua Jensen
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 1/8] Makefile & configure: add a NO_FNMATCH flag Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey,
Ævar Arnfjörð Bjarmason
On Sun, Oct 3, 2010 at 09:07, Joshua Jensen <jjensen@workspacewhiz.com> wrote:
> ----- Original Message -----
> From: Ævar Arnfjörð Bjarmason
> Date: 10/3/2010 2:30 AM
>>
>> On Sun, Oct 3, 2010 at 04:32, Joshua Jensen<jjensen@workspacewhiz.com>
>> wrote
>>>
>>> +int fnmatch_casefold(const char *pattern, const char *string, int flags)
>>> +{
>>> + char lowerPatternBuf[MAX_PATH];
>>> + char lowerStringBuf[MAX_PATH];
>>> + char* lowerPattern;
>>> + char* lowerString;
>>> + size_t patternLen;
>>> + size_t stringLen;
>>> + char* out;
>>> + int ret;
>>> +
>>> + /*
>>> + * Use the provided stack buffer, if possible. If the string is
>>> too
>>> + * large, allocate buffer space.
>>> + */
>>> + patternLen = strlen(pattern);
>>> + if (patternLen + 1> sizeof(lowerPatternBuf))
>>> + lowerPattern = xmalloc(patternLen + 1);
>>> + else
>>> + lowerPattern = lowerPatternBuf;
>>> +
>>> + stringLen = strlen(string);
>>> + if (stringLen + 1> sizeof(lowerStringBuf))
>>> + lowerString = xmalloc(stringLen + 1);
>>> + else
>>> + lowerString = lowerStringBuf;
>>> +
>>> + /* Make the pattern and string lowercase to pass to fnmatch. */
>>> + for (out = lowerPattern; *pattern; ++out, ++pattern)
>>> + *out = tolower(*pattern);
>>> + *out = 0;
>>> +
>>> + for (out = lowerString; *string; ++out, ++string)
>>> + *out = tolower(*string);
>>> + *out = 0;
>>> +
>>> + ret = fnmatch(lowerPattern, lowerString, flags);
>>> +
>>> + /* Free the pattern or string if it was allocated. */
>>> + if (lowerPattern != lowerPatternBuf)
>>> + free(lowerPattern);
>>> + if (lowerString != lowerStringBuf)
>>> + free(lowerString);
>>> + return ret;
>>> +}
>>> +
>>> +int fnmatch_icase(const char *pattern, const char *string, int flags)
>>> +{
>>> + return ignore_case ? fnmatch_casefold(pattern, string, flags) :
>>> fnmatch(pattern, string, flags);
>>> +}
>>
>> I liked v1 of this patch better, although it obviously had portability
>> issues. But I think it would be better to handle this with:
>>
>> #ifndef FNM_CASEFOLD
>> int fnmatch_casefold(const char *pattern, const char *string, int
>> flags)
>> {
>> ...
>> }
>> #endf
>>
>> int fnmatch_icase(const char *pattern, const char *string, int flags)
>> {
>> #ifndef FNM_CASEFOLD
>> return ignore_case ? fnmatch_casefold(pattern, string,
>> flags) : fnmatch(pattern, string, flags);
>> #else
>> return fnmatch(pattern, string, flags | (ignore_case ?
>> FNM_CASEFOLD : 0));
>> #endif
>> }
>>
>> Or simply use fnmatch(..., FNM_CASEFOLD) everywhere and include
>> compat/fnmatch/* on platforms like Solaris that don't have the GNU
>> extension.
I offered before to help with making this portable, so I've gone ahead
and done it. This series is like your v1, but it has two of my patches
at the front to add Makefile & configure checks & fallbacks for
fnmatch if the function either doesn't exist, or it doesn't support
the FNM_CASEFOLD flag.
> The real problem with compat/fnmatch is determining which random platforms
> need that support and updating the makefile accordingly.
We already do this for a bunch of NO_WHATEVER= flags. Adding one more
isn't going to be too hard to maintain.
> Further, the compat/fnmatch/* code would need to be rejigged
> somewhat, so there is no possible conflict (now or in the future)
> with the provided symbols. We discussed this as a potential problem
> developers would need to be aware of if the system fnmatch.h (or
> whatever it is called) gets #included.
Since we do -Icompat/fnmatch it's going to be our fnmatch.h that's
picked up, so we aren't going to get a symbol conflict I should think.
> Anyway, what you describe above creates two code paths. I would imagine
> that would be harder to debug; that is, on some platforms, it uses
> fnmatch_casefold and on others, it hands it off to fnmatch(...,
> FNM_CASEFOLD).
My ad-hoc example pseudocode created two codepaths, but this version
doesn't.
> In any case, I'd like to find a solution to get this series working for
> everyone. I've been out of commission for a month (deploying Git to 80+
> programmers at an organization, by the way), but I'm back now and can work
> this until it is complete.
This is all from your v1:
Joshua Jensen (6):
Add string comparison functions that respect the ignore_case
variable.
Case insensitivity support for .gitignore via core.ignorecase
Add case insensitivity support for directories when using git status
Add case insensitivity support when using git ls-files
Support case folding for git add when core.ignorecase=true
Support case folding in git fast-import when core.ignorecase=true
These two are new:
Ævar Arnfjörð Bjarmason (2):
Makefile & configure: add a NO_FNMATCH flag
This one is a good idea in general. We shouldn't be duplicating setup
code in both the Windows and MinGW portions of the Makefile, and if we
ever get another odd system that doesn't have fnmatch() this will make
things just work there.
Needs testing from someone with Windows.
Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
The code needed to make Joshua's code portable. On Solaris this
returns with the configure script:
$ make configure && ./configure | grep -i -e fnmatch && grep -i -e fnmatch config.mak.autogen
GEN configure
checking for fnmatch... yes
checking for library containing fnmatch... none required
checking whether the fnmatch function supports the FNMATCH_CASEFOLD GNU extension... no
NO_FNMATCH=
NO_FNMATCH_CASEFOLD=YesPlease
And on Linux:
$ make configure && ./configure | grep -i -e fnmatch && grep -i -e fnmatch config.mak.autogen
GEN configure
checking for fnmatch... yes
checking for library containing fnmatch... none required
checking whether the fnmatch function supports the FNMATCH_CASEFOLD GNU extension... yes
NO_FNMATCH=
NO_FNMATCH_CASEFOLD=
Makefile | 27 ++++++++++++---
config.mak.in | 2 +
configure.ac | 28 +++++++++++++++
dir.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++----------
dir.h | 4 ++
fast-import.c | 7 ++--
name-hash.c | 72 ++++++++++++++++++++++++++++++++++++++-
read-cache.c | 23 ++++++++++++
8 files changed, 241 insertions(+), 28 deletions(-)
--
1.7.3.159.g610493
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 1/8] Makefile & configure: add a NO_FNMATCH flag
2010-10-03 9:07 ` Joshua Jensen
2010-10-03 9:56 ` [PATCH/RFC v3 0/8] ab/icase-directory: jj/icase-directory with Makefile + configure checks Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey,
Ævar Arnfjörð Bjarmason
Windows and MinGW both lack fnmatch() in their C library and needed
compat/fnmatch, but they had duplicate code for adding the compat
function, and there was no Makefile flag or configure check for
fnmatch.
Change the Makefile it so that it's now possible to compile the compat
function with a NO_FNMATCH=YesPlease flag, and add a configure probe
for it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Makefile | 18 +++++++++++++-----
config.mak.in | 1 +
configure.ac | 6 ++++++
3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/Makefile b/Makefile
index 8a56b9a..f7c4383 100644
--- a/Makefile
+++ b/Makefile
@@ -70,6 +70,8 @@ all::
#
# Define NO_STRTOK_R if you don't have strtok_r in the C library.
#
+# Define NO_FNMATCH if you don't have fnmatch in the C library.
+#
# Define NO_LIBGEN_H if you don't have libgen.h.
#
# Define NEEDS_LIBGEN if your libgen needs -lgen when linking
@@ -1052,6 +1054,7 @@ ifeq ($(uname_S),Windows)
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
+ NO_FNMATCH = YesPlease
NO_MEMMEM = YesPlease
# NEEDS_LIBICONV = YesPlease
NO_ICONV = YesPlease
@@ -1081,8 +1084,8 @@ ifeq ($(uname_S),Windows)
AR = compat/vcbuild/scripts/lib.pl
CFLAGS =
BASIC_CFLAGS = -nologo -I. -I../zlib -Icompat/vcbuild -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
- COMPAT_OBJS = compat/msvc.o compat/fnmatch/fnmatch.o compat/winansi.o compat/win32/pthread.o
- COMPAT_CFLAGS = -D__USE_MINGW_ACCESS -DNOGDI -DHAVE_STRING_H -DHAVE_ALLOCA_H -Icompat -Icompat/fnmatch -Icompat/regex -Icompat/fnmatch -Icompat/win32 -DSTRIP_EXTENSION=\".exe\"
+ COMPAT_OBJS = compat/msvc.o compat/winansi.o compat/win32/pthread.o
+ COMPAT_CFLAGS = -D__USE_MINGW_ACCESS -DNOGDI -DHAVE_STRING_H -DHAVE_ALLOCA_H -Icompat -Icompat/regex -Icompat/win32 -DSTRIP_EXTENSION=\".exe\"
BASIC_LDFLAGS = -IGNORE:4217 -IGNORE:4049 -NOLOGO -SUBSYSTEM:CONSOLE -NODEFAULTLIB:MSVCRT.lib
EXTLIBS = advapi32.lib shell32.lib wininet.lib ws2_32.lib
PTHREAD_LIBS =
@@ -1107,6 +1110,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_STRTOK_R = YesPlease
+ NO_FNMATCH = YesPlease
NO_MEMMEM = YesPlease
NEEDS_LIBICONV = YesPlease
OLD_ICONV = YesPlease
@@ -1128,10 +1132,9 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_PYTHON = YesPlease
BLK_SHA1 = YesPlease
ETAGS_TARGET = ETAGS
- COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/fnmatch -Icompat/win32
+ COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/win32
COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
- COMPAT_OBJS += compat/mingw.o compat/fnmatch/fnmatch.o compat/winansi.o \
- compat/win32/pthread.o
+ COMPAT_OBJS += compat/mingw.o compat/winansi.o compat/win32/pthread.o
EXTLIBS += -lws2_32
PTHREAD_LIBS =
X = .exe
@@ -1342,6 +1345,11 @@ ifdef NO_STRTOK_R
COMPAT_CFLAGS += -DNO_STRTOK_R
COMPAT_OBJS += compat/strtok_r.o
endif
+ifdef NO_FNMATCH
+ COMPAT_CFLAGS += -Icompat/fnmatch
+ COMPAT_CFLAGS += -DNO_FNMATCH
+ COMPAT_OBJS += compat/fnmatch/fnmatch.o
+endif
ifdef NO_SETENV
COMPAT_CFLAGS += -DNO_SETENV
COMPAT_OBJS += compat/setenv.o
diff --git a/config.mak.in b/config.mak.in
index a0c34ee..aaa70a8 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -47,6 +47,7 @@ NO_C99_FORMAT=@NO_C99_FORMAT@
NO_HSTRERROR=@NO_HSTRERROR@
NO_STRCASESTR=@NO_STRCASESTR@
NO_STRTOK_R=@NO_STRTOK_R@
+NO_FNMATCH=@NO_FNMATCH@
NO_MEMMEM=@NO_MEMMEM@
NO_STRLCPY=@NO_STRLCPY@
NO_UINTMAX_T=@NO_UINTMAX_T@
diff --git a/configure.ac b/configure.ac
index cc55b6d..7715f6c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -818,6 +818,12 @@ GIT_CHECK_FUNC(strtok_r,
[NO_STRTOK_R=YesPlease])
AC_SUBST(NO_STRTOK_R)
#
+# Define NO_FNMATCH if you don't have fnmatch
+GIT_CHECK_FUNC(fnmatch,
+[NO_FNMATCH=],
+[NO_FNMATCH=YesPlease])
+AC_SUBST(NO_FNMATCH)
+#
# Define NO_MEMMEM if you don't have memmem.
GIT_CHECK_FUNC(memmem,
[NO_MEMMEM=],
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
2010-10-03 9:07 ` Joshua Jensen
2010-10-03 9:56 ` [PATCH/RFC v3 0/8] ab/icase-directory: jj/icase-directory with Makefile + configure checks Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 1/8] Makefile & configure: add a NO_FNMATCH flag Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 17:58 ` Johannes Sixt
2010-10-03 9:56 ` [PATCH/RFC v3 3/8] Add string comparison functions that respect the ignore_case variable Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey,
Ævar Arnfjörð Bjarmason
On some platforms (like Solaris) there is a fnmatch, but it doesn't
support the GNU FNM_CASEFOLD extension that's used by the
jj/icase-directory series' fnmatch_icase wrapper.
Change the Makefile so that it's now possible to set
NO_FNMATCH_CASEFOLD=YesPlease on those systems, and add a configure
probe for it.
Unlike the NO_REGEX check we don't add AC_INCLUDES_DEFAULT to our
headers. This is because on a GNU system the definition of
FNM_CASEFOLD in fnmatch.h is guarded by:
#if !defined _POSIX_C_SOURCE || _POSIX_C_SOURCE < 2 || defined _GNU_SOURCE
One of the headers AC_INCLUDES_DEFAULT includes ends up defining one
of those, so if we'd use it we'd always get
NO_FNMATCH_CASEFOLD=YesPlease on GNU systems, even though they have
FNM_CASEFOLD.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Makefile | 9 +++++++++
config.mak.in | 1 +
configure.ac | 22 ++++++++++++++++++++++
3 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/Makefile b/Makefile
index f7c4383..5fe0523 100644
--- a/Makefile
+++ b/Makefile
@@ -72,6 +72,9 @@ all::
#
# Define NO_FNMATCH if you don't have fnmatch in the C library.
#
+# Define NO_FNMATCH_CASEFOLD if your fnmatch function doesn't have the
+# FNM_CASEFOLD GNU extension.
+#
# Define NO_LIBGEN_H if you don't have libgen.h.
#
# Define NEEDS_LIBGEN if your libgen needs -lgen when linking
@@ -848,6 +851,7 @@ ifeq ($(uname_S),SunOS)
NO_MKDTEMP = YesPlease
NO_MKSTEMPS = YesPlease
NO_REGEX = YesPlease
+ NO_FNMATCH_CASEFOLD = YesPlease
ifeq ($(uname_R),5.6)
SOCKLEN_T = int
NO_HSTRERROR = YesPlease
@@ -1350,6 +1354,11 @@ ifdef NO_FNMATCH
COMPAT_CFLAGS += -DNO_FNMATCH
COMPAT_OBJS += compat/fnmatch/fnmatch.o
endif
+ifdef NO_FNMATCH_CASEFOLD
+ COMPAT_CFLAGS += -Icompat/fnmatch
+ COMPAT_CFLAGS += -DNO_FNMATCH_CASEFOLD
+ COMPAT_OBJS += compat/fnmatch/fnmatch.o
+endif
ifdef NO_SETENV
COMPAT_CFLAGS += -DNO_SETENV
COMPAT_OBJS += compat/setenv.o
diff --git a/config.mak.in b/config.mak.in
index aaa70a8..56343ba 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -48,6 +48,7 @@ NO_HSTRERROR=@NO_HSTRERROR@
NO_STRCASESTR=@NO_STRCASESTR@
NO_STRTOK_R=@NO_STRTOK_R@
NO_FNMATCH=@NO_FNMATCH@
+NO_FNMATCH_CASEFOLD=@NO_FNMATCH_CASEFOLD@
NO_MEMMEM=@NO_MEMMEM@
NO_STRLCPY=@NO_STRLCPY@
NO_UINTMAX_T=@NO_UINTMAX_T@
diff --git a/configure.ac b/configure.ac
index 7715f6c..6dd9241 100644
--- a/configure.ac
+++ b/configure.ac
@@ -824,6 +824,28 @@ GIT_CHECK_FUNC(fnmatch,
[NO_FNMATCH=YesPlease])
AC_SUBST(NO_FNMATCH)
#
+# Define NO_FNMATCH_CASEFOLD if your fnmatch function doesn't have the
+# FNM_CASEFOLD GNU extension.
+AC_CACHE_CHECK([whether the fnmatch function supports the FNMATCH_CASEFOLD GNU extension],
+ [ac_cv_c_excellent_fnmatch], [
+AC_EGREP_CPP(yippeeyeswehaveit,
+ AC_LANG_PROGRAM([
+#include <fnmatch.h>
+],
+[#ifdef FNM_CASEFOLD
+yippeeyeswehaveit
+#endif
+]),
+ [ac_cv_c_excellent_fnmatch=yes],
+ [ac_cv_c_excellent_fnmatch=no])
+])
+if test $ac_cv_c_excellent_fnmatch = yes; then
+ NO_FNMATCH_CASEFOLD=
+else
+ NO_FNMATCH_CASEFOLD=YesPlease
+fi
+AC_SUBST(NO_FNMATCH_CASEFOLD)
+#
# Define NO_MEMMEM if you don't have memmem.
GIT_CHECK_FUNC(memmem,
[NO_MEMMEM=],
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 3/8] Add string comparison functions that respect the ignore_case variable.
2010-10-03 9:07 ` Joshua Jensen
` (2 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 4/8] Case insensitivity support for .gitignore via core.ignorecase Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
Multiple locations within this patch series alter a case sensitive
string comparison call such as strcmp() to be a call to a string
comparison call that selects case comparison based on the global
ignore_case variable. Behaviorally, when core.ignorecase=false, the
*_icase() versions are functionally equivalent to their C runtime
counterparts. When core.ignorecase=true, the *_icase() versions perform
a case insensitive comparison.
Like Linus' earlier ignorecase patch, these may ignore filename
conventions on certain file systems. By isolating filename comparisons
to certain functions, support for those filename conventions may be more
easily met.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 16 ++++++++++++++++
dir.h | 4 ++++
2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/dir.c b/dir.c
index d1e5e5e..3432d58 100644
--- a/dir.c
+++ b/dir.c
@@ -18,6 +18,22 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, in
int check_only, const struct path_simplify *simplify);
static int get_dtype(struct dirent *de, const char *path, int len);
+/* helper string functions with support for the ignore_case flag */
+int strcmp_icase(const char *a, const char *b)
+{
+ return ignore_case ? strcasecmp(a, b) : strcmp(a, b);
+}
+
+int strncmp_icase(const char *a, const char *b, size_t count)
+{
+ return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
+}
+
+int fnmatch_icase(const char *pattern, const char *string, int flags)
+{
+ return fnmatch(pattern, string, flags | (ignore_case ? FNM_CASEFOLD : 0));
+}
+
static int common_prefix(const char **pathspec)
{
const char *path, *slash, *next;
diff --git a/dir.h b/dir.h
index 278d84c..b3e2104 100644
--- a/dir.h
+++ b/dir.h
@@ -101,4 +101,8 @@ extern int remove_dir_recursively(struct strbuf *path, int flag);
/* tries to remove the path with empty directories along it, ignores ENOENT */
extern int remove_path(const char *path);
+extern int strcmp_icase(const char *a, const char *b);
+extern int strncmp_icase(const char *a, const char *b, size_t count);
+extern int fnmatch_icase(const char *pattern, const char *string, int flags);
+
#endif
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 4/8] Case insensitivity support for .gitignore via core.ignorecase
2010-10-03 9:07 ` Joshua Jensen
` (3 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 3/8] Add string comparison functions that respect the ignore_case variable Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 5/8] Add case insensitivity support for directories when using git status Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
This is especially beneficial when using Windows and Perforce and the
git-p4 bridge. Internally, Perforce preserves a given file's full path
including its case at the time it was added to the Perforce repository.
When syncing a file down via Perforce, missing directories are created,
if necessary, using the case as stored with the filename. Unfortunately,
two files in the same directory can have differing cases for their
respective paths, such as /diRa/file1.c and /DirA/file2.c. Depending on
sync order, DirA/ may get created instead of diRa/.
It is possible to handle directory names in a case insensitive manner
without this patch, but it is highly inconvenient, requiring each
character to be specified like so: [Bb][Uu][Ii][Ll][Dd]. With this patch, the
gitignore exclusions honor the core.ignorecase=true configuration
setting and make the process less error prone. The above is specified
like so: Build
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/dir.c b/dir.c
index 3432d58..02c2a82 100644
--- a/dir.c
+++ b/dir.c
@@ -390,14 +390,14 @@ int excluded_from_list(const char *pathname,
if (x->flags & EXC_FLAG_NODIR) {
/* match basename */
if (x->flags & EXC_FLAG_NOWILDCARD) {
- if (!strcmp(exclude, basename))
+ if (!strcmp_icase(exclude, basename))
return to_exclude;
} else if (x->flags & EXC_FLAG_ENDSWITH) {
if (x->patternlen - 1 <= pathlen &&
- !strcmp(exclude + 1, pathname + pathlen - x->patternlen + 1))
+ !strcmp_icase(exclude + 1, pathname + pathlen - x->patternlen + 1))
return to_exclude;
} else {
- if (fnmatch(exclude, basename, 0) == 0)
+ if (fnmatch_icase(exclude, basename, 0) == 0)
return to_exclude;
}
}
@@ -412,14 +412,14 @@ int excluded_from_list(const char *pathname,
if (pathlen < baselen ||
(baselen && pathname[baselen-1] != '/') ||
- strncmp(pathname, x->base, baselen))
+ strncmp_icase(pathname, x->base, baselen))
continue;
if (x->flags & EXC_FLAG_NOWILDCARD) {
- if (!strcmp(exclude, pathname + baselen))
+ if (!strcmp_icase(exclude, pathname + baselen))
return to_exclude;
} else {
- if (fnmatch(exclude, pathname+baselen,
+ if (fnmatch_icase(exclude, pathname+baselen,
FNM_PATHNAME) == 0)
return to_exclude;
}
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 5/8] Add case insensitivity support for directories when using git status
2010-10-03 9:07 ` Joshua Jensen
` (4 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 4/8] Case insensitivity support for .gitignore via core.ignorecase Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
When using a case preserving but case insensitive file system, directory
case can differ but still refer to the same physical directory. git
status reports the directory with the alternate case as an Untracked
file. (That is, when mydir/filea.txt is added to the repository and
then the directory on disk is renamed from mydir/ to MyDir/, git status
shows MyDir/ as being untracked.)
Support has been added in name-hash.c for hashing directories with a
terminating slash into the name hash. When index_name_exists() is called
with a directory (a name with a terminating slash), the name is not
found via the normal cache_name_compare() call, but it is found in the
slow_same_name() function.
Additionally, in dir.c, directory_exists_in_index_icase() allows newly
added directories deeper in the directory chain to be identified.
Ultimately, it would be better if the file list was read in case
insensitive alphabetical order from disk, but this change seems to
suffice for now.
The end result is the directory is looked up in a case insensitive
manner and does not show in the Untracked files list.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 40 ++++++++++++++++++++++++++++++++-
name-hash.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 110 insertions(+), 2 deletions(-)
diff --git a/dir.c b/dir.c
index 02c2a82..cf8f65c 100644
--- a/dir.c
+++ b/dir.c
@@ -485,6 +485,39 @@ enum exist_status {
};
/*
+ * Do not use the alphabetically stored index to look up
+ * the directory name; instead, use the case insensitive
+ * name hash.
+ */
+static enum exist_status directory_exists_in_index_icase(const char *dirname, int len)
+{
+ struct cache_entry *ce = index_name_exists(&the_index, dirname, len + 1, ignore_case);
+ unsigned char endchar;
+
+ if (!ce)
+ return index_nonexistent;
+ endchar = ce->name[len];
+
+ /*
+ * The cache_entry structure returned will contain this dirname
+ * and possibly additional path components.
+ */
+ if (endchar == '/')
+ return index_directory;
+
+ /*
+ * If there are no additional path components, then this cache_entry
+ * represents a submodule. Submodules, despite being directories,
+ * are stored in the cache without a closing slash.
+ */
+ if (!endchar && S_ISGITLINK(ce->ce_mode))
+ return index_gitdir;
+
+ /* This should never be hit, but it exists just in case. */
+ return index_nonexistent;
+}
+
+/*
* The index sorts alphabetically by entry name, which
* means that a gitlink sorts as '\0' at the end, while
* a directory (which is defined not as an entry, but as
@@ -493,7 +526,12 @@ enum exist_status {
*/
static enum exist_status directory_exists_in_index(const char *dirname, int len)
{
- int pos = cache_name_pos(dirname, len);
+ int pos;
+
+ if (ignore_case)
+ return directory_exists_in_index_icase(dirname, len);
+
+ pos = cache_name_pos(dirname, len);
if (pos < 0)
pos = -pos-1;
while (pos < active_nr) {
diff --git a/name-hash.c b/name-hash.c
index 0031d78..c6b6a3f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -32,6 +32,42 @@ static unsigned int hash_name(const char *name, int namelen)
return hash;
}
+static void hash_index_entry_directories(struct index_state *istate, struct cache_entry *ce)
+{
+ /*
+ * Throw each directory component in the hash for quick lookup
+ * during a git status. Directory components are stored with their
+ * closing slash. Despite submodules being a directory, they never
+ * reach this point, because they are stored without a closing slash
+ * in the cache.
+ *
+ * Note that the cache_entry stored with the directory does not
+ * represent the directory itself. It is a pointer to an existing
+ * filename, and its only purpose is to represent existence of the
+ * directory in the cache. It is very possible multiple directory
+ * hash entries may point to the same cache_entry.
+ */
+ unsigned int hash;
+ void **pos;
+
+ const char *ptr = ce->name;
+ while (*ptr) {
+ while (*ptr && *ptr != '/')
+ ++ptr;
+ if (*ptr == '/') {
+ ++ptr;
+ hash = hash_name(ce->name, ptr - ce->name);
+ if (!lookup_hash(hash, &istate->name_hash)) {
+ pos = insert_hash(hash, ce, &istate->name_hash);
+ if (pos) {
+ ce->next = *pos;
+ *pos = ce;
+ }
+ }
+ }
+ }
+}
+
static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
{
void **pos;
@@ -47,6 +83,9 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
ce->next = *pos;
*pos = ce;
}
+
+ if (ignore_case)
+ hash_index_entry_directories(istate, ce);
}
static void lazy_init_name_hash(struct index_state *istate)
@@ -97,7 +136,21 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
if (len == namelen && !cache_name_compare(name, namelen, ce->name, len))
return 1;
- return icase && slow_same_name(name, namelen, ce->name, len);
+ if (!icase)
+ return 0;
+
+ /*
+ * If the entry we're comparing is a filename (no trailing slash), then compare
+ * the lengths exactly.
+ */
+ if (name[namelen - 1] != '/')
+ return slow_same_name(name, namelen, ce->name, len);
+
+ /*
+ * For a directory, we point to an arbitrary cache_entry filename. Just
+ * make sure the directory portion matches.
+ */
+ return slow_same_name(name, namelen, ce->name, namelen < len ? namelen : len);
}
struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
@@ -115,5 +168,22 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
}
ce = ce->next;
}
+
+ /*
+ * Might be a submodule. Despite submodules being directories,
+ * they are stored in the name hash without a closing slash.
+ * When ignore_case is 1, directories are stored in the name hash
+ * with their closing slash.
+ *
+ * The side effect of this storage technique is we have need to
+ * remove the slash from name and perform the lookup again without
+ * the slash. If a match is made, S_ISGITLINK(ce->mode) will be
+ * true.
+ */
+ if (icase && name[namelen - 1] == '/') {
+ ce = index_name_exists(istate, name, namelen - 1, icase);
+ if (ce && S_ISGITLINK(ce->ce_mode))
+ return ce;
+ }
return NULL;
}
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 9:07 ` Joshua Jensen
` (5 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 5/8] Add case insensitivity support for directories when using git status Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 11:54 ` Thomas Adam
2010-10-04 16:02 ` Robin Rosenberg
2010-10-03 9:56 ` [PATCH/RFC v3 7/8] Support case folding for git add when core.ignorecase=true Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
9 siblings, 2 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
MyDir/fileb.txt is added, running git ls-files mydir only shows
mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
Running git ls-files mYdIR shows nothing.
With this patch running git ls-files for mydir, MyDir, and mYdIR shows
mydir/filea.txt and MyDir/fileb.txt.
Wildcards are not handled case insensitively in this patch. Example:
MyDir/aBc/file.txt is added. git ls-files MyDir/a* works fine, but git
ls-files mydir/a* does not.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 38 ++++++++++++++++++++++++++------------
1 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/dir.c b/dir.c
index cf8f65c..53aa4f3 100644
--- a/dir.c
+++ b/dir.c
@@ -107,16 +107,30 @@ static int match_one(const char *match, const char *name, int namelen)
if (!*match)
return MATCHED_RECURSIVELY;
- for (;;) {
- unsigned char c1 = *match;
- unsigned char c2 = *name;
- if (c1 == '\0' || is_glob_special(c1))
- break;
- if (c1 != c2)
- return 0;
- match++;
- name++;
- namelen--;
+ if (ignore_case) {
+ for (;;) {
+ unsigned char c1 = tolower(*match);
+ unsigned char c2 = tolower(*name);
+ if (c1 == '\0' || is_glob_special(c1))
+ break;
+ if (c1 != c2)
+ return 0;
+ match++;
+ name++;
+ namelen--;
+ }
+ } else {
+ for (;;) {
+ unsigned char c1 = *match;
+ unsigned char c2 = *name;
+ if (c1 == '\0' || is_glob_special(c1))
+ break;
+ if (c1 != c2)
+ return 0;
+ match++;
+ name++;
+ namelen--;
+ }
}
@@ -125,8 +139,8 @@ static int match_one(const char *match, const char *name, int namelen)
* we need to match by fnmatch
*/
matchlen = strlen(match);
- if (strncmp(match, name, matchlen))
- return !fnmatch(match, name, 0) ? MATCHED_FNMATCH : 0;
+ if (strncmp_icase(match, name, matchlen))
+ return !fnmatch_icase(match, name, 0) ? MATCHED_FNMATCH : 0;
if (namelen == matchlen)
return MATCHED_EXACTLY;
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 7/8] Support case folding for git add when core.ignorecase=true
2010-10-03 9:07 ` Joshua Jensen
` (6 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 8/8] Support case folding in git fast-import " Ævar Arnfjörð Bjarmason
2010-10-07 4:13 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Junio C Hamano
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/
is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the
index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although
the earlier portions of this patch series account for those differences
in case, this patch makes the pathing consistent by folding the case of
newly added files against the first file added with that path.
In read-cache.c's add_to_index(), the index_name_exists() support used
for git status's case insensitive directory lookups is used to find the
proper directory case according to what the user already checked in.
That is, MyDir/ABC/'s case is used to alter the stored path for
fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt).
This is especially important when cloning a repository to a case
sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same
directory on a Windows machine, but on Linux, the files exist in two
separate directories. The update to add_to_index(), in effect, treats a
Windows file system as case sensitive by making path case consistent.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
read-cache.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 1f42473..379862c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -608,6 +608,29 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
ce->ce_mode = ce_mode_from_stat(ent, st_mode);
}
+ /* When core.ignorecase=true, determine if a directory of the same name but differing
+ * case already exists within the Git repository. If it does, ensure the directory
+ * case of the file being added to the repository matches (is folded into) the existing
+ * entry's directory case.
+ */
+ if (ignore_case) {
+ const char *startPtr = ce->name;
+ const char *ptr = startPtr;
+ while (*ptr) {
+ while (*ptr && *ptr != '/')
+ ++ptr;
+ if (*ptr == '/') {
+ struct cache_entry *foundce;
+ ++ptr;
+ foundce = index_name_exists(&the_index, ce->name, ptr - ce->name, ignore_case);
+ if (foundce) {
+ memcpy((void*)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);
+ startPtr = ptr;
+ }
+ }
+ }
+ }
+
alias = index_name_exists(istate, ce->name, ce_namelen(ce), ignore_case);
if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
/* Nothing changed, really */
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH/RFC v3 8/8] Support case folding in git fast-import when core.ignorecase=true
2010-10-03 9:07 ` Joshua Jensen
` (7 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 7/8] Support case folding for git add when core.ignorecase=true Ævar Arnfjörð Bjarmason
@ 2010-10-03 9:56 ` Ævar Arnfjörð Bjarmason
2010-10-07 4:13 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Junio C Hamano
9 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-03 9:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
From: Joshua Jensen <jjensen@workspacewhiz.com>
When core.ignorecase=true, imported file paths will be folded to match
existing directory case.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
fast-import.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..e214048 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -156,6 +156,7 @@ Format of STDIN stream:
#include "csum-file.h"
#include "quote.h"
#include "exec_cmd.h"
+#include "dir.h"
#define PACK_ID_BITS 16
#define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -1461,7 +1462,7 @@ static int tree_content_set(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (!slash1) {
if (!S_ISDIR(mode)
&& e->versions[1].mode == mode
@@ -1527,7 +1528,7 @@ static int tree_content_remove(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (slash1 && !S_ISDIR(e->versions[1].mode))
/*
* If p names a file in some subdirectory, and a
@@ -1585,7 +1586,7 @@ static int tree_content_get(
for (i = 0; i < t->entry_count; i++) {
e = t->entries[i];
- if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
+ if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
if (!slash1) {
memcpy(leaf, e, sizeof(*leaf));
if (e->tree && is_null_sha1(e->versions[1].sha1))
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
` (6 preceding siblings ...)
2010-10-03 8:17 ` [PATCH v2 0/6] Extensions of core.ignorecase=true support Johannes Sixt
@ 2010-10-03 11:48 ` Robert Buck
2010-10-03 18:12 ` Johannes Sixt
7 siblings, 1 reply; 45+ messages in thread
From: Robert Buck @ 2010-10-03 11:48 UTC (permalink / raw)
To: Joshua Jensen; +Cc: git, j6t
Referring back to my earlier comment to this patch series, which was
proposed on August 17, while I tend to agree to the changes that help
listing- operations, those changes that fold case concern me. Let me
explain...
You may find there is a strong contingent of people that would approve
of, and would want to use, case insensitivity for gitignore and ls,
for example; but by tying the same single property (core.ignorecase)
to the case folding behaviors some people would avoid the feature all
together, which would be unfortunate, when they otherwise could
benefit from at least one part of the new behavior.
There were several key things that went wrong in early git
development, this and the eol support were two casualties. The eol
support, as you recall, deprecated the old property in favor of a
couple new superior ones. I would recommend that the same thing be
done here, deprecate the old ignorecase property by introducing two
better ones.
So I could we please separate the behaviors that change intent
(folding) from the behaviors that merely alter how things are
displayed (listing) by splitting this into two separate properties?
For example,
core.casepreserving=true|false
core.caseinsensitive=true|false
The former property would control folding, the latter property would
apply to listing and pattern matching. Then people could opt out of
the folding behaviors (add, import), while continuing to adopt listing
and pattern matching (status, ls, ignore).
Again, deprecate core.ignorecase by making it default to {false,false}
for the new properties if unspecified, which would also be the default
if all three of the properties are unspecified. If ignorecase is
specified to be true, then default to {false, true}, respectively.
Would this be possible?
On Sun, Oct 3, 2010 at 12:32 AM, Joshua Jensen
<jjensen@workspacewhiz.com> wrote:
> The second version of this patch series fixes the problematic case
> insensitive fnmatch call in patch 1 that relied on an apparently GNU-only
> extension. Instead, the pattern and string are lowercased into
> temporary buffers, and the standard fnmatch is called without relying
> on the GNU extension.
>
> Patches 2-6 received no modifications.
>
> The original cover for the patch series follows as posted by Johannes Sixt:
>
> The following patch series extends the core.ignorecase=true support to
> handle case insensitive comparisons for the .gitignore file, git status,
> and git ls-files. git add and git fast-import will fold the case of the
> file being added, matching that of an already added directory entry. Case
> folding is also applied to git fast-import for renames, copies, and deletes.
>
> The most notable benefit, IMO, is that the case of directories in the
> worktree does not matter if, and only if, the directory exists already in
> the index with some different case variant. This helps applications on
> Windows that change the case even of directories in unpredictable ways.
> Joshua mentioned Perforce as the primary example.
>
> Concerning the implementation, Joshua explained when he initially submitted
> the series to the msysgit mailing list:
>
> git status and add both use an update made to name-hash.c where
> directories, specifically names with a trailing slash, can be looked up
> in a case insensitive manner. After trying a myriad of solutions, this
> seemed to be the cleanest. Does anyone see a problem with embedding the
> directory names in the same hash as the file names? I couldn't find one,
> especially since I append a slash to each directory name.
>
> The git add path case folding functionality is a somewhat radical
> departure from what Git does now. It is described in detail in patch 5.
> Does anyone have any concerns?
>
> I support the idea of this patch, and I can confirm that it works: I've
> used this series in production both with core.ignorecase set to true and
> to false, and in the former case, with directories and files with case
> different from the index.
>
> Joshua Jensen (6):
> Add string comparison functions that respect the ignore_case variable.
> Case insensitivity support for .gitignore via core.ignorecase
> Add case insensitivity support for directories when using git status
> Add case insensitivity support when using git ls-files
> Support case folding for git add when core.ignorecase=true
> Support case folding in git fast-import when core.ignorecase=true
>
>
> dir.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
> dir.h | 4 ++
> fast-import.c | 7 ++-
> name-hash.c | 72 +++++++++++++++++++++++++++
> read-cache.c | 23 +++++++++
> 5 files changed, 235 insertions(+), 23 deletions(-)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 9:56 ` [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files Ævar Arnfjörð Bjarmason
@ 2010-10-03 11:54 ` Thomas Adam
2010-10-03 18:19 ` Johannes Sixt
2010-10-04 16:02 ` Robin Rosenberg
1 sibling, 1 reply; 45+ messages in thread
From: Thomas Adam @ 2010-10-03 11:54 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
Hi --
On 3 October 2010 10:56, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> + if (ignore_case) {
> + for (;;) {
> + unsigned char c1 = tolower(*match);
> + unsigned char c2 = tolower(*name);
> + if (c1 == '\0' || is_glob_special(c1))
> + break;
> + if (c1 != c2)
> + return 0;
> + match++;
> + name++;
> + namelen--;
> + }
> + } else {
> + for (;;) {
> + unsigned char c1 = *match;
> + unsigned char c2 = *name;
> + if (c1 == '\0' || is_glob_special(c1))
> + break;
> + if (c1 != c2)
> + return 0;
> + match++;
> + name++;
> + namelen--;
> + }
> }
It's a real shame about the code duplication here. Can we not avoid
it just by doing:
unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
I appreciate that to some it might look like perl golf, but...
-- Thomas Adam
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 6/6] Support case folding in git fast-import when core.ignorecase=true
2010-10-03 4:32 ` [PATCH v2 6/6] Support case folding in git fast-import " Joshua Jensen
@ 2010-10-03 13:00 ` Sverre Rabbelier
0 siblings, 0 replies; 45+ messages in thread
From: Sverre Rabbelier @ 2010-10-03 13:00 UTC (permalink / raw)
To: Joshua Jensen; +Cc: git, j6t
Heya,
On Sun, Oct 3, 2010 at 06:32, Joshua Jensen <jjensen@workspacewhiz.com> wrote:
> When core.ignorecase=true, imported file paths will be folded to match
> existing directory case.
I haven't checked if you got all the relevant cases in fast-import.c,
but the cases you do address look good.
How would this handle incremental imports? How does this handle
someone adding a file twice, with the same name but different case?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
2010-10-03 9:56 ` [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag Ævar Arnfjörð Bjarmason
@ 2010-10-03 17:58 ` Johannes Sixt
2010-10-04 2:48 ` [PATCH/RFC v4 " Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-10-03 17:58 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Joshua Jensen, Brandon Casey
On Sonntag, 3. Oktober 2010, Ævar Arnfjörð Bjarmason wrote:
> @@ -1350,6 +1354,11 @@ ifdef NO_FNMATCH
> COMPAT_CFLAGS += -DNO_FNMATCH
> COMPAT_OBJS += compat/fnmatch/fnmatch.o
> endif
> +ifdef NO_FNMATCH_CASEFOLD
> + COMPAT_CFLAGS += -Icompat/fnmatch
> + COMPAT_CFLAGS += -DNO_FNMATCH_CASEFOLD
> + COMPAT_OBJS += compat/fnmatch/fnmatch.o
> +endif
I think you should protect against defining both NO_FNMATCH and
NO_FNMATCH_CASEFOLD (your version would link compat/fnmatch/fnmatch.o twice
in this case):
ifdef NO_FNMATCH
...
else
ifdef NO_FNMATCH_CASEFOLD
...
endif
endif
Otherwise, the patch looks fine.
-- Hannes
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-03 11:48 ` Robert Buck
@ 2010-10-03 18:12 ` Johannes Sixt
2010-10-06 22:04 ` Robert Buck
0 siblings, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-10-03 18:12 UTC (permalink / raw)
To: Robert Buck; +Cc: Joshua Jensen, git
On Sonntag, 3. Oktober 2010, Robert Buck wrote:
> So I could we please separate the behaviors that change intent
> (folding) from the behaviors that merely alter how things are
> displayed (listing) by splitting this into two separate properties?
> For example,
>
> core.casepreserving=true|false
> core.caseinsensitive=true|false
>
> The former property would control folding, the latter property would
> apply to listing and pattern matching. Then people could opt out of
> the folding behaviors (add, import), while continuing to adopt listing
> and pattern matching (status, ls, ignore).
core.ignorecase has a very well-defined meaning: It describes whether the
worktree lives on a filesystem that is case-insensitive. Perhaps you could
help me understand your case if you gave examples and a use-case? I have a
slight suspicion that your wish is orthogonal to core.ignorecase.
-- Hannes
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 11:54 ` Thomas Adam
@ 2010-10-03 18:19 ` Johannes Sixt
2010-10-03 21:59 ` Thomas Adam
2010-10-04 7:49 ` Jonathan Nieder
0 siblings, 2 replies; 45+ messages in thread
From: Johannes Sixt @ 2010-10-03 18:19 UTC (permalink / raw)
To: Thomas Adam
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Joshua Jensen, Brandon Casey
On Sonntag, 3. Oktober 2010, Thomas Adam wrote:
> Hi --
>
> On 3 October 2010 10:56, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > + if (ignore_case) {
> > + for (;;) {
> > + unsigned char c1 = tolower(*match);
> > + unsigned char c2 = tolower(*name);
> > + if (c1 == '\0' || is_glob_special(c1))
> > + break;
> > + if (c1 != c2)
> > + return 0;
> > + match++;
> > + name++;
> > + namelen--;
> > + }
> > + } else {
> > + for (;;) {
> > + unsigned char c1 = *match;
> > + unsigned char c2 = *name;
> > + if (c1 == '\0' || is_glob_special(c1))
> > + break;
> > + if (c1 != c2)
> > + return 0;
> > + match++;
> > + name++;
> > + namelen--;
> > + }
> > }
>
> It's a real shame about the code duplication here. Can we not avoid
> it just by doing:
>
> unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
> unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
>
> I appreciate that to some it might look like perl golf, but...
It has been discussed, and IIRC, the concensus was to keep the code
duplication because this is an inner loop.
-- Hannes
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 18:19 ` Johannes Sixt
@ 2010-10-03 21:59 ` Thomas Adam
2010-10-04 7:49 ` Jonathan Nieder
1 sibling, 0 replies; 45+ messages in thread
From: Thomas Adam @ 2010-10-03 21:59 UTC (permalink / raw)
To: Johannes Sixt
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Joshua Jensen, Brandon Casey
Hi --
On 3 October 2010 19:19, Johannes Sixt <j6t@kdbg.org> wrote:
> On Sonntag, 3. Oktober 2010, Thomas Adam wrote:
>> Hi --
>>
>> On 3 October 2010 10:56, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> > + if (ignore_case) {
>> > + for (;;) {
>> > + unsigned char c1 = tolower(*match);
>> > + unsigned char c2 = tolower(*name);
>> > + if (c1 == '\0' || is_glob_special(c1))
>> > + break;
>> > + if (c1 != c2)
>> > + return 0;
>> > + match++;
>> > + name++;
>> > + namelen--;
>> > + }
>> > + } else {
>> > + for (;;) {
>> > + unsigned char c1 = *match;
>> > + unsigned char c2 = *name;
>> > + if (c1 == '\0' || is_glob_special(c1))
>> > + break;
>> > + if (c1 != c2)
>> > + return 0;
>> > + match++;
>> > + name++;
>> > + namelen--;
>> > + }
>> > }
>>
>> It's a real shame about the code duplication here. Can we not avoid
>> it just by doing:
>>
>> unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
>> unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
>>
>> I appreciate that to some it might look like perl golf, but...
>
> It has been discussed, and IIRC, the concensus was to keep the code
> duplication because this is an inner loop.
I must have missed the discussion -- but why/how does making it an
inner-loop somehow prevent it from such an obvious (and readable)
optimisation, which would have fitted in well in other areas.
-- Thomas Adam
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-03 8:17 ` [PATCH v2 0/6] Extensions of core.ignorecase=true support Johannes Sixt
@ 2010-10-03 23:34 ` Junio C Hamano
0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-10-03 23:34 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Joshua Jensen, git, Junio C Hamano
Johannes Sixt <j6t@kdbg.org> writes:
> Junio, IIRC, the series appeared in next for some time before the 1.7.3
> release. Does this imply that you reviewed the series and deemed the
> implementation sound?
Not really. I knew that I had an opportunity to rewind whatever crap I
throw in 'next' soon, and wanted to see if anybody screams upon stumbling
on breakages ;-)
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH/RFC v4 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
2010-10-03 17:58 ` Johannes Sixt
@ 2010-10-04 2:48 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-04 2:48 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey,
Ævar Arnfjörð Bjarmason
On some platforms (like Solaris) there is a fnmatch, but it doesn't
support the GNU FNM_CASEFOLD extension that's used by the
jj/icase-directory series' fnmatch_icase wrapper.
Change the Makefile so that it's now possible to set
NO_FNMATCH_CASEFOLD=YesPlease on those systems, and add a configure
probe for it.
Unlike the NO_REGEX check we don't add AC_INCLUDES_DEFAULT to our
headers. This is because on a GNU system the definition of
FNM_CASEFOLD in fnmatch.h is guarded by:
#if !defined _POSIX_C_SOURCE || _POSIX_C_SOURCE < 2 || defined _GNU_SOURCE
One of the headers AC_INCLUDES_DEFAULT includes ends up defining one
of those, so if we'd use it we'd always get
NO_FNMATCH_CASEFOLD=YesPlease on GNU systems, even though they have
FNM_CASEFOLD.
When checking the flags we use:
ifdef NO_FNMATCH
...
else
ifdef NO_FNMATCH_CASEFOLD
...
endif
endif
The "else" so that we don't link against compat/fnmatch/fnmatch.o
twice if both NO_FNMATCH and NO_FNMATCH_CASEFOLD are defined.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
On Sun, Oct 3, 2010 at 17:58, Johannes Sixt <j6t@kdbg.org> wrote:
> I think you should protect against defining both NO_FNMATCH and
> NO_FNMATCH_CASEFOLD (your version would link compat/fnmatch/fnmatch.o twice
> in this case):
Well spotted. That's fixed in this version.
Makefile | 10 ++++++++++
config.mak.in | 1 +
configure.ac | 22 ++++++++++++++++++++++
3 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/Makefile b/Makefile
index f7c4383..7bd0a2b 100644
--- a/Makefile
+++ b/Makefile
@@ -72,6 +72,9 @@ all::
#
# Define NO_FNMATCH if you don't have fnmatch in the C library.
#
+# Define NO_FNMATCH_CASEFOLD if your fnmatch function doesn't have the
+# FNM_CASEFOLD GNU extension.
+#
# Define NO_LIBGEN_H if you don't have libgen.h.
#
# Define NEEDS_LIBGEN if your libgen needs -lgen when linking
@@ -848,6 +851,7 @@ ifeq ($(uname_S),SunOS)
NO_MKDTEMP = YesPlease
NO_MKSTEMPS = YesPlease
NO_REGEX = YesPlease
+ NO_FNMATCH_CASEFOLD = YesPlease
ifeq ($(uname_R),5.6)
SOCKLEN_T = int
NO_HSTRERROR = YesPlease
@@ -1349,6 +1353,12 @@ ifdef NO_FNMATCH
COMPAT_CFLAGS += -Icompat/fnmatch
COMPAT_CFLAGS += -DNO_FNMATCH
COMPAT_OBJS += compat/fnmatch/fnmatch.o
+else
+ifdef NO_FNMATCH_CASEFOLD
+ COMPAT_CFLAGS += -Icompat/fnmatch
+ COMPAT_CFLAGS += -DNO_FNMATCH_CASEFOLD
+ COMPAT_OBJS += compat/fnmatch/fnmatch.o
+endif
endif
ifdef NO_SETENV
COMPAT_CFLAGS += -DNO_SETENV
diff --git a/config.mak.in b/config.mak.in
index aaa70a8..56343ba 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -48,6 +48,7 @@ NO_HSTRERROR=@NO_HSTRERROR@
NO_STRCASESTR=@NO_STRCASESTR@
NO_STRTOK_R=@NO_STRTOK_R@
NO_FNMATCH=@NO_FNMATCH@
+NO_FNMATCH_CASEFOLD=@NO_FNMATCH_CASEFOLD@
NO_MEMMEM=@NO_MEMMEM@
NO_STRLCPY=@NO_STRLCPY@
NO_UINTMAX_T=@NO_UINTMAX_T@
diff --git a/configure.ac b/configure.ac
index 7715f6c..6dd9241 100644
--- a/configure.ac
+++ b/configure.ac
@@ -824,6 +824,28 @@ GIT_CHECK_FUNC(fnmatch,
[NO_FNMATCH=YesPlease])
AC_SUBST(NO_FNMATCH)
#
+# Define NO_FNMATCH_CASEFOLD if your fnmatch function doesn't have the
+# FNM_CASEFOLD GNU extension.
+AC_CACHE_CHECK([whether the fnmatch function supports the FNMATCH_CASEFOLD GNU extension],
+ [ac_cv_c_excellent_fnmatch], [
+AC_EGREP_CPP(yippeeyeswehaveit,
+ AC_LANG_PROGRAM([
+#include <fnmatch.h>
+],
+[#ifdef FNM_CASEFOLD
+yippeeyeswehaveit
+#endif
+]),
+ [ac_cv_c_excellent_fnmatch=yes],
+ [ac_cv_c_excellent_fnmatch=no])
+])
+if test $ac_cv_c_excellent_fnmatch = yes; then
+ NO_FNMATCH_CASEFOLD=
+else
+ NO_FNMATCH_CASEFOLD=YesPlease
+fi
+AC_SUBST(NO_FNMATCH_CASEFOLD)
+#
# Define NO_MEMMEM if you don't have memmem.
GIT_CHECK_FUNC(memmem,
[NO_MEMMEM=],
--
1.7.3.159.g610493
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 18:19 ` Johannes Sixt
2010-10-03 21:59 ` Thomas Adam
@ 2010-10-04 7:49 ` Jonathan Nieder
2010-10-04 8:02 ` Ævar Arnfjörð Bjarmason
2010-10-04 14:03 ` Erik Faye-Lund
1 sibling, 2 replies; 45+ messages in thread
From: Jonathan Nieder @ 2010-10-04 7:49 UTC (permalink / raw)
To: Johannes Sixt
Cc: Thomas Adam, Ævar Arnfjörð Bjarmason, git,
Junio C Hamano, Joshua Jensen, Brandon Casey
Johannes Sixt wrote:
> On Sonntag, 3. Oktober 2010, Thomas Adam wrote:
>> It's a real shame about the code duplication here. Can we not avoid
>> it just by doing:
>>
>> unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
>> unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
>>
>> I appreciate that to some it might look like perl golf, but...
>
> It has been discussed, and IIRC, the concensus was to keep the code
> duplication because this is an inner loop.
Did anyone time it? If it really is not dwarfed by other computation,
then how about (warning: ugly!)
static inline int step(unsigned char c1, unsigned char c2,
const char **match, const char **name, int *namelen)
{
if (c1 == '\0' || is_glob_special(c1))
return 1; /* break */
if (c1 != c2)
return 0; /* found mismatch! */
(*match)++;
(*name)++;
(*namelen)--;
return 2; /* continue */
}
...
int r = 1;
if (!ignore_case) {
while ((r = step(*match, *name, &match, &name, &namelen)) == 2)
; /* matches so far */
} else {
while ((r = step(tolower(*match), tolower(*name),
&match, &name, &namelen)) == 2)
; /* matches so far */
}
if (!r) /* found mismatch! */
return 0;
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 7:49 ` Jonathan Nieder
@ 2010-10-04 8:02 ` Ævar Arnfjörð Bjarmason
2010-10-04 14:03 ` Erik Faye-Lund
1 sibling, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-04 8:02 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Johannes Sixt, Thomas Adam, git, Junio C Hamano, Joshua Jensen,
Brandon Casey
On Mon, Oct 4, 2010 at 07:49, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Did anyone time it?
> static inline int step(unsigned char c1, unsigned char c2,
If it hasn't been timed isn't it better to leave it up to the compiler
whether it wants to inline that or not?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 7:49 ` Jonathan Nieder
2010-10-04 8:02 ` Ævar Arnfjörð Bjarmason
@ 2010-10-04 14:03 ` Erik Faye-Lund
2010-10-04 14:58 ` Joshua Jensen
1 sibling, 1 reply; 45+ messages in thread
From: Erik Faye-Lund @ 2010-10-04 14:03 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Johannes Sixt, Thomas Adam, Ævar Arnfjörð, git,
Junio C Hamano, Joshua Jensen, Brandon Casey
On Mon, Oct 4, 2010 at 9:49 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Johannes Sixt wrote:
>> On Sonntag, 3. Oktober 2010, Thomas Adam wrote:
>
>>> It's a real shame about the code duplication here. Can we not avoid
>>> it just by doing:
>>>
>>> unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
>>> unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
>>>
>>> I appreciate that to some it might look like perl golf, but...
>>
>> It has been discussed, and IIRC, the concensus was to keep the code
>> duplication because this is an inner loop.
>
> Did anyone time it? If it really is not dwarfed by other computation,
> then how about (warning: ugly!)
>
I believe it was timed. I was the one who reacted on this the first
time around, and I seem to remember that the performance impact was
indeed significant. This function is used all the time when updating
the index etc IIRC.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 14:03 ` Erik Faye-Lund
@ 2010-10-04 14:58 ` Joshua Jensen
2010-10-04 17:03 ` Jonathan Nieder
0 siblings, 1 reply; 45+ messages in thread
From: Joshua Jensen @ 2010-10-04 14:58 UTC (permalink / raw)
To: kusmabite
Cc: Jonathan Nieder, Johannes Sixt, Thomas Adam,
Ævar Arnfjörð, git, Junio C Hamano, Brandon Casey
----- Original Message -----
From: Erik Faye-Lund
Date: 10/4/2010 8:03 AM
> On Mon, Oct 4, 2010 at 9:49 AM, Jonathan Nieder<jrnieder@gmail.com> wrote:
>> Johannes Sixt wrote:
>>> On Sonntag, 3. Oktober 2010, Thomas Adam wrote:
>>>> It's a real shame about the code duplication here. Can we not avoid
>>>> it just by doing:
>>>>
>>>> unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
>>>> unisgned char c2 = (ignore_case) ? tolower(*name) : *name;
>>>>
>>>> I appreciate that to some it might look like perl golf, but...
>>> It has been discussed, and IIRC, the concensus was to keep the code
>>> duplication because this is an inner loop.
>> Did anyone time it? If it really is not dwarfed by other computation,
>> then how about (warning: ugly!)
>>
> I believe it was timed. I was the one who reacted on this the first
> time around, and I seem to remember that the performance impact was
> indeed significant. This function is used all the time when updating
> the index etc IIRC.
In a good sized repository I have in front of me now, running 'git
ls-files' through this code path results in 705,374 characters being
processed by this body of code. Given the code listed above, that means
we add 1,410,748 additional comparisons that everyone has to suffer
through, even those on a case sensitive file system. Sure, the code
could be optimized to not perform the double comparison, and the
compiler may actually perform that optimization. Still, it is hundreds
of thousands of additional comparisons and branches that were not there
before.
I'm running on a really, really fast machine, a Xeon X5560. The
difference in time for the above code versus what is in the patch seems
to average about 0.07 seconds. Remember, this is an incredibly fast
machine, and I imagine it will be worse on machines with slower
processors and less cache.
As discussed in the original thread (which, I believe, was on the
msysGit mailing list), one of Git's features is its speed. Maintaining
that speed in the core.ignorecase=false case is top priority for me, but
others with more know how can tell me I'm wrong.
Josh
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-03 9:56 ` [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files Ævar Arnfjörð Bjarmason
2010-10-03 11:54 ` Thomas Adam
@ 2010-10-04 16:02 ` Robin Rosenberg
2010-10-04 16:41 ` Ævar Arnfjörð Bjarmason
` (3 more replies)
1 sibling, 4 replies; 45+ messages in thread
From: Robin Rosenberg @ 2010-10-04 16:02 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
söndagen den 3 oktober 2010 11.56.44 skrev Ævar Arnfjörð Bjarmason:
> From: Joshua Jensen <jjensen@workspacewhiz.com>
>
> When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
> MyDir/fileb.txt is added, running git ls-files mydir only shows
> mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
> Running git ls-files mYdIR shows nothing.
>
> With this patch running git ls-files for mydir, MyDir, and mYdIR shows
> mydir/filea.txt and MyDir/fileb.txt.
>
> Wildcards are not handled case insensitively in this patch. Example:
> MyDir/aBc/file.txt is added. git ls-files MyDir/a* works fine, but git
> ls-files mydir/a* does not.
>
> Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> dir.c | 38 ++++++++++++++++++++++++++------------
> 1 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index cf8f65c..53aa4f3 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -107,16 +107,30 @@ static int match_one(const char *match, const char
> *name, int namelen) if (!*match)
> return MATCHED_RECURSIVELY;
>
> - for (;;) {
> - unsigned char c1 = *match;
> - unsigned char c2 = *name;
> - if (c1 == '\0' || is_glob_special(c1))
> - break;
> - if (c1 != c2)
> - return 0;
> - match++;
> - name++;
> - namelen--;
> + if (ignore_case) {
> + for (;;) {
> + unsigned char c1 = tolower(*match);
> + unsigned char c2 = tolower(*name);
Is anyone thinking "unicode" around here?
-- robin
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:02 ` Robin Rosenberg
@ 2010-10-04 16:41 ` Ævar Arnfjörð Bjarmason
2010-10-04 16:48 ` Erik Faye-Lund
` (2 subsequent siblings)
3 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-04 16:41 UTC (permalink / raw)
To: Robin Rosenberg
Cc: git, Junio C Hamano, Joshua Jensen, Johannes Sixt, Brandon Casey
On Mon, Oct 4, 2010 at 16:02, Robin Rosenberg
<robin.rosenberg@dewire.com> wrote:
> Is anyone thinking "unicode" around here?
Thinking yeah, doing the massive work to implement it: no.
It's much harder when you don't know what encoding the data is in,
encoding is only by repository convention in Git.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:02 ` Robin Rosenberg
2010-10-04 16:41 ` Ævar Arnfjörð Bjarmason
@ 2010-10-04 16:48 ` Erik Faye-Lund
2010-10-04 16:49 ` Joshua Jensen
2010-10-04 19:02 ` Johannes Sixt
3 siblings, 0 replies; 45+ messages in thread
From: Erik Faye-Lund @ 2010-10-04 16:48 UTC (permalink / raw)
To: Robin Rosenberg
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Joshua Jensen, Johannes Sixt, Brandon Casey
On Mon, Oct 4, 2010 at 6:02 PM, Robin Rosenberg
<robin.rosenberg@dewire.com> wrote:
> söndagen den 3 oktober 2010 11.56.44 skrev Ævar Arnfjörð Bjarmason:
>> From: Joshua Jensen <jjensen@workspacewhiz.com>
>>
>> When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
>> MyDir/fileb.txt is added, running git ls-files mydir only shows
>> mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
>> Running git ls-files mYdIR shows nothing.
>>
>> With this patch running git ls-files for mydir, MyDir, and mYdIR shows
>> mydir/filea.txt and MyDir/fileb.txt.
>>
>> Wildcards are not handled case insensitively in this patch. Example:
>> MyDir/aBc/file.txt is added. git ls-files MyDir/a* works fine, but git
>> ls-files mydir/a* does not.
>>
>> Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
>> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>> dir.c | 38 ++++++++++++++++++++++++++------------
>> 1 files changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/dir.c b/dir.c
>> index cf8f65c..53aa4f3 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -107,16 +107,30 @@ static int match_one(const char *match, const char
>> *name, int namelen) if (!*match)
>> return MATCHED_RECURSIVELY;
>>
>> - for (;;) {
>> - unsigned char c1 = *match;
>> - unsigned char c2 = *name;
>> - if (c1 == '\0' || is_glob_special(c1))
>> - break;
>> - if (c1 != c2)
>> - return 0;
>> - match++;
>> - name++;
>> - namelen--;
>> + if (ignore_case) {
>> + for (;;) {
>> + unsigned char c1 = tolower(*match);
>> + unsigned char c2 = tolower(*name);
>
> Is anyone thinking "unicode" around here?
>
You're not the first to think about the combination of core.ignorecase
and unicode, but unfortunately way too few people have.
slow_same_name() (and index_name_exists() by proxy) already does the
Wrong Thing (tm), so the problem is already rooted in the index. The
consensus on the msysGit mailing list last time this was brought up
[1] was simply to ignore the combination of unicode and
core.ignorecase, but I'm not sure I'm convinced myself that it's a
good idea. We might end up painting our selves further into a corner,
in the end making it nearly impossible to fix.
One complicating factor is that Windows' definition of what
character-pairs compare as identical depends on a table stored
somewhere in NTFS[2]. The time your drive was formatted decides what
that table looks like, and I haven't been able to retrieve it. This
might be going a little too far, as this table is likely to be very
rarely changed, but I think it's worth noting.
[1]: http://groups.google.com/group/msysgit/browse_thread/thread/675ad16102f6233f/a25cd7bb8dfa2abb#a25cd7bb8dfa2abb
[2]: http://blogs.msdn.com/b/michkap/archive/2007/10/24/5641619.aspx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:02 ` Robin Rosenberg
2010-10-04 16:41 ` Ævar Arnfjörð Bjarmason
2010-10-04 16:48 ` Erik Faye-Lund
@ 2010-10-04 16:49 ` Joshua Jensen
2010-10-04 17:08 ` Jonathan Nieder
2010-10-04 17:53 ` Ævar Arnfjörð Bjarmason
2010-10-04 19:02 ` Johannes Sixt
3 siblings, 2 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-04 16:49 UTC (permalink / raw)
To: Robin Rosenberg
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Johannes Sixt, Brandon Casey
----- Original Message -----
From: Robin Rosenberg
Date: 10/4/2010 10:02 AM
> söndagen den 3 oktober 2010 11.56.44 skrev Ævar Arnfjörð Bjarmason:
>> From: Joshua Jensen<jjensen@workspacewhiz.com>
>>
>> When mydir/filea.txt is added, mydir/ is renamed to MyDir/, and
>> MyDir/fileb.txt is added, running git ls-files mydir only shows
>> mydir/filea.txt. Running git ls-files MyDir shows MyDir/fileb.txt.
>> Running git ls-files mYdIR shows nothing.
>>
>> With this patch running git ls-files for mydir, MyDir, and mYdIR shows
>> mydir/filea.txt and MyDir/fileb.txt.
>> ---
>> dir.c | 38 ++++++++++++++++++++++++++------------
>> 1 files changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/dir.c b/dir.c
>> index cf8f65c..53aa4f3 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -107,16 +107,30 @@ static int match_one(const char *match, const char
>> + if (ignore_case) {
>> + for (;;) {
>> + unsigned char c1 = tolower(*match);
>> + unsigned char c2 = tolower(*name);
> Is anyone thinking "unicode" around here?
On Windows, Unicode filenames are 16-bit wide characters. The current
code doesn't handle them at all.
I do not know about other file systems and what Git actually handles. I
was under the impression it didn't handle Unicode filenames well in
general... ?
Josh
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 14:58 ` Joshua Jensen
@ 2010-10-04 17:03 ` Jonathan Nieder
0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Nieder @ 2010-10-04 17:03 UTC (permalink / raw)
To: Joshua Jensen
Cc: kusmabite, Johannes Sixt, Thomas Adam,
Ævar Arnfjörð, git, Junio C Hamano, Brandon Casey
Joshua Jensen wrote:
> I'm running on a really, really fast machine, a Xeon X5560. The
> difference in time for the above code versus what is in the patch
> seems to average about 0.07 seconds.
The useful information would be a percentage...
> Remember, this is an
> incredibly fast machine, and I imagine it will be worse on machines
> with slower processors and less cache.
... but the subarch and cache size may indeed also be relevant.
Here's a revised version of the ugly speed hack. Using a separate
function like this is probably a bad idea unless it speeds things up.
/* Returns match length, or -1 for mismatch. */
static inline int match_until_glob_special(const char *match, const char *name,
int namelen, int ignore_case)
{
int remaining = namelen;
for (;;) {
unsigned char c1 = (ignore_case) ? tolower(*match) : *match;
unsigned char c2 = (ignore_case) ? tolower(*name) : *name;
if (c1 == '\0' || is_glob_special(c1))
return namelen - remaining;
if (c1 != c2)
return -1;
match++;
name++;
remaining--;
}
}
[...]
int matched;
/* If the match was just the prefix, we matched */
if (!*match)
return MATCHED_RECURSIVELY;
/*
* Note: this funny "if" is to ensure each case gets inlined separately.
* Please don't optimize it away unless you've checked the assembler
* to ensure it wasn't helping.
*/
if (ignore_case)
matched = match_until_glob_special(match, name, namelen, 1);
else
matched = match_until_glob_special(match, name, namelen, 0);
if (matched == -1) /* mismatch! */
return 0;
match += matched;
name += matched;
remaining -= matched;
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:49 ` Joshua Jensen
@ 2010-10-04 17:08 ` Jonathan Nieder
2010-10-04 17:53 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 45+ messages in thread
From: Jonathan Nieder @ 2010-10-04 17:08 UTC (permalink / raw)
To: Joshua Jensen
Cc: Robin Rosenberg, Ævar Arnfjörð Bjarmason, git,
Junio C Hamano, Johannes Sixt, Brandon Casey
Joshua Jensen wrote:
> I do not know about other file systems and what Git actually
> handles. I was under the impression it didn't handle Unicode
> filenames well in general... ?
Except in cases like ignorecase handling, git treats path components
as arbitrary streams of bytes ('\0' and path separators are forbidden,
of course). It works pretty well if that's what you need.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:49 ` Joshua Jensen
2010-10-04 17:08 ` Jonathan Nieder
@ 2010-10-04 17:53 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-04 17:53 UTC (permalink / raw)
To: Joshua Jensen
Cc: Robin Rosenberg, git, Junio C Hamano, Johannes Sixt,
Brandon Casey
On Mon, Oct 4, 2010 at 16:49, Joshua Jensen <jjensen@workspacewhiz.com> wrote:
>> Is anyone thinking "unicode" around here?
>
> On Windows, Unicode filenames are 16-bit wide characters. The current code
> doesn't handle them at all.
>
> I do not know about other file systems and what Git actually handles. I was
> under the impression it didn't handle Unicode filenames well in general... ?
The only sane way of doing this sort of thing is to have a defined
*internal* encoding that gets converted to whatever the native
encoding is at the input/output points.
So Git could use Unicode represented by UTF-8, UTF-16 (whatever's
convenient) internally, but when you check out files those checked out
files can be in whatever encoding you choose.
So you could have a UTF-8 repository but check out UTF-8 filenames on
Windows. I.e. internally we'd have the file:
æab
Represented by UTF-8:
c3 a6 61 62 \0
But would check out UTF-16:
ff fe e6 00 61 00 62 00
Then when you add a new file it'll know it's in UTF-16 and convert it
to UTF-8 before writing to the repository. All invisible to the user.
Perl handles encoding issues like this, and it's awesome. The only
thing you have to do is make sure that the system knows the encoding
of data going into it, and what encoding you want out of it.
But any implementation of this is far off, and just storing raw byte
streams is Good Enough now that almost everyone uses UTF-8 anyway, so
nobody's seriously worked on this.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 16:02 ` Robin Rosenberg
` (2 preceding siblings ...)
2010-10-04 16:49 ` Joshua Jensen
@ 2010-10-04 19:02 ` Johannes Sixt
2010-10-04 19:17 ` Ævar Arnfjörð Bjarmason
3 siblings, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-10-04 19:02 UTC (permalink / raw)
To: Robin Rosenberg
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Joshua Jensen, Brandon Casey
On Montag, 4. Oktober 2010, Robin Rosenberg wrote:
> Is anyone thinking "unicode" around here?
My recommendation these days is that you should not use git if you care about
Unicode filenames: git is tied to POSIX in this regard, which defines
filenames as streams of *bytes*.
-- Hannes
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files
2010-10-04 19:02 ` Johannes Sixt
@ 2010-10-04 19:17 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-04 19:17 UTC (permalink / raw)
To: Johannes Sixt
Cc: Robin Rosenberg, git, Junio C Hamano, Joshua Jensen,
Brandon Casey
On Mon, Oct 4, 2010 at 19:02, Johannes Sixt <j6t@kdbg.org> wrote:
> On Montag, 4. Oktober 2010, Robin Rosenberg wrote:
>> Is anyone thinking "unicode" around here?
>
> My recommendation these days is that you should not use git if you care about
> Unicode filenames: git is tied to POSIX in this regard, which defines
> filenames as streams of *bytes*.
A SCM is all about giving meaning to streams of bytes. Just because
POSIX only says that filenames are \0-delimited blobs that doesn't
mean you can't have some annotation elsewhere that says "hey, these
blobs are in $encoding".
<insert another disclaimer here about implementing this being a huge
task, but I'm just saying...>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-03 18:12 ` Johannes Sixt
@ 2010-10-06 22:04 ` Robert Buck
2010-10-06 22:46 ` Joshua Jensen
0 siblings, 1 reply; 45+ messages in thread
From: Robert Buck @ 2010-10-06 22:04 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Joshua Jensen, git
Hello Johannes,
Here is the use case I was thinking of.
Let's say I want to have .gitignore be case insensitive with respect
to matches so I can simplify the file by not having [D][d]ebug sorts
of messes. But let's say I also want to support files whose names only
differ by case (just like Unix supports). Can your current patch
series support this? Does the current patch series break this?
Could you share how this would or would not work, and if not, how you
might accomplish this?
Thanks,
Bob
On Sun, Oct 3, 2010 at 2:12 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Sonntag, 3. Oktober 2010, Robert Buck wrote:
>> So I could we please separate the behaviors that change intent
>> (folding) from the behaviors that merely alter how things are
>> displayed (listing) by splitting this into two separate properties?
>> For example,
>>
>> core.casepreserving=true|false
>> core.caseinsensitive=true|false
>>
>> The former property would control folding, the latter property would
>> apply to listing and pattern matching. Then people could opt out of
>> the folding behaviors (add, import), while continuing to adopt listing
>> and pattern matching (status, ls, ignore).
>
> core.ignorecase has a very well-defined meaning: It describes whether the
> worktree lives on a filesystem that is case-insensitive. Perhaps you could
> help me understand your case if you gave examples and a use-case? I have a
> slight suspicion that your wish is orthogonal to core.ignorecase.
>
> -- Hannes
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 0/6] Extensions of core.ignorecase=true support
2010-10-06 22:04 ` Robert Buck
@ 2010-10-06 22:46 ` Joshua Jensen
0 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-06 22:46 UTC (permalink / raw)
To: Robert Buck; +Cc: Johannes Sixt, git
----- Original Message -----
From: Robert Buck
Date: 10/6/2010 4:04 PM
> Let's say I want to have .gitignore be case insensitive with respect
> to matches so I can simplify the file by not having [D][d]ebug sorts
> of messes. But let's say I also want to support files whose names only
> differ by case (just like Unix supports). Can your current patch
> series support this? Does the current patch series break this?
>
> Could you share how this would or would not work, and if not, how you
> might accomplish this?
With this patch series, you are either on a case sensitive file system
(core.ignorecase = false) or a case insensitive file system
(core.ignorecase = true).
There is no specific configuration for .gitignore case insensitivity.
It only pays attention to core.ignorecase.
This is something that could be added, but I don't fully understand the
need. On case sensitive file systems, the case of the resultant
filename is guaranteed. If you have both a Debug/ and debug/ directory,
I would expect two entries in the .gitignore.
?
Josh
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable.
2010-10-03 9:07 ` Joshua Jensen
` (8 preceding siblings ...)
2010-10-03 9:56 ` [PATCH/RFC v3 8/8] Support case folding in git fast-import " Ævar Arnfjörð Bjarmason
@ 2010-10-07 4:13 ` Junio C Hamano
2010-10-07 5:48 ` Joshua Jensen
9 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-10-07 4:13 UTC (permalink / raw)
To: Joshua Jensen; +Cc: Ævar Arnfjörð Bjarmason, git, j6t
Joshua Jensen <jjensen@workspacewhiz.com> writes:
> In any case, I'd like to find a solution to get this series working
> for everyone. I've been out of commission for a month (deploying Git
> to 80+ programmers at an organization, by the way), but I'm back now
> and can work this until it is complete.
Thanks; I'll queue Ævar's v3 (with [v4 2/8]) for now.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable.
2010-10-07 4:13 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Junio C Hamano
@ 2010-10-07 5:48 ` Joshua Jensen
0 siblings, 0 replies; 45+ messages in thread
From: Joshua Jensen @ 2010-10-07 5:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git, j6t
----- Original Message -----
From: Junio C Hamano
Date: 10/6/2010 10:13 PM
> Joshua Jensen<jjensen@workspacewhiz.com> writes:
>> In any case, I'd like to find a solution to get this series working
>> for everyone. I've been out of commission for a month (deploying Git
>> to 80+ programmers at an organization, by the way), but I'm back now
>> and can work this until it is complete.
> Thanks; I'll queue Ævar's v3 (with [v4 2/8]) for now.
That sounds great!
Josh
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2010-10-07 5:48 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-03 4:32 [PATCH v2 0/6] Extensions of core.ignorecase=true support Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Joshua Jensen
2010-10-03 8:30 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:07 ` Joshua Jensen
2010-10-03 9:56 ` [PATCH/RFC v3 0/8] ab/icase-directory: jj/icase-directory with Makefile + configure checks Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 1/8] Makefile & configure: add a NO_FNMATCH flag Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 2/8] Makefile & configure: add a NO_FNMATCH_CASEFOLD flag Ævar Arnfjörð Bjarmason
2010-10-03 17:58 ` Johannes Sixt
2010-10-04 2:48 ` [PATCH/RFC v4 " Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 3/8] Add string comparison functions that respect the ignore_case variable Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 4/8] Case insensitivity support for .gitignore via core.ignorecase Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 5/8] Add case insensitivity support for directories when using git status Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 6/8] Add case insensitivity support when using git ls-files Ævar Arnfjörð Bjarmason
2010-10-03 11:54 ` Thomas Adam
2010-10-03 18:19 ` Johannes Sixt
2010-10-03 21:59 ` Thomas Adam
2010-10-04 7:49 ` Jonathan Nieder
2010-10-04 8:02 ` Ævar Arnfjörð Bjarmason
2010-10-04 14:03 ` Erik Faye-Lund
2010-10-04 14:58 ` Joshua Jensen
2010-10-04 17:03 ` Jonathan Nieder
2010-10-04 16:02 ` Robin Rosenberg
2010-10-04 16:41 ` Ævar Arnfjörð Bjarmason
2010-10-04 16:48 ` Erik Faye-Lund
2010-10-04 16:49 ` Joshua Jensen
2010-10-04 17:08 ` Jonathan Nieder
2010-10-04 17:53 ` Ævar Arnfjörð Bjarmason
2010-10-04 19:02 ` Johannes Sixt
2010-10-04 19:17 ` Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 7/8] Support case folding for git add when core.ignorecase=true Ævar Arnfjörð Bjarmason
2010-10-03 9:56 ` [PATCH/RFC v3 8/8] Support case folding in git fast-import " Ævar Arnfjörð Bjarmason
2010-10-07 4:13 ` [PATCH v2 1/6] Add string comparison functions that respect the ignore_case variable Junio C Hamano
2010-10-07 5:48 ` Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 2/6] Case insensitivity support for .gitignore via core.ignorecase Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 3/6] Add case insensitivity support for directories when using git status Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 4/6] Add case insensitivity support when using git ls-files Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 5/6] Support case folding for git add when core.ignorecase=true Joshua Jensen
2010-10-03 4:32 ` [PATCH v2 6/6] Support case folding in git fast-import " Joshua Jensen
2010-10-03 13:00 ` Sverre Rabbelier
2010-10-03 8:17 ` [PATCH v2 0/6] Extensions of core.ignorecase=true support Johannes Sixt
2010-10-03 23:34 ` Junio C Hamano
2010-10-03 11:48 ` Robert Buck
2010-10-03 18:12 ` Johannes Sixt
2010-10-06 22:04 ` Robert Buck
2010-10-06 22:46 ` Joshua Jensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).