git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] stop storing trailing slash in dir-hash
@ 2013-09-13 11:15 Eric Sunshine
  2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt,
	Jonathan Nieder

This series alters name-hash so that it no longer stores the (redundant)
trailing slash with index_state.dir_hash entries. As an intentional
side-effect, the series fixes [1] in a cleaner way (suggested by Junio
[2]) than either [3] (680be044 in master) or [4].

As noted by Peff [5], this change is at a fairly fundamental level, so
care has been taken to ensure that all tests still pass (thus it at
least does not break anything covered by the tests).

[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727
[2]: http://thread.gmane.org/gmane.comp.version-control.git/232727/focus=232813
[3]: http://thread.gmane.org/gmane.comp.version-control.git/232796
[4]: http://thread.gmane.org/gmane.comp.version-control.git/232833
[5]: http://thread.gmane.org/gmane.comp.version-control.git/232727/focus=232822

Eric Sunshine (3):
  name-hash: refactor polymorphic index_name_exists()
  name-hash: stop storing trailing '/' on paths in index_state.dir_hash
  dir: revert work-around for retired dangerous behavior

 cache.h      |  2 ++
 dir.c        | 25 +++++++------------------
 name-hash.c  | 54 ++++++++++++++++++++++++++++--------------------------
 read-cache.c |  2 +-
 4 files changed, 38 insertions(+), 45 deletions(-)

-- 
1.8.4.457.g424cb08

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
  2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine
@ 2013-09-13 11:15 ` Eric Sunshine
  2013-09-13 18:40   ` Junio C Hamano
  2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine
  2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt,
	Jonathan Nieder

Depending upon the absence or presence of a trailing '/' on the incoming
pathname, index_name_exists() checks either if a file is present in the
index or if a directory is represented within the index.  Each caller
explicitly chooses the mode of operation by adding or removing a
trailing '/' before invoking index_name_exists().

Since these two modes of operations are disjoint and have no code in
common (one searches index_state.name_hash; the other dir_hash), they
can be represented more naturally as distinct functions: one to search
for a file, and one for a directory.

Splitting index searching into two functions relieves callers of the
artificial burden of having to add or remove a slash to select the mode
of operation; instead they just call the desired function. A subsequent
patch will take advantage of this benefit in order to eliminate the
requirement that the incoming pathname for a directory search must have
a trailing slash.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 cache.h      |  2 ++
 dir.c        |  7 ++++---
 name-hash.c  | 47 ++++++++++++++++++++++++-----------------------
 read-cache.c |  2 +-
 4 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index 9ef778a..03ec24c 100644
--- a/cache.h
+++ b/cache.h
@@ -314,6 +314,7 @@ extern void free_name_hash(struct index_state *istate);
 #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL)
 #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
 #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
+#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen))
 #define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase))
 #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen))
 #define resolve_undo_clear() resolve_undo_clear_index(&the_index)
@@ -463,6 +464,7 @@ extern int write_index(struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
+extern struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen);
 extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
diff --git a/dir.c b/dir.c
index b439ff0..0080673 100644
--- a/dir.c
+++ b/dir.c
@@ -860,7 +860,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
 
 static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
 {
-	if (cache_name_exists(pathname, len, ignore_case))
+	if ((len == 0 || pathname[len - 1] != '/') &&
+	    cache_name_exists(pathname, len, ignore_case))
 		return NULL;
 
 	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
@@ -885,11 +886,11 @@ enum exist_status {
 /*
  * Do not use the alphabetically sorted index to look up
  * the directory name; instead, use the case insensitive
- * name hash.
+ * directory hash.
  */
 static enum exist_status directory_exists_in_index_icase(const char *dirname, int len)
 {
-	const struct cache_entry *ce = cache_name_exists(dirname, len + 1, ignore_case);
+	const struct cache_entry *ce = cache_dir_exists(dirname, len + 1);
 	unsigned char endchar;
 
 	if (!ce)
diff --git a/name-hash.c b/name-hash.c
index 617c86c..5b01554 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -222,11 +222,35 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
 	return slow_same_name(name, namelen, ce->name, len);
 }
 
+struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen)
+{
+	struct cache_entry *ce;
+	struct dir_entry *dir;
+
+	lazy_init_name_hash(istate);
+	dir = find_dir_entry(istate, name, namelen);
+	if (dir && dir->nr)
+		return dir->ce;
+
+	/*
+	 * It might be a submodule. Unlike plain directories, which are stored
+	 * in the dir-hash, submodules are stored in the name-hash, so check
+	 * there, as well.
+	 */
+	ce = index_name_exists(istate, name, namelen - 1, 1);
+	if (ce && S_ISGITLINK(ce->ce_mode))
+		return ce;
+
+	return NULL;
+}
+
 struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
 {
 	unsigned int hash = hash_name(name, namelen);
 	struct cache_entry *ce;
 
+	assert(namelen == 0 || name[namelen - 1] != '/');
+
 	lazy_init_name_hash(istate);
 	ce = lookup_hash(hash, &istate->name_hash);
 
@@ -237,29 +261,6 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
 		}
 		ce = ce->next;
 	}
-
-	/*
-	 * When looking for a directory (trailing '/'), it might be a
-	 * submodule or a directory. Despite submodules being directories,
-	 * they are stored in the name hash without a closing slash.
-	 * When ignore_case is 1, directories are stored in a separate hash
-	 * table *with* their closing slash.
-	 *
-	 * The side effect of this storage technique is we have need to
-	 * lookup the directory in a separate hash table, and if not found
-	 * remove the slash from name and perform the lookup again without
-	 * the slash.  If a match is made, S_ISGITLINK(ce->mode) will be
-	 * true.
-	 */
-	if (icase && name[namelen - 1] == '/') {
-		struct dir_entry *dir = find_dir_entry(istate, name, namelen);
-		if (dir && dir->nr)
-			return dir->ce;
-
-		ce = index_name_exists(istate, name, namelen - 1, icase);
-		if (ce && S_ISGITLINK(ce->ce_mode))
-			return ce;
-	}
 	return NULL;
 }
 
diff --git a/read-cache.c b/read-cache.c
index 885943a..a59644a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 			if (*ptr == '/') {
 				struct cache_entry *foundce;
 				++ptr;
-				foundce = index_name_exists(istate, ce->name, ptr - ce->name, ignore_case);
+				foundce = index_dir_exists(istate, ce->name, ptr - ce->name);
 				if (foundce) {
 					memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);
 					startPtr = ptr;
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash
  2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine
  2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine
@ 2013-09-13 11:15 ` Eric Sunshine
  2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt,
	Jonathan Nieder

When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.

2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the now-redundant trailing '/', thus
callers still bear the burden of ensuring its presence before searching
via index_dir_exists(). Eliminate this redundancy by storing paths in
the dir-hash without the trailing '/'.

An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 dir.c        | 2 +-
 name-hash.c  | 9 +++++----
 read-cache.c | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 0080673..69d04a0 100644
--- a/dir.c
+++ b/dir.c
@@ -890,7 +890,7 @@ enum exist_status {
  */
 static enum exist_status directory_exists_in_index_icase(const char *dirname, int len)
 {
-	const struct cache_entry *ce = cache_dir_exists(dirname, len + 1);
+	const struct cache_entry *ce = cache_dir_exists(dirname, len);
 	unsigned char endchar;
 
 	if (!ce)
diff --git a/name-hash.c b/name-hash.c
index 5b01554..2bae75d 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -58,9 +58,9 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
 {
 	/*
 	 * Throw each directory component in the hash for quick lookup
-	 * during a git status. Directory components are stored with their
+	 * during a git status. Directory components are stored without their
 	 * closing slash.  Despite submodules being a directory, they never
-	 * reach this point, because they are stored without a closing slash
+	 * reach this point, because they are stored
 	 * in index_state.name_hash (as ordinary cache_entries).
 	 *
 	 * Note that the cache_entry stored with the dir_entry merely
@@ -78,6 +78,7 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
 		namelen--;
 	if (namelen <= 0)
 		return NULL;
+	namelen--;
 
 	/* lookup existing entry for that directory */
 	dir = find_dir_entry(istate, ce->name, namelen);
@@ -97,7 +98,7 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
 		}
 
 		/* recursively add missing parent directories */
-		dir->parent = hash_dir_entry(istate, ce, namelen - 1);
+		dir->parent = hash_dir_entry(istate, ce, namelen);
 	}
 	return dir;
 }
@@ -237,7 +238,7 @@ struct cache_entry *index_dir_exists(struct index_state *istate, const char *nam
 	 * in the dir-hash, submodules are stored in the name-hash, so check
 	 * there, as well.
 	 */
-	ce = index_name_exists(istate, name, namelen - 1, 1);
+	ce = index_name_exists(istate, name, namelen, 1);
 	if (ce && S_ISGITLINK(ce->ce_mode))
 		return ce;
 
diff --git a/read-cache.c b/read-cache.c
index a59644a..8990b61 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 			if (*ptr == '/') {
 				struct cache_entry *foundce;
 				++ptr;
-				foundce = index_dir_exists(istate, ce->name, ptr - ce->name);
+				foundce = index_dir_exists(istate, ce->name, ptr - ce->name - 1);
 				if (foundce) {
 					memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);
 					startPtr = ptr;
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] dir: revert work-around for retired dangerous behavior
  2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine
  2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine
  2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine
@ 2013-09-13 11:15 ` Eric Sunshine
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt,
	Jonathan Nieder

directory_exists_in_index_icase() dangerously assumed that it could
access one character beyond the end of its directory argument, and that
that character would unconditionally be '/'.  2eac2a4c (ls-files -k: a
directory only can be killed if the index has a non-directory,
2013-08-15) added a caller which did not respect this undocumented
assumption, and 680be044 (dir.c::test_one_path(): work around
directory_exists_in_index_icase() breakage, 2013-08-23) added a
work-around which temporarily appends a '/' before invoking
directory_exists_in_index_icase().

Since the dangerous behavior of directory_exists_in_index_icase() has
been eliminated, the work-around is now redundant, so retire it (but not
the tests added by the same commit).

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 dir.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/dir.c b/dir.c
index 69d04a0..ef95160 100644
--- a/dir.c
+++ b/dir.c
@@ -1161,21 +1161,9 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	 */
 	if ((dir->flags & DIR_COLLECT_KILLED_ONLY) &&
 	    (dtype == DT_DIR) &&
-	    !has_path_in_index) {
-		/*
-		 * NEEDSWORK: directory_exists_in_index_icase()
-		 * assumes that one byte past the given path is
-		 * readable and has '/', which needs to be fixed, but
-		 * until then, work it around in the caller.
-		 */
-		strbuf_addch(path, '/');
-		if (directory_exists_in_index(path->buf, path->len - 1) ==
-		    index_nonexistent) {
-			strbuf_setlen(path, path->len - 1);
-			return path_none;
-		}
-		strbuf_setlen(path, path->len - 1);
-	}
+	    !has_path_in_index &&
+	    (directory_exists_in_index(path->buf, path->len) == index_nonexistent))
+		return path_none;
 
 	exclude = is_excluded(dir, path->buf, &dtype);
 
-- 
1.8.4.457.g424cb08

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
  2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine
@ 2013-09-13 18:40   ` Junio C Hamano
  2013-09-13 19:29     ` Eric Sunshine
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-09-13 18:40 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, Jeff King, Brian Gernhardt, Jonathan Nieder

Eric Sunshine <sunshine@sunshineco.com> writes:

> Depending upon the absence or presence of a trailing '/' on the incoming
> pathname, index_name_exists() checks either if a file is present in the
> index or if a directory is represented within the index.  Each caller
> explicitly chooses the mode of operation by adding or removing a
> trailing '/' before invoking index_name_exists().
>
> Since these two modes of operations are disjoint and have no code in
> common (one searches index_state.name_hash; the other dir_hash), they
> can be represented more naturally as distinct functions: one to search
> for a file, and one for a directory.
>
> Splitting index searching into two functions relieves callers of the
> artificial burden of having to add or remove a slash to select the mode
> of operation; instead they just call the desired function. A subsequent
> patch will take advantage of this benefit in order to eliminate the
> requirement that the incoming pathname for a directory search must have
> a trailing slash.

Good thinking.  Thanks for working on this; I agree with the general
direction this series is going.

I however wonder if it would be a good idea to rename the other one
to {cache|index}_file_exists(), and even keep the *_name_exists()
variant that switches between the two based on the trailing slash,
to avoid breaking other topics in flight that may have added new
callsites to *_name_exists().  Otherwise the new callsites will feed
a path with a trailing slash to *_name_exists() and get a false
result, no?


> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>  cache.h      |  2 ++
>  dir.c        |  7 ++++---
>  name-hash.c  | 47 ++++++++++++++++++++++++-----------------------
>  read-cache.c |  2 +-
>  4 files changed, 31 insertions(+), 27 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 9ef778a..03ec24c 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -314,6 +314,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL)
>  #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
>  #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
> +#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen))
>  #define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase))
>  #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen))
>  #define resolve_undo_clear() resolve_undo_clear_index(&the_index)
> @@ -463,6 +464,7 @@ extern int write_index(struct index_state *, int newfd);
>  extern int discard_index(struct index_state *);
>  extern int unmerged_index(const struct index_state *);
>  extern int verify_path(const char *path);
> +extern struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen);
>  extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
>  extern int index_name_pos(const struct index_state *, const char *name, int namelen);
>  #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
> diff --git a/dir.c b/dir.c
> index b439ff0..0080673 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -860,7 +860,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
>  
>  static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
>  {
> -	if (cache_name_exists(pathname, len, ignore_case))
> +	if ((len == 0 || pathname[len - 1] != '/') &&
> +	    cache_name_exists(pathname, len, ignore_case))
>  		return NULL;
>  
>  	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
> @@ -885,11 +886,11 @@ enum exist_status {
>  /*
>   * Do not use the alphabetically sorted index to look up
>   * the directory name; instead, use the case insensitive
> - * name hash.
> + * directory hash.
>   */
>  static enum exist_status directory_exists_in_index_icase(const char *dirname, int len)
>  {
> -	const struct cache_entry *ce = cache_name_exists(dirname, len + 1, ignore_case);
> +	const struct cache_entry *ce = cache_dir_exists(dirname, len + 1);
>  	unsigned char endchar;
>  
>  	if (!ce)
> diff --git a/name-hash.c b/name-hash.c
> index 617c86c..5b01554 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -222,11 +222,35 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
>  	return slow_same_name(name, namelen, ce->name, len);
>  }
>  
> +struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen)
> +{
> +	struct cache_entry *ce;
> +	struct dir_entry *dir;
> +
> +	lazy_init_name_hash(istate);
> +	dir = find_dir_entry(istate, name, namelen);
> +	if (dir && dir->nr)
> +		return dir->ce;
> +
> +	/*
> +	 * It might be a submodule. Unlike plain directories, which are stored
> +	 * in the dir-hash, submodules are stored in the name-hash, so check
> +	 * there, as well.
> +	 */
> +	ce = index_name_exists(istate, name, namelen - 1, 1);
> +	if (ce && S_ISGITLINK(ce->ce_mode))
> +		return ce;
> +
> +	return NULL;
> +}
> +
>  struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
>  {
>  	unsigned int hash = hash_name(name, namelen);
>  	struct cache_entry *ce;
>  
> +	assert(namelen == 0 || name[namelen - 1] != '/');
> +
>  	lazy_init_name_hash(istate);
>  	ce = lookup_hash(hash, &istate->name_hash);
>  
> @@ -237,29 +261,6 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
>  		}
>  		ce = ce->next;
>  	}
> -
> -	/*
> -	 * When looking for a directory (trailing '/'), it might be a
> -	 * submodule or a directory. Despite submodules being directories,
> -	 * they are stored in the name hash without a closing slash.
> -	 * When ignore_case is 1, directories are stored in a separate hash
> -	 * table *with* their closing slash.
> -	 *
> -	 * The side effect of this storage technique is we have need to
> -	 * lookup the directory in a separate hash table, and if not found
> -	 * remove the slash from name and perform the lookup again without
> -	 * the slash.  If a match is made, S_ISGITLINK(ce->mode) will be
> -	 * true.
> -	 */
> -	if (icase && name[namelen - 1] == '/') {
> -		struct dir_entry *dir = find_dir_entry(istate, name, namelen);
> -		if (dir && dir->nr)
> -			return dir->ce;
> -
> -		ce = index_name_exists(istate, name, namelen - 1, icase);
> -		if (ce && S_ISGITLINK(ce->ce_mode))
> -			return ce;
> -	}
>  	return NULL;
>  }
>  
> diff --git a/read-cache.c b/read-cache.c
> index 885943a..a59644a 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  			if (*ptr == '/') {
>  				struct cache_entry *foundce;
>  				++ptr;
> -				foundce = index_name_exists(istate, ce->name, ptr - ce->name, ignore_case);
> +				foundce = index_dir_exists(istate, ce->name, ptr - ce->name);
>  				if (foundce) {
>  					memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);
>  					startPtr = ptr;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
  2013-09-13 18:40   ` Junio C Hamano
@ 2013-09-13 19:29     ` Eric Sunshine
       [not found]       ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 19:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder

On Fri, Sep 13, 2013 at 2:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> Depending upon the absence or presence of a trailing '/' on the incoming
>> pathname, index_name_exists() checks either if a file is present in the
>> index or if a directory is represented within the index.  Each caller
>> explicitly chooses the mode of operation by adding or removing a
>> trailing '/' before invoking index_name_exists().
>>
>> Since these two modes of operations are disjoint and have no code in
>> common (one searches index_state.name_hash; the other dir_hash), they
>> can be represented more naturally as distinct functions: one to search
>> for a file, and one for a directory.
>>
>> Splitting index searching into two functions relieves callers of the
>> artificial burden of having to add or remove a slash to select the mode
>> of operation; instead they just call the desired function. A subsequent
>> patch will take advantage of this benefit in order to eliminate the
>> requirement that the incoming pathname for a directory search must have
>> a trailing slash.
>
> Good thinking.  Thanks for working on this; I agree with the general
> direction this series is going.
>
> I however wonder if it would be a good idea to rename the other one
> to {cache|index}_file_exists(), and even keep the *_name_exists()
> variant that switches between the two based on the trailing slash,
> to avoid breaking other topics in flight that may have added new
> callsites to *_name_exists().  Otherwise the new callsites will feed
> a path with a trailing slash to *_name_exists() and get a false
> result, no?

An assert("no trailing /") was added to index_name_exists() precisely
for the purpose of catching cases of code incorrectly calling
index_name_exists() to search for a directory. No existing code in
'master' trips the assertion (at least according to the tests),
however, the assertion may be annoying for topics in flight which do.

Other than plopping these patches atop 'pu' and seeing if anything
breaks, what would be the "git way" of detecting in-flight topics
which add callers of index_name_exists()? (Excuse my git ignorance.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
       [not found]       ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com>
@ 2013-09-13 21:44         ` Eric Sunshine
  2013-09-13 22:16           ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder

On Fri, Sep 13, 2013 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
> On Fri, Sep 13, 2013 at 12:29 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Fri, Sep 13, 2013 at 2:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Eric Sunshine <sunshine@sunshineco.com> writes:
>>>
>>>> Since these two modes of operations are disjoint and have no code in
>>>> common (one searches index_state.name_hash; the other dir_hash), they
>>>> can be represented more naturally as distinct functions: one to search
>>>> for a file, and one for a directory.
>>>
>>> Good thinking.  Thanks for working on this; I agree with the general
>>> direction this series is going.
>>>
>>> I however wonder if it would be a good idea to rename the other one
>>> to {cache|index}_file_exists(), and even keep the *_name_exists()
>>> variant that switches between the two based on the trailing slash,
>>> to avoid breaking other topics in flight that may have added new
>>> callsites to *_name_exists().  Otherwise the new callsites will feed
>>> a path with a trailing slash to *_name_exists() and get a false
>>> result, no?
>>
>> An assert("no trailing /") was added to index_name_exists() precisely
>> for the purpose of catching cases of code incorrectly calling
>> index_name_exists() to search for a directory. No existing code in
>> 'master' trips the assertion (at least according to the tests),
>> however, the assertion may be annoying for topics in flight which do.
>>
>> Other than plopping these patches atop 'pu' and seeing if anything
>> breaks, what would be the "git way" of detecting in-flight topics
>> which add callers of index_name_exists()? (Excuse my git ignorance.)
>
> I would do a quick
>
> $ git diff master..pu | grep '^+.*_name_exists'
>
> which would be more conservative (it would also show an invocation
> moved from one place to another).

There appear to be no topics in flight which add new
index_name_exists() callers (which is not to say that no new callers
will be introduced before this topic lands in 'master').

I also plopped the patches atop 'pu' and there are no new tests
failures on Linux or Mac OS X, so the series does not seem to break
anything in flight. (There are a few test failures on 'pu' on Mac OS
X, but they are not related to this series.)

Given the above. How should I proceed? Do you still feel that it is
advisable to keep an index_name_exists() around for compatibility
reasons in case any new callers are introduced? Regardless of that
answer, do you want index_name_exists() renamed to
index_file_exists()?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
  2013-09-13 21:44         ` Eric Sunshine
@ 2013-09-13 22:16           ` Junio C Hamano
  2013-09-13 22:20             ` Eric Sunshine
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-09-13 22:16 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder

Eric Sunshine <sunshine@sunshineco.com> writes:

> Given the above. How should I proceed? Do you still feel that it is
> advisable to keep an index_name_exists() around for compatibility
> reasons in case any new callers are introduced? Regardless of that
> answer, do you want index_name_exists() renamed to
> index_file_exists()?

Renaming *_name_exists() to *_file_exists() without keeping a
compatibility one will force new topics to be rebased on this
series.  Alternatively we could merge them to 'pu' (and later 'next'
and 'master') with evil merges to adjust the change in the semantics
of the called function.  That increases the risk of accidental
breakages, I think.

It is safer to keep index_name_exists() around with the older
semantics, if we can, and rename your "file only" one to a different
name.  That way, even if a new topic still uses index_name_exists()
expecting the traditional behaviour, it will not break immediately
and we do not need to risk evil merges making mistakes.

Later, we can "git grep _name_exists" to spot them and convert such
old-style calls to either "directory only" or "file only" variants
after this series and these follow-on topics hit 'master' (and we do
not know at this point in what order that happens).

Hmm?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists()
  2013-09-13 22:16           ` Junio C Hamano
@ 2013-09-13 22:20             ` Eric Sunshine
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Sunshine @ 2013-09-13 22:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder

On Fri, Sep 13, 2013 at 6:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> Given the above. How should I proceed? Do you still feel that it is
>> advisable to keep an index_name_exists() around for compatibility
>> reasons in case any new callers are introduced? Regardless of that
>> answer, do you want index_name_exists() renamed to
>> index_file_exists()?
>
> Renaming *_name_exists() to *_file_exists() without keeping a
> compatibility one will force new topics to be rebased on this
> series.  Alternatively we could merge them to 'pu' (and later 'next'
> and 'master') with evil merges to adjust the change in the semantics
> of the called function.  That increases the risk of accidental
> breakages, I think.
>
> It is safer to keep index_name_exists() around with the older
> semantics, if we can, and rename your "file only" one to a different
> name.  That way, even if a new topic still uses index_name_exists()
> expecting the traditional behaviour, it will not break immediately
> and we do not need to risk evil merges making mistakes.
>
> Later, we can "git grep _name_exists" to spot them and convert such
> old-style calls to either "directory only" or "file only" variants
> after this series and these follow-on topics hit 'master' (and we do
> not know at this point in what order that happens).

Thanks. That's what I needed to know. I'll re-roll with the suggested changes.

(And, I'm looking into the Mac-only test breakages not related to this
patch series.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-09-13 22:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine
2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine
2013-09-13 18:40   ` Junio C Hamano
2013-09-13 19:29     ` Eric Sunshine
     [not found]       ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com>
2013-09-13 21:44         ` Eric Sunshine
2013-09-13 22:16           ` Junio C Hamano
2013-09-13 22:20             ` Eric Sunshine
2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine
2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).